Skip to content

Fix formatting#79

Open
claude-marie wants to merge 5 commits intomainfrom
fix_formatting
Open

Fix formatting#79
claude-marie wants to merge 5 commits intomainfrom
fix_formatting

Conversation

@claude-marie
Copy link
Copy Markdown
Contributor

It is not related to any ticket, this fix come from a bug I find helping Giulia in the workspace.

  1. After running the DRC formatting pipeline I find out a bug in one of the code file which needed a tiny fix
  2. The formatting report totally was crashing in DRC due to too many one to many merge with millions of line which lead to a RAM explosion. After many try to optimize it I decided to completrly rework it using the same plot but slighlty different data (aggregation). You can find the new report here and for comparaison an old one here

@EstebanMontandon
Copy link
Copy Markdown
Collaborator

Hey, let's continue to improve our workflow by reusing code from previous notebooks. Moving forward, let's keep an eye on how we can generalize these functions to work across different pipelines.

@claude-marie
Copy link
Copy Markdown
Contributor Author

I'll let it in stand by until next week, waiting for @sPuntinG to end up with the workshop

},
"outputs": [],
"source": [
"if (!exists(\"SNT_ROOT_PATH\") || length(SNT_ROOT_PATH) == 0L || !nzchar(as.character(SNT_ROOT_PATH)[1])) {\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that at least in the case of "notebooks", we can ignore validations of the root path.. let's just assume we are always working in OpenHexa where the root will never change.

so, to load the functions you can just do:
source("~workspace/pipelines/snt_dhis2_formatting/utils/snt_dhis2_formatting_report.r") # hardcoded path...

" SNT_ROOT_PATH <- \"~/workspace\"\n",
"}\n",
"source(file.path(SNT_ROOT_PATH, \"pipelines/snt_dhis2_formatting/utils/snt_dhis2_formatting_report.r\"))\n",
"formatting_report_paths_and_outputs()\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go for a generic "bootstrap" option. I like the option of calling a single function that handles all the necessary setup for me, so as a user I don't have to worry about it anymore, makes sense?

If I'm building a report in the future, and I forget to call any of these functions, I might have to start debugging what's going on...
just call 1 "bootstrap" function for example: snt_setup_report() Could be a wrapper function that calls get_setup_variables() and additionally includes specific report configurations, instead of :
formatting_report_paths_and_outputs()
formatting_report_apply_streamlined_defaults()
formatting_report_source_core_and_helpers()
formatting_report_openhexa_sdk()
...etc

Now that I think about it, the naming snt_setup() makes more sense than get_setup_variables()...



# Simplifie `shapes_data` en place (utilise `REPORT_SHAPE_SIMPLIFY_TOL`).
formatting_report_simplify_shapes_inplace <- function() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please never search for global R objects in functions that are applying a transformation. This could lead to very confusing results.

The way we use functions (at least in a notebook) should always be:
transformed_object = may_transformation(object)

This allow us to know what and when a variable is being transform.


# Routine parquet + filtre années + printdim.
formatting_report_load_routine_data <- function() {
if (!exists("openhexa", inherits = TRUE)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remember: all these loading functions can be replaced by the existing function . You can copy paste this function in the *_reporting.r (let's keep things separated for now)

Nice addition: This "openhexa" check could be included in the function I shared with you, but first I would just change the message to something generic.
"OpenHEXA SDK is not loaded or available ...or something"

Note: the "bootstrap" function should load the sdk and raise a warning if not available.

stop("Définir SNT_ROOT_PATH (cellule parameters Papermill) avant formatting_report_paths_and_outputs().")
}
root <- path.expand(as.character(SNT_ROOT_PATH)[1])
.report_nb_assign("SNT_ROOT_PATH", root)
Copy link
Copy Markdown
Collaborator

@EstebanMontandon EstebanMontandon Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bit of duplication here. We store the paths in variables and then we return the same paths in the result, I would avoid the former.
From now on, let's try to enforce notebook developments using only the paths provided by the "setup variable" returned by this "bootstrap" function.
(Using the list you are returning at the end of this function)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants