Skip to content

update quality of care pipeline#73

Open
claude-marie wants to merge 4 commits intomainfrom
SNT25-434
Open

update quality of care pipeline#73
claude-marie wants to merge 4 commits intomainfrom
SNT25-434

Conversation

@claude-marie
Copy link
Copy Markdown
Contributor

my version of the reworked pipeline

bootstrap_quality_of_care_context <- function(
root_path = "~/workspace",
required_packages = c("jsonlite", "data.table", "arrow", "sf", "ggplot2", "glue", "reticulate", "RColorBrewer", "dplyr", "writexl", "knitr", "scales", "gridExtra")
) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're doing the same sort of setup steps across all pipelines (R), The logic thing to do would be to try to unify the bootstrap in a generic functions , I'm doing this here, I'm calling this function from all 5 formatting scripts, but I think your function seems more complete (we just need to make sure we provide all output paths in the result).

So the next step would be to include this bootstrap in the snt_utils.r ;)

#' @param country_code Country code (e.g. COD).
#' @param data_action Action suffix (`imputed` or `removed`).
#' @return File name of the selected routine parquet.
select_latest_qoc_routine_file <- function(dataset_last_version, country_code, data_action) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can also be done generic and simpler like this

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this function? , we suppose to always read from the lates Dataset version right?

"\n",
"ROOT_PATH <- \"~/workspace\"\n",
"CODE_PATH <- file.path(ROOT_PATH, \"code\")\n",
"PIPELINE_PATH <- file.path(ROOT_PATH, \"pipelines\", \"snt_dhis2_quality_of_care\")\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these paths are set in the bootstrap we dont need them here.

#' @return Named list with routine data, shapes data, and selected filename.
load_quality_of_care_inputs <- function(setup_ctx, data_action) {
data_action <- validate_quality_of_care_action(data_action)
log_msg(glue::glue("Searching latest routine file for data_action: {data_action}"))
Copy link
Copy Markdown
Collaborator

@EstebanMontandon EstebanMontandon Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Add the glue library directly in the list of required packages at bootstrap. But is good practice to use the reference of the library inside functions, you can keep that.

-Not sure if I would have the data loading steps inside a function, this is not really generalizable and are better handled at the R notebook level.

In general, functions should be as generic as possible (within the SNT context, of course). The goal is not to hide pipeline steps or make the computation obscure from the user’s perspective, but rather to make each step clearer and more readable, building on a set of functions defined in the SNT utils "toolbox".

#'
#' @param routine Routine dataframe loaded from outliers dataset.
#' @return Data table with district-year indicators.
compute_quality_of_care_indicators <- function(routine) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the idea of "transparency", I would suggest decomposing this function into smaller functions. Each one could handle a specific step and make it clearer what is being computed. Imagine you are the user trying to understand what is going on.

Comment thread snt_dhis2_quality_of_care/pipeline.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this warning, as well as the check for existing_files, might not be necessary, since the function
the function add_files_to_dataset() logs any files not found, please see here

)
assign("openhexa", openhexa, envir = .GlobalEnv)

config_json <- load_snt_config(config_path, "SNT_config.json")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do this in the pipeline notebook directly

figures_path <- file.path(report_outputs_path, "figures")

required_packages <- packages
install_and_load(required_packages)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install_and_load(packages) ??

#' @param filename Name of file to load from latest dataset version.
#' @param verbose Whether to log informative messages.
#' @return Loaded object (data.frame/data.table/sf) depending on file format.
load_dataset_file_qoc <- function(dataset_id, filename, verbose = TRUE) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not a generic function that reads parquet file?

#' @param country_code Country code (e.g. COD).
#' @param data_action Action suffix (`imputed` or `removed`).
#' @return File name of the selected routine parquet.
select_latest_qoc_routine_file <- function(dataset_last_version, country_code, data_action) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this function? , we suppose to always read from the lates Dataset version right?

"\n",
"source(file.path(CODE_PATH, \"snt_utils.r\"))\n",
"source(file.path(PIPELINE_PATH, \"utils\", \"snt_dhis2_quality_of_care.r\"))\n",
"source(file.path(ROOT_PATH, \"code\", \"snt_utils.r\"))\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be loaded at the beginning of the script as default snt_dhis2_quality_of_care.r
source(file.path(ROOT_PATH, "code", "snt_utils.r"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants