Snt25 425 by claude-marie · Pull Request #65 · BLSQ/snt_development

claude-marie · 2026-04-03T10:27:38Z

The reporting rate dataset/dataelement rework

EstebanMontandon · 2026-04-13T08:56:17Z

+    arrow::write_parquet(reporting_rate_dataelement, file_path)
+    log_msg(glue::glue("Exported : {file_path}"))
+
+    file_path <- file.path(output_data_path, paste0(COUNTRY_CODE, "_reporting_rate_dataelement.csv"))


Let's try to make generic code, if we want to have files saved in a function , then let's do:
-Give the full path where the file should be saved instead of a hardcoded file.path(DATA_PATH, "reporting_rate")
-Pass the name of the file to be saved like function(data, output_dir, base_name), there we pass something like: paste0(COUNTRY_CODE, "_reporting_rate_dataelement")

However, this operation is quite straight forward, so I would even just leave it in the notebook..

write.csv(reporting_rate_dataelement, file.path(setup$DATA_PATH, "reporting_something", paste0(COUNTRY_CODE, "_reporting_rate_dataelement.csv")), row.names = FALSE) write_parquet(reporting_rate_dataelement, file.path(setup$DATA_PATH, "reporting_something", paste0(COUNTRY_CODE, "_reporting_rate_dataelement.parquet")))

EstebanMontandon · 2026-04-15T09:33:13Z

      },
-      "outputs": [],
      "source": [
        "# Load SNT metadata\n",


perhaps we can have something similar to what we do when we load snt config?
something like a load_snt_metadata() ?
is it a good idea maybe to generalize to a function that just loads json files?

config_json <- load_snt_config(file.path(CONFIG_PATH, "SNT_config.json"))

EstebanMontandon · 2026-04-15T09:35:42Z

-        "CODE_PATH <- file.path(SNT_ROOT_PATH, 'code') # this is where we store snt_utils.r\n",
-        "CONFIG_PATH <- file.path(SNT_ROOT_PATH, 'configuration') # .json config file\n",
-        "DATA_PATH <- file.path(SNT_ROOT_PATH, 'data', 'dhis2')  \n",
+        "SNT_ROOT_PATH <- \"~/workspace\"\n",


I think this paths are set in the get_setup_variables() function.
you can try replicate what you did in snt_dhis2_reporting_rate_dataelement.ipynb‎
But not urgent .. for the future.

EstebanMontandon · 2026-04-15T09:39:13Z

      },
-      "outputs": [],
      "source": [
        "# Important: this will break if reporting rate was calculated as DataSet method because it will not find the file\n",


can we re-use the code ?

snt_development/pipelines/snt_dhis2_formatting/utils/snt_dhis2_formatting.r

Line 77 in 5bf56f5

load_dataset_file <- function (dataset_id, filename, verbose=TRUE) {

EstebanMontandon · 2026-04-15T09:40:56Z

      },
-      "outputs": [],
      "source": [
        "shapes <- tryCatch({ get_latest_dataset_file_in_memory(DHIS2_FORMATTED_DATASET_NAME, paste0(COUNTRY_CODE, \"_shapes.geojson\")) }, \n",


we can re-use the functions here

EstebanMontandon · 2026-04-15T10:30:33Z

-        "Sys.setenv(PROJ_LIB = \"/opt/conda/share/proj\")\n",
-        "Sys.setenv(GDAL_DATA = \"/opt/conda/share/gdal\")\n",
-        "Sys.setenv(RETICULATE_PYTHON = \"/opt/conda/bin/python\")\n",
+        "CODE_PATH <- file.path(SNT_ROOT_PATH, \"code\")\n",


(related to previous comment) these paths are available in the snt_environment variable, better to use that.

EstebanMontandon · 2026-04-15T10:58:54Z

+
+
+#' Write CSV + Parquet under `<DATA_PATH>/dhis2/reporting_rate/`.
+write_reporting_rate_dataelement_outputs <- function(reporting_rate_tbl, snt_environment, country_code) {


related to previous comments.. It think this type of functions just to save some files are a bit of an overkill in complexity.. if needed, let's try to find a generic one size fit all solution. If not possible let's just keep this things in the notebook (at least for now)

EstebanMontandon · 2026-04-15T11:13:52Z

+      },
+      "source": [
+        "cx <- parse_reporting_rate_dataset_snt_settings(config_json)\n",
+        "list2env(cx, envir = .GlobalEnv)\n"


where is this used?
I think this is the same in the dataelements RR pipeline, parse_reporting_rate_dataset_snt_settings() seems unnecessary as we have config_json from where to collect variables, let's not hide that

EstebanMontandon · 2026-04-15T11:17:38Z

+      },
+      "source": [
+        "dhis2_reporting <- load_dataset_file(\n",
+        "    config_json$SNT_DATASET_IDENTIFIERS$DHIS2_DATASET_FORMATTED,\n",


better to save this parameters in variables:
formatting_dataset_id <- config_json$SNT_DATASET_IDENTIFIERS$DHIS2_DATASET_FORMATTED

EstebanMontandon · 2026-04-15T11:20:38Z

+        }
+      },
+      "source": [
+        "# NER-specific normalization quality check\n",


nothing is done in this step? perhaps the code is moved to the country specific? if so , we should get rid of this step in the generic notebook

EstebanMontandon · 2026-04-15T11:22:09Z

+
+
+#' Write CSV + Parquet under `<DATA_PATH>/dhis2/reporting_rate/`.
+write_reporting_rate_dataset_outputs <- function(reporting_rate_tbl, snt_environment, country_code) {


if not used in the notebook you can remove it.

EstebanMontandon · 2026-04-15T11:35:43Z

Just a note to reflect here, so I don't forget :

My main concern with how functions are being used (not only in this PR, but across SNT) is that it sometimes feels like the right pieces of logic aren’t always being properly grouped or encapsulated. It’s not about just copying notebook code into functions, since that can actually make things harder to read. But at the same time, functions shouldn’t be created just for the sake of having functions.
Ideally, they should follow the notebook workflow and help structure it. The additional difficulty in this, is that notebooks can sometimes get a bit “spaghetti-like”, but that’s when it’s useful to step back and identify the key logical parts that can be turned into meaningful functions that make sense in the bigger picture. So, hopefully functions should clarify and structure the workflow, not only replicate or fragment it. Not easy stuff anyway.

EstebanMontandon · 2026-04-20T13:10:03Z

-#' YYYYMM sequence covering the routine period range (inclusive by month).
-monthly_period_vector_from_routine <- function(dhis2_routine) {
+# Legacy alias.
+validate_indicator_columns_in_routine <- check_required_indicators_present_in_routine


please do not do this: let's keep clean "code policy" (unless strictly necessary)

EstebanMontandon · 2026-04-20T13:16:46Z

+
+#' Check that routine columns exist for the chosen activity / volume indicators.
+#' @export
+check_required_indicators_present_in_routine <- function(


This is a good example of what we need to start "generalizing". This implementation is tightly coupled to the specific implementation of this pipeline.
The alternative could be a generic function that receives a dataframe and a list of columns, and it generically checks if they exist.

Should we create functions for checks that are specific to a pipeline's workflow? Perhaps, but if we do, they should be designed as general-purpose tools rather than rigid checks. The goal isn't to hide what's happening in a black box, it's just to simplify by building simple, clear functions we can reuse

EstebanMontandon · 2026-04-20T13:39:23Z

+
+#' Save the final reporting-rate table as CSV + Parquet under `data/dhis2/reporting_rate/`.
+#' @export
+save_dataelement_reporting_rate_csv_and_parquet <- function(reporting_rate_tbl, snt_environment, country_code) {


We are doing this same saving action across many different pipelines, so having a function this specific doesn't really add value to the SNT library as It’s too tied to the pipeline (same for RR dataset), so if we do the same for each pipeline, it will lead to code duplication.

as this saving step can be done in a couple of lines, we should either use a generic utility for everyone or just handle it directly in the pipeline.

EstebanMontandon · 2026-04-20T13:43:02Z

-parse_reporting_rate_dataset_snt_settings <- function(config_json) {
-    assert_papermill_reporting_rate_dataset_params()
+# Legacy alias.
+assert_papermill_reporting_rate_dataset_params <- stop_if_dataset_reporting_papermill_params_missing


EstebanMontandon · 2026-04-20T13:47:43Z

+#' country / admins / product UID and the fixed routine column list from `config_json`.
+#'
+#' @export
+build_dataset_method_reporting_settings_from_config <- function(config_json) {


This is making the parametrization look like a black box, you can move the code to the pipline.
Also remove fixed_columns_for_dataset_reporting_rate_routine_slice, this names should be accessible in the pipeline.

EstebanMontandon · 2026-04-20T14:19:59Z


-build_facility_master_dataelement <- function(
+# Legacy alias.
+write_reporting_rate_dataelement_outputs <- save_dataelement_reporting_rate_csv_and_parquet


EstebanMontandon · 2026-04-20T14:25:30Z

-    if (is.null(vol)) {
-        vol <- c("CONF", "PRES")
+
+resolve_volume_indicator_column_names <- function(rc, volume_activity_indicators) {


EstebanMontandon · 2026-04-20T14:25:42Z

-    act <- rc$ACTIVITY_INDICATORS
-    if (is.null(act)) {
-        act <- c("CONF", "PRES", "SUSP")
+resolve_activity_indicator_column_names <- function(rc, activity_indicators) {


EstebanMontandon · 2026-04-20T14:25:54Z

+}
+

+resolve_weighted_reporting_rate_toggle <- function(rc) {


EstebanMontandon · 2026-04-20T14:28:06Z

+#' already called it).
+#'
+#' @export
+build_dataelement_reporting_settings_from_config <- function(


Please simplify this, is too convoluted, let's just assign variables directly in the pipeline.
You can move this to the pipeline section 1.1. Load and check config_json file:

ADMIN_1 = toupper(config_json$SNT_CONFIG$DHIS2_ADMINISTRATION_1)
ADMIN_2 = toupper(config_json$SNT_CONFIG$DHIS2_ADMINISTRATION_2)
DHIS2_INDICATORS = c("CONF", "PRES", "SUSP", "TEST")
DATAELEMENT_METHOD_DENOMINATOR = # pipeline parameter i think?
USE_WEIGHTED_REPORTING_RATES = # pipeline parameter i think?
ACTIVITY_INDICATORS = c("CONF", "PRES", "SUSP")
VOLUME_ACTIVITY_INDICATORS = c("CONF", "PRES")
fixed_cols = c("PERIOD", "YEAR", "MONTH", "ADM1_ID", "ADM2_ID", "OU_ID"),
fixed_cols_rr = c("YEAR", "MONTH", "ADM2_ID", "REPORTING_RATE")

claude-marie added 3 commits April 3, 2026 12:05

reporting rate update

8eb7f53

dataset & dataelement update

87a5213

last fix

eb3113e

claude-marie requested a review from EstebanMontandon April 3, 2026 10:27

claude-marie added 2 commits April 10, 2026 09:58

less black boxish pipelines

674a41d

fix parameters

19e8566

EstebanMontandon requested changes Apr 13, 2026

View reviewed changes

claude-marie added 3 commits April 13, 2026 17:35

quick fix

1f9ea6e

tested & working in snt dev

94366ed

a lot of changes, make theipeline easier

c219e51

claude-marie requested a review from EstebanMontandon April 14, 2026 12:32

EstebanMontandon requested changes Apr 15, 2026

View reviewed changes

claude-marie added 2 commits April 16, 2026 11:56

fix for moment

a14fc9c

final fix

33bc897

claude-marie requested a review from EstebanMontandon April 16, 2026 11:04

EstebanMontandon requested changes Apr 20, 2026

View reviewed changes

claude-marie added 2 commits April 20, 2026 23:09

milestone comments

3eaefbe

should be it

65b9146



		#' Write CSV + Parquet under `<DATA_PATH>/dhis2/reporting_rate/`.
		write_reporting_rate_dataelement_outputs <- function(reporting_rate_tbl, snt_environment, country_code) {



		#' Write CSV + Parquet under `<DATA_PATH>/dhis2/reporting_rate/`.
		write_reporting_rate_dataset_outputs <- function(reporting_rate_tbl, snt_environment, country_code) {

		}


		resolve_weighted_reporting_rate_toggle <- function(rc) {

Conversation

claude-marie commented Apr 3, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EstebanMontandon commented Apr 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants