PM_tutorial/data.qmd at main · LAPKB/PM_tutorial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
---
title: "Data"
---

```{r}
#| include: false

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  echo = TRUE,
  eval = FALSE
)
```

```{r}
#| label: setup
#| echo: false
#| message: false
#| eval: true

library(Pmetrics)
library(glue)
library(knitr)
library(dplyr)
library(tidyr)

r_help <- function(pkg, name) {
    glue::glue("[`{name}`](https://rdrr.io/pkg/{pkg}/man/{name}.html)")
}

gh_help <- function(name) {

    glue::glue("[`{name}`](https://lapkb.github.io/Pmetrics/reference/{name}.html)")

}

pmetrics <- function(){
    knitr::asis_output("[Pmetrics]{style=\"color: #841010; font-family: 'Arial', Arial, sans-serif; font-weight: 900;\"}")
}
exData <- dataEx

```

## Introduction

:::{.callout-tip}
Make sure you have run the [tutorial setup code](index.qmd#tutorial-setup) in your R session before copying, pasting and running example code here.
:::


**Pmetrics always needs data and a model to run.** Pmetrics data objects are typically read into memory from files. Although the file format is usually comma-separated (.csv), it is possible to use other separators, like the semicolon, by setting the appropriate argument with `r gh_help("setPMoptions")`.

```{r}

# look at and change global Pmetrics options
setPMoptions()

```

Examples of programs that can save .csv files are any text editor (e.g. TextEdit on Mac, Notepad on Windows) or spreadsheet program (e.g. Excel).

It is possible to create a data object in R directly, without reading a file. This is useful for simulation purposes, where you may want to create a small dataset on the fly. We'll cover this below.

## R6 objects

Most Pmetrics objects, including data, follow the [R6](https://r6.r-lib.org/reference/R6Class.html) framework. The idea of this object is to represent a dataset that is going to be modeled/simulated. All its behaviour is represented by the class `r gh_help("PM_data")`. This class allows datasets to be checked, plotted, written to disk and more. Use `PM_data$new("filename")` to create a `PM_data` object by reading the file.


## First data object
```{r}
# if not using the Rscript/Learn.R template created by PM_tutorial(),
# modify the path as needed
dat <- PM_data$new("src/ex.csv")
```

You can also build an appropriate data frame in R and provide that as an argument to `PM_data$new()`.


```{r}
# ensure data frame has at least these columns:
# id, time, dose, out
df <- data.frame(id = c(1,1,1,2,2),
                 time = c(0,1,2,0,1),
                 dose = c(100,NA,NA,200,NA),
                 out = c(NA,5.2,3.1,NA,7.4)
)
dat_df <- PM_data$new(df)
```

Lastly, you can take advantage of the `addEvent` method in `PM_data` objects to build a data object on the fly. This can be particularly useful for making quick simulation templates. Start with an empty call to `PM_data$new()` and add successive rows. See `r gh_help("PM_data")` for details under the `addEvent` method.

```{r}
#| eval: false
# build a PM_data object row by row
dat_add <- PM_data$new()$
    addEvent(id = 1, time = 0, dose = 100, addl = 5, ii = 24)$ # add 6 doses of 100 every 24 hours
    addEvent(id = 1, time = 144, out = -1)$ # add an observation of -1 at time 144
    addEvent(id = 1, wt = 75, validate = TRUE) # add wt of 75 to all rows for id = 1 and validate
```

**Notes:**

1.  Lack of time element in the last `addEvent` will add *wt = 75* to all rows for *id = 1*
2.  Use `validate = TRUE` as an argument in the last `addEvent` to finalize creation
3.  You can chain events as shown above by including the `$` between events.

:::{.callout-note}
For those familiar with [tidyverse](https://tidyverse.org) or the native R pipe to join functions ("%>%" or "|>", respectively), chaining in R6 is similar but restricted to methods defined for the object. In this case we chain the `addEvent`
methods. We could even chain an additional `PM_data` method like `$plot()` at the end of the above code. However, that would create `dat` as a plotly plot object, not a `PM_data` one.
:::

Below you see the data standardization and validation reports that are generated when you create a new `PM_data` object, and the output of typing `dat$data` and `dat$standard_data` look like in the viewer. The former is your original data, and the latter is what it looks like after standardization to the full Pmetrics format.


```{r}
#| echo: false
#| eval: true
#| label: display_dat_add
dat_add <- PM_data$new()$
    addEvent(id = 1, time = 0, dose = 100, addl = 5, ii = 24)$
    addEvent(id = 1, time = 144, out = -1)$
    addEvent(id = 1, wt = 75, validate = TRUE)
```


**Original data:**

```{r}
#| echo: false
#| eval: true
knitr::kable(dat_add$data)
```


**Standardized data:**


```{r}
#| echo: false
#| eval: true
knitr::kable(dat_add$standard_data)
```


Once you have created the `PM_data` object, you never need to create it again during your R session. You also don't have to bother copying the data file to the Runs folder each time you run the model, like you used to do with older **("Legacy")** versions of Pmetrics. The data are stored in memory and can be used in any Pmetrics function that needs it.

## Data format

R6 Pmetrics can use file or data frame input. The format is very flexible. A truncated example is shown below, with `NA` values replaced by "." as they would appear in a file.

```{r}
#| echo: false
#| results: 'asis'
#| eval: true

tab <- Pmetrics::dataEx$data %>%
    filter(id %in% 1:2) %>%
    select(id, time, dose, out, wt) %>%
    mutate(across(everything(), ~replace_na(as.character(.), "."))) %>%
    mutate(across(everything(), as.character))
knitr::kable(tab)
```

The only required columns are those below. Unlike Legacy Pmetrics, there are no requirements for a header or to prefix the ID column with "\#". However, any subsequent row that begins with "\#" will be ignored, which is helpful if you want to exclude data from the analysis, but preserve the integrity of the original dataset, or to add comment lines. The column order can be anything you wish, but the names should be the same as below. Ultimately, `PM_data$new()` converts all valid data into a standardized format discussed below.

-   [***ID***]{#data-id} This field can be numeric or character and identifies each individual. All rows must contain an ID, and all records from one individual must be contiguous. IDs may be any alphanumeric combination. The number of subjects is unlimited.

-   [***TIME***]{#data-time} This is the elapsed time in decimal hours since the first event, which is always `TIME  = 0`, unless you specify `TIME` as clock time. In that case, you must include a `DATE` column, described below. For clock time, the default format is HH:MM. Other formats can be specified. See `r gh_help("PM_data")` for more details. Every row must have an entry, and within a given ID, rows must be sorted chronologically, earliest to latest.

     -   [***DATE***]{#data-date} This column is only required if `TIME` is clock time, detected by the presence of ":". The default format of the date column is YYYY-MM-DD. As for `TIME`, other formats can be specified. See `r gh_help("PM_data")` for more details.

-   [***DOSE***]{#data-dose} This is the dose amount. It should be "." for observation rows. All subjects must have a dose event at time 0, which is the first row for that subject. The dose amount can be any numeric value, including 0. If the dose is an infusion, the `DUR` column must also be included. In other software
packages, `AMT` is equivalent to `DOSE`.

-   [***OUT***]{#data-out} This is the observation, or output value, and it is always required. If `EVID = 0`, there must be an entry. For such events, if the observation is missing, e.g. a sample was lost or not obtained, this must be coded as -99. It will be ignored for any other `EVID` and therefore should be ".". `OUT` can be coded as `DV` in other software packages. When `OUT = -99`, this is equivalent to `MDV = 1`, or missing dependent variable in other packages, but Pmetrics does not use `MDV`.

**Not required:**

- [***COVARIATES...***]{#data-cov} Covariates are optional and discussed below. Here, **wt** was included as an example of a covariate.

When `PM_data` reads a file, it will standardize it to the format below. This means some inferences are made. For example, in the absence of `EVID`, all doses are interpreted as oral. If they are infusions, `DUR` must be included to indicate the duration of the infusion. `EVID` only needs to be included if `EVID=4` (reset event) is required, described below. Similarly, `INPUT` and `OUTEQ` are only required if multiple inputs or outputs are being modeled. Lastly, `ADDL` and `II` are optional.

Lastly, the standardized data are checked for errors and if found, Pmetrics generates a report with the errors and will attempt to fix those that it can.

### Standardized Data

Data are standardized when `PM_data$new()` is invoked, and the data frame is placed in the `PM_data` object's `$standard_data` field. When the `$save()` method is called on a `PM_data` object, the data are saved in this standardized format. The first several rows of example standardized data are below, with details following.

```{r}
#| echo: false
#| results: 'asis'
#| eval: true

tab <- Pmetrics::dataEx$standard_data %>%
    filter(id %in% 1:2) %>%
    select(-africa, -gender, -age, -height) %>%
    #mutate(across(everything(), ~replace_na(as.character(.), "."))) %>%
    mutate(across(everything(), as.character))
knitr::kable(tab)
```

-   ***ID*** See [above](#data-id).

-   ***EVID*** This is the event ID field. It can be 0, 1, or 4. It is only required if `EVID = 4` is included in the data, in which case every row must have an entry. If there are no `EVID = 4` events, the entire `EVID` column can be omitted from the data.

    -   0 = observation

    -   1 = input (e.g. dose)

    -   2, 3 are currently unused

    -   4 = reset, where all compartment values are set to 0 and the time counter is reset to 0. This is useful when an individual has multiple sampling episodes that are widely spaced in time with no new information gathered. This is a dose event, so dose information needs to be complete. The `TIME` value for `EVID = 4` should be 0, and subsequent rows should increase monotonically from 0 until the last record or until another `EVID = 4` event, which will restart time at 0.

-   ***TIME*** See [above](#data-time).

-   ***DATE*** See [above](#data-date).

-   ***DUR*** This is the duration of an infusion in hours. If `EVID = 0` (observation event), `DUR` is ignored and should have a "." placeholder. For a bolus (e.g. an oral dose), set the value equal to 0. As mentioned above, if all doses are oral, `DUR` can be omitted from the data altogether. Some other packages use `RATE` instead of `DUR`, but of course, one can convert rate to duration with `DUR = DOSE / RATE`.

-   ***DOSE*** See [above](#data-dose).

-   ***ADDL*** This specifies the number of additional doses to give at interval `II`. `ADDL` can be positive or negative. If positive, it is the number of doses to give after the dose at time 0. If negative, it is the number of doses to give before the dose at time 0. It may be missing (".") for dose events (`EVID = 1` or `EVID = 4`), in which case it is assumed to be 0. It is ignored for observation (`EVID = 0`) events. Be sure to adjust the time entry for the subsequent row, if necessary, to account for the extra doses. All compartments in the model will contain the predicted amounts of drug at the end of the `II` interval after the last `ADDL` dose.

-   ***II*** This is the interdose interval and is only relevant if `ADDL` is not equal to 0, in which case `II` cannot be missing. If `ADDL = 0` or is missing, `II` is ignored.

-   ***INPUT*** This defines which input (i.e. drug) the `DOSE` corresponds to. The model defines which compartments receive the input(s). If only modeling one drug, `INPUT` is unnecessary, as all values will be assumed to be 1. Other packages may use `CMT` for compartment for both inputs and outputs. It is necessary to separate these in Pmetrics and for outputs, designate the corresponding model input number with `INPUT` (e.g. R[x] or B[x] for infusions and boluses in the model object), not the compartment.

-   ***OUT*** See [above](#data-out).

-   ***OUTEQ*** This is the output equation number that corresponds to the `OUT` value. Output equations are defined in the model file. If only modeling one output, this column is unnecessary, as all values are assumed to be 1. As discussed in `INPUT`, other packages may use `CMT` for compartment for both inputs and outputs. It is necessary to separate these in Pmetrics and for outputs, designate the corresponding model output equation number with `OUTEQ`, not the compartment.

-   ***CENS*** This is a new column as of Pmetrics 3.0.0. It indicates whether the observation is censored, i.e. below a lower limit of quantification or above an upper limit . It can take on four values:

    - Missing for dose events which are not observations. Use a "." as a placeholder in your data file.

    - 0 or "none" = not censored

    - 1 or "bloq" = left censored (below lower limit of quantification)

    - -1 or "aloq" = right censored (above upper limit of quantification)

    If there are no censored observations, the entire `CENS` column can be omitted from the data. In data fitting, left censored observations are handled using the M3 method described by Beal [@bealWaysFitPK2001a]. Right censored observations are handled similarly, but using the complementary probability. The value in the `OUT` column is the censoring lower limit of quantification (LLOQ) for left censored observations. It is the upper limit of quantification (ULOQ) for right censored observations. For uncensored observations, `OUT` is the observed value as usual. For example, if `OUT = 5` and `CENS = 1` or `CENS = "bloq"`, this indicates that the observation is below the LLOQ of 5. If `OUT = 10` and `CENS = -1` or `CENS = "aloq"`, this indicates that the observation is above the ULOQ of 10.

-   ***C0, C1, C2, C3*** These are the coefficients for the assay error polynomial for that observation. Each subject may have up to one set of coefficients per output equation. If more than one set is detected for a given subject and output equation, the last set will be used. If there are no available coefficients, these cells may be omitted. If they are included, for events which are not observations, they can be filled with "." as a placeholder. In data fitting, if the coefficients are present in the data file, Pmetrics will use them. If missing, Pmetrics will look for coefficients defined in the model.

-   [***COVARIATES...***]{#data-cov} Any column named other than above is assumed to be a covariate, one column per covariate. The first row for any subject must have a value for all covariates, since the first row is always a dose. **Covariates are handled differently than in Legacy Pmetrics.** In Legacy, they were only considered at the times of dose events (`EVID = 1` or `EVID = 4`). In *Pmetrics 3.0* and later, they are considered at all times, including observation events (`EVID = 0`). Therefore, to enter a new covariate value at a time other than a dose or an observation, create a row at the appropriate time (and possibly date if using clock/calendar), making the row either a dose row with `DOSE = 0` or an observation row with `OUT = -99` (missing). By default, covariate values are linearly interpolated between entries. This is useful for covariates like weight, which may vary from measurement to measurement. You can change this behavior in the model definition to make them piece-wise constant, i.e. carried forward from the previous value until a new value causes an instant change. This could be used, for example, to indicate periods of off and on dialysis. See the chapter on [Models](models.qmd) for more details.

## Manipulation of CSV files

#### Read

As we have seen, `PM_data$new("path/filename")` will create a new `PM_data` object by reading an appropriate data file in the `path` directory or the current working directory if `path` is ommitted. Change the column separator in the file from the default "," (.csv files) to ";" (.ssv files) using `setPMoptions()`.

#### Save

`PM_data$save("path/filename")` will save the `PM_data$standard_field` to a file called "filename" in the `path` directory or the current working directory if `path` is ommitted. This can be useful if you have loaded or created a data file and then changed it in R.  Change the column separator in the file from the default "," (.csv files) to ";" (.ssv files) using `setPMoptions()`.

#### Standardize

`PM_data$new()` automatically standardizes the data into the full format. This includes conversion of calendar date / clock time into decimal elapsed time.

#### Validate

`PM_data$new()` automatically calls `r gh_help("PMcheck")` so the data are validated as the data object is created.

#### Data conversion

-   `PMwrk2csv()` This function will convert old-style, single-drug USC\*PACK .wrk formatted files into Pmetrics data .csv files.

-   `NM2PM()` Although the structure of Pmetrics data files is similar to NONMEM, there are some differences. This function attempts to automatically convert to Pmetrics format. It has been tested on several examples, but there are probably NONMEM files which will cause it to crash.

## More Examples

**Pmetrics** comes with an example dataset called `dataEx` already loaded. You can practice with it. It is the same data as in "src/ex.csv" used above to create the `dat` object.

:::{.callout-tip}
In the code below and often in this book, `r r_help("base", "file.path")` is a base R function used to create file paths that are compatible with your operating system.
:::

```{r}

# Save data somewhere
path <- "src2"
dir.create(path) # create a temporary folder
dataEx$save(file.path(path, "ex2.csv")) # save the data there
dataEx$save("src2/ex.csv") # alternative

# Load it again with one of these alternatives
exData <- PM_data$new(file.path(path, "ex2.csv"))
exData <- PM_data$new("src2/ex2.csv")

unlink("src2", recursive = TRUE) # clean up


```

You can look at the `src/ex.csv` file directly by opening from your hard drive it in a spreadsheet program like Excel, or a text editor.

`exData` is an R6 object, which means that contains both data and methods to process that data.

```{r}
#| eval: true
# See the contents of the object
names(exData)
```

The first element is an artifact of the R6 class. The remaining elements are documented in the help for `r gh_help("PM_data")`. You can of course inspect the data directly.

```{r}
#| eval: true
# Your original data (first few rows)
head(exData$data)
```

Typing the name of the `PM_data` object will display it nicely in the viewer.
```{r}
# See the standardized data nicely formatted in the viewer
exData
```

Below we show it truncated for brevity.
```{r}
#| echo: false
#| eval: true
knitr::kable(head(exData$standard_data))
```


Most `r pmetrics()` objects are R6 objects. As a reminder, you can use the `$` operator to access their data fields and methods. Many of them have a `$summary()` method that prints a summary of the object to the console and a `$plot()` method that creates a plot of the object. See `r gh_help("PM_data")` for more information on the `PM_data` class and its methods.

**Note:** We recognize that many users are familiar with the "S3 framework" in R, which uses functions like `summary(object)` and `plot(object)`. To comply with better programming standards, `r pmetrics()` uses the R6 framework. However, we have provided S3 methods for most functions, so you can use `summary(object)` and `plot(object)` if you prefer.

```{r}
#| label: example-npag-6
# S3 method to summarize data
summary(exData)
```

`PM_data` has a `plot()` method that creates a plot of the data. See `r gh_help("plot.PM_data")`  for more information.

```{r}
#| label: example-npag-7
#| eval: false
exData$plot()
```

```{r}
#| eval: true
#| echo: false
p <- exData$plot()
plotly::plotly_build(p) #needed to display in output
```


## Citations