Overview
participantFlowDiagram
helps a study team find issues in
the progress of study participants through a research protocol by
graphically presenting summary data of their progress. It is designed to
show the complexity of that flow and summary state of the participants
with as little code as possible.
create_mermaid_diagram()
represents the data
graphically:
create_table()
represents the same data in a table:
Characteristic | N = 4001 |
---|---|
Interest | |
Interested | 262 (66%) |
Not Interested | 138 (35%) |
Eligibility Scheduling | |
Willing to schedule | 197 (75%) |
Unknown | 30 (11%) |
Lost to follow-up | 19 (7.3%) |
Unwilling to schedule | 16 (6.1%) |
Eligibility | |
Eligible | 171 (87%) |
Ineligible | 13 (6.6%) |
Lost to followup | 11 (5.6%) |
Eligibility unknown | 2 (1.0%) |
Consent Scheduling | |
Scheduled | 155 (91%) |
Unknown | 11 (6.4%) |
Lost to followup | 5 (2.9%) |
Consent | |
Consented | 129 (83%) |
Unknown | 24 (15%) |
Did not consent | 2 (1.3%) |
1 n (%) |
Getting started
Using participantFlowDiagram
requires a detailed
dataset, participant_level_progress
, that documents each
step in the workflow. The dataset has multiple constraints:
- Each column must be as a factor. Each level in the factor is a possible state at that step.
- Each factor should define all possible levels at that step. This in includes an unknown level to be set when no other logic at that level is true.
- Each factor value needs to be unique across all the factors because the factor values will be nodes names in a diagram. It is easy to manage this by prefixing each factor level with the column name.
- The steps should appear in the order of the workflow as this order governs the order in the table. While not required, this will improve the readability of the summary table.
- At each step, the count of non-null values should equal the count for the parent node.
This package provides an example dataset, consent_tracking_data.csv that describes a multi-step recruiting, eligibility and consent workflow. This dataset conforms to the above constraints.
consent_tracking_data <- readr::read_csv("consent_tracking_data.csv") |>
dplyr::mutate(dplyr::across(dplyr::everything(), as.factor))
consent_tracking_data |> str()
#> tibble [400 × 5] (S3: tbl_df/tbl/data.frame)
#> $ interest : Factor w/ 2 levels "interest_no",..: 2 1 2 2 2 2 1 2 1 2 ...
#> $ eligibility_scheduling: Factor w/ 4 levels "eligibility_scheduling_ltfu",..: 2 NA 4 4 4 1 NA 4 NA 4 ...
#> $ eligibility : Factor w/ 4 levels "eligibility_ltfu",..: NA NA 2 4 4 NA NA 1 NA 4 ...
#> $ consent_scheduling : Factor w/ 3 levels "consent_scheduling_ltfu",..: NA NA NA 3 3 NA NA NA NA 3 ...
#> $ consent : Factor w/ 3 levels "consent_no","consent_unknown",..: NA NA NA 3 3 NA NA NA NA 3 ...
The second required dataset is a small, two-column dataset that names the parent node and the child step at each step in the project.
steps
#> # A tibble: 5 × 2
#> parent child
#> <chr> <chr>
#> 1 Approached interest
#> 2 interest_yes eligibility_scheduling
#> 3 eligibility_scheduling_willing eligibility
#> 4 eligibility_yes consent_scheduling
#> 5 consent_scheduling_yes consent
A third dataset, pretty_labels
, can be generated with
the package function get_pretty_labels_template()
pretty_labels_template <- get_pretty_labels_template(
participant_level_progress = consent_tracking_data,
parents = steps$parent,
children = steps$child
)
pretty_labels_template
#> # A tibble: 21 × 4
#> variable row_type plain_label pretty_label
#> <chr> <chr> <chr> <chr>
#> 1 interest label interest interest
#> 2 interest level interest_no interest_no
#> 3 interest level interest_yes interest_yes
#> 4 eligibility_scheduling label eligibility_scheduling eligibility…
#> 5 eligibility_scheduling level eligibility_scheduling_ltfu eligibility…
#> 6 eligibility_scheduling level eligibility_scheduling_unknown eligibility…
#> 7 eligibility_scheduling level eligibility_scheduling_unwilling eligibility…
#> 8 eligibility_scheduling level eligibility_scheduling_willing eligibility…
#> 9 eligibility label eligibility eligibility
#> 10 eligibility level eligibility_ltfu eligibility…
#> # ℹ 11 more rows
The output of get_pretty_labels_template()
can be used
as-is in the inputs to create_mermaid_diagram()
and
create_table()
. The labels will be plain labels
taken from the factor levels and column names. Putting all that
together, the code looks like this:
consent_tracking_data <- readr::read_csv("consent_tracking_data.csv") |>
dplyr::mutate(dplyr::across(dplyr::everything(), as.factor))
# Name the parent node and family name of the children at each step
steps <- dplyr::tribble(
~parent, ~child,
"Approached", "interest",
"interest_yes", "eligibility_scheduling",
"eligibility_scheduling_willing", "eligibility",
"eligibility_yes", "consent_scheduling",
"consent_scheduling_yes", "consent"
)
pretty_labels <- get_pretty_labels_template(
participant_level_progress = consent_tracking_data,
parents = steps$parent,
children = steps$child
)
diagram <- create_mermaid_diagram(
participant_level_progress = consent_tracking_data,
parents = steps$parent,
children = steps$child,
pretty_labels = pretty_labels)
It generates this diagram with plain labels.
To get control of the labels, save the output of
get_pretty_labels_template()
and edit the
pretty_label
column. You can write tribble code using
timesaveR::to_tribble()
to write tribble code and paste
that inline in your code. get_pretty_labels_template()
uses
the same value as plain_label
in pretty_label
.
Edit the pretty_label
column as needed.
Use \n
to get new line characters to wrap the text of
the pretty labels in the diagram. These newline codes will be ignored in
the table.
pretty_labels_template <- get_pretty_labels_template(
participant_level_progress = consent_tracking_data,
parents = steps$parent,
children = steps$child
)
# Uncomment this code and run it to turn the pretty_labels_template
# into dplyr::tribble() code. Paste the tribble code below assigning
# it to the object "pretty_labels", re-comment these lines,
# then edit the text in the new_label column to make the pretty
# labels you'd like to see in the gtsummary table and the mermaid diagram.
#
# devtools::install_github("LukasWallrich/timesaveR")
# pretty_labels_template |>
# timesaveR::to_tribble(show = T)
pretty_labels <- tibble::tribble(
~variable, ~row_type, ~plain_label, ~pretty_label,
"interest", "label", "interest", "Interest",
"interest", "level", "interest_no", "Not Interested",
"interest", "level", "interest_yes", "Interested",
"eligibility_scheduling", "label", "eligibility_scheduling", "Eligibility Scheduling",
"eligibility_scheduling", "level", "eligibility_scheduling_ltfu", "Lost to follow-up",
"eligibility_scheduling", "level", "eligibility_scheduling_unknown", "Unknown",
"eligibility_scheduling", "level", "eligibility_scheduling_unwilling", "Unwilling to\n schedule",
"eligibility_scheduling", "level", "eligibility_scheduling_willing", "Willing to\n schedule",
"eligibility", "label", "eligibility", "Eligibility",
"eligibility", "level", "eligibility_ltfu", "Lost to followup",
"eligibility", "level", "eligibility_no", "Ineligible",
"eligibility", "level", "eligibility_unknown", "Eligibility \nunknown",
"eligibility", "level", "eligibility_yes", "Eligible",
"consent_scheduling", "label", "consent_scheduling", "Consent Scheduling",
"consent_scheduling", "level", "consent_scheduling_ltfu", "Lost to \nfollowup",
"consent_scheduling", "level", "consent_scheduling_unknown", "Unknown",
"consent_scheduling", "level", "consent_scheduling_yes", "Scheduled",
"consent", "label", "consent", "Consent",
"consent", "level", "consent_no", "Did not \nconsent",
"consent", "level", "consent_unknown", "Unknown",
"consent", "level", "consent_yes", "Consented"
)
These labels will produce the diagram and table on Overview.