Getting Started

library(participantFlowDiagram)

Overview

participantFlowDiagram helps a study team find issues in the progress of study participants through a research protocol by graphically presenting summary data of their progress. It is designed to show the complexity of that flow and summary state of the participants with as little code as possible.

create_mermaid_diagram() represents the data graphically:

create_table() represents the same data in a table:

Characteristic	N = 400¹
Interest
Interested	262 (66%)
Not Interested	138 (35%)
Eligibility Scheduling
Willing to schedule	197 (75%)
Unknown	30 (11%)
Lost to follow-up	19 (7.3%)
Unwilling to schedule	16 (6.1%)
Eligibility
Eligible	171 (87%)
Ineligible	13 (6.6%)
Lost to followup	11 (5.6%)
Eligibility unknown	2 (1.0%)
Consent Scheduling
Scheduled	155 (91%)
Unknown	11 (6.4%)
Lost to followup	5 (2.9%)
Consent
Consented	129 (83%)
Unknown	24 (15%)
Did not consent	2 (1.3%)
¹ n (%)

Using participantFlowDiagram requires a detailed dataset, participant_level_progress, that documents each step in the workflow. The dataset has multiple constraints:

Each column must be as a factor. Each level in the factor is a possible state at that step.
Each factor should define all possible levels at that step. This in includes an unknown level to be set when no other logic at that level is true.
Each factor value needs to be unique across all the factors because the factor values will be nodes names in a diagram. It is easy to manage this by prefixing each factor level with the column name.
The steps should appear in the order of the workflow as this order governs the order in the table. While not required, this will improve the readability of the summary table.
At each step, the count of non-null values should equal the count for the parent node.

This package provides an example dataset, consent_tracking_data.csv that describes a multi-step recruiting, eligibility and consent workflow. This dataset conforms to the above constraints.

consent_tracking_data <- readr::read_csv("consent_tracking_data.csv") |>
  dplyr::mutate(dplyr::across(dplyr::everything(), as.factor))

consent_tracking_data |> str()
#> tibble [400 × 5] (S3: tbl_df/tbl/data.frame)
#>  $ interest              : Factor w/ 2 levels "interest_no",..: 2 1 2 2 2 2 1 2 1 2 ...
#>  $ eligibility_scheduling: Factor w/ 4 levels "eligibility_scheduling_ltfu",..: 2 NA 4 4 4 1 NA 4 NA 4 ...
#>  $ eligibility           : Factor w/ 4 levels "eligibility_ltfu",..: NA NA 2 4 4 NA NA 1 NA 4 ...
#>  $ consent_scheduling    : Factor w/ 3 levels "consent_scheduling_ltfu",..: NA NA NA 3 3 NA NA NA NA 3 ...
#>  $ consent               : Factor w/ 3 levels "consent_no","consent_unknown",..: NA NA NA 3 3 NA NA NA NA 3 ...

The second required dataset is a small, two-column dataset that names the parent node and the child step at each step in the project.

steps
#> # A tibble: 5 × 2
#>   parent                         child                 
#>   <chr>                          <chr>                 
#> 1 Approached                     interest              
#> 2 interest_yes                   eligibility_scheduling
#> 3 eligibility_scheduling_willing eligibility           
#> 4 eligibility_yes                consent_scheduling    
#> 5 consent_scheduling_yes         consent

A third dataset, pretty_labels, can be generated with the package function get_pretty_labels_template()

pretty_labels_template <- get_pretty_labels_template(
  participant_level_progress = consent_tracking_data,
  parents = steps$parent,
  children = steps$child
)

pretty_labels_template
#> # A tibble: 21 × 4
#>    variable               row_type plain_label                      pretty_label
#>    <chr>                  <chr>    <chr>                            <chr>       
#>  1 interest               label    interest                         interest    
#>  2 interest               level    interest_no                      interest_no 
#>  3 interest               level    interest_yes                     interest_yes
#>  4 eligibility_scheduling label    eligibility_scheduling           eligibility…
#>  5 eligibility_scheduling level    eligibility_scheduling_ltfu      eligibility…
#>  6 eligibility_scheduling level    eligibility_scheduling_unknown   eligibility…
#>  7 eligibility_scheduling level    eligibility_scheduling_unwilling eligibility…
#>  8 eligibility_scheduling level    eligibility_scheduling_willing   eligibility…
#>  9 eligibility            label    eligibility                      eligibility 
#> 10 eligibility            level    eligibility_ltfu                 eligibility…
#> # ℹ 11 more rows

The output of get_pretty_labels_template() can be used as-is in the inputs to create_mermaid_diagram() and create_table(). The labels will be plain labels taken from the factor levels and column names. Putting all that together, the code looks like this:


consent_tracking_data <- readr::read_csv("consent_tracking_data.csv") |>
  dplyr::mutate(dplyr::across(dplyr::everything(), as.factor))

# Name the parent node and family name of the children at each step
steps <- dplyr::tribble(
  ~parent,                           ~child,
  "Approached",                      "interest",
  "interest_yes",                    "eligibility_scheduling",
  "eligibility_scheduling_willing",  "eligibility",
  "eligibility_yes",                 "consent_scheduling",
  "consent_scheduling_yes",          "consent"
)

pretty_labels <- get_pretty_labels_template(
  participant_level_progress = consent_tracking_data,
  parents = steps$parent,
  children = steps$child
)

diagram <- create_mermaid_diagram(
  participant_level_progress = consent_tracking_data,
  parents = steps$parent,
  children = steps$child,
  pretty_labels = pretty_labels)

It generates this diagram with plain labels.

To get control of the labels, save the output of get_pretty_labels_template() and edit the pretty_label column. You can write tribble code using timesaveR::to_tribble() to write tribble code and paste that inline in your code. get_pretty_labels_template() uses the same value as plain_label in pretty_label. Edit the pretty_label column as needed.

Use \n to get new line characters to wrap the text of the pretty labels in the diagram. These newline codes will be ignored in the table.

pretty_labels_template <- get_pretty_labels_template(
  participant_level_progress = consent_tracking_data,
  parents = steps$parent,
  children = steps$child
)

# Uncomment this code and run it to turn the pretty_labels_template
# into dplyr::tribble() code. Paste the tribble code below assigning
# it to the object "pretty_labels", re-comment these lines,
# then edit the text in the new_label column to make the pretty
# labels you'd like to see in the gtsummary table and the mermaid diagram.
#
# devtools::install_github("LukasWallrich/timesaveR")
# pretty_labels_template |>
#   timesaveR::to_tribble(show = T)

pretty_labels <- tibble::tribble(
  ~variable,                 ~row_type, ~plain_label,                        ~pretty_label,                       
   "interest",                "label",   "interest",                          "Interest",                         
   "interest",                "level",   "interest_no",                       "Not Interested",                      
   "interest",                "level",   "interest_yes",                      "Interested",                     
   "eligibility_scheduling",  "label",   "eligibility_scheduling",            "Eligibility Scheduling",           
   "eligibility_scheduling",  "level",   "eligibility_scheduling_ltfu",       "Lost to follow-up",      
   "eligibility_scheduling",  "level",   "eligibility_scheduling_unknown",    "Unknown",   
   "eligibility_scheduling",  "level",   "eligibility_scheduling_unwilling",  "Unwilling to\n schedule", 
   "eligibility_scheduling",  "level",   "eligibility_scheduling_willing",    "Willing to\n schedule",   
   "eligibility",             "label",   "eligibility",                       "Eligibility",                      
   "eligibility",             "level",   "eligibility_ltfu",                  "Lost to followup",                 
   "eligibility",             "level",   "eligibility_no",                    "Ineligible",                   
   "eligibility",             "level",   "eligibility_unknown",               "Eligibility \nunknown",              
   "eligibility",             "level",   "eligibility_yes",                   "Eligible",                  
   "consent_scheduling",      "label",   "consent_scheduling",                "Consent Scheduling",               
   "consent_scheduling",      "level",   "consent_scheduling_ltfu",           "Lost to \nfollowup",          
   "consent_scheduling",      "level",   "consent_scheduling_unknown",        "Unknown",       
   "consent_scheduling",      "level",   "consent_scheduling_yes",            "Scheduled",           
   "consent",                 "label",   "consent",                           "Consent",                          
   "consent",                 "level",   "consent_no",                        "Did not \nconsent",                       
   "consent",                 "level",   "consent_unknown",                   "Unknown",                  
   "consent",                 "level",   "consent_yes",                       "Consented"
)

These labels will produce the diagram and table on Overview.