Skip to contents

Overview

participantFlowDiagram helps a study team find issues in the progress of study participants through a research protocol by graphically presenting summary data of their progress. It is designed to show the complexity of that flow and summary state of the participants with as little code as possible.

create_mermaid_diagram() represents the data graphically:

create_table() represents the same data in a table:

Characteristic N = 4001
Interest
    Interested 262 (66%)
    Not Interested 138 (35%)
Eligibility Scheduling
    Willing to schedule 197 (75%)
    Unknown 30 (11%)
    Lost to follow-up 19 (7.3%)
    Unwilling to schedule 16 (6.1%)
Eligibility
    Eligible 171 (87%)
    Ineligible 13 (6.6%)
    Lost to followup 11 (5.6%)
    Eligibility unknown 2 (1.0%)
Consent Scheduling
    Scheduled 155 (91%)
    Unknown 11 (6.4%)
    Lost to followup 5 (2.9%)
Consent
    Consented 129 (83%)
    Unknown 24 (15%)
    Did not consent 2 (1.3%)
1 n (%)

Getting started

Using participantFlowDiagram requires a detailed dataset, participant_level_progress, that documents each step in the workflow. The dataset has multiple constraints:

  • Each column must be as a factor. Each level in the factor is a possible state at that step.
  • Each factor should define all possible levels at that step. This in includes an unknown level to be set when no other logic at that level is true.
  • Each factor value needs to be unique across all the factors because the factor values will be nodes names in a diagram. It is easy to manage this by prefixing each factor level with the column name.
  • The steps should appear in the order of the workflow as this order governs the order in the table. While not required, this will improve the readability of the summary table.
  • At each step, the count of non-null values should equal the count for the parent node.

This package provides an example dataset, consent_tracking_data.csv that describes a multi-step recruiting, eligibility and consent workflow. This dataset conforms to the above constraints.

consent_tracking_data <- readr::read_csv("consent_tracking_data.csv") |>
  dplyr::mutate(dplyr::across(dplyr::everything(), as.factor))

consent_tracking_data |> str()
#> tibble [400 × 5] (S3: tbl_df/tbl/data.frame)
#>  $ interest              : Factor w/ 2 levels "interest_no",..: 2 1 2 2 2 2 1 2 1 2 ...
#>  $ eligibility_scheduling: Factor w/ 4 levels "eligibility_scheduling_ltfu",..: 2 NA 4 4 4 1 NA 4 NA 4 ...
#>  $ eligibility           : Factor w/ 4 levels "eligibility_ltfu",..: NA NA 2 4 4 NA NA 1 NA 4 ...
#>  $ consent_scheduling    : Factor w/ 3 levels "consent_scheduling_ltfu",..: NA NA NA 3 3 NA NA NA NA 3 ...
#>  $ consent               : Factor w/ 3 levels "consent_no","consent_unknown",..: NA NA NA 3 3 NA NA NA NA 3 ...

The second required dataset is a small, two-column dataset that names the parent node and the child step at each step in the project.

steps
#> # A tibble: 5 × 2
#>   parent                         child                 
#>   <chr>                          <chr>                 
#> 1 Approached                     interest              
#> 2 interest_yes                   eligibility_scheduling
#> 3 eligibility_scheduling_willing eligibility           
#> 4 eligibility_yes                consent_scheduling    
#> 5 consent_scheduling_yes         consent

A third dataset, pretty_labels, can be generated with the package function get_pretty_labels_template()

pretty_labels_template <- get_pretty_labels_template(
  participant_level_progress = consent_tracking_data,
  parents = steps$parent,
  children = steps$child
)

pretty_labels_template
#> # A tibble: 21 × 4
#>    variable               row_type plain_label                      pretty_label
#>    <chr>                  <chr>    <chr>                            <chr>       
#>  1 interest               label    interest                         interest    
#>  2 interest               level    interest_no                      interest_no 
#>  3 interest               level    interest_yes                     interest_yes
#>  4 eligibility_scheduling label    eligibility_scheduling           eligibility…
#>  5 eligibility_scheduling level    eligibility_scheduling_ltfu      eligibility…
#>  6 eligibility_scheduling level    eligibility_scheduling_unknown   eligibility…
#>  7 eligibility_scheduling level    eligibility_scheduling_unwilling eligibility…
#>  8 eligibility_scheduling level    eligibility_scheduling_willing   eligibility…
#>  9 eligibility            label    eligibility                      eligibility 
#> 10 eligibility            level    eligibility_ltfu                 eligibility…
#> # ℹ 11 more rows

The output of get_pretty_labels_template() can be used as-is in the inputs to create_mermaid_diagram() and create_table(). The labels will be plain labels taken from the factor levels and column names. Putting all that together, the code looks like this:


consent_tracking_data <- readr::read_csv("consent_tracking_data.csv") |>
  dplyr::mutate(dplyr::across(dplyr::everything(), as.factor))

# Name the parent node and family name of the children at each step
steps <- dplyr::tribble(
  ~parent,                           ~child,
  "Approached",                      "interest",
  "interest_yes",                    "eligibility_scheduling",
  "eligibility_scheduling_willing",  "eligibility",
  "eligibility_yes",                 "consent_scheduling",
  "consent_scheduling_yes",          "consent"
)

pretty_labels <- get_pretty_labels_template(
  participant_level_progress = consent_tracking_data,
  parents = steps$parent,
  children = steps$child
)

diagram <- create_mermaid_diagram(
  participant_level_progress = consent_tracking_data,
  parents = steps$parent,
  children = steps$child,
  pretty_labels = pretty_labels)

It generates this diagram with plain labels.

To get control of the labels, save the output of get_pretty_labels_template() and edit the pretty_label column. You can write tribble code using timesaveR::to_tribble() to write tribble code and paste that inline in your code. get_pretty_labels_template() uses the same value as plain_label in pretty_label. Edit the pretty_label column as needed.

Use \n to get new line characters to wrap the text of the pretty labels in the diagram. These newline codes will be ignored in the table.

pretty_labels_template <- get_pretty_labels_template(
  participant_level_progress = consent_tracking_data,
  parents = steps$parent,
  children = steps$child
)

# Uncomment this code and run it to turn the pretty_labels_template
# into dplyr::tribble() code. Paste the tribble code below assigning
# it to the object "pretty_labels", re-comment these lines,
# then edit the text in the new_label column to make the pretty
# labels you'd like to see in the gtsummary table and the mermaid diagram.
#
# devtools::install_github("LukasWallrich/timesaveR")
# pretty_labels_template |>
#   timesaveR::to_tribble(show = T)

pretty_labels <- tibble::tribble(
  ~variable,                 ~row_type, ~plain_label,                        ~pretty_label,                       
   "interest",                "label",   "interest",                          "Interest",                         
   "interest",                "level",   "interest_no",                       "Not Interested",                      
   "interest",                "level",   "interest_yes",                      "Interested",                     
   "eligibility_scheduling",  "label",   "eligibility_scheduling",            "Eligibility Scheduling",           
   "eligibility_scheduling",  "level",   "eligibility_scheduling_ltfu",       "Lost to follow-up",      
   "eligibility_scheduling",  "level",   "eligibility_scheduling_unknown",    "Unknown",   
   "eligibility_scheduling",  "level",   "eligibility_scheduling_unwilling",  "Unwilling to\n schedule", 
   "eligibility_scheduling",  "level",   "eligibility_scheduling_willing",    "Willing to\n schedule",   
   "eligibility",             "label",   "eligibility",                       "Eligibility",                      
   "eligibility",             "level",   "eligibility_ltfu",                  "Lost to followup",                 
   "eligibility",             "level",   "eligibility_no",                    "Ineligible",                   
   "eligibility",             "level",   "eligibility_unknown",               "Eligibility \nunknown",              
   "eligibility",             "level",   "eligibility_yes",                   "Eligible",                  
   "consent_scheduling",      "label",   "consent_scheduling",                "Consent Scheduling",               
   "consent_scheduling",      "level",   "consent_scheduling_ltfu",           "Lost to \nfollowup",          
   "consent_scheduling",      "level",   "consent_scheduling_unknown",        "Unknown",       
   "consent_scheduling",      "level",   "consent_scheduling_yes",            "Scheduled",           
   "consent",                 "label",   "consent",                           "Consent",                          
   "consent",                 "level",   "consent_no",                        "Did not \nconsent",                       
   "consent",                 "level",   "consent_unknown",                   "Unknown",                  
   "consent",                 "level",   "consent_yes",                       "Consented"
)

These labels will produce the diagram and table on Overview.