getting_started.Rmd
REDCap is an electronic data capture software that is widely used in the academic research community. It provides tools building and managing online surveys and databases. These tools allow a designer to create complex data collection instruments with rich data typing, validation rules, repeating data structures, and time-series data collection and present these in a data-collection project. Data collected using this REDCap Project must conform the rules in the project definition. REDCap Filler provides a testing and development service for REDCap users. It generates and loads synthetic test data into a REDCap project, using the project’s design to guide test data generation. This article provides a basic example using redcapfiller to populate a REDCap project.
Populating a project with 5 records of data is as simple as
library(redcapfiller)
generated_values <- get_project_values(redcap_uri, token)
purrr::walk(
generated_values,
~ REDCapR::redcap_write(
redcap_uri = redcap_uri,
token = token,
ds_to_write = .x
)
)
The goal of redcapfiller is to generate a test dataset as complex as
the project design using nothing more than the project design. The only
required inputs are redap_uri
, the uri to a REDCap host’s
API interface, and token
, an API token to a project on that
host.
library(redcapfiller)
generated_values <- get_project_values(redcap_uri, token)
#> 1 rows were read from REDCap in 0.1 seconds. The http status code was 200.
#> The data dictionary describing 20 fields was read from REDCap in 0.1 seconds. The http status code was 200.
#> 26 variable metadata records were read from REDCap in 0.2 seconds. The http status code was 200.
#> The data dictionary describing 20 fields was read from REDCap in 0.1 seconds. The http status code was 200.
#> 1 instrument metadata records were read from REDCap in 0.1 seconds. The http status code was 200.
#> 1 rows were read from REDCap in 0.1 seconds. The http status code was 200.
#> 0 data access groups were read from REDCap in 0.1 seconds. The http status code was 200.
#> 225 records and 1 columns were read from REDCap in 0.2 seconds. The http status code was 200.
#> Starting to read 225 records at 2025-06-11 22:30:07.9766.
#> Reading batch 1 of 3, with subjects 1 through 100 (ie, 100 unique subject records).
#> 100 records and 26 columns were read from REDCap in 0.2 seconds. The http status code was 200.
#> Reading batch 2 of 3, with subjects 101 through 200 (ie, 100 unique subject records).
#> 100 records and 26 columns were read from REDCap in 0.2 seconds. The http status code was 200.
#> Reading batch 3 of 3, with subjects 201 through 225 (ie, 25 unique subject records).
#> 25 records and 26 columns were read from REDCap in 0.1 seconds. The http status code was 200.
The object returned by get_project_values()
is always a
list.
typeof(generated_values)
#> [1] "list"
For classic projects, like this example, the list will always have
length of 1. This will make more sense when we run
get_project_values()
against a longitudinal project.
length(generated_values)
#> [1] 1
Each list element is a tibble with a filled-rectangle of data.
generated_values[[1]] |>
rmarkdown::paged_table()
Don’t be distracted by the character data types on the numeric columns. They still conform to the project design.
Writing the synthetic data to REDCap requires you walk the list to
write each list element. This is easy with
purrr::walk()
purrr::walk(
generated_values,
~ REDCapR::redcap_write(
redcap_uri = redcap_uri,
token = token,
ds_to_write = .x
)
)
#> Starting to update 5 records to be written at 2025-06-11 22:30:10.581212.
#> Writing batch 1 of 1, with indices 1 through 5.
#> 5 records were written to REDCap in 0.6 seconds.
redcapfiller
can handle longitudinal projects with any
number of forms, events, and form-event relationships. The code is
exactly the same as before, but the generated data is more complex. In
the our longitudinal example we are adding five records to a project
with this form-event matrix:
get_project_values()
detects the longitudinal features
of the project and creates a list element for each of the nine
longitudinal events. Each element has data for the forms and fields on
that event.
library(redcapfiller)
generated_values <- get_project_values(redcap_uri, token)
length(generated_values)
#> [1] 9
The write operation uses the same code, but this time there are nine write events of five records each.
purrr::walk(
generated_values,
~ REDCapR::redcap_write(
redcap_uri = redcap_uri,
token = token,
ds_to_write = .x
)
)
#> Starting to update 5 records to be written at 2025-06-11 22:30:17.368387.
#> Writing batch 1 of 1, with indices 1 through 5.
#> 5 records were written to REDCap in 2.0 seconds.
#> Starting to update 5 records to be written at 2025-06-11 22:30:19.84957.
#> Writing batch 1 of 1, with indices 1 through 5.
#> 5 records were written to REDCap in 1.9 seconds.
#> Starting to update 5 records to be written at 2025-06-11 22:30:22.220468.
#> Writing batch 1 of 1, with indices 1 through 5.
#> 5 records were written to REDCap in 2.0 seconds.
#> Starting to update 5 records to be written at 2025-06-11 22:30:24.703048.
#> Writing batch 1 of 1, with indices 1 through 5.
#> 5 records were written to REDCap in 2.0 seconds.
#> Starting to update 5 records to be written at 2025-06-11 22:30:27.21596.
#> Writing batch 1 of 1, with indices 1 through 5.
#> 5 records were written to REDCap in 1.7 seconds.
#> Starting to update 5 records to be written at 2025-06-11 22:30:29.427052.
#> Writing batch 1 of 1, with indices 1 through 5.
#> 5 records were written to REDCap in 1.9 seconds.
#> Starting to update 5 records to be written at 2025-06-11 22:30:31.859502.
#> Writing batch 1 of 1, with indices 1 through 5.
#> 5 records were written to REDCap in 1.5 seconds.
#> Starting to update 5 records to be written at 2025-06-11 22:30:33.824294.
#> Writing batch 1 of 1, with indices 1 through 5.
#> 5 records were written to REDCap in 0.5 seconds.
#> Starting to update 5 records to be written at 2025-06-11 22:30:34.796597.
#> Writing batch 1 of 1, with indices 1 through 5.
#> 5 records were written to REDCap in 0.7 seconds.
You can specify the number of records you want to generate.
library(redcapfiller)
generated_values <- get_project_values(
redcap_uri,
token,
number_of_records_to_populate = 10
)
generated_values[[1]] |>
nrow()
#> [1] 10
In a longitudinal project, you can specify a vector of events you want to fill.
library(redcapfiller)
generated_values <- get_project_values(
redcap_uri,
token,
events = c("screening_arm_1", "week_2_arm_1")
)
length(generated_values)
#> [1] 2