| Title: | Robustness and Drift Auditing for Longitudinal Decision Systems |
|---|---|
| Description: | Provides tools for constructing longitudinal decision paths, quantifying temporal drift, tracking subgroup disparity trajectories, and stress-testing longitudinal conclusions under hidden bias. Implements three signature metrics: the Drift Intensity Index (DII), which measures structural instability in transition dynamics using the Frobenius norm of consecutive transition matrix differences; the Bias Amplification Index (BAI), which quantifies whether group disparities widen or converge over time; and the Temporal Fragility Index (TFI), which estimates the minimum hidden-bias perturbation required to nullify a longitudinal trend conclusion. An interactive 'shiny' application supports exploratory analysis, visualization, and reproducible reporting. Methods are motivated by applications in educational and social science research, including the Early Childhood Longitudinal Study (ECLS). The DII is based on the Frobenius norm as described in Golub and Van Loan (2013, ISBN:9781421407944). The TFI extends the hidden-bias sensitivity framework of Rosenbaum (2002, ISBN:9781441912633). The BAI draws on disparity-trajectory methods discussed in Duncan and Murnane (2011, ISBN:9780871542731). |
| Authors: | Subir Hait [aut, cre] (ORCID: <https://orcid.org/0009-0004-9871-9677>) |
| Maintainer: | Subir Hait <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.1 |
| Built: | 2026-05-23 07:40:22 UTC |
| Source: | https://github.com/causalfragility-lab/robustflow |
For each individual, concatenates their sequence of decisions (or outcomes) over time into a single path string. Returns individual-level paths, aggregate frequency counts, the pooled transition matrix, and path entropy.
build_paths(data, id, time, decision, sep = "->")build_paths(data, id, time, decision, sep = "->")
data |
A data frame in long format, pre-sorted by |
id |
Character scalar. Name of the individual identifier variable. |
time |
Character scalar. Name of the time variable. |
decision |
Character scalar. Name of the decision or outcome variable. |
sep |
Character scalar. Separator inserted between consecutive decision
states in the path string. Default |
A named list with the following elements:
individual_pathsData frame with one row per individual and
columns id and path.
path_countsData frame of unique paths with columns path,
n (frequency), and pct (percentage), sorted in descending order
of frequency.
transition_matrixInteger matrix of pooled transition counts (rows = from-state, columns = to-state).
path_entropyNumeric scalar. Shannon entropy (bits) of the path frequency distribution. Higher values indicate greater diversity of individual trajectories.
df <- data.frame( id = rep(1:4, each = 3), time = rep(1:3, times = 4), dec = c(0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L) ) result <- build_paths(df, id = "id", time = "time", decision = "dec") head(result$path_counts) result$transition_matrix result$path_entropydf <- data.frame( id = rep(1:4, each = 3), time = rep(1:3, times = 4), dec = c(0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L) ) result <- build_paths(df, id = "id", time = "time", decision = "dec") head(result$path_counts) result$transition_matrix result$path_entropy
The BAI measures whether the disparity gap between two groups widens or narrows from the first to the last observed time point. It is defined as:
compute_bai(gap_series, standardize = FALSE, threshold = 0.05)compute_bai(gap_series, standardize = FALSE, threshold = 0.05)
gap_series |
Numeric vector of gap values, one per time point, ordered
chronologically. |
standardize |
Logical. If |
threshold |
Numeric scalar. Absolute BAI threshold used to classify
the direction as amplification or convergence. Values with
|
A positive BAI indicates amplification (widening gap); a negative BAI indicates convergence (narrowing gap); values near zero indicate stability.
An optional standardized version divides by the standard deviation of the gap series:
A named list with the following elements:
baiNumeric scalar. The (optionally standardized) BAI.
gap_startGap at the first time point.
gap_endGap at the last time point.
directionCharacter. "amplification", "convergence", or
"stable".
# Widening gap over 5 waves gaps <- c(0.10, 0.12, 0.15, 0.18, 0.22) compute_bai(gaps) # Narrowing gap compute_bai(c(0.20, 0.15, 0.10, 0.05)) # Standardized compute_bai(gaps, standardize = TRUE)# Widening gap over 5 waves gaps <- c(0.10, 0.12, 0.15, 0.18, 0.22) compute_bai(gaps) # Narrowing gap compute_bai(c(0.20, 0.15, 0.10, 0.05)) # Standardized compute_bai(gaps, standardize = TRUE)
The DII quantifies structural instability in the decision transition system
between consecutive time periods. For each pair of adjacent time points
, a period-specific transition matrix is estimated
from observed consecutive-state pairs, and the DII is defined as the
Frobenius norm of their difference:
compute_drift(data, id, time, decision, normalize = TRUE)compute_drift(data, id, time, decision, normalize = TRUE)
data |
A data frame in long format. |
id |
Character scalar. Name of the individual identifier variable. |
time |
Character scalar. Name of the time variable. |
decision |
Character scalar. Name of the decision or outcome variable. |
normalize |
Logical. If |
When normalize = TRUE (default), each matrix is row-normalized before
computing the norm, so DII is scale-free and comparable across datasets.
The period-specific transition matrix is constructed from
transitions observed between time and time only
(not cumulatively). Individuals present at both and
contribute one transition pair. Individuals missing at either wave are
excluded from that period's matrix.
A named list with the following elements:
summaryData frame with columns time and DII. The first
time point always has DII = NA (no preceding period).
matricesNamed list of period-specific transition matrices (one per time interval).
mean_diiNumeric scalar. Mean DII across all non-missing periods.
max_dii_periodThe time value at which DII is largest.
Hait, S. (2025). RobustFlow: Robustness and drift auditing for longitudinal decision systems. R package version 0.1.0.
set.seed(42) df <- data.frame( id = rep(seq_len(50), each = 4), time = rep(seq_len(4), times = 50), dec = sample(0:1, 200, replace = TRUE) ) result <- compute_drift(df, id = "id", time = "time", decision = "dec") result$summary result$mean_diiset.seed(42) df <- data.frame( id = rep(seq_len(50), each = 4), time = rep(seq_len(4), times = 50), dec = sample(0:1, 200, replace = TRUE) ) result <- compute_drift(df, id = "id", time = "time", decision = "dec") result$summary result$mean_dii
Aggregates the focal-event rate for each group at each time point and computes the pairwise gap between the first two levels of the group variable (sorted alphabetically).
compute_group_gaps(data, time, decision, group, focal_value = 1)compute_group_gaps(data, time, decision, group, focal_value = 1)
data |
A data frame in long format. |
time |
Character scalar. Name of the time variable. |
decision |
Character scalar. Name of the decision or outcome variable. |
group |
Character scalar. Name of the grouping variable. |
focal_value |
Numeric or character scalar. The decision value treated
as the "event" when computing group rates (e.g., |
A named list with the following elements:
long_formatData frame with columns time, group, and rate
(proportion of the focal event within each group-time cell).
gap_dfData frame with columns time and gap
(Group 1 rate minus Group 2 rate, where groups are the first two
alphabetically sorted levels). gap is NA when fewer than two
group levels are present.
gapNumeric vector of gap values ordered by time (convenience
accessor for compute_bai()).
group_levelsCharacter vector of all group levels found.
set.seed(1) df <- data.frame( id = rep(seq_len(60), each = 3), time = rep(seq_len(3), times = 60), dec = sample(0:1, 180, replace = TRUE), grp = rep(c("Low", "High"), each = 90) ) gaps <- compute_group_gaps( data = df, time = "time", decision = "dec", group = "grp", focal_value = 1 ) gaps$gap_df gaps$group_levelsset.seed(1) df <- data.frame( id = rep(seq_len(60), each = 3), time = rep(seq_len(3), times = 60), dec = sample(0:1, 180, replace = TRUE), grp = rep(c("Low", "High"), each = 90) ) gaps <- compute_group_gaps( data = df, time = "time", decision = "dec", group = "grp", focal_value = 1 ) gaps$gap_df gaps$group_levels
The TFI estimates the minimum amount of hidden bias (modeled as a scalar
attenuation parameter ) required to nullify a longitudinal trend
conclusion. The observed trend is summarized as the OLS slope of
effect_series on time index. Under perturbation , the adjusted
slope is:
compute_tfi_simple(effect_series, perturb_seq = seq(0, 2, by = 0.01))compute_tfi_simple(effect_series, perturb_seq = seq(0, 2, by = 0.01))
effect_series |
Numeric vector of observed effects over time
(e.g., DII values, gap values). The trend is estimated as the slope of
a simple OLS regression of |
perturb_seq |
Numeric vector of perturbation values to evaluate.
Must be non-negative. Defaults to |
The TFI is the smallest such that
(for positive slopes) or (for negative slopes).
If no such exists within perturb_seq, TFI is returned as Inf,
indicating a highly robust conclusion.
This is an intentionally accessible, first-generation operationalization of temporal robustness. Future versions will support perturbation models based on E-values, ITCV (impact threshold for a confounding variable), and simulation-based tipping-point approaches.
A named list with the following elements:
tfiNumeric scalar. The minimum perturbation that nullifies
the trend, or Inf if none in perturb_seq does so.
observed_slopeNumeric scalar. OLS slope of effect_series
on the time index.
sensitivity_curveData frame with columns perturbation and
adjusted_effect (the slope under each perturbation value).
summary_tableOne-row data frame with columns Metric and
Value, summarizing the observed slope, TFI, and interpretation.
Hait, S. (2025). RobustFlow: Robustness and drift auditing for longitudinal decision systems. R package version 0.1.0.
# Upward drift trend - moderately robust dii_vals <- c(0.05, 0.10, 0.14, 0.19, 0.25) result <- compute_tfi_simple(dii_vals) result$tfi result$summary_table # Flat trend - TFI is 0 compute_tfi_simple(c(0.1, 0.1, 0.1, 0.1))$tfi# Upward drift trend - moderately robust dii_vals <- c(0.05, 0.10, 0.14, 0.19, 0.25) result <- compute_tfi_simple(dii_vals) result$tfi result$summary_table # Flat trend - TFI is 0 compute_tfi_simple(c(0.1, 0.1, 0.1, 0.1))$tfi
Pools all consecutive decision pairs (from time to )
across all individuals and returns a matrix of transition counts.
compute_transition_matrix_all(data, id, time, decision)compute_transition_matrix_all(data, id, time, decision)
data |
A data frame in long format. |
id |
Character scalar. Individual identifier variable name. |
time |
Character scalar. Time variable name. |
decision |
Character scalar. Decision variable name. |
An integer matrix of transition counts with named rows (from-state)
and columns (to-state). Returns a matrix if no
transitions can be extracted.
df <- data.frame( id = rep(1:3, each = 3), time = rep(1:3, times = 3), dec = c(0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L) ) compute_transition_matrix_all(df, "id", "time", "dec")df <- data.frame( id = rep(1:3, each = 3), time = rep(1:3, times = 3), dec = c(0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L) ) compute_transition_matrix_all(df, "id", "time", "dec")
Writes a self-contained R script to output_file that replicates the
analysis performed in the RobustFlow Shiny application with the given
variable mappings. The generated script is intended as a starting point
for users who want to reproduce or extend the app analysis programmatically.
generate_r_script( id_var, time_var, decision_var, group_var = NULL, cluster_var = NULL, focal_value = 1, output_file )generate_r_script( id_var, time_var, decision_var, group_var = NULL, cluster_var = NULL, focal_value = 1, output_file )
id_var |
Character scalar. ID variable name. |
time_var |
Character scalar. Time variable name. |
decision_var |
Character scalar. Decision variable name. |
group_var |
Character scalar or |
cluster_var |
Character scalar or |
focal_value |
Numeric scalar. Focal decision value for gap computation.
Default |
output_file |
Character scalar. Path to write the |
Invisibly returns output_file.
tmp <- tempfile(fileext = ".R") generate_r_script( id_var = "child_id", time_var = "wave", decision_var = "risk_math", group_var = "ses_group", focal_value = 1, output_file = tmp ) # Show first 10 lines cat(readLines(tmp, n = 10), sep = "\n") unlink(tmp)tmp <- tempfile(fileext = ".R") generate_r_script( id_var = "child_id", time_var = "wave", decision_var = "risk_math", group_var = "ses_group", focal_value = 1, output_file = tmp ) # Show first 10 lines cat(readLines(tmp, n = 10), sep = "\n") unlink(tmp)
Opens the interactive RobustFlow application in a browser window. The app provides a seven-tab workflow for uploading panel data, constructing decision paths, diagnosing temporal drift, tracking subgroup disparities, auditing robustness, identifying intervention points, and exporting reproducible reports.
run_app( onStart = NULL, options = list(), enableBookmarking = NULL, uiPattern = "/", ... )run_app( onStart = NULL, options = list(), enableBookmarking = NULL, uiPattern = "/", ... )
onStart |
A function to call before the app is started. Passed to
|
options |
Named list of options passed to |
enableBookmarking |
Enable bookmarking. See |
uiPattern |
A regular expression matching the URL paths for which the Shiny UI is rendered. |
... |
Additional arguments passed to |
Invisibly returns the Shiny app object (class "shiny.appobj").
if (interactive()) { run_app() }if (interactive()) { run_app() }
Checks that required variables exist in data, sorts the data by
individual and time, and returns a structured list with diagnostic
information about the panel.
validate_panel_data(data, id, time, decision, group = NULL, cluster = NULL)validate_panel_data(data, id, time, decision, group = NULL, cluster = NULL)
data |
A data frame in long format (one row per individual per time point). |
id |
Character scalar. Name of the individual identifier variable. |
time |
Character scalar. Name of the time variable. |
decision |
Character scalar. Name of the decision or outcome variable. |
group |
Character scalar or |
cluster |
Character scalar or |
A named list with the following elements:
dataSorted data frame (by id, then time).
n_idsInteger. Number of unique individuals.
n_timesInteger. Number of unique time points.
balancedLogical. TRUE if every individual appears at every
time point (balanced panel).
missingnessNamed integer vector. Count of NA values for each
required variable.
df <- data.frame( child_id = rep(1:5, each = 3), wave = rep(1:3, times = 5), outcome = sample(0:1, 15, replace = TRUE) ) result <- validate_panel_data( data = df, id = "child_id", time = "wave", decision = "outcome" ) result$n_ids result$balanceddf <- data.frame( child_id = rep(1:5, each = 3), wave = rep(1:3, times = 5), outcome = sample(0:1, 15, replace = TRUE) ) result <- validate_panel_data( data = df, id = "child_id", time = "wave", decision = "outcome" ) result$n_ids result$balanced