| Title: | Modeling Achievement Gap Trajectories with Hierarchical Penalized Splines |
|---|---|
| Description: | Implements a hierarchical penalized spline framework for estimating achievement gap trajectories in longitudinal educational data. The achievement gap between two groups (e.g., low versus high socioeconomic status) is modeled directly as a smooth function of grade while the baseline trajectory is estimated simultaneously within a mixed-effects model. Smoothing parameters are selected using restricted maximum likelihood (REML), and simultaneous confidence bands with correct joint coverage are constructed using posterior simulation. The package also includes functions for simulation-based benchmarking, visualization of gap trajectories, and hypothesis testing for global and grade-specific differences. The modeling framework builds on penalized spline methods (Eilers and Marx, 1996, <doi:10.1214/ss/1038425655>) and generalized additive modeling approaches (Wood, 2017, <doi:10.1201/9781315370279>), with uncertainty quantification following Marra and Wood (2012, <doi:10.1111/j.1467-9469.2011.00760.x>). |
| Authors: | Subir Hait [aut, cre] (ORCID: <https://orcid.org/0009-0004-9871-9677>) |
| Maintainer: | Subir Hait <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-06-04 06:42:52 UTC |
| Source: | https://github.com/causalfragility-lab/achievegap |
Convenience wrapper around gap_trajectory() that provides a simple
formula interface: score ~ grade. The group indicator and nested
random effects are supplied via group and random.
achieve_gap( formula, group = NULL, random = ~1 | school/student, data, k = 6, bs = "cr", n_sim = 10000, conf_level = 0.95, grade_grid = NULL, verbose = TRUE )achieve_gap( formula, group = NULL, random = ~1 | school/student, data, k = 6, bs = "cr", n_sim = 10000, conf_level = 0.95, grade_grid = NULL, verbose = TRUE )
formula |
A two-sided formula of the form |
group |
A single character string naming the binary group variable (0/1, FALSE/TRUE, or 2-level factor) indicating reference vs focal group. |
random |
Random intercept structure in lme4-style notation.
Currently only nested intercepts are supported, with the default
|
data |
A data.frame containing all variables. |
k |
Basis dimension passed to |
bs |
Basis type passed to |
n_sim |
Number of posterior simulations used for simultaneous bands. |
conf_level |
Confidence level for bands (e.g., 0.95). |
grade_grid |
Optional numeric vector of grades/measurement occasions at which to evaluate trajectories. |
verbose |
Logical; if TRUE prints a compact model summary message. |
An object of class "achieveGap" as returned by gap_trajectory().
sim <- simulate_gap(n_students = 200, n_schools = 20, seed = 1) fit <- achieve_gap( score ~ grade, group = "SES_group", random = ~ 1 | school/student, data = sim$data, n_sim = 500, verbose = FALSE ) summary(fit)sim <- simulate_gap(n_students = 200, n_schools = 20, seed = 1) fit <- achieve_gap( score ~ grade, group = "SES_group", random = ~ 1 | school/student, data = sim$data, n_sim = 500, verbose = FALSE ) summary(fit)
The achieveGap package provides a joint hierarchical penalized spline framework for estimating achievement gap trajectories in longitudinal educational data. The gap between two groups (e.g., low vs. high socioeconomic status) is parameterized directly as a smooth function of grade, estimated simultaneously with the baseline trajectory within a mixed effects model. Smoothing parameters are selected via restricted maximum likelihood (REML), and simultaneous confidence bands with correct joint coverage are constructed via posterior simulation.
gap_trajectoryFit the joint hierarchical spline model.
plot.achieveGapPlot the estimated gap trajectory.
summary.achieveGapTabular summary of estimates.
test_gapHypothesis tests for the gap trajectory.
fit_separateSeparate-model benchmark.
simulate_gapSynthetic data generator.
run_simulationBenchmark simulation study.
Maintainer: Subir Hait [email protected] (ORCID)
Eilers & Marx (1996); Marra & Wood (2012); Wood (2017); Raudenbush & Bryk (2002).
Useful links:
Report bugs at https://github.com/causalfragility-lab/achieveGap/issues
Fits independent penalized spline mixed models to each group and computes the achievement gap as a post hoc difference between fitted curves. Pointwise standard errors are computed via a naive delta method assuming independence between the two fitted smooths:
This is included for benchmarking against the proposed joint model
gap_trajectory.
fit_separate( data, score, grade, group, school, student, k = 6, bs = "cr", conf_level = 0.95, grade_grid = NULL, verbose = TRUE )fit_separate( data, score, grade, group, school, student, k = 6, bs = "cr", conf_level = 0.95, grade_grid = NULL, verbose = TRUE )
data |
A data frame in long format. |
score |
Character string. Name of the outcome variable. |
grade |
Character string. Name of the grade/time variable. |
group |
Character string. Name of the binary group indicator. |
school |
Character string. Name of the school ID variable. |
student |
Character string. Name of the student ID variable. |
k |
Integer. Number of spline basis functions. Default is |
bs |
Character string. Spline basis type. Default is |
conf_level |
Numeric. Confidence level for intervals. Default
|
grade_grid |
Numeric vector. Evaluation grid for the gap function. Defaults to 100 equally spaced points across the observed grade range. |
verbose |
Logical. Print progress. Default is |
This function fits two separate models and subtracts fitted values. Because the two fits are obtained from disjoint subsets, the resulting uncertainty quantification is not directly comparable to the joint-model simultaneous bands (and can be inefficient for gap inference). It is provided as a simple baseline/benchmark.
A named list with eight elements: grade_grid (numeric
evaluation grid); gap_hat (estimated gap: reference minus focal);
gap_se (delta-method pointwise standard errors); ci_lower
and ci_upper (pointwise confidence bounds); mod_ref and
mod_focal (fitted mgcv::gamm objects for each group); and
group_levels (character vector c(reference, focal)).
sim <- simulate_gap(n_students = 300, n_schools = 25, seed = 42) sep <- fit_separate( data = sim$data, score = "score", grade = "grade", group = "SES_group", school = "school", student = "student" ) head(sep$gap_hat)sim <- simulate_gap(n_students = 300, n_schools = 25, seed = 42) sep <- fit_separate( data = sim$data, score = "score", grade = "grade", group = "SES_group", school = "school", student = "student" ) head(sep$gap_hat)
Fits a joint mixed-effects spline model in which the achievement gap between two groups is modeled directly as a smooth function of grade or time. The baseline trajectory and the group contrast trajectory are estimated simultaneously using penalized regression splines with restricted maximum likelihood (REML) smoothing parameter selection. Simultaneous confidence bands are constructed by posterior simulation from the approximate sampling distribution of the spline coefficients.
gap_trajectory( data, score, grade, group, school, student, covariates = NULL, k = 6, bs = "cr", n_sim = 10000, conf_level = 0.95, grade_grid = NULL, verbose = TRUE )gap_trajectory( data, score, grade, group, school, student, covariates = NULL, k = 6, bs = "cr", n_sim = 10000, conf_level = 0.95, grade_grid = NULL, verbose = TRUE )
data |
A data frame in long format containing one row per observation. |
score |
Character string giving the outcome variable name. |
grade |
Character string giving the numeric grade or time variable name. |
group |
Character string giving the binary group indicator variable name. |
school |
Character string giving the school identifier variable name. |
student |
Character string giving the student identifier variable name. |
covariates |
Optional character vector of additional covariate names. |
k |
Integer basis dimension for each smooth term. Must be smaller than the number of unique observed grade values. |
bs |
Character string giving the spline basis type passed to |
n_sim |
Integer number of posterior draws used to construct
simultaneous confidence bands. Default is |
conf_level |
Numeric confidence level for pointwise and simultaneous
intervals. Default is |
grade_grid |
Optional numeric vector giving the grid of grade values at
which the fitted gap trajectory is evaluated. If |
verbose |
Logical. If |
The estimated gap is defined as:
where the reference group is the first observed level of group and the
focal group is the second observed level.
An object of class "achieveGap" containing the estimated gap
trajectory, pointwise and simultaneous confidence bands, fitted model
objects, and supporting metadata.
sim <- simulate_gap(n_students = 20, n_schools = 5, seed = 1) fit <- gap_trajectory( data = sim$data, score = "score", grade = "grade", group = "SES_group", school = "school", student = "student", k = 5, n_sim = 200, verbose = FALSE ) summary(fit) plot(fit)sim <- simulate_gap(n_students = 20, n_schools = 5, seed = 1) fit <- gap_trajectory( data = sim$data, score = "score", grade = "grade", group = "SES_group", school = "school", student = "student", k = 5, n_sim = 200, verbose = FALSE ) summary(fit) plot(fit)
Plot the estimated achievement gap trajectory with pointwise and/or simultaneous confidence bands.
## S3 method for class 'achieveGap' plot( x, band = c("both", "simultaneous", "pointwise"), true_gap = NULL, grade_labels = NULL, title = NULL, ... )## S3 method for class 'achieveGap' plot( x, band = c("both", "simultaneous", "pointwise"), true_gap = NULL, grade_labels = NULL, title = NULL, ... )
x |
An object of class |
band |
Which band(s) to display: |
true_gap |
Optional numeric vector of same length as
|
grade_labels |
Optional character labels for the x-axis tick marks.
Three forms are accepted: (a) a named character vector mapping
numeric grade values to labels (e.g.
|
title |
Optional plot title. |
... |
Additional arguments (ignored). |
A ggplot2 object.
Print Method for achieveGap Objects
## S3 method for class 'achieveGap' print(x, ...)## S3 method for class 'achieveGap' print(x, ...)
x |
An object of class |
... |
Additional arguments (ignored). |
Invisibly returns x.
Runs a structured simulation study comparing the proposed joint spline model
(gap_trajectory) against (1) a linear growth model and
(2) separate splines with post hoc subtraction (fit_separate).
Computes RMSE, bias, simultaneous band coverage, and pointwise coverage.
run_simulation( n_reps = 100, conditions = NULL, k = 6, n_sim = 3000, alpha = 0.05, seed = NULL, verbose = TRUE )run_simulation( n_reps = 100, conditions = NULL, k = 6, n_sim = 3000, alpha = 0.05, seed = NULL, verbose = TRUE )
n_reps |
Integer. Number of simulation replications. Default is |
conditions |
A list of named lists specifying simulation conditions.
If |
k |
Integer. Spline basis dimension. Default is |
n_sim |
Integer. Posterior draws for simultaneous bands in the joint model.
Default |
alpha |
Numeric. Significance level used only for linear-model pointwise intervals; default is 0.05 (95% CI). |
seed |
Integer or |
verbose |
Logical. Print progress. Default is |
A data.frame with one row per replication-condition containing RMSE, bias, and coverage metrics for each method.
simulate_gap, gap_trajectory, fit_separate
results <- run_simulation(n_reps = 5, seed = 1) summarize_simulation(results)results <- run_simulation(n_reps = 5, seed = 1) summarize_simulation(results)
Generates synthetic longitudinal multilevel data with a known achievement
gap trajectory, suitable for evaluating the performance of
gap_trajectory and other methods.
Generates synthetic longitudinal multilevel data with a known achievement
gap trajectory, suitable for evaluating the performance of
gap_trajectory and other methods.
simulate_gap( n_students = 200, n_schools = 20, gap_shape = c("monotone", "nonmonotone"), grades = 0:7, sigma_u = 0.2, sigma_v = 0.3, sigma_e = 0.5, prop_low = 0.5, seed = NULL ) simulate_gap( n_students = 200, n_schools = 20, gap_shape = c("monotone", "nonmonotone"), grades = 0:7, sigma_u = 0.2, sigma_v = 0.3, sigma_e = 0.5, prop_low = 0.5, seed = NULL )simulate_gap( n_students = 200, n_schools = 20, gap_shape = c("monotone", "nonmonotone"), grades = 0:7, sigma_u = 0.2, sigma_v = 0.3, sigma_e = 0.5, prop_low = 0.5, seed = NULL ) simulate_gap( n_students = 200, n_schools = 20, gap_shape = c("monotone", "nonmonotone"), grades = 0:7, sigma_u = 0.2, sigma_v = 0.3, sigma_e = 0.5, prop_low = 0.5, seed = NULL )
n_students |
Integer. Total number of students. Default is |
n_schools |
Integer. Total number of schools. Default is |
gap_shape |
Character string. Shape of the true gap function.
One of |
grades |
Numeric vector. Assessment grade points. Default is
|
sigma_u |
Numeric. School-level random effect standard deviation.
Default is |
sigma_v |
Numeric. Student-level random effect standard deviation.
Default is |
sigma_e |
Numeric. Residual standard deviation. Default is |
prop_low |
Numeric. Proportion of students in the focal (low-SES)
group. Default is |
seed |
Integer or |
Data-generating model:
where is the (positive) gap magnitude and the focal group
has lower scores by construction.
Data-generating model:
where is the (positive) gap magnitude and the focal group
has lower scores by construction.
A list with elements:
dataA data frame in long format with columns:
student, grade, school, SES_group, score.
true_gapA data frame with columns grade and
gap containing the true (positive) gap function evaluated at each grade.
f0_funThe true baseline function.
f1_funThe true gap function (positive).
paramsList of simulation parameters.
A named list with five elements: data (a long-format data
frame with columns student, grade, school,
SES_group, and score); true_gap (a data frame with
columns grade and gap giving the true gap at each grade);
f0_fun (the true baseline function); f1_fun (the true gap
function, always positive); and params (a list of the simulation
parameters used).
gap_trajectory, run_simulation
gap_trajectory, run_simulation
sim <- simulate_gap(n_students = 200, n_schools = 20, gap_shape = "monotone", seed = 123) head(sim$data) sim$true_gap sim <- simulate_gap(n_students = 200, n_schools = 20, gap_shape = "monotone", seed = 123) head(sim$data) sim$true_gapsim <- simulate_gap(n_students = 200, n_schools = 20, gap_shape = "monotone", seed = 123) head(sim$data) sim$true_gap sim <- simulate_gap(n_students = 200, n_schools = 20, gap_shape = "monotone", seed = 123) head(sim$data) sim$true_gap
Prints formatted summary tables from a simulation study produced by
run_simulation and returns them invisibly.
summarize_simulation(sim_results)summarize_simulation(sim_results)
sim_results |
A data.frame returned by |
Invisibly returns a list with two data frames: table1
(overall performance averaged across conditions) and table2
(joint model coverage broken down by simulation condition).
results <- run_simulation(n_reps = 5, seed = 1) summarize_simulation(results)results <- run_simulation(n_reps = 5, seed = 1) summarize_simulation(results)
Prints a compact table of estimated gap values (with standard errors) and simultaneous confidence band bounds at selected points on the grade grid. Also reports the range of the estimated gap and the grade span where the simultaneous band excludes zero.
## S3 method for class 'achieveGap' summary(object, n_points = 8, ...)## S3 method for class 'achieveGap' summary(object, n_points = 8, ...)
object |
An object of class |
n_points |
Integer. Number of points from the grade grid to display.
Default is |
... |
Additional arguments (ignored). |
Invisibly returns a data.frame with the displayed summary rows.
Provides (1) a global test of whether the gap trajectory is identically zero,
and (2) identification of grade intervals where the gap is statistically
different from zero using the simultaneous confidence band from
gap_trajectory.
test_gap( x, type = c("both", "global", "simultaneous"), alpha = 0.05, verbose = TRUE )test_gap( x, type = c("both", "global", "simultaneous"), alpha = 0.05, verbose = TRUE )
x |
An object of class |
type |
Character string. One of |
alpha |
Significance level. Default is |
verbose |
Logical; if TRUE prints a human-readable summary. |
A list with class "achieveGap_test" containing:
typeRequested test type.
alphaSignificance level.
globalList with stat, df, p_value, reject.
simultaneousList with any_significant and a data.frame
of significant intervals (if any).