| Title: | Nonlinear Causal Dose-Response Estimation via Splines |
|---|---|
| Description: | Estimates nonlinear causal dose-response functions for continuous treatments using spline-based methods under standard causal assumptions (unconfoundedness / ignorability). Implements three identification strategies: Inverse Probability Weighting (IPW) via the generalised propensity score (GPS), G-computation (outcome regression), and a doubly-robust combination. Natural cubic splines and B-splines are supported for both the exposure-response curve f(T) and the propensity nuisance model. Pointwise confidence bands are obtained via the sandwich estimator or nonparametric bootstrap. Also provides fragility diagnostics including pointwise curvature-based fragility, uncertainty-normalised fragility, and regional integration over user-defined treatment intervals. Builds on the framework of Hirano and Imbens (2004) <doi:10.1111/j.1468-0262.2004.00481.x> for continuous treatments and extends it to fully nonparametric spline estimation. |
| Authors: | Subir Hait [aut, cre] (ORCID: <https://orcid.org/0009-0004-9871-9677>) |
| Maintainer: | Subir Hait <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-05-25 07:58:49 UTC |
| Source: | https://github.com/causalfragility-lab/causalspline |
CausalSpline estimates causal dose-response functions
for continuous treatments under unconfoundedness, without
imposing linearity on the treatment effect. The package fills a gap in the
causal inference ecosystem: most tools assume a scalar
treatment effect, whereas real policy or health applications frequently
exhibit thresholds, diminishing returns, and non-monotone relationships.
causal_splineMain fitting function. Supports IPW, G-computation, and doubly-robust estimation.
gps_modelFit the generalised propensity score model for continuous treatments.
trim_weightsWinsorise extreme IPW weights.
check_overlapDiagnose positivity (ESS, weight plots).
dose_response_curveExtract the curve data frame.
simulate_dose_responseSimulate nonlinear dose-response data for benchmarking.
gradient_curveNumerical first and second derivatives of the fitted dose-response curve.
fragility_curvePointwise fragility with adaptive eps, interpretation zones, and uncertainty normalisation.
region_fragilityIntegrated fragility over a treatment interval.
Identification relies on two standard assumptions:
Unconfoundedness (strong ignorability):
for all .
Positivity (overlap):
for all in the support of
and all in the support of .
The outcome is regressed on a spline of T using GPS-based inverse probability weights. Consistent if the GPS model is correctly specified.
The outcome model includes spline(T) + X. The curve is obtained by marginalising over the observed X distribution. Consistent if the outcome model is correctly specified.
Combines IPW and g-computation. Consistent if at least one of the two models is correctly specified.
Maintainer: Subir Hait [email protected] (ORCID)
Hirano, K., & Imbens, G. W. (2004). The propensity score with continuous treatments. doi:10.1002/0470090456.ch7
Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika, 87(3), 706-710. doi:10.1093/biomet/87.3.706
Useful links:
Report bugs at https://github.com/causalfragility-lab/CausalSpline/issues
Estimates the causal dose-response function for a continuous
treatment under unconfoundedness, using either Inverse Probability
Weighting (IPW) with the generalised propensity score (GPS) or
G-computation (outcome regression). The exposure-response curve is modelled
as a natural cubic spline or B-spline, allowing thresholds, diminishing
returns, and other nonlinear patterns to be recovered without parametric
assumptions on the functional form.
causal_spline( formula, data, method = c("ipw", "gcomp", "dr"), spline_type = c("ns", "bs"), df_exposure = 4L, knots_exposure = NULL, df_gps = 4L, gps_family = c("gaussian", "gamma", "beta"), trim_quantiles = c(0.01, 0.99), eval_grid = 100L, eval_points = NULL, se_method = c("sandwich", "bootstrap"), boot_reps = 500L, conf_level = 0.95, verbose = FALSE )causal_spline( formula, data, method = c("ipw", "gcomp", "dr"), spline_type = c("ns", "bs"), df_exposure = 4L, knots_exposure = NULL, df_gps = 4L, gps_family = c("gaussian", "gamma", "beta"), trim_quantiles = c(0.01, 0.99), eval_grid = 100L, eval_points = NULL, se_method = c("sandwich", "bootstrap"), boot_reps = 500L, conf_level = 0.95, verbose = FALSE )
formula |
A two-sided formula of the form |
data |
A data frame containing all variables in |
method |
Character string. Estimation method: |
spline_type |
Character string. Type of spline basis for |
df_exposure |
Integer. Degrees of freedom for the treatment spline
|
knots_exposure |
Numeric vector of interior knot positions for the
treatment spline. If |
df_gps |
Integer. Degrees of freedom for the GPS (propensity) spline
used in the |
gps_family |
Character string. Family for the GPS model:
|
trim_quantiles |
Numeric vector of length 2 giving lower and upper
quantiles for weight trimming. Default |
eval_grid |
Integer. Number of equally-spaced treatment values at
which to evaluate |
eval_points |
Numeric vector of specific treatment values at which to
evaluate the curve. Overrides |
se_method |
Character string. Method for standard errors:
|
boot_reps |
Integer. Number of bootstrap replications when
|
conf_level |
Numeric. Confidence level for intervals. Default
|
verbose |
Logical. Print progress messages. Default |
An object of class "causal_spline" with elements:
curveA data frame with columns t (treatment grid),
estimate (E[Y(t)]), se, lower, upper.
ateEstimated average treatment effect over the observed treatment range (scalar).
weightsNumeric vector of final (trimmed, normalised)
IPW weights (only for method = "ipw" or "dr").
gps_fitThe fitted GPS model object.
outcome_fitThe fitted outcome model object.
callThe matched call.
methodThe estimation method used.
spline_typeSpline type used.
df_exposureDegrees of freedom for the exposure spline.
data_summarySummary statistics of the treatment variable.
Hirano, K., & Imbens, G. W. (2004). The propensity score with continuous treatments. Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, 226164, 73-84. doi:10.1002/0470090456.ch7
Flores, C. A., Flores-Lagunes, A., Gonzalez, A., & Neumann, T. C. (2012). Estimating the effects of length of exposure to instruction in a training program: the case of job corps. Review of Economics and Statistics, 94(1), 153-171. doi:10.1162/REST_a_00177
Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika, 87(3), 706-710. doi:10.1093/biomet/87.3.706
# Simulate nonlinear dose-response data set.seed(42) dat <- simulate_dose_response(n = 200, dgp = "threshold") # Fit with IPW fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat, method = "ipw", df_exposure = 5) summary(fit) plot(fit) # Fit with G-computation fit_gc <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat, method = "gcomp", df_exposure = 5) plot(fit_gc, truth = dat)# Simulate nonlinear dose-response data set.seed(42) dat <- simulate_dose_response(n = 200, dgp = "threshold") # Fit with IPW fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat, method = "ipw", df_exposure = 5) summary(fit) plot(fit) # Fit with G-computation fit_gc <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat, method = "gcomp", df_exposure = 5) plot(fit_gc, truth = dat)
Plots the distribution of the treatment variable conditional on covariate strata and returns effective sample size (ESS) and weight diagnostics to assess the positivity (overlap) assumption.
check_overlap(treatment, weights, plot = TRUE)check_overlap(treatment, weights, plot = TRUE)
treatment |
Numeric vector of treatment values. |
weights |
Numeric vector of IPW weights (length must equal
|
plot |
Logical. If |
A list with:
essEffective sample size: .
weight_summaryFive-number summary of the weights.
plotggplot2 object (if plot = TRUE).
dat <- simulate_dose_response(200) X <- as.matrix(dat[, c("X1", "X2", "X3")]) gps <- gps_model(dat$T, X) w <- trim_weights(abs(1 / stats::residuals(gps$model)), c(0.01, 0.99)) check_overlap(dat$T, w)dat <- simulate_dose_response(200) X <- as.matrix(dat[, c("X1", "X2", "X3")]) gps <- gps_model(dat$T, X) w <- trim_weights(abs(1 / stats::residuals(gps$model)), c(0.01, 0.99)) check_overlap(dat$T, w)
Convenience function to pull the estimated curve from a
fitted causal_spline object.
dose_response_curve(fit)dose_response_curve(fit)
fit |
A |
A data frame with columns t, estimate, se,
lower, upper.
dat <- simulate_dose_response(200) fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) curve_df <- dose_response_curve(fit) head(curve_df)dat <- simulate_dose_response(200) fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) curve_df <- dose_response_curve(fit) head(curve_df)
Computes a pointwise shape-based fragility measure from the first and second derivatives of the fitted dose-response curve, with five enhancements:
Adaptive eps: stabilisation constant is
so interpretation is
consistent across datasets.
Interpretation zones: fragility is classified into
"low", "moderate", and "high" based on the 50th
and 75th percentiles of the pointwise fragility distribution.
Uncertainty-normalised fragility: an additional column
combines shape
instability with statistical uncertainty.
Support density: the kernel density of the treatment
variable (if t_obs is supplied) is attached, flagging regions
with sparse data.
High-fragility flag: logical column high_fragility
marks points above the 75th percentile.
fragility_curve( object, grid = NULL, h = NULL, eps = NULL, type = c("curvature_ratio", "inverse_slope"), t_obs = NULL, ... )fragility_curve( object, grid = NULL, h = NULL, eps = NULL, type = c("curvature_ratio", "inverse_slope"), t_obs = NULL, ... )
object |
A fitted |
grid |
Optional numeric evaluation grid. If |
h |
Step size for finite differences. Default |
eps |
Numeric or |
type |
Character. Fragility definition: |
t_obs |
Optional numeric vector of observed treatment values (used to
compute support density). If |
... |
Ignored. |
A data frame of class "fragility_curve" with columns:
tTreatment grid.
estimateFitted .
seStandard error of fitted curve.
derivativeFirst derivative.
second_derivativeSecond derivative.
fragilityPointwise fragility .
fragility_normUncertainty-normalised fragility
.
fragility_zoneFactor: "low", "moderate",
"high".
high_fragilityLogical: TRUE if above 75th
percentile.
support_densityKernel density of T at each grid point
(only if t_obs supplied).
fragility_typeCharacter. Type used.
dat <- simulate_dose_response(200, dgp = "threshold") fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) fc <- fragility_curve(fit, t_obs = dat$T) plot(fc)dat <- simulate_dose_response(200, dgp = "threshold") fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) fc <- fragility_curve(fit, t_obs = dat$T) plot(fc)
Fits a model for the conditional distribution of a continuous treatment
(the generalised propensity score). Covariates are entered
linearly by default; spline transformations of the treatment are not needed
here since this models the treatment, not the outcome.
gps_model( treatment, covariates, spline_type = c("ns", "bs"), df = 4L, family = c("gaussian", "gamma", "beta"), verbose = FALSE )gps_model( treatment, covariates, spline_type = c("ns", "bs"), df = 4L, family = c("gaussian", "gamma", "beta"), verbose = FALSE )
treatment |
Numeric vector of treatment values. |
covariates |
Numeric matrix of confounders. |
spline_type |
Character. |
df |
Integer. Degrees of freedom for covariate splines. Default
|
family |
Character. Distribution for the GPS model.
|
verbose |
Logical. Default |
A list with:
modelFitted model object (lm or glm).
familyFamily used.
r_squaredR-squared of the GPS model (diagnostic).
dat <- simulate_dose_response(n = 200, dgp = "linear") X <- as.matrix(dat[, c("X1", "X2", "X3")]) gps <- gps_model(dat$T, X) summary(gps$model)dat <- simulate_dose_response(n = 200, dgp = "linear") X <- as.matrix(dat[, c("X1", "X2", "X3")]) gps <- gps_model(dat$T, X) summary(gps$model)
Computes first and second numerical derivatives of the fitted dose-response
curve using central finite differences applied to
predict.causal_spline. Useful for identifying regions of rapid
change (high first derivative) or inflection / diminishing returns (second
derivative changes sign).
gradient_curve(object, grid = NULL, h = NULL, eps = 1e-06, ...)gradient_curve(object, grid = NULL, h = NULL, eps = 1e-06, ...)
object |
A fitted |
grid |
Optional numeric vector of treatment values at which to evaluate
the derivatives. If |
h |
Numeric. Step size for finite differences. If |
eps |
Small positive constant used by |
... |
Ignored. |
A data frame with columns:
tTreatment grid values.
estimateFitted .
seStandard error of from the fitted curve.
derivativeFirst derivative .
second_derivativeSecond derivative .
dat <- simulate_dose_response(200, dgp = "threshold") fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) gd <- gradient_curve(fit) head(gd)dat <- simulate_dose_response(200, dgp = "threshold") fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) gd <- gradient_curve(fit) head(gd)
Plots the estimated causal dose-response curve against
with pointwise confidence bands and an optional rug for the observed
treatment distribution.
## S3 method for class 'causal_spline' plot( x, rug = TRUE, truth = NULL, xlab = "Treatment (T)", ylab = "E[Y(t)]", title = NULL, ... )## S3 method for class 'causal_spline' plot( x, rug = TRUE, truth = NULL, xlab = "Treatment (T)", ylab = "E[Y(t)]", title = NULL, ... )
x |
A |
rug |
Logical. Add a treatment distribution rug. Default |
truth |
Optional data frame with columns |
xlab |
Character. x-axis label. Default |
ylab |
Character. y-axis label. Default |
title |
Character. Plot title. Default |
... |
Ignored. |
A ggplot2 object.
dat <- simulate_dose_response(200, dgp = "threshold") fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) plot(fit)dat <- simulate_dose_response(200, dgp = "threshold") fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) plot(fit)
Produces a dual-panel plot: the dose-response curve (top) and the
fragility curve (bottom), with high-fragility regions shaded. If
support_density is present in x (i.e. t_obs was
supplied to fragility_curve), a scaled density ribbon is
overlaid on the fragility panel to flag low-support regions.
## S3 method for class 'fragility_curve' plot(x, ...)## S3 method for class 'fragility_curve' plot(x, ...)
x |
A |
... |
Ignored. |
A combined patchwork plot if the patchwork package is
installed, otherwise the two panels are printed separately and a list of
two ggplot2 objects is returned invisibly.
dat <- simulate_dose_response(200, dgp = "threshold") fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) fc <- fragility_curve(fit, t_obs = dat$T) plot(fc)dat <- simulate_dose_response(200, dgp = "threshold") fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) fc <- fragility_curve(fit, t_obs = dat$T) plot(fc)
Predict E[Y(t)] at new treatment values
## S3 method for class 'causal_spline' predict(object, newt, se_fit = FALSE, warn_extrap = TRUE, ...)## S3 method for class 'causal_spline' predict(object, newt, se_fit = FALSE, warn_extrap = TRUE, ...)
object |
A |
newt |
Numeric vector of treatment values at which to predict. |
se_fit |
Logical. Return standard errors? Default |
warn_extrap |
Logical. Warn if any values in |
... |
Ignored. |
A data frame with columns t, estimate,
extrapolated, and optionally se, lower, upper.
The extrapolated column is TRUE for any newt value
outside the observed treatment support.
dat <- simulate_dose_response(200) fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) predict(fit, newt = c(1, 2, 3, 4, 5))dat <- simulate_dose_response(200) fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) predict(fit, newt = c(1, 2, 3, 4, 5))
Print method for causal_spline objects
## S3 method for class 'causal_spline' print(x, ...)## S3 method for class 'causal_spline' print(x, ...)
x |
A |
... |
Ignored. |
Integrates the pointwise fragility curve over a treatment interval
using the trapezoidal rule. Useful for comparing sensitivity
across dose ranges (e.g., low vs. high dose) or summarising instability
at a policy-relevant threshold.
region_fragility( object, a, b, grid = NULL, h = NULL, eps = NULL, type = c("curvature_ratio", "inverse_slope"), normalize = TRUE, t_obs = NULL, ... )region_fragility( object, a, b, grid = NULL, h = NULL, eps = NULL, type = c("curvature_ratio", "inverse_slope"), normalize = TRUE, t_obs = NULL, ... )
object |
A fitted |
a |
Numeric scalar. Lower bound of the integration interval. |
b |
Numeric scalar. Upper bound of the integration interval. |
grid |
Optional numeric evaluation grid within |
h |
Step size for finite differences. Default |
eps |
Adaptive eps passed to |
type |
Fragility definition: |
normalize |
Logical. Divide integral by interval length? Default
|
t_obs |
Optional numeric vector of observed treatment values for
support density. Default |
... |
Ignored. |
A list with elements:
intervalNamed numeric vector c(a, b) after clamping
to the observed support.
typeFragility type used.
integral_fragilityTrapezoidal integral of fragility over
.
average_fragilityIntegral divided by interval length.
normalizedLogical flag.
curveThe full fragility_curve data frame.
dat <- simulate_dose_response(200, dgp = "threshold") fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) region_fragility(fit, a = 2, b = 5)dat <- simulate_dose_response(200, dgp = "threshold") fit <- causal_spline(Y ~ T | X1 + X2 + X3, data = dat) region_fragility(fit, a = 2, b = 5)
Generates synthetic observational data with a continuous treatment, confounders, and a nonlinear dose-response outcome. Useful for testing, benchmarking, and illustrating the CausalSpline package.
simulate_dose_response( n = 500L, dgp = c("threshold", "diminishing", "nonmonotone", "linear", "sinusoidal"), confounding = 0.5, sigma_y = 1, seed = NULL )simulate_dose_response( n = 500L, dgp = c("threshold", "diminishing", "nonmonotone", "linear", "sinusoidal"), confounding = 0.5, sigma_y = 1, seed = NULL )
n |
Integer. Sample size. Default |
dgp |
Character string. Data-generating process:
|
confounding |
Numeric scalar in |
sigma_y |
Numeric. Standard deviation of the outcome noise.
Default |
seed |
Integer or |
A data frame with columns:
YObserved outcome.
TObserved (confounded) treatment.
X1, X2, X3
Confounders.
Y0Potential outcome at T = 0 (for evaluation).
true_effectf(T) at each observed T value.
dat <- simulate_dose_response(n = 200, dgp = "threshold", seed = 1) plot(dat$T, dat$true_effect, type = "l", xlab = "Treatment", ylab = "True causal effect") dat2 <- simulate_dose_response(n = 200, dgp = "nonmonotone", confounding = 0.8, seed = 42) hist(dat2$T, main = "Treatment distribution", xlab = "T")dat <- simulate_dose_response(n = 200, dgp = "threshold", seed = 1) plot(dat$T, dat$true_effect, type = "l", xlab = "Treatment", ylab = "True causal effect") dat2 <- simulate_dose_response(n = 200, dgp = "nonmonotone", confounding = 0.8, seed = 42) hist(dat2$T, main = "Treatment distribution", xlab = "T")
Summary method for causal_spline objects
## S3 method for class 'causal_spline' summary(object, ...)## S3 method for class 'causal_spline' summary(object, ...)
object |
A |
... |
Ignored. |
Winsorises (clips) weights at specified quantiles to reduce variance from extreme propensity scores. This is a standard stabilisation step when using inverse probability weighting for continuous treatments.
trim_weights(weights, quantiles = c(0.01, 0.99))trim_weights(weights, quantiles = c(0.01, 0.99))
weights |
Numeric vector of (possibly unstabilised) IPW weights. |
quantiles |
Numeric vector of length 2: lower and upper quantile
thresholds. Default |
Numeric vector of trimmed weights (same length as input).
w <- rexp(200) w_trimmed <- trim_weights(w, c(0.01, 0.99)) summary(w_trimmed)w <- rexp(200) w_trimmed <- trim_weights(w, c(0.01, 0.99)) summary(w_trimmed)