| Title: | Spline-Based Nonlinear Modeling for Multilevel and Longitudinal Data |
|---|---|
| Description: | Provides a unified framework for fitting, predicting, and interpreting nonlinear relationships in single-level, multilevel, and longitudinal regression models. Flexible functional forms are supported using natural cubic splines ('splines'), B-splines ('splines'), and GAM smooths ('mgcv'). Supports two-way and nested clustering via 'lme4', automatic knot selection by AIC or BIC, multilevel R-squared decomposition (Nakagawa-Schielzeth marginal and conditional R-squared with level-specific variance partitioning), a postestimation suite returning first and second derivatives with confidence bands, turning points and inflection regions, and a model comparison workflow contrasting linear, polynomial, and spline fits by AIC, BIC, and likelihood-ratio tests. Cluster heterogeneity in nonlinear effects is supported via random-slope spline terms. |
| Authors: | Subir Hait [aut, cre] (ORCID: <https://orcid.org/0009-0004-9871-9677>) |
| Maintainer: | Subir Hait <[email protected]> |
| License: | GPL-3 |
| Version: | 0.2.0 |
| Built: | 2026-05-16 06:23:33 UTC |
| Source: | https://github.com/causalfragility-lab/multispline |
Compares a nonlinear spline model against simpler alternatives—a linear model and optional polynomial terms—to help researchers justify the spline approach over simpler specifications.
For each model, the function reports:
AIC and BIC
Log-likelihood (and likelihood-ratio test versus the linear model where available)
Number of parameters (df)
Residual variance / deviance
nl_compare( object, polynomial_degrees = c(2L, 3L), digits = 3L, return_models = FALSE )nl_compare( object, polynomial_degrees = c(2L, 3L), digits = 3L, return_models = FALSE )
object |
An |
polynomial_degrees |
Integer vector of polynomial degrees to include
as additional comparators. Default |
digits |
Integer; decimal places for the output table. Default |
return_models |
Logical; if |
A list (invisibly) of class "nl_compare" with:
tableA data frame with the comparison statistics.
modelsNamed list of fitted models (when
return_models = TRUE).
bestName of the model with the lowest AIC.
The table is pretty-printed automatically.
## Not run: fit <- nl_fit(data = mydata, y = "score", x = "age", df = 4) nl_compare(fit) nl_compare(fit, polynomial_degrees = c(2, 3, 4)) ## End(Not run)## Not run: fit <- nl_fit(data = mydata, y = "score", x = "age", df = 4) nl_compare(fit) nl_compare(fit, polynomial_degrees = c(2, 3, 4)) ## End(Not run)
Uses numerical differentiation on the prediction grid to compute the
first derivative (slope / marginal effect), the second derivative
(curvature), and propagated confidence bands for both, using the
delta method from the se.fit column of nl_predict().
Results are passed to nl_turning_points to identify
local turning points and inflection regions.
nl_derivatives(pred_df, x, time = NULL, h = NULL, level = 0.95)nl_derivatives(pred_df, x, time = NULL, h = NULL, level = 0.95)
pred_df |
A data frame returned by |
x |
Character; name of the focal predictor column. |
time |
Optional character; name of the time column. If present, derivatives are computed separately within each time level. |
h |
Numeric; step size for finite-difference approximation.
Default |
level |
Confidence level for derivative CIs. Default |
A data frame with columns x (the focal predictor),
time (if applicable), fit (predicted outcome),
d1 (first derivative), d1_lwr and d1_upr
(lower and upper CI for the first derivative), d2 (second
derivative), and d2_lwr and d2_upr (CI for the second
derivative).
Fits a nonlinear regression model for an outcome y with a focal
predictor x, modeled using natural cubic splines ("ns"),
B-splines ("bs"), or GAM smooths ("gam").
Version 2 additions: two-way and nested clustering via the
nested argument; random spline slopes via random_slope;
B-spline basis via method = "bs"; automatic df selection via
df = "auto".
nl_fit( data, y, x, time = NULL, cluster = NULL, nested = FALSE, controls = NULL, method = c("ns", "bs", "gam"), df = 4, df_range = 2:8, df_criterion = c("AIC", "BIC"), k = 5, bs_degree = 3, random_slope = FALSE, family = stats::gaussian(), ... )nl_fit( data, y, x, time = NULL, cluster = NULL, nested = FALSE, controls = NULL, method = c("ns", "bs", "gam"), df = 4, df_range = 2:8, df_criterion = c("AIC", "BIC"), k = 5, bs_degree = 3, random_slope = FALSE, family = stats::gaussian(), ... )
data |
A data frame (often long format for longitudinal data). |
y |
Outcome variable name (string). |
x |
Focal nonlinear predictor name (string). Must be numeric. |
time |
Optional time variable name (string). |
cluster |
Optional character vector of grouping variable name(s) for
random effects, e.g. |
nested |
Logical; only used when |
controls |
Optional character vector of additional covariate names to include as linear fixed effects. |
method |
Spline basis to use: |
df |
Degrees of freedom for the spline basis. Supply a single integer
greater than or equal to 1, or the string |
df_range |
Integer vector of candidate df values evaluated when
|
df_criterion |
Information criterion used for automatic df selection:
|
k |
Basis dimension for |
bs_degree |
Polynomial degree for |
random_slope |
Logical; if |
family |
A family object such as |
... |
Additional arguments passed to the underlying fitting function
( |
An object of class "nl_fit" (a named list). It contains
the fitted model object, the method, variable names
(y, x, time, cluster, controls),
spline settings (df, df_selected, df_search,
k, bs_degree), flags (nested,
random_slope), the family, the model formula,
the call, and metadata used for prediction (x_info,
levels_info, control_defaults).
nl_predict, nl_derivatives,
nl_compare, nl_r2, nl_knots
## Not run: # Single-level natural spline with automatic df selection fit <- nl_fit(data = mydata, y = "score", x = "age", df = "auto") # Two-way cross-classified clustering fit2 <- nl_fit( data = mydata, y = "score", x = "age", cluster = c("student_id", "school_id"), nested = FALSE ) # Nested clustering (students within schools) fit3 <- nl_fit( data = mydata, y = "score", x = "age", cluster = c("student_id", "school_id"), nested = TRUE ) # Random spline slopes fit4 <- nl_fit( data = mydata, y = "score", x = "age", cluster = "id", random_slope = TRUE ) ## End(Not run)## Not run: # Single-level natural spline with automatic df selection fit <- nl_fit(data = mydata, y = "score", x = "age", df = "auto") # Two-way cross-classified clustering fit2 <- nl_fit( data = mydata, y = "score", x = "age", cluster = c("student_id", "school_id"), nested = FALSE ) # Nested clustering (students within schools) fit3 <- nl_fit( data = mydata, y = "score", x = "age", cluster = c("student_id", "school_id"), nested = TRUE ) # Random spline slopes fit4 <- nl_fit( data = mydata, y = "score", x = "age", cluster = "id", random_slope = TRUE ) ## End(Not run)
Quantifies and visualises how much the nonlinear relationship between
x and y varies across clusters, using a model with random
spline slopes.
The function:
Refits the model with random spline slopes (if not already fitted
with random_slope = TRUE).
Extracts cluster-specific predicted curves (BLUPs).
Returns a heterogeneity summary: variance of slopes across clusters at each x value, and a plot of the cluster-specific trajectories.
Performs a likelihood-ratio test comparing the random-slope model against a random-intercept-only model to assess whether heterogeneity is statistically significant.
nl_het( object, n_clusters_plot = 30L, x_seq = NULL, level = 0.95, plot = TRUE, seed = 42L )nl_het( object, n_clusters_plot = 30L, x_seq = NULL, level = 0.95, plot = TRUE, seed = 42L )
object |
An |
n_clusters_plot |
Maximum number of cluster curves to display in the
trajectory plot. Default |
x_seq |
Optional numeric vector of x values for prediction. |
level |
Confidence level for the population-mean ribbon. Default
|
plot |
Logical; print the trajectory plot. Default |
seed |
Integer seed for reproducibility when sub-sampling clusters.
Default |
A list (invisibly) with:
trajectory_dfLong data frame of cluster-specific predicted values.
slope_varianceNamed numeric: SD of first-derivative estimates across clusters at each x value.
lrtLRT comparing random-slope vs random-intercept model
(NULL if random_slope = TRUE was supplied).
plotA ggplot object.
Extracts variance components from a multilevel nl_fit object and
computes intraclass correlation coefficients (ICCs) for each grouping
level plus the residual.
The ICC for grouping factor is defined as:
nl_icc(object, include_residual = TRUE)nl_icc(object, include_residual = TRUE)
object |
An |
include_residual |
Logical; if |
A named numeric vector of ICCs, one per grouping factor (named
ICC_<groupname>) plus Residual for the residual variance
(when include_residual = TRUE). All values sum to 1.
## Not run: fit <- nl_fit( data = mydata, y = "math_score", x = "SES", cluster = c("id", "schid"), df = 4 ) nl_icc(fit) ## End(Not run)## Not run: fit <- nl_fit( data = mydata, y = "math_score", x = "SES", cluster = c("id", "schid"), df = 4 ) nl_icc(fit) ## End(Not run)
Performs a grid search over candidate degrees of freedom (df) for a
natural cubic spline ("ns") or B-spline ("bs") model and
selects the best value by an information criterion (AIC or BIC). A
diagnostic plot of the criterion against df is returned.
This function is called internally by nl_fit when
df = "auto", but it can also be called directly for exploration
before fitting the final model.
nl_knots( data, y, x, time = NULL, cluster = NULL, nested = FALSE, controls = NULL, method = c("ns", "bs"), df_range = 2:10, criterion = c("AIC", "BIC"), family = stats::gaussian(), plot = TRUE, ... )nl_knots( data, y, x, time = NULL, cluster = NULL, nested = FALSE, controls = NULL, method = c("ns", "bs"), df_range = 2:10, criterion = c("AIC", "BIC"), family = stats::gaussian(), plot = TRUE, ... )
data |
A data frame. |
y |
Outcome variable name (string). |
x |
Focal predictor name (string). |
time |
Optional time variable name. |
cluster |
Optional character vector of cluster variable names. |
nested |
Logical; nested clustering. Default |
controls |
Optional character vector of control variable names. |
method |
Either |
df_range |
Integer vector of candidate df values. Default |
criterion |
Either |
family |
A family object. Default |
plot |
Logical; if |
... |
Additional arguments passed to the underlying fitter. |
A list with:
best_dfThe df value with the lowest criterion value.
search_tableData frame with columns df and
criterion.
criterionThe criterion used ("AIC" or
"BIC").
plotA ggplot object (if plot = TRUE).
Visualises results from nl_predict or
nl_derivatives. The type argument selects the plot:
"trajectory": predicted outcome vs x with optional
confidence ribbon (v1 behaviour, now default).
"slope": first derivative (marginal effect) vs x.
"curvature": second derivative vs x.
"combo": trajectory + slope panels side by side.
nl_plot( pred_df = NULL, deriv_df = NULL, x, time = NULL, type = c("trajectory", "slope", "curvature", "combo"), show_ci = TRUE, ci_level = 0.95, show_turning_points = FALSE, turning_points = NULL, zero_line = TRUE, y_lab = NULL, x_lab = NULL, title = NULL, legend_title = NULL )nl_plot( pred_df = NULL, deriv_df = NULL, x, time = NULL, type = c("trajectory", "slope", "curvature", "combo"), show_ci = TRUE, ci_level = 0.95, show_turning_points = FALSE, turning_points = NULL, zero_line = TRUE, y_lab = NULL, x_lab = NULL, title = NULL, legend_title = NULL )
pred_df |
A data frame from |
deriv_df |
A data frame from |
x |
Character; name of the focal predictor column in |
time |
Optional character; name of the time column. Auto-detected if
|
type |
Plot type: |
show_ci |
Logical; add confidence ribbons. Default |
ci_level |
Numeric; confidence level when deriving CI from
|
show_turning_points |
Logical; overlay turning-point markers on the
trajectory plot. Default |
turning_points |
Data frame from |
zero_line |
Logical; for slope / curvature plots, draw a dashed
horizontal line at zero. Default |
y_lab |
Y-axis label. |
x_lab |
X-axis label. Defaults to |
title |
Optional plot title. |
legend_title |
Optional legend title. |
A ggplot object (or a combined plot list for "combo").
nl_predict, nl_derivatives,
nl_turning_points
Creates a prediction data frame over a grid of the focal predictor x
(and optionally over time), holding control variables at typical
values. For mixed models, predictions default to population-level curves
(random effects set to zero).
v2 improvements:
CI for glmerMod: approximate confidence intervals are
computed via the parametric bootstrap or the delta method. Set
glmer_ci = "delta" (default, fast) or "boot" (more
accurate, slower).
Cluster-specific predictions: set re_form = NULL
to include random effects in the predictions.
nl_predict( object, x_seq = NULL, time_levels = NULL, controls_fixed = NULL, se = TRUE, level = 0.95, re_form = NA, glmer_ci = c("delta", "boot"), n_boot = 500L, ... )nl_predict( object, x_seq = NULL, time_levels = NULL, controls_fixed = NULL, se = TRUE, level = 0.95, re_form = NA, glmer_ci = c("delta", "boot"), n_boot = 500L, ... )
object |
An |
x_seq |
Optional numeric vector of x values. If |
time_levels |
Optional vector of time levels. |
controls_fixed |
Optional named list of fixed control values. |
se |
Logical; include SEs and CIs. Default |
level |
Confidence level. Default |
re_form |
For mixed models: |
glmer_ci |
Method for glmerMod CIs: |
n_boot |
Number of bootstrap replicates when |
... |
Reserved for future use. |
A data frame with columns for the focal predictor, time (if any),
controls at fixed values, fit, se.fit, lwr,
and upr.
nl_fit, nl_plot,
nl_derivatives
Computes a suite of R-squared statistics for models fitted by
nl_fit.
For single-level models the standard R-squared and adjusted R-squared are
returned. For multilevel models (lmerMod / glmerMod) two quantities are
reported: the Nakagawa-Schielzeth marginal R-squared (variance explained by
fixed effects only) and the conditional R-squared (fixed plus all random
effects), together with a level-specific variance partition table analogous
to the r2_mlm / Raudenbush-Bryk approach.
nl_r2(object, digits = 4L)nl_r2(object, digits = 4L)
object |
An |
digits |
Integer; decimal places for display. Default |
Marginal and conditional R-squared for LMMs follow the Nakagawa and
Schielzeth (2013) formulae extended to multiple random effects by Nakagawa,
Johnson and Schielzeth (2017). The fixed-effects variance
is computed as the variance of the linear predictor
from fixed effects only ().
The level-specific variance partition (r2_mlm-style) decomposes the total
modelled variance ()
to show how much each source contributes, printed as a breakdown table.
A list of class "nl_r2" returned invisibly and
pretty-printed automatically. It contains type (one of
"OLS", "GAM", "LMM", or "GLMM"),
r2 (a named numeric vector: R2 and R2_adj for OLS;
R2_dev for GAM; R2m and R2c for LMM/GLMM), and
variance_partition (a data frame with columns component,
variance, and proportion for multilevel models, or
NULL for single-level models).
Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R-squared from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4(2), 133–142.
Nakagawa, S., Johnson, P. C. D., & Schielzeth, H. (2017). The coefficient of determination R-squared and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society Interface, 14(134), 20170213.
Rights, J. D., & Sterba, S. K. (2019). Quantifying explained variance in multilevel models: An integrative framework for defining R-squared measures. Psychological Methods, 24(3), 309–338.
Produces a tidy coefficient table for a model fitted by
nl_fit. For linear mixed models (lmerMod), optional
p-values and denominator degrees of freedom are obtained via
lmerTest (Satterthwaite method). For GLMMs (glmerMod),
z-tests are reported from summary().
nl_summary( object, digits = 3, pvals = TRUE, df_method = c("satterthwaite", "none") )nl_summary( object, digits = 3, pvals = TRUE, df_method = c("satterthwaite", "none") )
object |
An |
digits |
Integer; number of decimal places for rounding. If
|
pvals |
Logical; if |
df_method |
For |
A data frame (or tibble) with columns: Term, Estimate,
Std.Error, df (if available), statistic, and
p.value (if available).
From the derivative table returned by nl_derivatives, finds
turning points (values of x where the first derivative crosses zero),
inflection regions (intervals where the second derivative changes sign),
and slope regions (contiguous stretches where the curve is increasing,
decreasing, or flat).
nl_turning_points(deriv_df, x, time = NULL, tol = 1e-06)nl_turning_points(deriv_df, x, time = NULL, tol = 1e-06)
deriv_df |
A data frame returned by |
x |
Character; name of the focal predictor column. |
time |
Optional character; name of the time column. Turning points are identified separately within each time level when supplied. |
tol |
Numeric tolerance for near-zero detection of d1. Default
|
A named list with three elements.
turning_points is a data frame with columns x,
time (if applicable), fit, and type
(one of "maximum", "minimum", or "saddle").
inflection_regions is a data frame with columns
x_start, x_end, time, and direction
("concave_to_convex" or "convex_to_concave").
slope_regions is a data frame with columns x_start,
x_end, direction, and time.
Compact console display for objects returned by nl_fit.
Shows key metadata (method, outcome, predictor, time, clustering, family,
and controls) and the fitted model formula.
## S3 method for class 'nl_fit' print(x, ...)## S3 method for class 'nl_fit' print(x, ...)
x |
An object of class |
... |
Further arguments (currently ignored). |
x invisibly.
Prints a summary_nl_fit object returned by
summary.nl_fit.
## S3 method for class 'summary_nl_fit' print(x, ...)## S3 method for class 'summary_nl_fit' print(x, ...)
x |
A |
... |
Further arguments passed to |
x invisibly.
Produces a tidy coefficient table via nl_summary and wraps
it in a summary_nl_fit object for pretty printing.
## S3 method for class 'nl_fit' summary( object, digits = 3, pvals = TRUE, df_method = c("satterthwaite", "none"), ... )## S3 method for class 'nl_fit' summary( object, digits = 3, pvals = TRUE, df_method = c("satterthwaite", "none"), ... )
object |
An |
digits |
Number of decimal places for rounding. Default |
pvals |
Logical; if |
df_method |
For |
... |
Further arguments passed to |
An object of class summary_nl_fit containing call,
formula, method, and table.