Advanced Analytics

Linear Regression Lab

Build, test, and validate custom OLS models on the monthly modeling dataset. Select a target (Y) and predictors (X), then review inference, diagnostics, and interpretation generated by the backend regression engine.

Notes: Results describe statistical associations, not causality. For time-series data, autocorrelation and heteroskedasticity are common—use robust/HAC inference when diagnostics suggest it.

Regression Settings

1. Target Variable (Y)

Dependent variable you want to explain or forecast.

2. Predictors (X)

Independent variables used to explain the target.

3. Advanced Configuration (Optional)

Show Settings

Select the regression family to run. Additional models will appear here as they are implemented.

Adjusts p-values and confidence intervals to be robust against data violations.

Dataset: features_monthly.parquet (monthly)

4. Run Model

Executes OLS on the cleaned sample (listwise deletion of missing/inf values). Results include inference, diagnostic tests, and interpretation.

Select variables to begin.

Ready to Analyze

Choose a target (Y) and predictors (X), then run the model. You’ll get robust inference, diagnostics, plots, coefficient tables, VIF, and ANOVA.

Tip: If diagnostics flag heteroskedasticity or autocorrelation, prefer robust (HC) or HAC (Newey–West) standard errors when interpreting p-values and confidence intervals.
Statistical Terms Guide
t-test & p-value: Per-coefficient test of whether a predictor’s coefficient differs from zero under the chosen standard errors.
F-test (overall model): Tests whether the predictors jointly improve fit relative to a constant-only model.
Adj. R²: In-sample explanatory power adjusted for model complexity (penalizes adding weak predictors).
AIC & BIC: Relative model selection scores (lower is better) for comparing models fit on the same target and sample window.
Breusch–Pagan / White: Tests for heteroskedasticity (non-constant error variance). If detected, prefer robust (HC) or HAC inference.
Durbin–Watson / Breusch–Godfrey: Diagnostics for residual autocorrelation. If present, HAC (Newey–West) inference is recommended.
VIF (Multicollinearity): Indicates redundancy among predictors. VIF ≥ 5 suggests moderate multicollinearity; ≥ 10 is high risk.
ANOVA: Decomposes explained vs unexplained variance to support model-fit interpretation.
Practical caution: In financial datasets, predictor transforms may embed trends and shared exposures. Strong diagnostics + robust/HAC inference improve reliability, but model design (lagging predictors, avoiding leakage) is equally important.