Numeric Forest logo
Polynomial Regression Calculator
Outlier Sensitivity:
Decimal Places:
Clear Random Data

Introduction

This polynomial regression calculator is designed to analyse non-linear relationships between two numeric datasets, X and Y. By fitting a polynomial of degree n to the observations, it allows researchers to model complex trends that simple linear models cannot capture, facilitating a deeper understanding of variable interactions within a scientific or academic framework.

What this calculator does

The tool performs a least-squares polynomial fit by constructing a Vandermonde matrix from the input coordinates. It requires two equal-length datasets and a specified polynomial degree. The process generates a regression equation, calculates the coefficients for each power of x, and determines the coefficient of determination R2 to evaluate the model fit. It also identifies potential outliers using modified Z-scores based on the median absolute deviation.

Formula used

The calculation utilises the normal equation to find the coefficient vector a. This involves the transpose of the design matrix XT, its inverse, and the dependent variable vector y. The model quality is assessed via the coefficient of determination R2, which compares the residual sum of squares SSres to the total sum of squares SStotal.

a=XTX-1XTy
R2=1-SSresSStotal

How to use this calculator

1. Enter the independent values for Dataset X as comma-separated digits.
2. Enter the dependent values for Dataset Y, ensuring the count matches Dataset X.
3. Select the desired polynomial degree, outlier sensitivity, and decimal precision.
4. Execute the calculation to view the regression equation, coefficients, and statistical fit analysis.

Example calculation

Scenario: A researcher in environmental science is modelling the rate of soil nutrient depletion over ten months to determine if the decay follows a quadratic trend.

Inputs: Dataset X is 1,2,3; Dataset Y is 2,4,9; Degree is 2.

Working:

Step 1: y=a2x2+a1x+a0

Step 2: XTXa=XTy

Step 3: a2=1.5,a1=-3.5,a0=4.0

Step 4: y=1.5x2-3.5x+4.0

Result: y=1.5x2-3.5x+4.0

Interpretation: The coefficients represent the specific weights for the quadratic, linear, and constant terms that minimise the residual error.

Summary: The model successfully defines the curvature of the provided data points.

Understanding the result

The resulting regression equation provides the mathematical model for the data. The R2 value indicates the proportion of variance in Y explained by X; a value closer to 1.0 suggests a superior fit, while residuals help identify where the model deviates from actual observations.

Assumptions and limitations

The model assumes a non-linear relationship exists and that the number of unique data points exceeds the polynomial degree. High-degree polynomials may lead to overfitting, where the model describes noise rather than the underlying trend.

Common mistakes to avoid

Typical errors include using a polynomial degree that is too high for the sample size, which reduces the model's predictive reliability. Additionally, entering datasets of unequal lengths or including non-numeric characters will result in calculation failures.

Sensitivity and robustness

Polynomial regression is sensitive to outliers, as squared or higher-power terms can amplify the influence of extreme values. The tool provides sensitivity settings to identify these points, ensuring that researchers can assess whether specific data points are disproportionately affecting the calculated coefficients.

Troubleshooting

If the calculator reports a singular matrix, it indicates that the data points may be perfectly correlated or insufficient for the chosen degree. Ensure that the input contains distinct numeric values and that the degree is appropriate for the number of observations provided.

Frequently asked questions

What is the maximum number of data points?

The calculator accepts up to 1000 data points for processing.

What does a singular matrix error mean?

This occurs when the matrix cannot be inverted, often due to redundant data or a lack of unique values required for the selected polynomial degree.

How is the model fit evaluated?

The fit is evaluated using the R-squared value, which measures how well the regression line approximates the real data points.

Where this calculation is used

Polynomial regression is widely applied in educational and research settings to model phenomena that do not follow a straight line. In social research, it may be used to analyse population growth trends. In sports analysis, it helps in modelling projectile trajectories or athlete performance curves. Environmental studies often employ it to track fluctuations in chemical concentrations or temperature changes over time, providing a more nuanced descriptive statistics tool than standard linear models.

Results are based on standard mathematical and statistical methods and may involve rounding or approximation. If precise accuracy is required, please verify results independently. See full disclaimer.