Building on Linear Regression
Linear regression is a simple and widely used approach for modeling the relationship between a dependent variable and one or more independent variables. However, it assumes a linear relationship between the independent and dependent variables. When dealing with non-linear data, we can extend linear regression by incorporating non-linear transformations of the features.
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
# Generate sample non-linear data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Transform features to higher degrees
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
# Fit linear regression model on transformed features
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
# Predictions
y_pred = lin_reg.predict(X_poly)
Non-linear data can be handled in various ways, including polynomial regression, spline regression, or using non-linear models like decision trees or neural networks. Polynomial regression fits a polynomial function to the data, allowing for non-linear relationships between variables.
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np
# Generate non-linear data
np.random.seed(0)
X = 6 * np.random.rand(100, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(100, 1)
# Transform features to higher degrees
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
# Fit polynomial regression model
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
# Predictions
y_pred = lin_reg.predict(X_poly)
Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function. Two common types of regularization are Ridge Regression (L2 regularization) and Lasso Regression (L1 regularization).
from sklearn.linear_model import Ridge, Lasso
import numpy as np
# Generate sample data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Fit Ridge regression model
ridge_reg = Ridge(alpha=1)
ridge_reg.fit(X, y)
# Fit Lasso regression model
lasso_reg = Lasso(alpha=0.1)
lasso_reg.fit(X, y)
Ridge and Lasso regression can be represented geometrically as constraints on the coefficients. In Ridge regression, the constraint is a L2 norm, which forms a circular constraint region. In Lasso regression, the constraint is a L1 norm, which forms a diamond-shaped constraint region.