When evaluating regression models, there are several metrics you can use to assess their performance beyond just Mean Squared Error (MSE) or Root Mean Squared Error (RMSE). Here are some commonly used evaluation metrics for regression:
Mean Absolute Error (MAE): This metric measures the average absolute difference between the predicted and actual values. It provides a measure of the model’s average prediction error.
R-squared (R²) or Coefficient of Determination: R-squared indicates the proportion of the variance in the dependent variable (target) that is predictable from the independent variables (features). It ranges from 0 to 1, where 1 indicates a perfect fit.
Mean Squared Logarithmic Error (MSLE): MSLE measures the average logarithmic error between the predicted and actual values. It can be useful when the target variable has exponential growth.
Explained Variance Score: This score measures the proportion of variance explained by the model. It ranges from 0 to 1, where 1 indicates a perfect fit.
Median Absolute Error (MedAE): Similar to MAE, this metric calculates the median absolute difference between the predicted and actual values. It is less sensitive to outliers compared to MAE.
R-squared Adjusted (Adjusted R²): Adjusted R-squared takes into account the number of predictors in the model. It penalizes the addition of unnecessary variables and helps avoid overfitting.
Mean Percentage Error (MPE): This metric calculates the average percentage difference between the predicted and actual values. It is useful when you want to understand the relative error of the model’s predictions.
Mean Absolute Percentage Error (MAPE): MAPE calculates the average percentage difference between the predicted and actual values, similar to MPE. It is commonly used in time series forecasting.
Quantile Loss: Quantile loss measures the accuracy of predicting specific quantiles of the target variable. It provides information about the model’s performance across different levels of uncertainty.
Visualization techniques for regression evaluation can include:
Scatter plots: Plotting the predicted values against the actual values can help visualize the overall performance of the model. Ideally, the points should lie close to a diagonal line.
Residual plots: Residuals are the differences between the predicted and actual values. Plotting the residuals against the predicted values or the independent variables can help identify patterns or heteroscedasticity (unequal variance).
Distribution plots: Comparing the distribution of predicted values with the actual values can provide insights into the model’s accuracy and whether it is capturing the underlying data distribution.
Regression line plot: Visualizing the regression line along with the data points can help understand the relationship between the independent and dependent variables.
Following code shows some example:
%matplotlib inline |