Often when creating a regression model it is difficult to grasp the relationship between variables as well as fit of the model by merely examining values of different measures. This is especially the case when fittng models where the relationship between predictor and response is mediated by a link function (i.e. generalized linear models). In such cases it might be useful to visualize the model fit. There is no direct and concise way to do this in R but it is entirely possible with a combination of several functions.
To begin with, we need a regression model. To make it more interesting, we’ll skip simple linear regression and create a logistic regression model with multiple predictors. The
mtcars dataset contains data on 32 autombiles and our model will include transmission (0 = automatic, 1 = manual) as the response and gross horsepower and weight (1000 lbs) as predictors.
To illustrate the relationship between transmission and weight, we’ll plot the two variables against each other.
Manual transmission seems to be somewhat unexpectedly more common in case of lower weight and vice versa.
We can predict values from the model with the
predict() function. Using this function to add the fit from our model is a bit tricky since we need to provide a data frame with values of the predictors (the newvalue argument). In order to get a consistent line we’ll first create a vector that covers the entire range of plotted values of weight. Then we’ll plot it against predicted values, obtained from this vector, mean value of horsepower, and our model. Note that the values of horsepower must be kept at the mean.
The result indicates that the model fits the data better at the more extreme values of weight (as is usual with logistic regression models).
We can also plot only the curve using the (again accordingly named)
curve() function. In this case, we again use the
predict() function but now as an expression with an unknown variable for weight. The scale of weight will be determined according to the from and to arguments to the function. However, the resulting plot does not include observed values and so does not illustrate how well the model fits data.
The resulting curve can be interpreted as (1) the probability of a car having manual transmission (2) given the weight (3) when horsepower remains constant (at its mean value).
This post was originally written in rmarkdown and the code can be found here.