• No products in the cart.

204.1.8 Practice : Multiple Regression Issues

Practicing Multi Variable Linear Regression model.

Link to the previous post: https://course.dvanalyticsmds.com/204-1-7-adjusted-r-squared-in-python/

In the last post of this session, we did cover basics of Multiple variable Linear Regression. In this post, we will Practice and try to solve issues associated with Multiple Regression.

Practice : Multiple Regression- issues

  • Import Final Exam Score data
  • Build a model to predict final score using the rest of the variables.
  • How are Sem2_Math & Final score related? As Sem2_Math score increases, what happens to Final score?
  • Remove “Sem1_Math” variable from the model and rebuild the model
  • Is there any change in R square or Adj R square
  • How are Sem2_Math & Final score related now? As Sem2_Math score increases, what happens to Final score?
  • Draw a scatter plot between Sem1_Math & Sem2_Math
  • Find the correlation between Sem1_Math & Sem2_Math
In [34]:
#Import Final Exam Score data
final_exam=pd.read_csv("datasets\\Final Exam\\Final Exam Score.csv")
In [35]:
#Size of the data
final_exam.shape
Out[35]:
(24, 5)
In [36]:
#Variable names
final_exam.columns
Out[36]:
Index(['Sem1_Science', 'Sem2_Science', 'Sem1_Math', 'Sem2_Math',
       'Final_exam_marks'],
      dtype='object')
In [37]:
#Build a model to predict final score using the rest of the variables.
from sklearn.linear_model import LinearRegression
lr1 = LinearRegression()
lr1.fit(final_exam[["Sem1_Science"]+["Sem2_Science"]+["Sem1_Math"]+["Sem2_Math"]], final_exam[["Final_exam_marks"]])
predictions1 = lr1.predict(final_exam[["Sem1_Science"]+["Sem2_Science"]+["Sem1_Math"]+["Sem2_Math"]])

import statsmodels.formula.api as sm
model1 = sm.ols(formula='Final_exam_marks ~ Sem1_Science+Sem2_Science+Sem1_Math+Sem2_Math', data=final_exam)
fitted1 = model1.fit()
fitted1.summary()
Out[37]:
OLS Regression Results
Dep. Variable: Final_exam_marks R-squared: 0.990
Model: OLS Adj. R-squared: 0.987
Method: Least Squares F-statistic: 452.3
Date: Wed, 27 Jul 2016 Prob (F-statistic): 1.50e-18
Time: 11:48:28 Log-Likelihood: -38.099
No. Observations: 24 AIC: 86.20
Df Residuals: 19 BIC: 92.09
Df Model: 4
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -1.6226 1.999 -0.812 0.427 -5.806 2.561
Sem1_Science 0.1738 0.063 2.767 0.012 0.042 0.305
Sem2_Science 0.2785 0.052 5.379 0.000 0.170 0.387
Sem1_Math 0.7890 0.197 4.002 0.001 0.376 1.202
Sem2_Math -0.2063 0.191 -1.078 0.294 -0.607 0.194
Omnibus: 6.343 Durbin-Watson: 1.863
Prob(Omnibus): 0.042 Jarque-Bera (JB): 4.332
Skew: 0.973 Prob(JB): 0.115
Kurtosis: 3.737 Cond. No. 1.20e+03
In [38]:
fitted1.rsquared
Out[38]:
0.98960765475687229
  • How are Sem2_Math & Final score related? As Sem2_Math score increases, what happens to Final score?

As Sem2_Math score increases Final score decreases

In [39]:
#Remove "Sem1_Math" variable from the model and rebuild the model
from sklearn.linear_model import LinearRegression
lr2 = LinearRegression()
lr2.fit(final_exam[["Sem1_Science"]+["Sem2_Science"]+["Sem2_Math"]], final_exam[["Final_exam_marks"]])
predictions2 = lr2.predict(final_exam[["Sem1_Science"]+["Sem2_Science"]+["Sem2_Math"]])

import statsmodels.formula.api as sm
model2 = sm.ols(formula='Final_exam_marks ~ Sem1_Science+Sem2_Science+Sem2_Math', data=final_exam)
fitted2 = model2.fit()
fitted2.summary()
Out[39]:
OLS Regression Results
Dep. Variable: Final_exam_marks R-squared: 0.981
Model: OLS Adj. R-squared: 0.978
Method: Least Squares F-statistic: 341.4
Date: Wed, 27 Jul 2016 Prob (F-statistic): 2.44e-17
Time: 11:48:29 Log-Likelihood: -45.436
No. Observations: 24 AIC: 98.87
Df Residuals: 20 BIC: 103.6
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -2.3986 2.632 -0.911 0.373 -7.889 3.092
Sem1_Science 0.2130 0.082 2.595 0.017 0.042 0.384
Sem2_Science 0.2686 0.068 3.925 0.001 0.126 0.411
Sem2_Math 0.5320 0.067 7.897 0.000 0.391 0.673
Omnibus: 5.869 Durbin-Watson: 2.424
Prob(Omnibus): 0.053 Jarque-Bera (JB): 3.793
Skew: 0.864 Prob(JB): 0.150
Kurtosis: 3.898 Cond. No. 1.03e+03
  • Is there any change in R square or Adj R square
Model R2
AdjR2
model1 0.990 0.987
model2 0.981 0.978
  • How are Sem2_Math & Final score related now? As Sem2_Math score increases, what happens to Final score?

As Sem2_Math score increases Final score also increases.

In [40]:
#Draw a scatter plot between Sem1_Math & Sem2_Mat

import matplotlib.pyplot as plt
%matplotlib inline 
plt.scatter(final_exam.Sem1_Math,final_exam.Sem2_Math)
Out[40]:
<matplotlib.collections.PathCollection at 0xb2cf0f0>
In [41]:
#Find the correlation between Sem1_Math & Sem2_Math 
np.corrcoef(final_exam.Sem1_Math,final_exam.Sem2_Math)
Out[41]:
array([[ 1.       ,  0.9924948],
       [ 0.9924948,  1.       ]])

The next post is about issues of multicollinearity in python.

Link to the next post : https://course.dvanalyticsmds.com/204-1-9-issue-of-multicollinearity-in-python/

DV Analytics

DV Data & Analytics is a leading data science,  Cyber Security training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.