联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2022-04-18 09:03

Download the file Coursework2Data.csv from the moodle webpage and load it into R.

The dataset comprises information on the retail prices of second hand cars. The variables are:

? Price: the retail price of the second hand car (in 1000 £);

? Age: the age of the car (in months);

? Mileage: the mileage, that is the distance that the car has driven in its lifetime (in 1000 miles);

? MOT: time passed since the last MOT, a vehicle safety inspection that a registered car needs to pass

every year;

? ABS: whether the car has ABS, that is an anti-lock brake system which is an enhanced safety feature;

? Sunroof: whether the car has a sun roof.

(a) [1 mark] Produce a scatterplot of Price against Mileage.

(b) [3 marks] Consider polynomials up to degree 3 to model the relationship between Mileage and Price.

Fit each model and then add to your scatterplot from (a) the corresponding fitted lines/curves using dierent

colours/line types for each. Don’t forget to add a legend. Judging from your plot, which seems the most

appropriate model? Explain why.

(c) [3 marks] Perform a sequential ANOVA on the cubic model from (b) and include the output in your

report. What conclusion can you draw from the results?

(d) [8 marks] Explain how to use the results from (c) to compute the entries for the standard ANOVA table

for the existence of regression for the quadratic model. Write out this ANOVA table.

(e) [7 marks] Perform the test for existence of regression for the quadratic model at a 5% significance level.

In your answer state clearly

? the null and the alternative hypothesis;

? the definition of the relevant test statistic;

? the distribution of the test statistic under the null hypothesis;

? the observed value of the test statistic;

? the corresponding p-value;

? the outcome of the test and

? the conclusion you draw from the test.

(Hint: you may either use the results from (d) or use any other approach to obtain the relevant quantities,

but in the latter case explain how you obtained the relevant quantities.)

(f) [4 marks] Produce the four default diagnostic plots (Residuals vs Fitted Values plot, Normal Q-Q plot,

Scale-Location plot and Residuals versus Leverages plot) for the quadratic model. Briefly (1-2 sentences)

comment on each plot.

(g) [2 marks] Next fit the model

Pricej = —0 + —1Mileagej + —2Mileage2

j + —3Agej + —4Age2

j + ‘j ,

where j = 1,..., 172. Explain how to use the information provided in the model summary output for this

model to obtain an unbiased estimate for the variance of the errors. Give the numerical value of the unbiased

estimate for the variance of the errors.

(h) [7 marks] Perform a hypothesis test at a 5% significance level to decide whether the quadratic term in

Age is needed in the model in (g). In your answer state clearly

? the null and the alternative hypothesis;

? the definition of the relevant test statistic;

? the distribution of the test statistic under the null hypothesis;

? the observed value of the test statistic;

? the corresponding p-value;

? the outcome of the test and

? the conclusion you draw from the test.

2

(i) [2 marks] Use the function influenceIndexPlot from the car package applied to the fitted model in

(g) to produce an index plot of the leverages. Which are the observations with the six highest leverages?

(Hint: use the option list=(n=6) in the command influenceIndexPlot to label the observations with the

six highest leverages.)

(j) [3 marks] Produce a scatterplot of Age against Mileage such that the observations with the six highest

leverages identified in (i) have a dierent

colour from the other data points. How would you characterise

these observations in terms of their age and mileage?

(k) [2 marks] Produce an index plot of the Cook’s distances for the model in (g). Which are the observations

with the three highest Cook’s distances?

(l) [3 marks] For the model in (g) use the command influencePlot(model, id=list(n=3)) to produce a

bubble plot that flags up the datapoints with the three largest absolute studentised residuals, the datapoints

with the three highest leverages and the datapoints with the three highest Cook’s distance. Explain in terms

of their leverage and residual, why the observations identified in (k) have the highest Cook’s distance?

(m) [2 marks] Consider a new model produced by adding the explanatory variables MOT, ABS and Sunroof

to the model in (g). Give the R code that you would use to fit the model and to perform a hypothesis test to

decide whether the new model is a significant improvement over the model in (g).

(n) [3 marks] Perform a forward stepwise variable selection using the AIC as the model selection criterion.

Use as the minimal model

Pricej = —0 + —1Mileagej + —3Agej + ‘j for j = 1, . . . , n.

As the maximal model use the model in (m). Include the output in your report. Which model is selected as

the final model and what value does the AIC take for this model?

3


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp