联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2019-10-03 10:46

2.6. EXERCISES 81

Exercises

iConceptual Exercises

2.1- 2.7 True or False? Each of the statements in Exercises 2.1- 2.7 is either true or false.

For each statement, indicate whether it is true or false and, if it is false, give a reason.

2.1 If dataset A has a larger correlation between Y and X than dataset B, then the slope between

Y and X for dataset A will be larger than the slope between Y and X for dataset B?

2.2 The degrees of freedom for the model is always 1 for a simple linear regression model.

, 2.3 The magnitude of the critical value (t*) used to compute a confidence interval for the slope

of a regression model decreases as the sample size increases.

2.4 The variability due to error (SSE) is always smaller than the variation explained by the

model (SSA1odel).

2.5 If the size of the typical error increases, then the prediction interval for a new observation

becomes narrower.

2.6 For the same value of the predictor, the 95% prediction interval for a new observation is

always wider than the 95% confidence interval for the mean response.

2. 7 If the correlation between X 1 and Y is greater (in magnitude) than the correlation between X2

and Y , t hen the coefficient of determination for regressing Y on X 1 is greater than the coefficient

of determination for regressing Yon X2.

2.8 Using correlation. A regression equation was fit to a set of data for which the correlation,

r. between X and Y was 0.6. Which of the following must be true?

a. The slope of the regression line is 0.6.

b. The regression model explains 60% of the variability in Y.

c. The regression model explains 36% of the variability in Y.

d. At least half of the residuals are smaller than 0.6 in absolute value.

2.9 Interpreting the size of r 2 ?

a . Does a high value of r 2, say, 0.90 or 0.95, indicate that a linear relationship is the best possible

model for the data? Explain.

b. Does a low value of r 2, say, 0.20 or 0.30, indicate that some relationship other than linear

would be the best model for the data? Explain.

3.9. EXERCISES 14!)

3.9 Exercises

Conceptual Exercises

3.1 Predicting a statistics final exam grade. A st.at.i:;tics professor assigned various grades

during the scmestff indudi11g a midtl'rm exam (out of 100 points) and a logistic regrcssiou project

(out of 30 points). The prcdictiou equation below was lit, using da tn fr0111 24 students in the clru;s,

to predict t he fiual exam score (out of 100 poillts) bm;cd Oil t.hl' midterm aud project gra<lcs:

F~l = l 1.0 + 0.53 · l\lidl er·111 + 1.20 · Projecl.

a. What would this tell you about a studcut who got perfect ::;cores Oil the midterm and project?

b. ~lichael got a grade of 87 Oil his midterm, 21 on the project, and an 80 on the final. Compute

his rC'sidual Hild write a sentence to explain what that value means in Michael's case.

3.2 Predicting a statistics final exam grade (continued). Does t he prediction equation for

final exam scores in Exercise 3.1 suggest that the project score has a stronger relationship wit h the

final exam than the midt erm exam? Explain why or why not.

3.3 Breakfast cereals. A regression model was fit to a sample of breakfast cereals. The response

rnriable Y is calories per serving. The predictor variables are X 1, grams of sugar per serving, a nd

X2. grams of fiber per serving. The fitted regression model is

Y = 109. 3 + 1.0 · X 1 - 3. 7 · X 2

In the context of this setting, interpret - 3.7, the coefficient of X2. That is. describe how fiber is

related to calories per serving, in the presence of the sugar variable.

3.4 Adjusting R2 ? Decide if the following statements are true or fabc. and explain why:

a. For a multiple regression problem, the adjusted coefficient of determination, R2 adj , will always

be smaller than t he regular, unadjusted R2.

b. If we fit a multiple regression model and t hen a.<ld a new predictor to t he model, the adjusted

coefficient of determination, R2 adj , will always increal:ie.

3.5 Body measurements. Suppose t hat you are interested in predicting the percentage of body

fat (BodyFat) 011 a man using t he explanatory variables waist size ( Waist) and Height.

a. Do you think that BodyFat and Waist are positively correlated? Explain.

b. For a fixed waist size (say, 38 inches), would you expect BodyFat to be positively or negatively

correlated with a man's Height? Explain why.

3.9. E,XE RCISES 151

c. Predict t he amount of titanium (Titanium) in a well based on a possible quadratic relationship

with the distance (Afi les ) from a mining 8ite.

d. Predict the amount of sulfide (Sulfide) in a well based on Y ear, distance (Miles) from a

mining site, depth (Depth) of the well, and any interactions of pairs of explanatory variables.

3 .9 Degrees of freedom for well water models. Suppose that the environmental expert in

Exercise 3.8 gives you data from 198 wells. Identify the degrees of freedom for error in each of the

models from the previous exercise.

3.10 Predicting faculty salaries. A dean at a small liberal arts college is interested in fitting

a multiple regression model to try to predict salaries for faculty members. If the residuals are

unusually large for any individual faculty member, then adjustments in the person's annual salary

are considered.

a . Identify the model for predicting Salary from age of the faculty member (Age) , years of

experience (Seniority), number of publications (Pub) , and an indicator variable for gender

(!Gender). The dean wants this initial model to include only pairwise interaction terms.

b. Do you think that Age and Seniority will be correlated? Explain.

c. Do you think that Seniority and Pub will be correlated? Explain.

d . Do you think that the dean will be happy if the coefficient for !Gender is significantly different

from zero? Explain.

Guided Exercises

3.11 Active pulse rates. The computer output below comes from a study to model Active

pulse rates ( after climbing several flights of stairs) based on resting pulse rate ( Rest in beats per

minute), weight ( Wgt in pounds), and amount of Exercise (in hours per week). The data were

obtained from 232 students taking Stat2 courses in past semesters.

The regression equation is Active= 11.8 + 1.12 Rest+ 0.0342 Wgt - 1.09 Exercise

Predictor Coef SE Coef T p

Constant 11.84 11.95 0.99 0.323

Rest 1.1194 0 .1192 9.39 0.000

Wgt 0.03420 0.03173 1.08 0.282

Exercise -1.085 1.600 -0.68 0.498

S = 15 . 0452 R-Sq = 36.9% R-Sq(adj) = 36.1%

1.)2 Clf A PT EH :J. ft.1ULTIPLE REGfl£s ,

.Sf<J,\

~ . 1 . 1 · t rpret the result in the conte t a. CRt t lP h~·pothesC':; that tJ = O ,·C'rs11s /h -=I=- 0 am m e . x of tb· . I l , 2 ? ? r . rm?ar rnockl are satisfied for th Ill

pro 1 cm. \ ou ma.Y assu111c? that thC' concl1t1011s ior ?

1 1 ese data.

h . Construct an<l interpret a 90% confid<'nce interval for t he coefficient fh in this model.

\\·1 · . I . d. t c r H- ?JOO-pound student who c · mt act1yc p 11lsc> rate wo uld t 111s mode pie ic 10 < - exercises 7 hours PC'r week a nd ha~ a rest ing pube rate of 7G bC'ats per minute?

3.12 Major Le.ague Baseball winning pe rcentage. Ju Example 3.1, we considered a model£

t hC' winning p<'rcentagcs of football tPams based on measures of ~ffen~iv~ (PointsFor) and defensi~:

(PomtsAgainst) abilit~-. The tile MLB2007Standings contams s1m1lar data _on _many variables

for I\1ajor League Oa::;C'ball (~ILB) teams from the 2007 regular season. The wmnmg percentages

a r0 in thP \-aria bl<> H ·inPct und scoring varia bles include R-uns (scored by a team for the season)

a nd ER.4 (essentially the aYerage nms against a team per game).

a . Fit a multiple regression model to predict WinPct based on Runs and ERA. Write down the

pr<:"diction equation.

b. Tiu-- Boston Red Sox had a winning percentage of 0.593 for the 2007 season. They scored 867

runs and had an ERA of 3.87. Use this information and the fitted model to find the residual

for the Red Sox.

c. Comment on the effectiveness of each of the two predictors in this model. vVould you recommend

dropping one or the other (or both) from the model? Explain why or why not.

d . Does this model for team winning percentages in baseball appear to be more or less effertiw

than the model for footba ll t eams in Example 3.1 on page 95? Give a numerical justification

for your answer.

3 . 13 E nrollm ents in mathematics courses. In Exercise 2.23 on page 85. we consider a model

to predict spring enrollment in mathematics courses based on the fall enrollment. The residuals for

that model showed a pattern of growing over the years in the data. Thus, it might be beneficial

to add t h e academic year variable AY ear to our model and fit a multiple regression. The data are

pro\·ided in the file MathEnrollment.

a . Fit a multiple regression model for predicting spring enrollment (Spring ) from fall enrollment

(Fall) a nd academic year (AYear), after removing the data from 2003 that had special

circumstances. Report t he fitted prediction equat ion.

b . Prep are a ppro priate residual plots and comment on the conditions for inference. Did the

slig ht problems with the residual plots (e.g., increasing residuals over time) that we noticed

for t he simple linear model disappear?

1

3 !) F.\:FllCISES 153

S. l.t Enmllments in mathematics courses (continued). Refer to the model in Exercise 3.13

t<1 pn?dirt Spring mathematics enrollments with a two-predictor model based 011 Fall enrollments

8 11d aca<lPmic year (AYear) for the data in MathEnrollment.

a. \\'hat percent of the variability in spring enrollment is explai11ed by the multiple regression

model based on fall enrollment and academic year?

b. What is the size of the typical error for thh, multiple regm,sion model?

c. Provide the ANOVA table for partitioning the total variability in spring enrollment based on

this model and interpret the associated F-test.

d. Are the regression coefficients for both explanatory variables significantly different from zero?

ProYide appropriate hypotheses, test statistics, and p-values in order to make your conclusion.

3.15 More breakfast cereal. The regression model in Exercise 3.3 was fit to a sample of 36

breakfast cereals ,vith calories per serving as the response variable. The two predictors were grams

of sugar per serving and grams of fiber per serving. The partition of the sums of squares for this

model is

SSTotal

17190

SSModel + SSE

9350 + 7840

a. Calculate R2 for this model and interpret the value in the context of this setting.

b. Calculate the regression standard error of this multiple regression model.

c. Calculate the F-ratio for testing the null hypothesis that neither sugar nor fiber is related to

the calorie content of cereals.

d. Assuming the regression conditions hold, the p-value for the F-ratio in (c) is about 0.000002.

Interpret what this tells you about the variables in this situation.

3.16 Combining explanatory variables. Suppose that X1 and X2 are positively related with

X 1 = 2X2 - 4. Let Y = 0.5X1 + 5 summarize a positive linear relationship between Y and X1.

a. Substitute the first equation into the second to show a linear relationship between Y and X2.

Comment on the direction of the association between Y and X2 in the new equation.

L. Now add the original two equations and rearrange terms to give an equation in the form

Y = aX1 + bX2 + c. Are the coefficients of X1 and X2 both in the direction you would

expect based on the signs in the separate equations? Combining explanatory variables that

are related to each other can produce surprising results.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp