R10 练习: 简单线性回归
考纲范围
- Describe a simple linear regression model, how the least squares criterion is used to estimate regression coefficients, and the interpretation of these coefficients.
- Explain the assumptions underlying the simple linear regression model, and describe how residuals and residual plots indicate if these assumptions may have been violated.
- Calculate and interpret measures of fit and formulate and evaluate tests of fit and of regression coefficients in a simple linear regression.
- Describe the use of analysis of variance (ANOVA) in regression analysis, interpret ANOVA results, and calculate and interpret the standard error of estimate in a simple linear regression.
- Calculate and interpret the predicted value for the dependent variable, and a prediction interval for it, given an estimated linear regression model and a value for the independent variable.
- Describe different functional forms of simple linear regressions.
Q1.
Ordinary least squares (OLS) is an algorithm to estimate the intercept coefficient and the slope coefficient of the regression model to minimize the sum of squares errors (SSE). SSE refers to:
A. the sum of squared vertical distances between the observations of the dependent variable and the regression line.
B. the sum of squared differences between the predicted value of the dependent variable by the regression and the mean value of the dependent variable.
C. total sum of squares.
查看答案与解析
答案:A
解析:SSE(残差平方和)的定义。
选项 判断 解析 A ✓ SSE = ,即因变量观测值与回归线预测值之间的垂直距离的平方和 B ✗ 这是SSR(回归平方和)的定义: C ✗ TSS(总平方和)= SSR + SSE =
Q2.
An analyst is investigating the correlation between a company’s net profit margin and its Research & Development (R&D) expenditures. The analyst gathers data from various companies within an industry and calculates two key ratios: the ratio of R&D expenditures to revenues (RDR) and the net profit margin (NPM) for 11 companies. The objective is to provide an explanation for the changes in NPM by examining the changes in RDR among the companies. If the covariance between NPM and RDR is -0.00048, the standard deviations of NPM and RDR are 0.00069 and 0.01613, respectively. What is the slope coefficient for this linear regression model?
A. -1.84.
B. -0.03.
C. -0.7.
查看答案与解析
答案:A
解析:斜率系数的计算。NPM是因变量,RDR是自变量。
计算过程:
选项 判断 解析 A ✓ b₁ = Cov/Var(X) = -0.00048/0.01613² = -1.84 B ✗ -0.03可能是用Cov除以了错误的值 C ✗ -0.7可能是相关系数
Q3.
Gary, a quantitative analyst, is using the following simple linear regression to find the relationship between a firm’s return on equity (ROE) and debt-to-equity (D/E) ratio:
where ROE is stated in percentages and D/E is in decimals.
If D/E is 2, the predicted value of the firm’s ROE is ___; If D/E changes from 1.5 to 2, the change of ROE is ___; If D/E is 4 and the actual ROE is 8.2%, the residual is ___.
A. 2.3%, 4.25%, 0.2%.
B. 5%, 0.75%, -0.2%.
C. 5%, 0.75%, 0.2%.
查看答案与解析
答案:C
解析:线性回归模型的应用和残差计算。
计算过程:
- 当D/E = 2:
- D/E从1.5变到2,变化量 = 0.5:
- 当D/E = 4: ,残差 = 实际值 - 预测值 = 8.2% - 8% = 0.2%
选项 判断 解析 A ✗ 预测值和变化量计算错误 B ✗ 残差符号错误,应为+0.2%而非-0.2% C ✓ 5%, 0.75%, 0.2% 全部正确
Q4.
Erik and Eric are discussing some topics about simple linear regression. Erik states that there are four key assumptions and presents these assumptions as follows:
Assumption I: the relationship between the dependent variable and the independent variable is linear. Assumption II: the variance of the error term is the same for all observations. Assumption III: the residuals are correlated across observations. Assumption IV: both the error term and the explaining variable are normally distributed.
Which assumptions are correct?
A. I and III.
B. I and II.
C. II and IV.
查看答案与解析
答案:B
解析:简单线性回归的假设条件。
选项 判断 解析 I ✓ 正确,线性关系是基本假设 II ✓ 正确,同方差性(homoskedasticity)是基本假设 III ✗ 不正确,假设应该是残差不相关(uncorrelated),不是相关 IV ✗ 不正确,假设是误差项正态分布,不要求自变量正态分布
Q5.
Which of the following statements is not an assumption of a simple linear regression model?
A. Homoskedasticity: The variance of the regression residuals is the same across the observations.
B. Dependence: The variables, Y and X, are dependent on one another.
C. Normality: The regression residuals are normally distributed.
查看答案与解析
答案:B
解析:识别哪个不是简单线性回归的假设。
选项 判断 解析 A 是假设 同方差性是回归假设之一 B 不是假设 回归假设中没有”Y和X相互依赖”这一条。实际上假设X是非随机的(或与误差项无关) C 是假设 残差正态分布是回归假设之一
Q6.
The analysis of variance (ANOVA) of a simple linear regression is presented in the following table:
| Source | df | Sum of Squares | Mean Square |
|---|---|---|---|
| Regression | 1 | 0.068 | 0.068 |
| Residual | 25 | 0.025 | 0.001 |
| Total | 26 | 0.093 |
The coefficient of determination is closest to:
A. 68.
B. 0.27.
C. 0.73.
查看答案与解析
答案:C
解析:决定系数(R²)= SSR / SST。
计算过程:
选项 判断 解析 A ✗ 68是MSR/MSE = F统计量 B ✗ 0.27 = SSE/SST = 1 - R² C ✓ R² = 0.068/0.093 = 0.73
Q7.
Which of the following statements is least correct about the coefficient of determination?
A. Coefficient of determination measures the percentage change of the dependent variable attributed to the independent variable.
B. The minimum and maximum values of the coefficient of determination are 0 and 1, respectively.
C. The lower coefficient of determination, the better fitness.
查看答案与解析
答案:C
解析:决定系数的性质。
选项 判断 解析 A ✓ 正确,R²衡量因变量变异中由自变量解释的比例 B ✓ 正确,R²的取值范围是[0, 1] C ✗ 不正确,R²越高拟合度越好,不是越低越好
Q8.
An analyst runs a simple linear regression to explain the price movements of a manufacturing company by using monthly outputs over the past 60 months. The regression result shows that the regression sum of squares is 0.0562, and the sum of squared errors is 0.0229. The F-statistic is closest to:
A. 142.3406.
B. 144.7948.
C. 147.2489.
查看答案与解析
答案:A
解析:F统计量 = MSR / MSE。
计算过程:
- n = 60, k = 1(简单回归)
- MSR = SSR / df_regression = 0.0562 / 1 = 0.0562
- MSE = SSE / df_residual = 0.0229 / (60-2) = 0.0229 / 58 = 0.000395
- F = MSR / MSE = 0.0562 / 0.000395 = 142.34
选项 判断 解析 A ✓ F = 0.0562 / (0.0229/58) = 142.34 B ✗ 可能使用了错误的自由度 C ✗ 可能使用了错误的自由度
Q9.
An economist is analyzing the relationship between household incomes and the expenditure of households. The results of this estimation are based on 400 observations provided below.
| Coefficient | Standard Error | t-Statistic | p-value | |
|---|---|---|---|---|
| Intercept | 380.5269 | 212.3630 | 1.791870 | 0.1109 |
| Household incomes | 0.484532 | 0.032382 | 14.96298 | 0.0000 |
Which of the following should the economist conclude?
A. The average household income is 380.5269.
B. The estimated slope coefficient is different from the one at the 0.05 level of significance.
C. The household incomes explain 48.45% of the variation in household expenditures.
查看答案与解析
答案:B
解析:回归结果的解读。
选项 判断 解析 A ✗ 380.5269是截距项,不是平均家庭收入 B ✓ 斜率系数的p-value = 0.0000 < 0.05,在5%显著性水平下显著不为零(即与0显著不同,或者说与”the one”即零假设中的值不同) C ✗ 0.484532是斜率系数,不是R²。不能直接说它解释了48.45%的变异
Q10.
An analyst collects 200 observations and runs a simple linear regression to forecast the return of a beverage company, Shining Star, by using monthly CPI. The regression parameters are b₀ = 2.33%, b₁ = 0.45, where b₀ and b₁ indicate the estimated intercept and slope respectively. The R-square of the regression is 49%. If he wants to test whether the correlation between return and CPI is equal to zero, the test statistic should be:
A. -13.79.
B. 13.79.
C. 13.83.
查看答案与解析
答案:B
解析:在简单线性回归中,检验相关系数是否为零等价于检验斜率系数是否为零。
计算过程:
- R² = 49%, 所以 r = √0.49 = 0.7(正值,因为b₁ > 0)
- n = 200, df = n - 2 = 198
选项 判断 解析 A ✗ 斜率为正,相关系数为正,t统计量应为正 B ✓ t = 0.7 × √198 / √0.51 = 13.79 C ✗ 计算偏差
Q11.
For a 0.05 level of significance, the critical F-value for the test of whether the simple linear regression model is a good fit is 7.71. Based on the Exhibit below, is the F-test significant at the 0.05 significance level?
| Source | df | Sum of Squares | Mean Square |
|---|---|---|---|
| Regression | 1 | 123.9 | 123.9 |
| Residual | 4 | 26.2 | 6.55 |
| Total | 5 | 150.1 |
A. Yes. With a calculated F-statistic of 18.92, we conclude that the slope of the model is different from zero.
B. No. With a calculated F-statistic of 18.92, we cannot conclude that the slope of the model is different from zero.
C. No. With a calculated F-statistic of 5.34, we cannot conclude that the slope of the model is different from zero.
查看答案与解析
答案:A
解析:F检验判断回归模型的整体显著性。
计算过程:
F = 18.92 > 7.71(临界值),拒绝H₀,回归模型整体显著。
选项 判断 解析 A ✓ F = 18.92 > 7.71,模型显著,斜率显著不为零 B ✗ F统计量超过临界值,应该拒绝H₀ C ✗ F统计量计算错误
Q12.
Teddy, a stock researcher, notices that the return on the stock of Ping An Bank is correlated with the bank’s EPS. He gets the following analysis of variance (ANOVA) table through linear regression analysis.
| Source | df | Sum of Squares (SS) | Mean Square (MS) |
|---|---|---|---|
| Regression | 1 | 0.04912 | 0.0492 |
| Error | 10 | 1.0528 | 0.1053 |
| Total | 11 | 1.1020 |
Based on the table, the standard error of estimate is closest to:
A. 0.3245.
B. 0.4216.
C. 1.1378.
查看答案与解析
答案:A
解析:估计标准误差(SEE)= sqrt(MSE)。
计算过程:
选项 判断 解析 A ✓ SEE = √0.1053 = 0.3245 B ✗ 可能是√(SSE/(n-1))的结果 C ✗ 可能是√(SST)的结果
Q13.
Which of the following statements is the least correct about the standard error of the estimate (SEE)?
A. The standard error of the estimate is the square root of the sum of squares error (SSE).
B. The smaller the better fit of the model.
C. SEE is an absolute measure of fitness.
查看答案与解析
答案:A
解析:估计标准误差的定义和性质。
选项 判断 解析 A ✗ 不正确。SEE = √(SSE/(n-2)) = √MSE,不是√SSE。需要除以自由度 B ✓ 正确,SEE越小说明模型拟合越好 C ✓ 正确,SEE是绝对度量(与R²的相对度量相对)
Q14.
An analyst analyzed the impact of China’s household income on consumer spending. He collected the data of per capita disposable income (PCDI) and per capita consumption expenditure (PCCE) of 100 Chinese cities in 2020 to run a linear regression with PCCE as the dependent variable and PCDI as the independent variable, and obtained the following results:
| Coefficients | Standard Error | t-Statistic | p-Value | |
|---|---|---|---|---|
| Intercept | 1,353.2316 | 680.3423 | 1.9890 | 0.0001 |
| PCDI (CNY) | 0.7856 | 0.0264 | 29.7576 | 0.0000 |
Based on the regression results, if the per capita disposable income for a Chinese city is 43,834 RMB, the predicted per capita consumption expenditure is closest to:
A. 35,789.2220.
B. 32,234.9873.
C. 48187.2316.
查看答案与解析
答案:A
解析:使用回归方程进行预测。
计算过程:
最接近A选项35,789.2220(微小差异可能来自四舍五入)。
选项 判断 解析 A ✓ 1353.2316 + 0.7856 × 43834 ≈ 35,789 B ✗ 计算错误 C ✗ 可能使用了错误的系数
Q15.
Gloria, a quantitative analyst, notices that the gross profit margin of a real estate developer is correlated with the GDP growth rate. Based on 20 observations, she conducts a simple linear regression using the gross profit margin as the dependent variable, and the GDP growth rate as the independent variable. The regression results are presented in the table below:
| Coefficient | Standard Error | |
|---|---|---|
| Intercept | 0.0212 | 0.556 |
| GDP Growth Rate | 0.253 | 0.108 |
Notes:
- The absolute value of the critical value for the t-statistic with 18 degrees of freedom is 2.101 at the 5% level of significance.
- The standard error of the forecast () is 0.07324.
If the forecasted value of the GDP growth rate is 2%, the 95% prediction interval for the actual gross profit margin is closest to:
A. -0.1341 to 0.1490
B. -0.1276 to 0.1801
C. -0.0922 to 0.1573
查看答案与解析
答案:B
解析:预测区间的计算。
计算过程:
- 预测值:
- 预测区间 =
- \times 0.07324 = 0.15389$
- 下限 = 0.02626 - 0.15389 = -0.1276
- 上限 = 0.02626 + 0.15389 = 0.1801
选项 判断 解析 A ✗ 预测值或临界值计算错误 B ✓ [-0.1276, 0.1801] C ✗ 计算错误
Q16.
When illustrating the relative change in the dependent variable for a relative change in the independent variable, which of the following functional forms is the most appropriate?
A. The log-lin model
B. The lin-log model
C. The log-log model
查看答案与解析
答案:C
解析:不同函数形式的回归模型解释。
选项 判断 解析 A ✗ Log-lin模型:ln(Y) = b₀ + b₁X,X的绝对变化导致Y的相对(百分比)变化 B ✗ Lin-log模型:Y = b₀ + b₁ln(X),X的相对变化导致Y的绝对变化 C ✓ Log-log模型:ln(Y) = b₀ + b₁ln(X),X的相对变化导致Y的相对变化。b₁是弹性系数
Q17.
Rui Wen is studying the relationship between the earnings per share (EPS) of companies and their capital expenditure (CAPEX). He collects a sample of 100 listed companies and runs a model as follows:
It is known that the slope coefficient is significantly different from zero at a 0.05 significance level. Which of the following statements is most likely correct?
A. The variation of the natural log of capital expenditure explains the variation of EPS.
B. The variation of EPS explains the variation of the natural log of capital expenditure.
C. The variation of capital expenditure explains the variation of EPS.
查看答案与解析
答案:A
解析:lin-log模型中变量关系的解释。
选项 判断 解析 A ✓ 正确。模型中自变量是ln(CAPEX),因变量是EPS,所以ln(CAPEX)的变异解释EPS的变异 B ✗ 因果关系方向反了,EPS是因变量不是自变量 C ✗ 模型中使用的是ln(CAPEX)而非CAPEX本身
Q18.
Xin, an equity analyst, wants to figure out the relationship between the annual sales of ABC company and the annual GDP. He runs two models as follows:
| Model 1 | Model 2 |
|---|---|
Which of the following least likely provides evidence to support the conclusion that Model 2 fits the data better than Model 1?
A. The coefficient of determination () of model 2 is higher than that of model 1.
B. The slope coefficient of model 2 is higher than that of model 1.
C. The F-statistic (for testing the overall model significance) of model 2 is higher than that of model 1.
查看答案与解析
答案:B
解析:比较不同函数形式回归模型的拟合度。
选项 判断 解析 A 支持 R²越高,模型解释力越强。但注意,因变量不同时(Sales vs ln(Sales)),R²不能直接比较 B 不支持 斜率系数的大小与模型拟合度无关。斜率大不代表拟合好 C 支持 F统计量越大,模型整体显著性越高