Data Science Questions and Answers Part-27

1. Power is the probability of rejecting the null hypothesis when it is true.
a) True
b) False

Answer: b
Explanation: Power is the probability of rejecting the null hypothesis when it is false.

2. Point out the correct statement.
a) The mean is a measure of central tendency of the data
b) Empirical mean is related to “centering” the random variables
c) The empirical standard deviation is a measure of spread
d) All of the mentioned

Answer: d
Explanation: The process of centering and scaling the data is called “normalizing” the data.

3. Which of the following implies no relationship with respect to correlation?
a) Cor(X, Y) = 1
b) Cor(X, Y) = 0
c) Cor(X, Y) = 2
d) All of the mentioned

Answer: b
Explanation: Correlation is a statistical technique that can show whether and how strongly pairs of variables are related.

4. Normalized data are centered at ___ and have units equal to standard deviations of the original data.
a) 0
b) 5
c) 1
d) 10

Answer: a
Explanation: In statistics and applications of statistics, normalization can have a range of meanings.

5. Point out the wrong statement.
a) Regression through the origin yields an equivalent slope if you center the data first
b) Normalizing variables results in the slope being the correlation
c) Least squares is not an estimation tool
d) None of the mentioned

Answer: c
Explanation: Least squares is an estimation tool.

6. Which of the following is correct with respect to residuals?
a) Positive residuals are above the line, negative residuals are below
b) Positive residuals are below the line, negative residuals are above
c) Positive residuals and negative residuals are below the line
d) All of the mentioned

Answer: a
Explanation: Residuals can be thought of as the outcome with the linear association of the predictor removed.

7. Minimizing the likelihood is the same as maximizing -2 log likelihood.
a) True
b) False

Answer: a
Explanation: Maximizing the likelihood is the same as minimizing 2 log likelihood.

8. Which of the following refers to the circumstance in which the variability of a variable is unequal across the range of values of a second variable that predicts it?
a) Heterogeneity
b) Heteroskedasticity
c) Heteroelasticty
d) None of the mentioned

Answer: b
Explanation: Heteroskedasticity has serious consequences for the OLS estimator.

9. Residuals are useful for investigating best model fit.
a) True
b) False

Answer: b
Explanation: Residuals are useful for investigating poor model fit.

10. Which of the following is the correct formula for total variation?
a) Total Variation = Residual Variation – Regression Variation
b) Total Variation = Residual Variation + Regression Variation
c) Total Variation = Residual Variation * Regression Variation
d) All of the mentioned

Answer: b
Explanation: The complementary part of the total variation is called unexplained or residual.