Data Science Questions and Answers Part-30

1. Point out the correct statement.
a) Asymptotics are used for inference usually
b) Caret includes several functions to pre-process the predictor data
c) The function dummyVars can be used to generate a complete set of dummy variables from one or more factors
d) All of the mentioned

Answer: d
Explanation: The function dummyVars takes a formula and a data set and outputs an object that can be used to create the dummy variables using the predict method.

2. Which of the following can be used to create sub–samples using a maximum dissimilarity approach?
a) minDissim
b) maxDissim
c) inmaxDissim
d) all of the mentioned

Answer: b
Explanation: Splitting is based on the predictors.

3. caret does not use the proxy package.
a) True
b) False

Answer: b
Explanation: caret uses the proxy package.

4. Which of the following function can be used to create balanced splits of the data?
a) newDataPartition
b) createDataPartition
c) renameDataPartition
d) none of the mentioned

Answer: b
Explanation: If the y argument to this function is a factor, the random sampling occurs within each class and should preserve the overall class distribution of the data.

5. Which of the following package tools are present in caret?
a) pre-processing
b) feature selection
c) model tuning
d) all of the mentioned

Answer: d
Explanation: There are many different modeling functions in R.

6. Which of the following function is a wrapper for different lattice plots to visualize the data?
a) levelplot
b) featurePlot
c) plotsample
d) none of the mentioned

Answer: b
Explanation: featurePlot is used for data visualization in caret.

7. Which of the following function can be used to identify near zero-variance variables?
a) zeroVar
b) nearVar
c) nearZeroVar
d) all of the mentioned

Answer: c
Explanation: The saveMetrics argument can be used to show the details and usually defaults to FALSE.

8. Which of the following function can be used to flag predictors for removal?
a) searchCorrelation
b) findCausation
c) findCorrelation
d) none of the mentioned

Answer: c
Explanation: Some models thrive on correlated predictors.

9. Point out the correct statement.
a) findLinearColumns will also return a vector of column positions can be removed to eliminate the linear dependencies
b) findLinearCombos will return a list that enumerates dependencies
c) the function findLinearRows can be used to generate a complete set of row variables from one factor
d) none of the mentioned

Answer: b
Explanation: For each linear combination, it will incrementally remove columns from the matrix and test to see if the dependencies have been resolved.

10. Which of the following can be used to impute data sets based only on information in the training set?
a) postProcess
b) preProcess
c) process
d) all of the mentioned

Answer: b
Explanation: This can be done with K-nearest neighbors.