Data Science Questions and Answers Part-5

1. 3V’s are not sufficient to describe big data.
a) True
b) False

  Discussion

Answer: a
Explanation: IBM data scientists break big data into four dimensions: volume, variety, velocity and veracity.

2. Which of the following focuses on the discovery of (previously) unknown properties on the data?
a) Data mining
b) Big Data
c) Data wrangling
d) Machine Learning

  Discussion

Answer: a
Explanation: Data munging or data wrangling is loosely the process of manually converting or mapping data from one “raw” form into another format that allows for more convenient consumption of the data with the help of semi-automated tools.

3. Beyond Volume, variety and velocity are the issues of big data veracity.
a) True
b) False

  Discussion

Answer: a
Explanation: Data Veracity is uncertain or imprecise data.

4. Point out the correct statement.
a) If equations are known but the parameters are not, they may be inferred with data analysis
b) If equations are not known but the parameters are, they may be inferred with data analysis
c) If equations and parameter are not, they may be inferred with data analysis
d) none of the mentioned

  Discussion

Answer: a
Explanation: Usually the random component of data is measurement error.

5. Which of the following is the top most important thing in data science?
a) answer
b) question
c) data
d) none of the mentioned

  Discussion

Answer: b
Explanation: The second most important is the data.

6. Which of the following approach should be used if you can’t fix the variable?
a) randomize it
b) non stratify it
c) generalize it
d) none of the mentioned

  Discussion

Answer: a
Explanation: If you can’t fix the variable, stratify it.

7. Which of the following is a good way of performing experiments in data science?
a) Measure variability
b) Generalize to the problem
c) Have Replication
d) All of the mentioned

  Discussion

Answer: d
Explanation: Experiments on causal relationships investigate the effect of one or more variables on one or more outcome variables.

8. Which of the following is commonly referred to as ‘data fishing’?
a) Data bagging
b) Data booting
c) Data merging
d) none of the mentioned

  Discussion

Answer: d
Explanation: Data dredging is sometimes referred to as “data fishing”.

9. Which of the following data mining technique is used to uncover patterns in data?
a) Data bagging
b) Data booting
c) Data merging
d) Data Dredging

  Discussion

Answer: d
Explanation: Data dredging, also called as data snooping, refers to the practice of misusing data mining techniques to show misleading scientific ‘research’.

10. If X predicts Y, it does mean X causes Y.
a) True
b) False

  Discussion

Answer: b
Explanation: If X predicts Y, it does not mean X causes Y.