Data Science Questions and Answers Part-13

1. Which of the following is a trait of tidy data?
a) each variable in one column
b) each observation in different row
c) one table for each kind of variable
d) none of the mentioned

Answer: b
Explanation: The summary could be the sum of the observations, the number of occurrences, their mean value, and so on.

2. Which of the following package is used for tidy data?
a) tidyr
b) souryr
c) NumPy
d) all of the mentioned

Answer: a
Explanation: tidyr is used for tidy data with spread and gather functions.

3. Point out the wrong statement.
a) Tidy datasets are all alike but every messy dataset is messy in its own way
b) Most statistical datasets are data frames made up of rows and columns
c) Tidy datasets provide a standardized way to link the structure of a dataset with its semantics
d) None of the mentioned

Answer: d
Explanation: The tidy data standard has been designed to simplify the development of data analysis tools that work well together.

4. Which of the following process involves structuring datasets to facilitate analysis?
a) Data tidying
b) Data mining
c) Data booting
d) All of the mentioned

Answer: a
Explanation: The principles of tidy data provide a standard way to organize data values within a dataset.

5. Strange binary file generated from machines is an example of tidy data.
a) True
b) False

Answer: b
Explanation: Data sets stored in spreadsheets, such as Microsoft’s Excel, are binary, not raw ASCII data files.

6. Which of the following is the most common problem with messy data?
a) Column headers are values
b) Variables are stored in both rows and columns
c) A single observational unit is stored in multiple tables
d) All of the mentioned

Answer: d
Explanation: Real datasets can, and often do, violate the three precepts of tidy data in almost every way imaginable.

7. tidyr is a reframing of _______ designed to accompany the tidy data framework.
a) reshape5
b) dplyr
c) reshape2
d) all of the mentioned

Answer: c
Explanation: tidyr does less reframing than reshape2.

8. Raw data in the real-world is tidy and properly formatted.
a) True
b) False

Answer: a
Explanation: Data analysis is not a goal in itself; the goal is to enable the business to make better decisions.

9. Which of the following function is used for loading flat files?
a) read.data
b) read.sheet
c) read.table
d) none of the mentioned

Answer: c
Explanation: This reads data in to the RAM.

10. Point out the correct statement.
a) XLConnect package has more options for manipulating access files
b) XLConnect vignette package can also be used for manipulating excel files
c) write.xlsx write out an excel file with different argument
d) None of the mentioned

Answer: c
Explanation: write.xlsx write out an excel file with similar argument.