Hadoop Questions and Answers Part-25

1. _________ is the output produced by TextOutputFor mat, Hadoop default OutputFormat.
a) KeyValueTextInputFormat
b) KeyValueTextOutputFormat
c) FileValueTextInputFormat
d) All of the mentioned

Answer: b
Explanation: To interpret such files correctly, KeyValueTextInputFormat is appropriate.

2. Point out the wrong statement.
a) Hadoop sequence file format stores sequences of binary key-value pairs
b) SequenceFileAsBinaryInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence file’s keys and values as opaque binary objects
c) SequenceFileAsTextInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence file’s keys and values as opaque binary objects.
d) None of the mentioned

Answer: c
Explanation: SequenceFileAsBinaryInputFormat is used for reading keys, values from SequenceFiles in binary (raw) format.

3. __________ is a variant of SequenceFileInputFormat that converts the sequence file’s keys and values to Text objects.
a) SequenceFile
b) SequenceFileAsTextInputFormat
c) SequenceAsTextInputFormat
d) All of the mentioned

Answer: b
Explanation: With multiple reducers, records will be allocated evenly across reduce tasks, with all records that share the same key being processed by the same reduce task.

4. _________ class allows you to specify the InputFormat and Mapper to use on a per-path basis.
a) MultipleOutputs
b) MultipleInputs
c) SingleInputs
d) None of the mentioned

Answer: b
Explanation: One might be tab-separated plain text, the other a binary sequence file. Even if they are in the same format, they may have different representations, and therefore need to be parsed differently.

5. ___________ is an input format for reading data from a relational database, using JDBC.
a) DBInput
b) DBInputFormat
c) DBInpFormat
d) All of the mentioned

Answer: b
Explanation: DBInputFormat is the most frequently used format for reading data.

6. Which of the following is the default output format?
a) TextFormat
b) TextOutput
c) TextOutputFormat
d) None of the mentioned

Answer: c
Explanation: TextOutputFormat keys and values may be of any type.

7. Which of the following writes MapFiles as output?
a) DBInpFormat
b) MapFileOutputFormat
c) SequenceFileAsBinaryOutputFormat
d) None of the mentioned

Answer: c
Explanation: SequenceFileAsBinaryOutputFormat writes keys and values in raw binary format into a SequenceFile container.

8. The split size is normally the size of a ________ block, which is appropriate for most applications.
a) Generic
b) Task
c) Library
d) HDFS

Answer: d
Explanation: FileInputFormat splits only large files(Here “large” means larger than an HDFS block).

9. Point out the correct statement.
a) The minimum split size is usually 1 byte, although some formats have a lower bound on the split size
b) Applications may impose a minimum split size
c) The maximum split size defaults to the maximum value that can be represented by a Java long type
d) All of the mentioned

Answer: a
Explanation: The maximum split size has an effect only when it is less than the block size, forcing splits to be smaller than a block.

10. Point out the wrong statement.
a) Hadoop works better with a small number of large files than a large number of small files
b) CombineFileInputFormat is designed to work well with small files
c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job
d) None of the mentioned

Answer: c
Explanation: If the file is very small (“small” means significantly smaller than an HDFS block) and there are a lot of them, then each map task will process very little input, and there will be a lot of them (one per file), each of which imposes extra bookkeeping overhead.