Hadoop Questions and Answers Part-24

1. An input _________ is a chunk of the input that is processed by a single map.
a) textformat
b) split
c) datanode
d) all of the mentioned

Answer: b
Explanation: Each split is divided into records, and the map processes each record—a key-value pair—in turn.

2. Point out the wrong statement.
a) If V2 and V3 are the same, you only need to use setOutputValueClass()
b) The overall effect of Streaming job is to perform a sort of the input
c) A Streaming application can control the separator that is used when a key-value pair is turned into a series of bytes and sent to the map or reduce process over standard input
d) None of the mentioned

Answer: d
Explanation: If a combine function is used then it is the same form as the reduce function, except its output types are the intermediate key and value types (K2 and V2), so they can feed the reduce function.

3. An ___________ is responsible for creating the input splits, and dividing them into records.
a) TextOutputFormat
b) TextInputFormat
c) OutputInputFormat
d) InputFormat

Answer: d
Explanation: As a MapReduce application writer, you don’t need to deal with InputSplits directly, as they are created by an InputFormat.

4. ______________ is another implementation of the MapRunnable interface that runs mappers concurrently in a configurable number of threads.
a) MultithreadedRunner
b) MultithreadedMap
c) MultithreadedMapRunner
d) SinglethreadedMapRunner

Answer: c
Explanation: A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs, which it passes to the map function.

5. Which of the following is the only way of running mappers?
a) MapReducer
b) MapRunner
c) MapRed
d) All of the mentioned

Answer: b
Explanation: Having calculated the splits, the client sends them to the jobtracker.

6. _________ is the base class for all implementations of InputFormat that use files as their data source.
a) FileTextFormat
b) FileInputFormat
c) FileOutputFormat
d) None of the mentioned

Answer: b
Explanation: FileInputFormat provides implementation for generating splits for the input files.

7. Which of the following method add a path or paths to the list of inputs?
a) setInputPaths()
b) addInputPath()
c) setInput()
d) none of the mentioned

Answer: b
Explanation: FileInputFormat offers four static convenience methods for setting a JobConf input paths.

8. ___________ takes node and rack locality into account when deciding which blocks to place in the same split.
a) CombineFileOutputFormat
b) CombineFileInputFormat
c) TextFileInputFormat
d) None of the mentioned

Answer: b
Explanation: CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job.

9. Point out the correct statement.
a) With TextInputFormat and KeyValueTextInputFormat, each mapper receives a variable number of lines of input
b) StreamXmlRecordReader, the page elements can be interpreted as records for processing by a mapper
c) The number depends on the size of the split and the length of the lines.
d) All of the mentioned

Answer: d
Explanation: Large XML documents that are composed of a series of “records” can be broken into these records using simple string or regular-expression matching to find start and end tags of records.

10. The key, a ____________ is the byte offset within the file of the beginning of the line.
a) LongReadable
b) LongWritable
c) ShortReadable
d) All of the mentioned

Answer: b
Explanation: The value is the contents of the line, excluding any line terminators (newline, carriage return), and is packaged as a Text object.