Hadoop Questions and Answers Part-17

1. Point out the correct statement.
a) Mapper maps input key/value pairs to a set of intermediate key/value pairs
b) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods
c) Mapper and Reducer interfaces form the core of the job
d) None of the mentioned

Answer: d
Explanation: The transformed intermediate records do not need to be of the same type as the input records.

2. The Hadoop MapReduce framework spawns one map task for each __________ generated by the InputFormat for the job.
a) OutputSplit
b) InputSplit
c) InputSplitStream
d) All of the mentioned

Answer: b
Explanation: Mapper implementations are passed the JobConf for the job via the JobConfigurable.configure(JobConf) method and override it to initialize themselves.

3. Users can control which keys (and hence records) go to which Reducer by implementing a custom?
a) Partitioner
b) OutputSplit
c) Reporter
d) All of the mentioned

Answer: a
Explanation: Users can control the grouping by specifying a Comparator via JobConf.setOutputKeyComparatorClass(Class).

4. Point out the wrong statement.
a) The Mapper outputs are sorted and then partitioned per Reducer
b) The total number of partitions is the same as the number of reduce tasks for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format
d) None of the mentioned

Answer: d
Explanation: All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output.

5. Applications can use the ____________ to report progress and set application-level status messages.
a) Partitioner
b) OutputSplit
c) Reporter
d) All of the mentioned

Answer: c
Explanation: Reporter is also used to update Counters, or just indicate that they are alive.

6. The right level of parallelism for maps seems to be around _________ maps per-node.
a) 1-10
b) 10-100
c) 100-150
d) 150-200

Answer: b
Explanation: Task setup takes a while, so it is best if the maps take at least a minute to execute.

7. The number of reduces for the job is set by the user via _________
a) JobConf.setNumTasks(int)
b) JobConf.setNumReduceTasks(int)
c) JobConf.setNumMapTasks(int)
d) All of the mentioned

Answer: b
Explanation: Reducer has 3 primary phases: shuffle, sort and reduce.

8. The framework groups Reducer inputs by key in _________ stage.
a) sort
b) shuffle
c) reduce
d) none of the mentioned

Answer: a
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

9. The output of the reduce task is typically written to the FileSystem via _____________
a) OutputCollector.collect
b) OutputCollector.get
c) OutputCollector.receive
d) OutputCollector.put

Answer: a
Explanation: The output of the Reducer is not sorted.

10. Which of the following is the default Partitioner for Mapreduce?
a) MergePartitioner
b) HashedPartitioner
c) HashPartitioner
d) None of the mentioned

Answer: c
Explanation: The total number of partitions is the same as the number of reduce tasks for the job.