Hadoop Interview Questions and Answers

1. Mapper implementations are passed the JobConf for the job via the ________ method.
a) JobConfigure.configure
b) JobConfigurable.configure
c) JobConfigurable.configurable
d) None of the mentioned

Discussion

Answer: b
Explanation: JobConfigurable.configure method is overridden to initialize themselves.

2. Point out the correct statement.
a) Applications can use the Reporter to report progress
b) The Hadoop MapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format
d) All of the mentioned

Discussion

Answer: d
Explanation: Reporters can be used to set application-level status messages and update Counters.

3. Input to the _______ is the sorted output of the mappers.
a) Reducer
b) Mapper
c) Shuffle
d) All of the mentioned

Discussion

Answer: a
Explanation: In the Shuffle phase the framework fetches the relevant partition of the output of all the mappers, via HTTP.

4. The right number of reduces seems to be ____________
a) 0.90
b) 0.80
c) 0.36
d) 0.95

Discussion

Answer: d
Explanation: The right number of reduces seems to be 0.95 or 1.75.

5. Point out the wrong statement.
a) Reducer has 2 primary phases
b) Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures
c) It is legal to set the number of reduce-tasks to zero if no reduction is desired
d) The framework groups Reducer inputs by keys (since different mappers may have output the same key) in the sort stage

Discussion

Answer: a
Explanation: Reducer has 3 primary phases: shuffle, sort and reduce.

6. The output of the _______ is not sorted in the Mapreduce framework for Hadoop.
a) Mapper
b) Cascader
c) Scalding
d) None of the mentioned

Discussion

Answer: d
Explanation: The output of the reduce task is typically written to the FileSystem. The output of the Reducer is not sorted.

7. Which of the following phases occur simultaneously?
a) Shuffle and Sort
b) Reduce and Sort
c) Shuffle and Map
d) All of the mentioned

Discussion

Answer: a
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

8. Mapper and Reducer implementations can use the ________ to report progress or just indicate that they are alive.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned

Discussion

Answer: c
Explanation: Reporter is a facility for MapReduce applications to report progress, set application-level status messages and update Counters.

9. _________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned

Discussion

Answer: b
Explanation: Hadoop MapReduce comes bundled with a library of generally useful mappers, reducers, and partitioners.

10. _________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution.
a) Map Parameters
b) JobConf
c) MemoryConf
d) None of the above

Discussion

Answer: b
Explanation: JobConf represents a MapReduce job configuration.

PrepBharat

Hadoop Questions and Answers - Analyzing Data with Hadoop

Search

MCQs on Basics of Hadoop and Mapreduce

MCQs on HDFS – Hadoop Distributed File System

MCQs on Hadoop I/O and Developing a MapReduce Application

MCQs on Working of MapReduce

MCQs on MapReduce Types and Pig

Hadoop MCQs