Hadoop Questions and Answers Part-18

1. Point out the correct statement.
a) The right number of reduces seems to be 0.95 or 1.75
b) Increasing the number of reduces increases the framework overhead
c) With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish
d) All of the mentioned

Answer: c
Explanation: With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

2. Which of the following partitions the key space?
a) Partitioner
b) Compactor
c) Collector
d) All of the mentioned

Answer: a
Explanation: Partitioner controls the partitioning of the keys of the intermediate map-outputs.

3. ____________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.
a) OutputCompactor
b) OutputCollector
c) InputCollector
d) All of the mentioned

Answer: b
Explanation: Hadoop MapReduce comes bundled with a library of generally useful mappers, reducers, and partitioners.

4. Point out the wrong statement.
a) It is legal to set the number of reduce-tasks to zero if no reduction is desired
b) The outputs of the map-tasks go directly to the FileSystem
c) The Mapreduce framework does not sort the map-outputs before writing them out to the FileSystem
d) None of the mentioned

Answer: d
Explanation: Outputs of the map-tasks go directly to the FileSystem, into the output path set by setOutputPath(Path).

5. __________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution.
a) JobConfig
b) JobConf
c) JobConfiguration
d) All of the mentioned

Answer: b
Explanation: JobConf is typically used to specify the Mapper, combiner (if any), Partitioner, Reducer, InputFormat, OutputFormat and OutputCommitter implementations.

6. The ___________ executes the Mapper/ Reducer task as a child process in a separate jvm.
a) JobTracker
b) TaskTracker
c) TaskScheduler
d) None of the mentioned

Answer: a
Explanation: The child-task inherits the environment of the parent TaskTracker.

7. Maximum virtual memory of the launched child-task is specified using _________
a) mapv
b) mapred
c) mapvim
d) All of the mentioned

Answer: b
Explanation: Admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, using mapred.

8. Which of the following parameter is the threshold for the accounting and serialization buffers?
a) io.sort.spill.percent
b) io.sort.record.percent
c) io.sort.mb
d) None of the mentioned

Answer: a
Explanation: When the percentage of either buffer has filled, their contents will be spilled to disk in the background.

9. ______________ is percentage of memory relative to the maximum heap size in which map outputs may be retained during the reduce.
a) mapred.job.shuffle.merge.percent
b) mapred.job.reduce.input.buffer.percen
c) mapred.inmem.merge.threshold
d) io.sort.factor

Answer: b
Explanation: When the reduce begins, map outputs will be merged to disk until those that remain are under the resource limit this defines.

10. Which of the following class provides access to configuration parameters?
a) Config
b) Configuration
c) OutputConfig
d) None of the mentioned

Answer: b
Explanation: Configurations are specified by resources.