Point out the wrong statement.

a) Hadoop works better with a small number of large files than a large number of small files
b) CombineFileInputFormat is designed to work well with small files
c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job
d) None of the mentioned

Answer: c
Explanation: If the file is very small (“small” means significantly smaller than an HDFS block) and there are a lot of them, then each map task will process very little input, and there will be a lot of them (one per file), each of which imposes extra bookkeeping overhead.

Register Now

Login

Lost Password

Point out the wrong statement.

Related Posts

Join The Discussion