The default hive.input.format is set to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat. This configuration could give less number of mappers than the split size (i.e., # blocks in HDFS) of the input table.

Try setting org.apache.hadoop.hive.ql.io.HiveInputFormat for hive.input.format.

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

Note Apache Tez uses org.apache.hadoop.hive.ql.io.HiveInputFormat by the default.

set hive.tez.input.format;

hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat

You can then control the maximum number of mappers via setting:

set mapreduce.job.maps=128;

The number of mappers is less than input splits in Hadoop 2.x

results matching ""

No results matching ""