January 20, 2011

Shuffle Error: MAX_FAILED_UNIQUE_FETCHES; bailing-out

First, long time no use MapReduce! Today I wasted some time figuring this out; "Shuffle Error: MAX_FAILED_UNIQUE_FETCHES; bailing-out".

If you meet this error message w/ the higher version than the hadoop-0.20.2, you should check the file "mapred-site.xml" in {$HADOOP_HOME}/conf directory and the "/etc/hosts" because this happens when the IP addresses are all confused and things aren't on the right ports.

# The following lines are desirable for IPv6 capable hosts
#::1     localhost ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters

and,

<property>
    <name>mapreduce.task.tracker.http.address</name>
    <value>0.0.0.0:50060</value>
    <description>
    The task tracker http server address and port.
    If the port is 0 then the server will start on a free port.
    </description>
  </property>

Or, if you have a lot of map and reduce processes in your cluster, check the "tasktracker.http.threads" property.

<property>
    <name>mapreduce.tasktracker.http.threads</name>
    <value>400</value>
  </property>