1. a clean vm: Ubuntu 16.04 LTS
  2. follow single node cluster tutorial: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
    • download hadoop-2.7.3-src
    • follow BUILDING.txt
      • install all packages listed
        $ sudo apt-get purge openjdk*
        $ sudo apt-get install software-properties-common
        $ sudo add-apt-repository ppa:webupd8team/java
        $ sudo apt-get update
        $ sudo apt-get install oracle-java7-installer
        $ sudo apt-get -y install maven
        $ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev
        $ sudo apt-get -y install libprotobuf-dev protobuf-compiler
        $ sudo apt-get install snappy libsnappy-dev
        $ sudo apt-get install bzip2 libbz2-dev
        $ sudo apt-get install libjansson-dev
        $ sudo apt-get install fuse libfuse-dev
    • building hadoop binary
      $ mvn package -Pdist -DskipTests -Dtar

      • ERROR: protoc version is ‘libprotoc 2.6.1’, expected version is ‘2.5.0’
      • FIX: http://codetips.coloza.com/compile-hadoop-from-source/
      • ERROR: libprotoc.so.8: cannot open shared object file: No such file or directory
      • FIX: $ sudo ldconfig /usr/local/lib    (Note: the libprotoc.so.8 should be in /usr/local/lib
    • install $mvn install
    • edit hadoop-dist/target/etc/hadoop/hadoop-env.sh
    • set export JAVA_HOME=/usr/lib/jvm/java-7-oracle
    • standalone mode test passed
    • pseudo-distributed mode:
      • $ bin/hdfs dfs -mkdir /user/<username>/input
        $ bin/hdfs dfs -put etc/hadoop/*.xml /user/<username>/input
          $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'

