HDFS

Notes about HDFS Deamon

$ bin/hdfs namenode -format

HDFS isn’t a real filesystem that runs on a hard drive like ext3 or something similar. It stores data on a regular file system like ext3 and provides API to access its data. It is more like a database. This command initializes the database.

By default the namenode location: /tmp/hadoop-/dfs/name

To change the namenode location add the follwing properties At hdfs-site.xml

<property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/dfs/namenode</value>
</property>

 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/dfs/datanode</value>
</property>

Make sure you have the right permission to access the paths specified.
Make sure you got dfs.namenode.name.dir and dfs.datanode.data.dir right. (when copying, may leave name->data not completely replaced) 😦

HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. There are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. I

The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes.

A file is split into one or more blocks and these blocks are stored in a set of DataNodes.

The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.

——

; Start NameNode daemon and DataNode daemon
$ sbin/start-dfs.sh

; create folder /usr and /usr/abc in the hdfs
$ bin/hdfs dfs -mkdir /usr
$ bin/hdfs dfs -mkdir /usr/abc

; put a local file to hdfs location
$ bin/hdfs dfs -put

—-

HDFS daemons are NameNode, SecondaryNameNode, and DataNode.

– etc/hadoop/core-site.xml
fs.defaultFS NameNode URI hdfs://host:port/

– etc/hadoop/hdfs-site.xml
dfs.namenode.name.dir
Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.
If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.

dfs.datanode.data.dir Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.

—- TROUBLE SHOOT
Connection Refused
– jps # check whether hadoop is running: nemanode, datanode
– /etc/hosts -> remove 127.0.1.1 ubuntu

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://stackoverflow.com/questions/27143409/what-the-command-hadoop-namenode-format-will-do
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html

Advertisements

2 thoughts on “HDFS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s