Notes about HDFS Deamon
$ bin/hdfs namenode -format
HDFS isn’t a real filesystem that runs on a hard drive like ext3 or something similar. It stores data on a regular file system like ext3 and provides API to access its data. It is more like a database. This command initializes the database.
By default the namenode location: /tmp/hadoop-/dfs/name
To change the namenode location add the follwing properties At hdfs-site.xml
<property> <name>dfs.namenode.name.dir</name> <value>file:/dfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/dfs/datanode</value> </property>
Make sure you have the right permission to access the paths specified.
Make sure you got dfs.namenode.name.dir and dfs.datanode.data.dir right. (when copying, may leave name->data not completely replaced) 😦
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. There are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. I
The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes.
A file is split into one or more blocks and these blocks are stored in a set of DataNodes.
The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.
; Start NameNode daemon and DataNode daemon
; create folder /usr and /usr/abc in the hdfs
$ bin/hdfs dfs -mkdir /usr
$ bin/hdfs dfs -mkdir /usr/abc
; put a local file to hdfs location
$ bin/hdfs dfs -put
HDFS daemons are NameNode, SecondaryNameNode, and DataNode.
fs.defaultFS NameNode URI hdfs://host:port/
Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.
If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.datanode.data.dir Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.
—- TROUBLE SHOOT
– jps # check whether hadoop is running: nemanode, datanode
– /etc/hosts -> remove 127.0.1.1 ubuntu