Before we begin I would like to give a little intro. We have been working on a lot of “Big Data” and machine learning type projects for some future products and services we intend to offer. So we like to share some of our initial findings and code into this big realm. Haven’t heard of Hadoop yet?
As the title indicates we will be installing Hadoop 1.22 on a Centos 6.4 linux server. This will be a “singe node” install, I will write a separate articles on installing hadoop on a cluster. If you are just getting into Hadoop this tutorial will help get your first hadoop installation out of the way.
As the title indicates we will be installing Hadoop 1.22 on a Centos 6.4 linux server. This will be a “singe node” install, I will write a separate articles on installing hadoop on a cluster. If you are just getting into Hadoop this tutorial will help get your first hadoop installation out of the way.
First let’s get started, with some java.
Installing OpenJDK 1.6
Luckily hadoop works and runs well (from what we have tested) on the openjdk. No need to download any jdk binaries from Oracle.
yum install java-1.6.0-openjdk.x86_64
Hadoop will run just fine with the vanilla open jdk. However for maven to run properly we are going need the devel jdk also installed.
yum install java-1.6.0-openjdk-devel.x86_64
Let’s make sure java is registered on your system run.
First let’s see if the correct version is setup in our path.
java -version java version "1.6.0_28" OpenJDK Runtime Environment (IcedTea6 1.13.0pre) (rhel-1.66.1.13.0.el6-x86_64) OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)
javac -version
IF you do not see this version of Java, you probably had a previous install of java on your system. Luckily we are using CentOS so we can easily change this by running the alternatives config.
alternatives -config java There is 1 program that provides 'java'. Selection Command ----------------------------------------------- *+ 1 /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java
If you had any previous versions of java installed make sure the 1.6 JDK is selected. We will need this version for our later examples.
Now that we have all that java stuff out of the way let’s get down to installing and configuring hadoop.
Note: As with anything there are many ways to do this, Robe de Mariée 2014when we get into installing Hadoop into a cluster it will save you a lot of time. However sometimes it’s best to do the hard way first to familiarize your self with something new.
Before we begin let’s create some credentials for Hadoop to use.
useradd hadoop
passwd hadoop
You do not have to create a specific user account for hadoop to run properly, of course root works just fine. You can even enable key based login for your cluster.
Note: For this demo we ran everything as the hadoop account with no key login.
Let’s create a directory for all the hadoop binaries(okay mostly jars) to live in.
Create a Hadoop Directory
mkdir /opt/hadoop
cd /opt/hadoop
wget http://apache.cs.utah.edu/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-bin.tar.gz
Note: Default minimum installs of CentOS don’t include wget.
yum install wget
tar -xzf Hado -C hadoop
Chown -R /opt/hadoop hadoop
Edit Hadoop Configs
Now that we have everything extracted and proper ownership applied there are a few hadoop configs we will need to change.
vi conf/core-site.xml
#Add the following inside the configuration tag <property> <name>fs.default.name</name> <value>hdfs://localhost:9000/</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property>
Edit hdfs-site.xml
vi conf/hdfs-site.xml
# Add the following inside the configuration tag <property> <name>dfs.data.dir</name> <value>/opt/hadoop/hadoop/dfs/name/data</value> <final>true</final> </property> <property> <name>dfs.name.dir</name> <value>/opt/hadoop/hadoop/dfs/name</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>2</value> </property>
Edit mapred-site.xml
vi conf/mapred-site.xml
# Add the following inside the configuration tag <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property>
Edit hadoop-env.sh
vi conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64/
Set JAVA_HOME path as per your system configuration for java.
Let’s format our first namenode!
su - hadoop cd /opt/hadoop/hadoop bin/hadoop namenode -format
Start Hadoop
bin/start-all.sh
Each Service Has It’s Own Status Page
http://hnode1.vaurent.com:50030/ for the Jobtracker http://hnode1.vaurent.com:50070/ for the Namenode http://hnode1.vaurent.com:50060/ for the Tasktracker
To stop Hadoop
bin/stop-all.sh
That about sums it all up. I will update this article with links to setting up Maven and running our first Hadoop test with Apache Pig.
No comments:
Post a Comment