G E N: Setting up Hadoop 1.22 on Centos 6.4 with OpenJDK1.6

Before we begin I would like to give a little intro. We have been working on a lot of “Big Data” and machine learning type projects for some future products and services we intend to offer. So we like to share some of our initial findings and code into this big realm. Haven’t heard of Hadoop yet?
As the title indicates we will be installing Hadoop 1.22 on a Centos 6.4 linux server. This will be a “singe node” install, I will write a separate articles on installing hadoop on a cluster. If you are just getting into Hadoop this tutorial will help get your first hadoop installation out of the way.

First let’s get started, with some java.

Installing OpenJDK 1.6

Luckily hadoop works and runs well (from what we have tested) on the openjdk. No need to download any jdk binaries from Oracle.

yum install java-1.6.0-openjdk.x86_64

Hadoop will run just fine with the vanilla open jdk. However for maven to run properly we are going need the devel jdk also installed.

yum install java-1.6.0-openjdk-devel.x86_64

Let’s make sure java is registered on your system run.

First let’s see if the correct version is setup in our path.

java -version

java version "1.6.0_28"
OpenJDK Runtime Environment (IcedTea6 1.13.0pre) (rhel-1.66.1.13.0.el6-x86_64)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)

javac -version

IF you do not see this version of Java, you probably had a previous install of java on your system. Luckily we are using CentOS so we can easily change this by running the alternatives config.

alternatives -config java
There is 1 program that provides 'java'.
Selection Command
 -----------------------------------------------
 *+ 1 /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java

If you had any previous versions of java installed make sure the 1.6 JDK is selected. We will need this version for our later examples.

Now that we have all that java stuff out of the way let’s get down to installing and configuring hadoop.

Note: As with anything there are many ways to do this, Robe de Mariée 2014when we get into installing Hadoop into a cluster it will save you a lot of time. However sometimes it’s best to do the hard way first to familiarize your self with something new.

Before we begin let’s create some credentials for Hadoop to use.

useradd hadoop

passwd hadoop

You do not have to create a specific user account for hadoop to run properly, of course root works just fine. You can even enable key based login for your cluster.

Note: For this demo we ran everything as the hadoop account with no key login.

Let’s create a directory for all the hadoop binaries(okay mostly jars) to live in.

Create a Hadoop Directory

mkdir /opt/hadoop

cd /opt/hadoop

wget http://apache.cs.utah.edu/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-bin.tar.gz

Note: Default minimum installs of CentOS don’t include wget.

yum install wget

tar -xzf Hado -C hadoop

Chown -R /opt/hadoop hadoop

Edit Hadoop Configs

Now that we have everything extracted and proper ownership applied there are a few hadoop configs we will need to change.

 vi conf/core-site.xml

#Add the following inside the configuration tag
<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000/</value>
</property>
<property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>

Edit hdfs-site.xml

 vi conf/hdfs-site.xml

# Add the following inside the configuration tag
<property>
 <name>dfs.data.dir</name>
 <value>/opt/hadoop/hadoop/dfs/name/data</value>
 <final>true</final>
</property>
<property>
 <name>dfs.name.dir</name>
 <value>/opt/hadoop/hadoop/dfs/name</value>
 <final>true</final>
</property>
<property>
 <name>dfs.replication</name>
 <value>2</value>
</property>

Edit mapred-site.xml

 vi conf/mapred-site.xml

# Add the following inside the configuration tag
<property>
        <name>mapred.job.tracker</name>
 <value>localhost:9001</value>
</property>

Edit hadoop-env.sh

 vi conf/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64/

Set JAVA_HOME path as per your system configuration for java.

Let’s format our first namenode!

 su - hadoop
 cd /opt/hadoop/hadoop
 bin/hadoop namenode -format

Start Hadoop

 bin/start-all.sh

Each Service Has It’s Own Status Page

  http://hnode1.vaurent.com:50030/   for the Jobtracker
  http://hnode1.vaurent.com:50070/   for the Namenode
  http://hnode1.vaurent.com:50060/   for the Tasktracker

To stop Hadoop

bin/stop-all.sh

That about sums it all up. I will update this article with links to setting up Maven and running our first Hadoop test with Apache Pig.

G E N

Sunday, May 11, 2014

Setting up Hadoop 1.22 on Centos 6.4 with OpenJDK1.6

Installing OpenJDK 1.6

Create a Hadoop Directory

Edit Hadoop Configs

Start Hadoop

To stop Hadoop

No comments:

Post a Comment