G E N: May 2014

Tuesday, May 20, 2014

Modifying a Manifest File

You use the m command-line option to add custom information to the manifest during creation of a JAR file. This section describes the m option.
The Jar tool automatically puts a default Manifest with the pathname META-INF/MANIFEST.MF into any JAR file you create. You can enable special JAR file functionality, such as package sealing, by modifying the default manifest. Typically, modifying the default manifest involves adding special-purpose headers to the manifest that allow the JAR file to perform a particular desired function.
To modify the manifest, you must first prepare a text file containing the information you wish to add to the manifest. You then use the Jar tool's m option to add the information in your file to the manifest.

Warning: The text file from which you are creating the manifest must end with a new line or carriage return. The last line will not be parsed properly if it does not end with a new line or carriage return.

The basic command has this format:

jar cfm jar-file manifest-addition input-file(s)

Let's look at the options and arguments used in this command:

The c option indicates that you want to create a JAR file.
The m option indicates that you want to merge information from an existing file into the manifest file of the JAR file you're creating.
The f option indicates that you want the output to go to a file (the JAR file you're creating) rather than to standard output.
manifest-addition is the name (or path and name) of the existing text file whose contents you want to add to the contents of JAR file's manifest.
jar-file is the name that you want the resulting JAR file to have.
The input-file(s) argument is a space-separated list of one or more files that you want to be placed in your JAR file.

The m and f options must be in the same order as the corresponding arguments.

Note: The contents of the manifest must be encoded in UTF8.

The remaining sections of this lesson demonstrate specific modifications you may want to make to the manifest file.

Sunday, May 11, 2014

Apache Hadoop

What Is Apache Hadoop?

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
The project includes these modules:

Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Other Hadoop-related projects at Apache include:

Ambari™: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
Avro™: A data serialization system.
Cassandra™: A scalable multi-master database with no single points of failure.
Chukwa™: A data collection system for managing large distributed systems.
HBase™: A scalable, distributed database that supports structured data storage for large tables.
Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.
Mahout™: A Scalable machine learning and data mining library.
Pig™: A high-level data-flow language and execution framework for parallel computation.
Spark™: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
Tez™: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.
ZooKeeper™: A high-performance coordination service for distributed applications.

Getting Started

To get started, begin here:

Learn about Hadoop by reading the documentation.
Download Hadoop from the release page.
Discuss Hadoop on the mailing list.

Setting up Hadoop 1.22 on Centos 6.4 with OpenJDK1.6

Before we begin I would like to give a little intro. We have been working on a lot of “Big Data” and machine learning type projects for some future products and services we intend to offer. So we like to share some of our initial findings and code into this big realm. Haven’t heard of Hadoop yet?
As the title indicates we will be installing Hadoop 1.22 on a Centos 6.4 linux server. This will be a “singe node” install, I will write a separate articles on installing hadoop on a cluster. If you are just getting into Hadoop this tutorial will help get your first hadoop installation out of the way.

First let’s get started, with some java.

Installing OpenJDK 1.6

Luckily hadoop works and runs well (from what we have tested) on the openjdk. No need to download any jdk binaries from Oracle.

yum install java-1.6.0-openjdk.x86_64

Hadoop will run just fine with the vanilla open jdk. However for maven to run properly we are going need the devel jdk also installed.

yum install java-1.6.0-openjdk-devel.x86_64

Let’s make sure java is registered on your system run.

First let’s see if the correct version is setup in our path.

java -version

java version "1.6.0_28"
OpenJDK Runtime Environment (IcedTea6 1.13.0pre) (rhel-1.66.1.13.0.el6-x86_64)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)

javac -version

IF you do not see this version of Java, you probably had a previous install of java on your system. Luckily we are using CentOS so we can easily change this by running the alternatives config.

alternatives -config java
There is 1 program that provides 'java'.
Selection Command
 -----------------------------------------------
 *+ 1 /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java

If you had any previous versions of java installed make sure the 1.6 JDK is selected. We will need this version for our later examples.

Now that we have all that java stuff out of the way let’s get down to installing and configuring hadoop.

Note: As with anything there are many ways to do this, Robe de Mariée 2014when we get into installing Hadoop into a cluster it will save you a lot of time. However sometimes it’s best to do the hard way first to familiarize your self with something new.

Before we begin let’s create some credentials for Hadoop to use.

useradd hadoop

passwd hadoop

You do not have to create a specific user account for hadoop to run properly, of course root works just fine. You can even enable key based login for your cluster.

Note: For this demo we ran everything as the hadoop account with no key login.

Let’s create a directory for all the hadoop binaries(okay mostly jars) to live in.

Create a Hadoop Directory

mkdir /opt/hadoop

cd /opt/hadoop

wget http://apache.cs.utah.edu/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-bin.tar.gz

Note: Default minimum installs of CentOS don’t include wget.

yum install wget

tar -xzf Hado -C hadoop

Chown -R /opt/hadoop hadoop

Edit Hadoop Configs

Now that we have everything extracted and proper ownership applied there are a few hadoop configs we will need to change.

 vi conf/core-site.xml

#Add the following inside the configuration tag
<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000/</value>
</property>
<property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>

Edit hdfs-site.xml

 vi conf/hdfs-site.xml

# Add the following inside the configuration tag
<property>
 <name>dfs.data.dir</name>
 <value>/opt/hadoop/hadoop/dfs/name/data</value>
 <final>true</final>
</property>
<property>
 <name>dfs.name.dir</name>
 <value>/opt/hadoop/hadoop/dfs/name</value>
 <final>true</final>
</property>
<property>
 <name>dfs.replication</name>
 <value>2</value>
</property>

Edit mapred-site.xml

 vi conf/mapred-site.xml

# Add the following inside the configuration tag
<property>
        <name>mapred.job.tracker</name>
 <value>localhost:9001</value>
</property>

Edit hadoop-env.sh

 vi conf/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64/

Set JAVA_HOME path as per your system configuration for java.

Let’s format our first namenode!

 su - hadoop
 cd /opt/hadoop/hadoop
 bin/hadoop namenode -format

Start Hadoop

 bin/start-all.sh

Each Service Has It’s Own Status Page

  http://hnode1.vaurent.com:50030/   for the Jobtracker
  http://hnode1.vaurent.com:50070/   for the Namenode
  http://hnode1.vaurent.com:50060/   for the Tasktracker

To stop Hadoop

bin/stop-all.sh

That about sums it all up. I will update this article with links to setting up Maven and running our first Hadoop test with Apache Pig.

Enable SUDO for RHEL & CENTOS

Sudo is an arguably safer alternative to logging in (or using the su command) to the root account. Sudo allows you to partition and delegate superuser commands (functions) without giving a user total "root" power on the system. Here are a few other advantages:

Privileged commands are logged. It is a simple way to audit who did what at what point in time.
It is more efficient to use sudo over su, or to log in as root, in reference to keystrokes.
You don't have to change the root password when an administrator has his root functions revoked, leaves the company, changes roles, etc. The change part is easy, but coordinating the new password with every other administrator can be a hassle.

# Is sudo installed?
Login with the root user.

Let's first determine if the sudo package is installed.

# rpm -q sudo

If the package is not installed, we can retrieve/install it with the following command:

# yum install sudo

# Create a normal user
Create the user and add to the wheel group. The wheel group is usually predefined as the container for administrator accounts.

# useradd -G wheel -c "Test User" testNew

Create a password for the user.

# passwd testNew
Changing password for user testNew.
New UNIX password: P@$$w0rd
Retype new UNIX password: P@$$w0rd
passwd: all authentication tokens updated successfully.

# Or modify an existing user
Add an existing user (the user testMod in my example) to the wheel group.

# usermod -aG wheel testMod

# Modify the sudoers file
Use the visudo command to safely modify the sudoers file.

# visudo

Search for the Allows people in group wheel to run all commands directive and uncomment the second line to enable the wheelgroup to run all commands.

...
## Allows people in group wheel to run all commands
%wheel  ALL=(ALL)       ALL
...

Save the file.

# Test with a privileged command (logged in as a normal user)
We will first attempt to run the visudo command with our normal user account. As expected, the operation will fail.

$ /usr/sbin/visudo
visudo: /etc/sudoers: Permission denied

Now we will run the command within the context of sudo to temporarily elevate the privileges of our normal user.

$ sudo -i visudo
[sudo] password for test: P@$$w0rd

# Verify the command is logged
Check the secure log to verify the event is recorded.

$ sudo grep visudo /var/log/secure
...
Aug  21 20:01:20 centos sudo:     test : TTY=pts/0 ; PWD=/home/test ; USER=root ; COMMAND=/bin/bash -c visudo
...

This is just a single use case of how to implement sudo. I encourage you to check out the man pages and other documentation to see how you can tailor it to your specific environment.

WildFly – A New Improved JBoss Application Server for Linux

As we all know that JBoss AS has been renamed to WildFly.

WildFly 8 is Red Hat‘s Java EE 7 compliant open source application server. The Main features are as below:

Java EE 7 Compatibility: The biggest change in this is that now WildFly 8 is official Java EE7 Certified.
High Performance Web Server: Undertow is new high performance web server written in Java. Now this has been implemented in WildFly 8. This is really designed for high throughput and scalability and can handle millions of connections. Undertow’s lifecycle is completely controlled by the embedding application. This is extremely lightweight with core jar having size of 1MB and embedded server using less than 4MB of heap space. This is really great.
3Port Reduction: Since it is using Undertow which support for Upgrading HTTP, which will allow multiple protocols to be multiplexed over single HTTP port. WildFly 8 has moved nearly all of its protocols to be multiplexed over two HTTP ports: one is management and another one is application port. This is really a big change and benefit to cloud providers (such as OpenShift) who run hundreds to thousands of instances on a single server. In total, it has two default ports for configuration and they are 9990 (Web Administration Console) and 8080 (Application Console).
Management Role Based Access Control & Auditing: This is the new and interesting thing implemented in WildFly 8. By using this we can create different users and can assign those one to different roles as per requirements. I’ll show you later with screen shots.
Logging:The management API now supports the ability to list and view the available log files on a server. Now, we have attribute called “add-logging-api-dependencies” available for any kind of deployments in which we want to skip container logging. This will disable the adding of the implicit server logging dependencies. We have another option i.e. we can use a jboss-deployment-structure.xml to exclude the logging subsystem. Using this, it will help to stop the logging subsystem from going through any deployment.

We can also make use of another parameter i.e. use-deployment-logging-config for enabling/ disabling processing of logging configuration files within a deployment.
Note: System Property that we were using for disabling per logging has been deprecated from this version.

Clustering: Again Big change is one clustering. All Features related to Clustering support had been changed in WildFly 8 and these includes as below:

Distributed web session has been optimized for it with new Java Based Web Server i.e. Undertow.
mod_cluster support for Undertow.
Optimized Distributed SSO(Single Sign-On) capabilities and support for Undertow.
New/optimized distributed @Stateful EJB caching implementation.
WildFly 8 added some new public clustering API.
For creating singleton services it provides new public APIs.

CLI Improvements: CLI Configuration has also been improved. You know All admin love to work on CLI ;). So, now we can create alias for particular server and then can use that alias whenever want to connect to that server using connect command.
There are still lots of enhancements and updates done in WildFly 8. You can check all these at:http://wildfly.org/news/2014/02/11/WildFly8-Final-Released/

Installation of WildFly 8 in Linux

Before moving ahead with Installation make sure that you have Java EE 7 installed on your system. WildFly 8 will not work with previous revisions. Please follow the below guide to install Java EE 7 in the Linux systems (Install JDK/JRE 7u25 in Linux).

Step 1: Downloading WildFly 8

[root@anuppc-02]# wget http://download.jboss.org/wildfly/8.0.0.Final/wildfly-8.0.0.Final.zip
[root@anuppc-02 jboss]# wget http://download.jboss.org/wildfly/8.1.0.CR1/wildfly-8.1.0.CR1.tar.gz
--2014-05-11 19:00:31-- http://download.jboss.org/wildfly/8.1.0.CR1/wildfly-8.1.0.CR1.tar.gz
Resolving download.jboss.org... 23.67.250.131, 23.67.250.122
Connecting to download.jboss.org|23.67.250.131|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 115039521 (110M) [application/x-gzip]
Saving to: `wildfly-8.1.0.CR1.tar.gz'

100%[======================================================================================================>] 115,039,521 975K/s in 2m 10s

2014-05-11 19:02:41 (867 KB/s) - `wildfly-8.1.0.CR1.tar.gz' saved [115039521/115039521]

[root@anuppc-02 data3]#

Step 2: Extract it

# tar -xvf yourfile.tar [to extract to current directory]
# tar -C /myfolder -zxvf yourfile.tar.gz [ to extract to another directory]
$ tar -zxvf wildfly-8.1.0.CR1.tar.gz

Step 3: Setting Environment variable

Now set some environment variables. You can set these on system wise or within your configuration files. Here I am setting within configuration files standalone.sh and standalone.conf in ‘bin‘ folder.
[root@anuppc-02 data]# cd wildfly-8.0.0.Final
[root@anuppc-02 data]# cd bin/
# Add following 2 lines to     standalone.sh / standalone.conf
JBOSS_HOME="/var/cemp/data3/wildfly-8.1.0.CR1"
JAVA_HOME="/usr/java/jdk1.8.0_05"
Note: For whole system wide, you can set it under ‘/etc/profile‘ file.

standalone.xml
<socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
    <socket-binding name="management-http" interface="management" port="${jboss.management.http.port:9990}"/>
    <socket-binding name="management-https" interface="management" port="${jboss.management.https.port:9993}"/>
    <socket-binding name="ajp" port="${jboss.ajp.port:8009}"/>
    <socket-binding name="http" port="${jboss.http.port:9080}"/> 
    <socket-binding name="https" port="${jboss.https.port:9443}"/> 
    <socket-binding name="txn-recovery-environment" port="4712"/>
    <socket-binding name="txn-status-manager" port="4713"/>
    <outbound-socket-binding name="mail-smtp">
        <remote-destination host="localhost" port="25"/>
    </outbound-socket-binding>
</socket-binding-group>
</server>

Step 4: Starting WildFly 8

Now start server i.e. for standalone mode use ‘standalone.sh‘ and for domain mode use ‘domain.sh‘.
[root@anuppc-02 bin]# ./standalone.sh
[root@anuppc-02 bin]# ./domain.sh
But, here I am starting in standalone mode. By default it will get start by ‘standalone.xml‘ file, But You can also start with some other configuration using ‘–server-config‘ option.
As below I am starting server with ‘standalone-full-ha.xml‘ and this file is present in “$JBOSS_HOME/standalone(profile)/configuration/”.
[root@anuppc-02 bin]# ./standalone.sh --server-config standalone-full-ha.xml

Step 5: Acessing WildFly 8

Now you can point your browser to ‘http://localhost:9080‘ (if using the default configured http port) which brings you to the Welcome Screen.
From here, you can access WildFly community documentation guides and enhanced web-based Administration Console access.

Step 6: Managing WildFly 8

WildFly 8 provides two administrative consoles for managing running instance:

web-based Administration Console
command-line interface

Before connecting to administration console or remotely using the command line, you will need to create a new user using the ‘add-user.sh‘ script in the bin folder.
Next, go to ‘bin‘ directory, set ‘JBOSS_HOME‘ in add-user.sh (if variable is not set on system bases) and create user as below.
[root@anuppc-02 bin]# ./add-user.sh
Once starting the script you will be guided through the process to add a new user:

Sample Output
What type of user do you wish to add?
a) Management User (mgmt-users.properties)
b) Application User (application-users.properties)
(a):
Enter the details of the new user to add.
Using realm 'ManagementRealm' as discovered from the existing property files.
Username : admin
The username 'admin' is easy to guess
Are you sure you want to add user 'admin' yes/no? yes
Password recommendations are listed below. To modify these restrictions edit the add-user.properties configuration file.
- The password should not be one of the following restricted values {root, admin, administrator}
- The password should contain at least 8 characters, 1 alphanumeric character(s), 1 digit(s), 1 non-alphanumeric symbol(s)
- The password should be different from the username
Password :
Re-enter Password :
What groups do you want this user to belong to? (Please enter a comma separated list, or leave blank for none)[ ]:
About to add user 'admin' for realm 'ManagementRealm'
Is this correct yes/no? yes
Added user 'admin' to file '/data/wildfly-8.0.0.Final/standalone/configuration/mgmt-users.properties'
Added user 'admin' to file /data/wildfly-8.0.0.Final/domain/configuration/mgmt-users.properties'
Added user 'admin' with groups to file /data/wildfly-8.0.0.Final/standalone/configuration/mgmt-groups.properties'
Added user 'admin' with groups to file /data/wildfly-8.0.0.Final/domain/configuration/mgmt-groups.properties'
Is this new user going to be used for one AS process to connect to another AS process?
e.g. for a slave host controller connecting to the master or for a Remoting connection for server to server EJB calls.
yes/no? yes
To represent the user add the following to the server-identities definition
Press any key to continue . . .
Now access the web-based Administration Console at ‘http://localhost:9990/console‘ and enter the new created username and password to directly access the Management Console.

If you prefer to handle your server from the CLI, run the ‘jboss-cli.sh‘ script from the ‘bin‘ directory that offers the same capabilities available via the web-based UI.
[root@anuppc-02 bin]# cd bin
[root@anuppc-02 bin]# ./jboss-cli.sh --connectConnected to standalone controller at localhost:9999

For more information, follow the official WildFly 8 documentation at https://docs.jboss.org/author/display/WFLY8/Documentation.