Thursday, February 23, 2012

Maven log4j-1.2.15 dependency problem

If you’re using Maven to manage your project’s build and dependencies, you may have encountered some problems when trying to include the latest version of log4j as a dependency. Specifically, log4j 1.2.15 depends on some artifacts that are not available in the central Maven repository due to licensing issues, and thus when you try to build a project that depends on this version of log4j, you may not be able to download the artifacts and your build will fail.
We could download and install these artifacts to the local repository, if we really needed them. But in most cases, they’re not needed and thus you won’t want your project relying on these artifacts just because some parts of log4j do. Thus, we need to exclude them.

The problem: Not really neededThe issue is going from log4j 1.2.14 to 1.2.15, the developers added some features which required some dependencies on various sun and javax packages. However in most cases, you won’t be using this extra functionality, but if you just include log4j 1.2.15, this will cause your project to require those extra artifacts as per the transitive dependency rule.
Because some of these artifacts are not available from the central Maven repository, due to licensing issues, they will not be automatically installed to your local repository. So, if you attempt to run mvn install, you’re likely to encounter this sort of error:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[INFO] Unable to find resource 'com.sun.jdmk:jmxtools:jar:1.2.1' in repository central (http://repo1.maven.org/maven2)
[INFO] Unable to find resource 'javax.jms:jms:jar:1.1' in repository central (http://repo1.maven.org/maven2)
[INFO] Unable to find resource 'com.sun.jmx:jmxri:jar:1.2.1' in repository central (http://repo1.maven.org/maven2)
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.
Missing:
----------
1) com.sun.jdmk:jmxtools:jar:1.2.1
2) javax.jms:jms:jar:1.1
3) com.sun.jmx:jmxri:jar:1.2.1
----------
3 required artifacts are missing.
 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And if you’re using Eclipse, and have used the Maven Eclipse plugin command (mvn eclipse:eclipse) to generate the project settings, you’ll have the problem of Eclipse not being able to find the artifacts references on the build path, resulting in an error like so:

DIAGRAM 1:

This causes a big problem as it essentially prevents you from building your project. You could download and install these artifacts to your local repository, but since they’re not really needed, we should exclude them from the dependency list for log4j.

Excluding dependenciesThankfully, Maven make it easy to exclude dependencies from a certain project. Looking at the log4j 1.2.15 POM file (you may have to select “View Source”), we can see several dependencies that weren’t there in the previous release. These are likely to support new features, and aren’t needed for the most common uses of log4j. Here are the actual dependencies for log4j 1.2.15:

Dependencies----------------------------------------------------------
GroupId                |    ArtifactId         |    Version
----------------------------------------------------------
javax.mail              |    mail                  |    1.4
javax.jms               |    jms                  |    1.1
com.sun.jdmk        |    jmxtools          |    1.2.1
com.sun.jmx          |    jmxri                |    1.2.1
oro                        |    oro                  |    2.0.8
junit                       |    junit                 |    3.8.1
-----------------------------------------------------------
 We only need to exclude the first four, and not the last two, since they have a scope of test, and won’t be included anyways. To exclude these dependencies, add the log4j 1.2.15 dependency as show below.


This tells Maven not to add those artifacts to the classpath and so they won’t be needed to build your project anymore. Note that you have to explicitly exclude each one, there is no way to exclude all of the dependencies for a project.

If you’re using Eclipse, after running mvn eclipse:clean/mvn eclipse:eclipse, you should have the build path properly setup without any missing artifacts:

Everything should now work!
Transitive Dependencies and ExclusionsThe issue here is that the log4j 1.2.15 POM file probably should have marked these dependencies as optional, which would have had the same effect as having to exclude them on every project that referenced that version of log4j. What does an optional dependency mean? The Maven website has a pretty good explanation.

Basically, if you have a large project that requires a lot of dependencies, but the “core” features only require a subset of those dependencies, you may want to mark the others as “optional” so as not to burden any projects that reference yours. Your project will still need all of the dependencies to build, but other projects that reference yours will only need the optional dependencies if they are using the additional features. In this case, they’ll have to explicitly add those dependencies, as the transitive dependency rule won’t kick in for “optional” ones.

Also worthy to note: exclusions are done on a per-dependency basis. This means that the dependencies that we excluded from log4j are only excluded from the log4j scope. This has the effect of not globally excluding those dependencies. So, for example, if we added another dependency that did really require the javax.jms/jms artifact, it would not be prevented from being added. Furthermore, if we wanted, we could manually add a dependency to our own list for that JMS artifact, and it would show up as normal.

Friday, February 3, 2012

Java and Regular Expressions

Overview:
A regular expression defines a search pattern for strings. This pattern may match one or several times or not at all for a given string. The abbreviation for regular expression is regex.
A simple example for a regular expression is a (literal) string. For example the Hello World regex will match the "Hello World" string.
"." (dot) is another example for an regular expression. "." matches any single character; it would match for example "a" or "z" or "1".

Usage:
Regular expressions can be used to search, edit and manipulate text.
Regular expressions are supported by most programming languages, e.g. Java, Perl, Groovy, etc.
Unfortunately each language supports regular expressions slightly different.
If a regular expression is used to analyse or modify a text, this process is called "The regular expression is applied to the text".
The pattern defined by the regular expression is applied on the string from left to right. Once a source character has been used in a match, it cannot be reused. For example the regex "aba" will match "ababababa" only two times (aba_aba__).

Common Matching Symbol:

Regular Expression Description
. Matches any sign
^regex regex must match at the beginning of the line
regex$ Finds regex must match at the end of the line
[abc] Set definition, can match the letter a or b or c
[abc][vz] Set definition, can match a or b or c followed by either v or z
[^abc] When a "^" appears as the first character inside [] when it negates the pattern. This can match any character except a or b or c
[a-d1-7] Ranges, letter between a and d and figures from 1 to 7, will not match d1
X|Z Finds X or Z
XZ Finds X directly followed by Z
$ Checks if a line end follows

Meta Characters:
The following metacharacters have a pre-defined meaning and make certain common pattern easier to use, e.g. \d instead of [0..9].
Regular Expression Description
\d Any digit, short for [0-9]
\D A non-digit, short for [^0-9]
\s A whitespace character, short for [ \t\n\x0b\r\f]
\S A non-whitespace character, for short for [^\s]
\w A word character, short for [a-zA-Z_0-9]
\W A non-word character [^\w]
\S+ Several non-whitespace characters

Quantifier:
A quantifier defines how often an element can occur. The symbols ?, *, + and {} define the quantity of the regular expressions.

Regular Expression Description Examples
* Occurs zero or more times, is short for {0,} X* - Finds no or several letter X, .* - any character sequence
+ Occurs one or more times, is short for {1,} X+ - Finds one or several letter X
? Occurs no or one times, ? is short for {0,1} X? -Finds no or exactly one letter X
{X} Occurs X number of times, {} describes the order of the preceding liberal \d{3} - Three digits, .{10} - any character sequence of length 10
{X,Y} Occurs between X and Y times, \d{1,4}- \d must occur at least once and at a maximum of four
*? ? after a qualifier makes it a "reluctant quantifier", it tries to find the smallest match.  

Grouping and Back reference:
You can group parts of your regular expression. In your pattern you group elements via round brackets, e.g. "()". This allows you to assign a repetition operator the a complete group.
In addition these groups also create a backreference to the part of the regular expression. This captures the group. A backreference stores the part of the String which matched the group. This allows you to use this part in the replacement.
Via the $ you can refer to a group. $1 is the first group, $2 the second, etc.
Lets for example assume you want to replace all whitespace between a letter followed by a point or a comma. This would involve that the point or the comma is part of the pattern. Still it should be included in the result

// Removes whitespace between a word character and . or ,
String pattern = "(\\w)(\\s+)([\\.,])";
System.out.println(EXAMPLE_TEST.replaceAll(pattern, "$3"));

This example extracts the text between a title tag.
// Extract the text between the two title elements
pattern = "(?i)()(.+?)()";
String updated = EXAMPLE_TEST.replaceAll(pattern, "$2");


Blackslashes in Java
The backslash is an escape character in Java Strings. e.g. backslash has a predefined meaning in Java. You have to use "\\" to define a single backslash. If you want to define "\w" then you must be using "\\w" in your regex. If you want to use backslash you as a literal you have to type \\\\ as \ is also a escape character in regular expressions.

Using regular expression with String.matches()
Strings in Java have build in support for regular expressions. Strings have three build in methods for regular expressions, e.g. matches(), split()), replace(). .
These methods are not optimized for performance. We will later use classes which are optimized for performance.

Method Description
s.matches("regex") Evaluates if "regex" matches s. Returns only true if the WHOLE string can be matched
s.split("regex") Creates array with substrings of s divided at occurance of "regex". "regex" is not included in the result.
s.replace("regex"), "replacement" Replaces "regex" with "replacement

Create for the following example the Java project com.dixit.regex.test.
package com.dixit.regex.test;

public class RegexTestStrings {
    public static final String EXAMPLE_TEST = "This is my small example "
            + "string which I'm going to " + "use for pattern matching.";

    public static void main(String[] args) {
        System.out.println(EXAMPLE_TEST.matches("\\w.*"));
        String[] splitString = (EXAMPLE_TEST.split("\\s+"));
        System.out.println(splitString.length);// Should be 14
        for (String string : splitString) {
            System.out.println(string);
        }
        // Replace all whitespace with tabs
        System.out.println(EXAMPLE_TEST.replaceAll("\\s+", "\t"));
    }
}


Examples:
The following class gives several examples for the usage of regular expressions with strings. See the comment for the purpose.
If you want to test these examples, create for the Java project com.dixit.regex.string.

package com.dixit.regex.string;

public class StringMatcher {
    // Returns true if the string matches exactly "true"
    public boolean isTrue(String s){
        return s.matches("true");
    }
    // Returns true if the string matches exactly "true" or "True"
    public boolean isTrueVersion2(String s){
        return s.matches("[tT]rue");
    }
   
    // Returns true if the string matches exactly "true" or "True"
    // or "yes" or "Yes"
    public boolean isTrueOrYes(String s){
        return s.matches("[tT]rue|[yY]es");
    }
   
    // Returns true if the string contains exactly "true"
    public boolean containsTrue(String s){
        return s.matches(".*true.*");
    }
   

    // Returns true if the string contains of three letters
    public boolean isThreeLetters(String s){
        return s.matches("[a-zA-Z]{3}");
        // Simpler from for
//        return s.matches("[a-Z][a-Z][a-Z]");
    }
   
    // Returns true if the string does not have a number at the beginning
    public boolean isNoNumberAtBeginning(String s){
        return s.matches("^[^\\d].*");
    }
    // Returns true if the string contains a arbitrary number of characters except b
    public boolean isIntersection(String s){
        return s.matches("([\\w&&[^b]])*");
    }
    // Returns true if the string contains a number less then 300
    public boolean isLessThenThreeHundret(String s){
        return s.matches("[^0-9]*[12]?[0-9]{1,2}[^0-9]*");
    }
   
}


And a small JUnit Test to validates the examples.
------------------------------------------------------------------------------------------------------------------------------------------------------package com.dixit.regex.string;

import org.junit.Before;
import org.junit.Test;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class StringMatcherTest {
    private StringMatcher m;

    @Before
    public void setup(){
        m = new StringMatcher();
    }

    @Test
    public void testIsTrue() {
        assertTrue(m.isTrue("true"));
        assertFalse(m.isTrue("true2"));
        assertFalse(m.isTrue("True"));
    }

    @Test
    public void testIsTrueVersion2() {
        assertTrue(m.isTrueVersion2("true"));
        assertFalse(m.isTrueVersion2("true2"));
        assertTrue(m.isTrueVersion2("True"));;
    }

    @Test
    public void testIsTrueOrYes() {
        assertTrue(m.isTrueOrYes("true"));
        assertTrue(m.isTrueOrYes("yes"));
        assertTrue(m.isTrueOrYes("Yes"));
        assertFalse(m.isTrueOrYes("no"));
    }

    @Test
    public void testContainsTrue() {
        assertTrue(m.containsTrue("thetruewithin"));
    }

    @Test
    public void testIsThreeLetters() {
        assertTrue(m.isThreeLetters("abc"));
        assertFalse(m.isThreeLetters("abcd"));
    }
   
    @Test
    public void testisNoNumberAtBeginning() {
        assertTrue(m.isNoNumberAtBeginning("abc"));
        assertFalse(m.isNoNumberAtBeginning("1abcd"));
        assertTrue(m.isNoNumberAtBeginning("a1bcd"));
        assertTrue(m.isNoNumberAtBeginning("asdfdsf"));
    }
   
    @Test
    public void testisIntersection() {
        assertTrue(m.isIntersection("1"));
        assertFalse(m.isIntersection("abcksdfkdskfsdfdsf"));
        assertTrue(m.isIntersection("skdskfjsmcnxmvjwque484242"));
    }
   

    @Test
    public void testLessThenThreeHundret() {
        assertTrue(m.isLessThenThreeHundret("288"));
        assertFalse(m.isLessThenThreeHundret("3288"));
        assertFalse(m.isLessThenThreeHundret("328 8"));
        assertTrue(m.isLessThenThreeHundret("1"));
        assertTrue(m.isLessThenThreeHundret("99"));
        assertFalse(m.isLessThenThreeHundret("300"));
    }
}


Pattern and Matcher
For advanced regular expressions the java.util.regex.Pattern and java.util.regex.Matcher classes are used.
You first create a Pattern object which defines the regular expression. This Pattern object allows you to create a Matcher object for a given string. This Matcher object then allows you to do regex operations on a String.
           
package com.dixit.regex.test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTestPatternMatcher {
    public static final String EXAMPLE_TEST = "This is my small example string which I'm going to use for pattern matching.";

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("\\w+");
        // In case you would like to ignore case sensitivity you could use this
        // statement
        // Pattern pattern = Pattern.compile("\\s+", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(EXAMPLE_TEST);
        // Check all occurance
        while (matcher.find()) {
            System.out.print("Start index: " + matcher.start());
            System.out.print(" End index: " + matcher.end() + " ");
            System.out.println(matcher.group());
        }
        // Now create a new pattern and matcher to replace whitespace with tabs
        Pattern replace = Pattern.compile("\\s+");
        Matcher matcher2 = replace.matcher(EXAMPLE_TEST);
        System.out.println(matcher2.replaceAll("\t"));
    }
}


Java Regrex Examples:
The following lists typical examples for the usage of regular expressions. I hope you find similarities to your examples.

OR
Task: Write a regular expression which matches a text line if this text line contains either the word "Joe" or the word "Jim" or both.
Create a project com.dixit.regex.eitheror and the following class.
-------------------------------------------------------------------------------------------------------------------------------------
package com.dixit.regex.eitheror;

import org.junit.Test;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class EitherOrCheck {
    @Test
    public void testSimpleTrue() {
        String s = "humbapumpa jim";
        assertTrue(s.matches(".*(jim|joe).*"));
        s = "humbapumpa jom";
        assertFalse(s.matches(".*(jim|joe).*"));
        s = "humbaPumpa joe";
        assertTrue(s.matches(".*(jim|joe).*"));
        s = "humbapumpa joe jim";
        assertTrue(s.matches(".*(jim|joe).*"));
    }
}
-------------------------------------------------------------------------------------------
---------------------------------

Phone number
Task: Write a regular expression which matches any phone number.
A phone number in this example consists either out of 7 numbers in a row or out of 3 number a (white)space or a dash and then 4 numbers.

-----------------------------------------------------------------------------------------------------------------------------------------               
package com.dixit.regex.phonenumber;

import org.junit.Test;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;


public class CheckPhone {
   
    @Test
    public void testSimpleTrue() {
        String pattern = "\\d\\d\\d([,\\s])?\\d\\d\\d\\d";
        String s= "1233323322";
        assertFalse(s.matches(pattern));
        s = "1233323";
        assertTrue(s.matches(pattern));
        s = "123 3323";
        assertTrue(s.matches(pattern));
    }
}
-----------------------------------------------------------------------------------------------------------------------------------------
           

Check for a certain number range
The following example will check if a text contains a number with 3 digits.
Create the Java project "com.dixit.regex.numbermatch" and the following class.
-----------------------------------------------------------------------------------------------------------------------------------------
package de.vogella.regex.numbermatch;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.junit.Test;

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

public class CheckNumber {
    @Test
    public void testSimpleTrue() {
        String s= "1233";
        assertTrue(test(s));
        s= "0";
        assertFalse(test(s));
        s = "29 Kasdkf 2300 Kdsdf";
        assertTrue(test(s));
        s = "99900234";
        assertTrue(test(s));
    }
    public static boolean test (String s){
        Pattern pattern = Pattern.compile("\\d{3}");
        Matcher matcher = pattern.matcher(s);
        if (matcher.find()){
            return true;
        }
        return false;
    }

}
-----------------------------------------------------------------------------------------------------------------------------------------

Building a link checker
The following example allows you to extract all valid links from a webpage. It does not consider links with start with "javascript:" or "mailto:".
Create the Java project com.dixit.regex.weblinks and the following class:
-----------------------------------------------------------------------------------------------------------------------------------------
package com.dixit.regex.weblinks;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LinkGetter {
    private Pattern htmltag;
    private Pattern link;
    private final String root;

    public LinkGetter(String root) {
        this.root = root;
        htmltag = Pattern.compile("]*href=\"[^>]*>(.*?)");
        link = Pattern.compile("href=\"[^>]*\">");
    }

    public List getLinks(String url) {
        List links = new ArrayList();
        try {
            BufferedReader bufferedReader = new BufferedReader(
                    new InputStreamReader(new URL(url).openStream()));
            String s;
            StringBuilder builder = new StringBuilder();
            while ((s = bufferedReader.readLine()) != null) {
                builder.append(s);
            }

            Matcher tagmatch = htmltag.matcher(builder.toString());
            while (tagmatch.find()) {
                Matcher matcher = link.matcher(tagmatch.group());
                matcher.find();
                String link = matcher.group().replaceFirst("href=\"", "")
                        .replaceFirst("\">", "");
                if (valid(link)) {
                    links.add(makeAbsolute(url, link));
                }
            }
        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return links;
    }

    private boolean valid(String s) {
        if (s.matches("javascript:.*|mailto:.*")) {
            return false;
        }
        return true;
    }

    private String makeAbsolute(String url, String link) {
        if (link.matches("http://.*")) {
            return link;
        }
        if (link.matches("/.*") && url.matches(".*$[^/]")) {
            return url + "/" + link;
        }
        if (link.matches("[^/].*") && url.matches(".*[^/]")) {
            return url + "/" + link;
        }
        if (link.matches("/.*") && url.matches(".*[/]")) {
            return url + link;
        }
        if (link.matches("/.*") && url.matches(".*[^/]")) {
            return url + link;
        }
        throw new RuntimeException("Cannot make the link absolute. Url: " + url
                + " Link " + link);
    }
}
-----------------------------------------------------------------------------------------------------------------------------------------