Wednesday 26 December 2012

Lucene - Quickly add Index and Search Capability

What is Lucene?

Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.


Lucene can plain text, integers, index PDF, Office Documents. etc.,

How Lucene enables Faster Search?

Lucence creates something called Inverted Index.   Normally we map    document -> terms in the document.  But, Lucene does the reverse. Creates a index term -> list of documents containing the term, which makes it faster to search.

 

Install Lucene

Maven Dependency

<pre class="brush:xml"><dependency>
 <groupid>org.apache.lucene</groupid>
 <artifactid>lucene-core</artifactid>
 <version>3.0.2</version>
 <type>jar</type>
 <scope>compile</scope>
</dependency>

Download Dependency

Download Lucene from http://lucene.apache.org/ and add the lucene-core.jar in the classpath

How does Lucene Work? 

  

Let's understand the picture first from bottom - Center. The Raw Text is used to create a Lucene "Document" which is analyzed using the specified Analyzer and Document is added to the index based on the Store, TermVector and Analzed property of the Fields.

Next, the search from top to center. The users specify the query in a text format. The query Object is build based on the query text and the result of the executed query is returned as TopDocs. 

Core Lucene Classes 


Directory, FSDirectory, RAMDirectory
Directory containing Index
File system based index dir
Memory based index dir
Directory indexDirectory = FSDirectory.open(new File("c://lucene//nodes"));
IndexWriter
Handling writing to index - addDocument, updateDocument, deleteDocuments, merge etc
IndexWriter writer = new IndexWriter(indexDirectory,
                                                        new StandardAnalyzer(Version.LUCENE_30),
                                                        new MaxFieldLength(1010101));
IndexSearcher
Search using indexReader - search(query, int)
IndexSearcher searcher = new IndexSearcher(indexDirectory);
Document
DTO used to index and search
Document document = new Document();
Field
Each document contains multiple fields. Has 2 part, name, value.
new Field("id", "1", Store.YES, Index.NOT_ANALYZED)
Term
A word from test. Used in search.2 parts.Field to search and value to search
Term term = new Term("id", "1");
Query
Base of all types of queries - TermQuery, BooleanQuery, PrefixQuery, RangeQuery, WildcardQuery, PhraseQuery etc.
Query query = new TermQuery(term);
Analyzer
Builds tokens from text, and helps in building index terms from text
new StandardAnalyzer()

 

 The Lucene Directory

Directory - is the data space on which lucene operates. It can be a File System or a Memory.
Below are the often used Directory 

Directory
Description
Example
FSDirectory
File System based Directory
Directory = FSDirectory.open(File file);   // File -> Directory path
RAMDirectory
Memory based Lucene directory
Directory = new MemoryDirectory()
Directory = new MemoryDirectory(Directory dir) // load File based Directory to memory

 

 Create an Index Entry

Lucene "Document" object is the main object used in indexing.  Documents contain multiple fields. The Analyzers work on the document fields to break them down into tokens and then writes the Directory using an Index Writer.

IndexWriter

IndexWriter writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_30), true, MaxFieldLength.UNLIMITED);

 

Analyzers

The job of analyzing text into tokens or keywords to be searched. There are few default Analyzers provided by Lucene. The choice of Analyzer defined how the indexed text is tokenized and searched. 
Below are some standard analyzers.


Example - How analyzers work on sample text


 Properties that define Field indexing

  • Store - Should the Field be stored to retrieve in the future
  • ANALYZED - Should the contents be split into tokens
  • TermVECTOR - Term based details to be stored or not


Store :

Should the field be stored to retreive later
STORE.YES
Store the value, can be retrieved later from the index
STORE.NO
Don’t store. Used along with Index.ANALYZED. When token are used only for search


Analyzed: 

How to analyze the text
Index.ANALYZED
Break the text into tokens, index each token make them searchable
Index.NOT_ANALYZED
Index the whole text as a single token, but don’t analyze (split them)
Index.ANALYZED_NO_NORMS
Same as ANALYZED, but does not store norms
Index.NOT_ANALYZED_NO_NORMS
Same as NOT_ANALYZED, but without norms
Index.NO
Don’t  make this field searchable completely


Term Vector

Need Term details for similar, highlight etc.
TermVector.YES
Record  UNIQUE TERMS + COUNTS + NO POSITIONS + NO OFFSETS
in each document
TermVector.WITH_POSITIONS
Record  UNIQUE TERMS + COUNTS + POSITIONS + NO OFFSETS
in each document
TermVector.WITH_OFFSETS
Record  UNIQUE TERMS + COUNTS + NO POSITIONS + OFFSETS
in each document
TermVector.WITH_POSITIONS_OFFSETS
Record  UNIQUE TERMS + COUNTS + POSITIONS + OFFSETS
in each document
TermVector.NO
Don’t record term vector information


Example of Creating Index

IndexWriter writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_30), true,MaxFieldLength.UNLIMITED);

Document document = new Document();
document.add(new Field("id", "1", Store.YES, Index.NOT_ANALYZED));
document.add(new Field("name", "user1", Store.YES, Index.NOT_ANALYZED));
document.add(new Field("age", "20", Store.YES, Index.NOT_ANALYZED));
writer.addDocument(document);

Example of Updating Index

IndexWriter writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED);

Document document = new Document();
document.add(new Field("id", "1", Store.YES, Index.NOT_ANALYZED));
document.add(new Field("name", "changed-user1", Store.YES, Index.NOT_ANALYZED));
document.add(new Field("age", "30", Store.YES, Index.NOT_ANALYZED));
Term term = new Term("id", "1");

Example of Deleting Index

IndexWriter writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_30), MaxFieldLength.UNLIMITED);

Term term = new Term("id", "1");
writer.deleteDocuments(term);
 

Searching an Index : 

The users specify the query in a text format. The query Object is built based on the query text, analyzed and the result of the executed query is returned as TopDocs.


Query:

Queries are the main input for the search.
TermQuery

BooleanQuery
AND or not ( combine multiple queries)  
Machine generated alternative text: Table 36 Boolean query operator shortcuts
Verbose syntax Shortcut syntax
aANDb
+a+b
aORb
ab
aNDNOTb
+a-b

PrefixQuery
Starts with
WildcardQuery
 ? And *   - * not allowed in the beginning
PhraseQuery
Exact phrase
RangeQuery
Term range or numeric range
FuzzyQuery
Similar words search

Sample Queries


Example on Search:

IndexSearcher searcher = new IndexSearcher(indexDirectory);
Term term = new Term("id", "1");
Query query = new TermQuery(term);
TopDocs docs = searcher.search(query, 3);
for (int i = 1; i <= docs.totalHits; i++)
{
     System.out.println(searcher.doc(i));
}


Lucene Diagnostic Tools: 

  • Luke  - http://code.google.com/p/luke/
    Luke is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display and modify their content in several ways:

  • Limo  - http://limo.sourceforge.net/
    The idea is to have a small tool, running as a web application, that gives basic information about indexes used by the Lucene search engine


Complete Example: 

Download here : LuceneTester.java


References

Tuesday 25 December 2012

Enum - in Java





Enumeration


Wikipedia says "An enumeration of a collection of items is a complete, ordered listing of all of the items in that collection"


What is an Enumerated Type?


Enumerated Type is predefined set of constant values that form a group

This is available since JDK 1.5

Before the Enum was introduced, we can achieve similar thing by having a class with private constructor and public constant Object. Example shown below:

public  class State
{
    private State();
    public static final State START = new State();
    public static final State STOP= new State();
    public static final State  RUN= new State();
}
 

Enums in java 1.5


  • All enums extends from java.lang.Enum
  • Enum constants are by default public static

Example of Enum in JDK 1.5 and above: 


enum State
{
     START (1),
     STOP (2),
     RUN (3);

     private int level;

     private State(int level)
     {
         this.level = level;
     }
}

1. use it in switch statements


switch(State)
{
 case START:
          ………….
 case STOP:
          ……….
 case RUN:
         ………..  
} 

2. the enums are constants final. 

  •  It can be compared with ==
  • State threadState = State.RUN; threadState== State.RUN is always true 

3. Enum can over ride methods

enum State
    {
        START (1)
        {
            @Override
            public void log()
            {
                System.out.println("--STARTING--");
            }
        },
        STOP (2)
        {
            @Override
            public void log()
            {
                System.out.println("--STOPPING--");
            }
        },
        RUN (3)
        {
            @Override
            public void log()
            {
                System.out.println("--RUNNING--");
            }
        };

        private int level;

        private State(int level)
        {
            this.level = level;
        }

        public abstract void log();

    }

Importants methods of java.lang.enum


 String name()
Returns the name of this enum constant, exactly as declared in its enum declaration.
State.START.name()  
returns "START"
Int ordinal()
Returns the ordinal of this enumeration constant (its position in its enum declaration, where the initial constant is assigned an ordinal of zero).
State.START.ordinal() returns 1
static  <T extends Enum<T>>  T
valueOf(Class<T> enumType, String name)
Returns the enum constant of the specified enum type with the specified name.
State.valueof(State, "START") returns State.START Object
T[] Values()
Fetch the list of enum  constants declared
State.values(); returns State[]

 

 What is EnumSet?

This is a set implementation to be used in the enum types. All of the elements in an enum set must come from a single enum type that is specified,
static EnumSet<state> set = EnumSet.range(STOP, RUN);

What is EnumMap?

This implements the Map Interface. It takes Enum as the key

 Complete Example - 

Download StateTester.java

package test;

import java.util.EnumMap;
import java.util.EnumSet;

public class StateTester
{

    enum State
    {
        START (1)
        {
            @Override
            public void log()
            {
                System.out.println("--STARTING--");
            }
        },
        STOP (2)
        {
            @Override
            public void log()
            {
                System.out.println("--STOPPING--");
            }
        },
        RUN (3)
        {
            @Override
            public void log()
            {
                System.out.println("--RUNNING--");
            }
        };

        private int level;

        static EnumSet<State> set = EnumSet.range(STOP, RUN);

        private State(int level)
        {
            this.level = level;
        }

        public abstract void log();

    }

    public static void main(String args[])
    {
        System.out.println("Name : "+State.START.name()); // OUTPUT : START
        System.out.println("Ordinal : "+State.RUN.ordinal()); // OUTPUT : 2
        System.out.println("valueOf: "+ State.valueOf("STOP")); // OUTPUT : STOP
        System.out.println("Enum.valueOf : "+ Enum.valueOf(State.class, "START")); // OUTPUT : START
        
        System.out.println("Enumset.contains :"+ State.set.contains(State.START)); //false
        System.out.println("Enumset.contains :"+ State.set.contains(State.STOP)); //true
        System.out.println("values : "+ State.values());   // array [Ltest.StateTester$State;@152b6651

        EnumMap<State, String> stateEnum = new EnumMap<State,String>(State.class);
        stateEnum.put(State.RUN, "run");
        System.out.println("ENUM MAP: "+stateEnum.get(State.RUN));
        System.out.println("ENUM MAP null: "+stateEnum.get(State.START));
        
        System.out.println("wrong valueOf :  "+ State.valueOf("run")); // Exception in thread "main" java.lang.IllegalArgumentException: No enum const class test.StateTester$State.run
    }
}

Output from the above java file:


Name : START
Ordinal : 2
valueOf: STOP
Enum.valueOf : START
Enumset.contains :false
Enumset.contains :true
values : [Ltest.StateTester$State;@6bbc4459
ENUM MAP: run
ENUM MAP null: null

Exception in thread "main" java.lang.IllegalArgumentException: No enum const class test.StateTester$State.run
 at java.lang.Enum.valueOf(Enum.java:196)
 at test.StateTester$State.valueOf(StateTester.java:1)
 at test.StateTester.main(StateTester.java:65)

Wednesday 12 December 2012

REST - Basics

Advanced - Exception Handling in java

How to create an Exception

Any checked exception can be a subclass of java.lang.Exception hierarchy or java.lang.Throwable class

Unchecked exception can be subclassed for java.lang.Error or java.lang.RuntimeException hierarchy classes

Example 
public class LowerBoundException extends java.lang.Exception
{
    int errorCode=-1;
    public LowerBoundException(final int errorCode, final String message)
    {
        super(message);
        this.errorCode=errorCode;
    }
}

Creating Exceptions - with Throws and Throw

Throws - says method is expected to throw the specified exceptions. The caller show either handle those with appropriate try catch block or re- throw them down the call stack.

Throw - throw an error condition to the caller of the method


Example
public class  Validator
{
      static int lower=50;
      public void validate(final int value) throws LowerBoundException
      {
            If(value < lower)
            {
                  throw new LowerBoundException(100,  value + " is less than "+lower);
            }
      }

}

Exception Handling - with try, catch & finally

The code inside the block will be handled for errors.
Catch - handle the respective type of exception.
Finally - regardless of whether the code inside the try throws an exception or not, the finally block will be executed. This will be used for any form up cleanup activities.

The finally block is a key tool for preventing resource leaks. When closing a file or otherwise recovering resources, place the code in a finally block to ensure that resource is always recovered.


Example
public class ValidatorTest
{
 public static void main(String args[])
 {
  Validator validator = new Validator();
  try
  {
      validator.validate(30); //throws LowerBoundException
      validator.validate(50);
      validator.validate(100);
  }catch(LowerBoundException lbe)
  {
        System.out.println("--Exception :: "+lbe.getMessage());
        lbe.printStackTrace();
  }
  finally   //cleanup
  {
        validator=null;
  }
 }
}

 

Catching More Than One Type of Exception with One Exception Handler

In Java SE 7 and later, a single catch block can handle more than one type of exception. This feature can reduce code duplication and lessen the temptation to catch an overly broad exception.

In the catch clause, specify the types of exceptions that block can handle, and separate each exception type with a vertical bar (|):

Example
catch (IOException|SQLException ex) {
    logger.log(ex);
    throw ex;
}

 

The try-with-resources Statement 

This is available in JDK 1.7. The try-with-resources statement is a try statement that declares one or more resources. A resource is an object that must be closed after the program is finished with it. The try-with-resources statement ensures that each resource is closed at the end of the statement. Any object that implements java.lang.AutoCloseable, which includes all objects which implement java.io.Closeable, can be used as a resource.

The following example reads the first line from a file. It uses an instance of BufferedReader to read data from the file. BufferedReader is a resource implementing java.lang.AutoClosable in Java SE 7. Because the BufferedReader instance is declared in a try-with-resource statement, it will be closed regardless of whether the try statement completes normally or abruptly


static String readFirstLineFromFile(String path) throws IOException {
    try (BufferedReader br =  new BufferedReader(new FileReader(path))) {
        return br.readLine();
    }
}

Prior to Java SE 7, the finally block was used to achieve the same . The following example uses a finally block instead of a try-with-resources statement:
static String readFirstLineFromFileWithFinallyBlock(String path)  throws IOException {
    BufferedReader br = new BufferedReader(new FileReader(path));
    try {
        return br.readLine();
    } finally {
        if (br != null) br.close();
    }
}

Exception Chaining:

A technique of handling exceptions by re-throwing a caught exception after wrapping it inside a new exception.  This is very helpful in wrapping  unchecked exceptions into a checkedexception.   The entire trace of errors is captured in the stacktrace of the exception

Logging Exceptions:

With Exception e;
Exception Methods
What does it do?
e.toString()
Fetches the exception name with Message
e.printStackTrace()
prints the entire stack trace
e.getMessage()
fetches the message of the exception

Best Practices:

  1. Do not propogate implementation-specific exception propogate the business/user Layer. If there is a SQLException while logging in , the User does not need to know about the SQLException. All that they want to see is "Could not Login".
  2. Never swollow or ignore exceptions. Do not catch exception and return. They should be logged or nested in another exception when rethrowing.
    try
    {
      …………..
    }catch(SQLException se)
    {
    }
    
  3. Always catch the  subclass exception first and then the superclass.
  4.  Log exception Only once. Do not print stacktrace  unless it is very necessary for debugging.
  5. Print stacktrace using log.debug(). 
  6. Always cleanup after exception using finally
  7. Exception naming is very important. Use Appropriate names matching what the exception carries.
  8. Throw early and catch late. Throw exceptions as soon as you find it if it cannot be handled.

References

http://docs.oracle.com/javase/tutorial/essential/exceptions/index.htm
http://doc.sumy.ua/prog/java/langref/ch09_01.htm

Wednesday 5 December 2012

Basics- Exceptions in Java




 

 

 

 

What are Exceptions?



Exception is the way to indicate an abnormal error that occurred in the called method.



Exception Handling:


  • When an error occurs within a method, the method creates an exception object and hands it off to the runtime system. The Exception object holds the information on the error, its type and state of the program when the error occurred. Creating an exception object and handing it to the runtime system is called throwing an exception.
  • After a method throws an exception, the runtime system attempts to find a block of code (Exception Handler) to handle it.  The block to handle is searched  in the call stack.The exception handler chosen is said to catch the exception.
  • If the runtime system  searches all the methods on the call stack and doesn’t find an appropriate exception handler,  the program terminates. 

Exception Handling:

A block of code that can catch and handle the exception.   Mostly has the try.. Catch.. Finally block In the call stack when an exception is thrown,the appropriate handler for the exception will be searched towards bottom of the stack

Types of Exceptions:

There are 2 major category of Exception classes. Checked and Unchecked Exceptions. Below is the high level hierarchy of exception classes


Differences between Checked and unchecked Exceptions


Unchecked Exceptions
Checked Exceptions
Type
Runtime Exceptions
Compile time exceptions
extends
Subclasses of Error and RuntimeException
Subclasses of Throwable and Exception other than RuntimeException
throws
Donot need throws
Need appropriate throws
Exception handling
Do not mandate try catch
Mandate try catch somewhere in the call stack
Example
 OutofMemoryError, NullPointerException
ClassNotFoundException


Exception Hierarchy:


java.lang package hosts the exception classes in java.  As seen above, the 2 types of Throwable Error and Exception Hierarchy is shown in the below snapshot


 

Conclusion:


More basic details on Exceptions, handling and java.lang classes are covered in this post. More advanced information about creating Exceptions, handling exceptions with examples, Best practices will be covered in the next post.