marf.nlp.Storage
Class Corpus

java.lang.Object
  extended by marf.nlp.Storage.Corpus
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, java.lang.Comparable

public class Corpus
extends java.lang.Object
implements java.io.Serializable, java.lang.Cloneable, java.lang.Comparable

Corpus container bean.

Since:
0.3.0.6
Version:
$Revision: 1.3 $
Author:
Serguei Mokhov
See Also:
Serialized Form

Field Summary
static int COMPARE_CASE_INSENSITIVE
          Case is not altered, but internal casing is ignored.
static int COMPARE_CASE_SENSITIVE
           
static int COMPARE_LOWER_CASE
          Lowercased prior action.
static int COMPARE_UPPER_CASE
          Uppercased prior action.
protected  int iCaseSensitivity
           
protected  int iDeletes
           
protected  int iInserts
           
protected  int iMatches
           
protected  int iModifications
           
protected  java.lang.StringBuffer oRawCorpusTextBuffer
          Plain-text container.
protected  java.util.List oTokenizedCorpus
           
protected  NLPStreamTokenizer oTokenizer
          Tokenizer to use to convert a raw string to WordStat tokens.
 
Constructor Summary
Corpus()
           
 
Method Summary
 Corpus append(java.lang.Object poObjectToAppend)
           
 Corpus appendToken(java.lang.String pstrTokenToAppend)
           
protected  java.lang.Object clone()
           
 void compare(Corpus poCorpus)
          Compares this corpus to another one and report the differences in terms of lexeme tokens.
 int compareTo(java.lang.Object poObjectToCompare)
           
 boolean equals(java.lang.Object arg0)
           
protected  void finalize()
           
 int getCaseSensitivity()
           
 int getDeletes()
           
 int getInserts()
           
 int getMatches()
           
 int getModifications()
           
 java.lang.StringBuffer getRawCorpusTextBuffer()
           
 java.util.List getTokenizedCorpus()
           
 NLPStreamTokenizer getTokenizer()
           
 int hashCode()
           
 void setCaseSensitivity(int caseSensitivity)
           
 void setDeletes(int deletes)
           
 void setInserts(int inserts)
           
 void setMatches(int matches)
           
 void setModifications(int modifications)
           
 void setRawCorpusTextBuffer(java.lang.StringBuffer poRawCorpusTextBuffer)
           
 void setTokenizedCorpus(java.util.List tokenizedCorpus)
           
 void setTokenizer(NLPStreamTokenizer poTokenizer)
           
 void tokenize()
          Tokenizes raw contained corpus into a list of tokens.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

COMPARE_CASE_SENSITIVE

public static final int COMPARE_CASE_SENSITIVE
See Also:
Constant Field Values

COMPARE_LOWER_CASE

public static final int COMPARE_LOWER_CASE
Lowercased prior action.

See Also:
Constant Field Values

COMPARE_UPPER_CASE

public static final int COMPARE_UPPER_CASE
Uppercased prior action.

See Also:
Constant Field Values

COMPARE_CASE_INSENSITIVE

public static final int COMPARE_CASE_INSENSITIVE
Case is not altered, but internal casing is ignored.

See Also:
Constant Field Values

iCaseSensitivity

protected int iCaseSensitivity

oRawCorpusTextBuffer

protected java.lang.StringBuffer oRawCorpusTextBuffer
Plain-text container.


oTokenizer

protected transient NLPStreamTokenizer oTokenizer
Tokenizer to use to convert a raw string to WordStat tokens.


iMatches

protected int iMatches

iInserts

protected int iInserts

iDeletes

protected int iDeletes

iModifications

protected int iModifications

oTokenizedCorpus

protected java.util.List oTokenizedCorpus
Constructor Detail

Corpus

public Corpus()
Method Detail

compare

public void compare(Corpus poCorpus)
Compares this corpus to another one and report the differences in terms of lexeme tokens. This corpus acts like main corpus to compare to (master); and the parameter is the one being compared. The statistics is measured to count the number of the delets, inserts, or modifications with respect to the master corpus.

Parameters:
poCorpus - secondary tokenized corpus to compare with this one

append

public Corpus append(java.lang.Object poObjectToAppend)

appendToken

public Corpus appendToken(java.lang.String pstrTokenToAppend)

tokenize

public void tokenize()
              throws java.io.IOException
Tokenizes raw contained corpus into a list of tokens.

Throws:
java.io.IOException

getRawCorpusTextBuffer

public java.lang.StringBuffer getRawCorpusTextBuffer()

setRawCorpusTextBuffer

public void setRawCorpusTextBuffer(java.lang.StringBuffer poRawCorpusTextBuffer)

getTokenizer

public NLPStreamTokenizer getTokenizer()

setTokenizer

public void setTokenizer(NLPStreamTokenizer poTokenizer)

getTokenizedCorpus

public java.util.List getTokenizedCorpus()

setTokenizedCorpus

public void setTokenizedCorpus(java.util.List tokenizedCorpus)

getCaseSensitivity

public int getCaseSensitivity()

setCaseSensitivity

public void setCaseSensitivity(int caseSensitivity)

getMatches

public int getMatches()

setMatches

public void setMatches(int matches)

getInserts

public int getInserts()

setInserts

public void setInserts(int inserts)

getDeletes

public int getDeletes()

setDeletes

public void setDeletes(int deletes)

getModifications

public int getModifications()

setModifications

public void setModifications(int modifications)

clone

protected java.lang.Object clone()
                          throws java.lang.CloneNotSupportedException
Overrides:
clone in class java.lang.Object
Throws:
java.lang.CloneNotSupportedException

equals

public boolean equals(java.lang.Object arg0)
Overrides:
equals in class java.lang.Object

finalize

protected void finalize()
                 throws java.lang.Throwable
Overrides:
finalize in class java.lang.Object
Throws:
java.lang.Throwable

hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

compareTo

public int compareTo(java.lang.Object poObjectToCompare)
Specified by:
compareTo in interface java.lang.Comparable


SourceForge Logo