marf.Classification.Stochastic
Class ZipfLaw

java.lang.Object
  extended by marf.Storage.StorageManager
      extended by marf.Classification.Classification
          extended by marf.Classification.Stochastic.Stochastic
              extended by marf.Classification.Stochastic.ZipfLaw
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, IClassification, IStorageManager

public class ZipfLaw
extends Stochastic

Module exercising Zipf's Law.

$Id: ZipfLaw.java,v 1.32 2007/12/31 00:17:05 mokhov Exp $

Since:
0.3.0.2
Version:
$Revision: 1.32 $
Author:
Serguei Mokhov
See Also:
Serialized Form

Field Summary
static int DEFAULT_OUTPUT_PAGE_SIZE
          Default number of entries display/output per page.
 
Fields inherited from class marf.Classification.Classification
adFeatureVector, oFeatureExtraction, oResultSet, oTrainingSet
 
Fields inherited from class marf.Storage.StorageManager
bDumpOnNotFound, iCurrentDumpMode, oObjectToSerialize, strFilename
 
Fields inherited from interface marf.Classification.IClassification
MARF_INTERFACE_CODE_REVISION
 
Fields inherited from interface marf.Storage.IStorageManager
DUMP_BINARY, DUMP_CSV_TEXT, DUMP_GZIP_BINARY, DUMP_HTML, DUMP_SQL, DUMP_XML, MARF_INTERFACE_CODE_REVISION, STORAGE_FILE_EXTENSIONS
 
Constructor Summary
ZipfLaw(IFeatureExtraction poFeatureExtraction)
          Classification API.
ZipfLaw(java.lang.String pstrStatsFilename)
          Takes a filename argument.
 
Method Summary
 void backSynchronizeObject()
          Must to be overridden by the modules that use object serialization with the generic implementation of restore().
 boolean classify(double[] padFeatureVector)
          Not Implemented.
 void collectStatistics(double[] padFeatures)
          Collects result statistics.
 void collectStatistics(java.io.StreamTokenizer poStreamTokenizer)
          Collects result statistics.
 void dump()
          An object must know how dump itself or its data structures to a file.
 void dumpAll()
          Dumps results to STDOUT.
 void dumpCSV()
          Implements CSV dump through the dumpGraphValues() method.
 void dumpGraphValues()
          Dumps CVS values of the rank and frequency into a file.
static java.lang.String getMARFSourceCodeRevision()
          Retrieves class' revision.
 int getMaxWordLength()
          Allows getting the length of the longest word found (in characters).
 int getMinWordLength()
          Allows getting the length of the smallest word found (in characters).
 Result getResult()
          Retrieves the maximum-probability classification result.
 StatisticalObject[] getSortedStatRefs()
          Allows getting an array of sorted references to WordStats objects.
 java.util.Hashtable getStats()
          Allows getting raw Hashtable of the WordStats objects.
 WordStats getWordStats(java.lang.String pstrLexeme)
          Allows getting a particular WordStats object by its lexeme.
 boolean isDumpLogariphmOn()
          Allows examining the value of the log-log flag.
 void restore()
          An object must know how restore itself or its data structures from a file.
 void setDumpLogariphm(boolean pbDumpLogariphm)
          Allows setting the dump log-log flag to indicate the module to dump graphs in the log-log scale.
 java.lang.String toString()
          Reports minimum and maximum word lengths and the dictionary itself in a form of a String.
 boolean train(double[] padFeatureVector)
          Not Implemented.
 
Methods inherited from class marf.Classification.Classification
classify, clone, getFeatureExtraction, getResultSet, getTrainingSetFilename, loadTrainingSet, setFeatureExtraction, train
 
Methods inherited from class marf.Storage.StorageManager
dumpBinary, dumpGzipBinary, dumpHTML, dumpSQL, dumpXML, enableDumpOnNotFound, equals, getDefaultExtension, getDefaultExtension, getDumpMode, getFilename, getObjectToSerialize, hashCode, restoreBinary, restoreCSV, restoreGzipBinary, restoreHTML, restoreSQL, restoreXML, setDumpMode, setFilename
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_OUTPUT_PAGE_SIZE

public static final int DEFAULT_OUTPUT_PAGE_SIZE
Default number of entries display/output per page.

Since:
0.3.0.5
See Also:
Constant Field Values
Constructor Detail

ZipfLaw

public ZipfLaw(java.lang.String pstrStatsFilename)
Takes a filename argument.

Parameters:
pstrStatsFilename - the desired file to process

ZipfLaw

public ZipfLaw(IFeatureExtraction poFeatureExtraction)
Classification API.

Parameters:
poFeatureExtraction - preprocessing module to get the data from
Method Detail

classify

public boolean classify(double[] padFeatureVector)
                 throws ClassificationException
Description copied from class: Stochastic
Not Implemented.

Specified by:
classify in interface IClassification
Overrides:
classify in class Stochastic
Parameters:
padFeatureVector - vector of features to compare with the stored ones
Returns:
nothing
Throws:
ClassificationException - never thrown
Since:
0.3.0.6
See Also:
IClassification.classify(double[])

train

public boolean train(double[] padFeatureVector)
              throws ClassificationException
Description copied from class: Stochastic
Not Implemented.

Specified by:
train in interface IClassification
Overrides:
train in class Stochastic
Parameters:
padFeatureVector - feature vector to train on
Returns:
nothing
Throws:
ClassificationException - never thrown
Since:
0.3.0.6
See Also:
IClassification.train(double[])

getResult

public Result getResult()
Description copied from class: Stochastic
Retrieves the maximum-probability classification result.

Specified by:
getResult in interface IClassification
Overrides:
getResult in class Stochastic
Returns:
Result object
Since:
0.3.0.6
See Also:
IClassification.getResult()

collectStatistics

public final void collectStatistics(double[] padFeatures)
                             throws ClassificationException
Collects result statistics. TODO: employ StatsCollector.

Parameters:
padFeatures - desired stream tokenizer
Throws:
ClassificationException - in case of inner exceptions

collectStatistics

public final void collectStatistics(java.io.StreamTokenizer poStreamTokenizer)
                             throws ClassificationException
Collects result statistics. TODO: employ StatsCollector.

Parameters:
poStreamTokenizer - desired stream tokenizer
Throws:
ClassificationException - in case of inner exceptions

dumpAll

public final void dumpAll()
Dumps results to STDOUT.


dumpGraphValues

public final void dumpGraphValues()
                           throws java.io.IOException
Dumps CVS values of the rank and frequency into a file. Filename is composed from the original corpus name plus the .csv extension. By default the dump is in the log() scale.

Throws:
java.io.IOException

backSynchronizeObject

public void backSynchronizeObject()
Description copied from class: StorageManager
Must to be overridden by the modules that use object serialization with the generic implementation of restore(). By default this method is unimplemented.

Overrides:
backSynchronizeObject in class StorageManager
Since:
0.3.0.5
See Also:
restore(), StorageManager.backSynchronizeObject()

dump

public void dump()
          throws StorageException
An object must know how dump itself or its data structures to a file. Options are: Object serialization and CSV. Internally, the method calls all the dump*() methods based on the current dump mode. This derivative uses only DUMP_GZIP_BINARY, DUMP_BINARY and DUMP_CSV_TEXT modes.

Specified by:
dump in interface IStorageManager
Overrides:
dump in class Classification
Throws:
StorageException - if saving to a file for some reason fails or the dump mode set to an unsupported value
Since:
0.3.0.5
See Also:
StorageManager.dumpGzipBinary(), dumpCSV(), StorageManager.dumpBinary(), backSynchronizeObject()

restore

public void restore()
             throws StorageException
An object must know how restore itself or its data structures from a file. Options are: Object serialization and CSV. Internally, the method calls all the restore*() methods based on the current dump mode.

Specified by:
restore in interface IStorageManager
Overrides:
restore in class Classification
Throws:
StorageException - if loading from a file for some reason fails or the dump mode set to an unsupported value
Since:
0.3.0.5
See Also:
IStorageManager.DUMP_GZIP_BINARY, IStorageManager.DUMP_BINARY, IStorageManager.DUMP_CSV_TEXT, StorageManager.dumpGzipBinary(), StorageManager.dumpBinary(), dumpCSV(), backSynchronizeObject(), StorageManager.iCurrentDumpMode

dumpCSV

public void dumpCSV()
             throws StorageException
Implements CSV dump through the dumpGraphValues() method.

Specified by:
dumpCSV in interface IStorageManager
Overrides:
dumpCSV in class StorageManager
Throws:
StorageException - in case of any I/O error
Since:
0.3.0.5
See Also:
dumpGraphValues()

isDumpLogariphmOn

public boolean isDumpLogariphmOn()
Allows examining the value of the log-log flag.

Returns:
the current value of the flag
Since:
0.3.0.5
See Also:
setDumpLogariphm(boolean)

setDumpLogariphm

public void setDumpLogariphm(boolean pbDumpLogariphm)
Allows setting the dump log-log flag to indicate the module to dump graphs in the log-log scale.

Parameters:
pbDumpLogariphm - new value of the log-log flag
Since:
0.3.0.5

getSortedStatRefs

public final StatisticalObject[] getSortedStatRefs()
Allows getting an array of sorted references to WordStats objects.

Returns:
the sorted WordStats array
Since:
0.3.0.5

getStats

public final java.util.Hashtable getStats()
Allows getting raw Hashtable of the WordStats objects.

Returns:
the stats hashtable
Since:
0.3.0.5

getWordStats

public final WordStats getWordStats(java.lang.String pstrLexeme)
Allows getting a particular WordStats object by its lexeme.

Parameters:
pstrLexeme - lexeme to look up the WordStats entry
Returns:
the corresponding WordStats entry or null if not found
Since:
0.3.0.5

getMaxWordLength

public final int getMaxWordLength()
Allows getting the length of the longest word found (in characters).

Returns:
the length of the longest word in the dictionary
Since:
0.3.0.5

getMinWordLength

public final int getMinWordLength()
Allows getting the length of the smallest word found (in characters).

Returns:
the length of the smallest word in the dictionary
Since:
0.3.0.5

toString

public java.lang.String toString()
Reports minimum and maximum word lengths and the dictionary itself in a form of a String.

Overrides:
toString in class StorageManager
Since:
0.3.0.5
See Also:
Object.toString()

getMARFSourceCodeRevision

public static java.lang.String getMARFSourceCodeRevision()
Retrieves class' revision.

Returns:
revision string


SourceForge Logo