marf.Stats.StatisticalEstimators
Class StatisticalEstimator

java.lang.Object
  extended by marf.Storage.StorageManager
      extended by marf.Stats.StatisticalEstimators.StatisticalEstimator
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, IStatisticalEstimator, IStorageManager
Direct Known Subclasses:
GLI, KatzBackoff, SLI, Smoothing

public abstract class StatisticalEstimator
extends StorageManager
implements IStatisticalEstimator

Implements generic Statistical Estimator routines. Must be subclasses by concrete implementations of statistical estimators.

$Id: StatisticalEstimator.java,v 1.30 2007/12/18 21:57:15 mokhov Exp $

Since:
0.3.0.2
Version:
$Revision: 1.30 $
Author:
Serguei Mokhov
See Also:
Serialized Form

Field Summary
protected  ProbabilityTable oProbabilityTable
          Probabilities table to perform estimation on.
protected  NLPStreamTokenizer oStreamTokenizer
          Stream tokenizer for NLP processing to get word and non-word tokens from.
private static long serialVersionUID
          For serialization versioning.
 
Fields inherited from class marf.Storage.StorageManager
bDumpOnNotFound, iCurrentDumpMode, oObjectToSerialize, strFilename
 
Fields inherited from interface marf.Stats.StatisticalEstimators.IStatisticalEstimator
MARF_INTERFACE_CODE_REVISION
 
Fields inherited from interface marf.Storage.IStorageManager
DUMP_BINARY, DUMP_CSV_TEXT, DUMP_GZIP_BINARY, DUMP_HTML, DUMP_SQL, DUMP_XML, MARF_INTERFACE_CODE_REVISION, STORAGE_FILE_EXTENSIONS
 
Constructor Summary
StatisticalEstimator()
          Default constructor creates a new probability table based on language with the default filename.
 
Method Summary
 void backSynchronizeObject()
          Updates the reference to the probabilities table for future serialization after restoration.
 void dumpCSV()
          Not implemented.
 void dumpXML()
          Not implemented.
 java.lang.String getFilename()
          Sets the default filename for dumps as e.g.
 java.lang.String getLanguage()
          Retrieves current language.
static java.lang.String getMARFSourceCodeRevision()
          Retrieves class' revision.
 ProbabilityTable getProbabilityTable()
          Retrieves current probabilities table.
 NLPStreamTokenizer getStreamTokenizer()
          Retrieves current stream tokenizer.
 double p()
          N-gram-based probability classification.
 java.lang.String resetFilename()
          Resets the internal filename to the default and returns it.
 void restoreCSV()
          Not implemented.
 void restoreXML()
          Not implemented.
 void setLanguage(java.lang.String pstrLang)
          Allows alteration of the current language being processed.
 void setStreamTokenizer(NLPStreamTokenizer poStreamTokenizer)
          Sets desired stream tokenizer.
 boolean train()
          Every estimator needs to implement its specific training method.
 
Methods inherited from class marf.Storage.StorageManager
clone, dump, dumpBinary, dumpGzipBinary, dumpHTML, dumpSQL, enableDumpOnNotFound, equals, getDefaultExtension, getDefaultExtension, getDumpMode, getObjectToSerialize, hashCode, restore, restoreBinary, restoreGzipBinary, restoreHTML, restoreSQL, setDumpMode, setFilename, toString
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

oProbabilityTable

protected ProbabilityTable oProbabilityTable
Probabilities table to perform estimation on.


oStreamTokenizer

protected NLPStreamTokenizer oStreamTokenizer
Stream tokenizer for NLP processing to get word and non-word tokens from.


serialVersionUID

private static final long serialVersionUID
For serialization versioning. When adding new members or make other structural changes regenerate this number with the serialver tool that comes with JDK.

Since:
0.3.0.5
See Also:
Constant Field Values
Constructor Detail

StatisticalEstimator

public StatisticalEstimator()
Default constructor creates a new probability table based on language with the default filename.

See Also:
MARF.NLP.getLanguage(), getFilename()
Method Detail

p

public final double p()
N-gram-based probability classification.

Specified by:
p in interface IStatisticalEstimator
Returns:
calculated probability value

train

public boolean train()
Every estimator needs to implement its specific training method.

Specified by:
train in interface IStatisticalEstimator
Returns:
true if training was successful

backSynchronizeObject

public void backSynchronizeObject()
Updates the reference to the probabilities table for future serialization after restoration. A part of the StorageManager interface.

Overrides:
backSynchronizeObject in class StorageManager
Since:
0.3.0.5
See Also:
StorageManager.backSynchronizeObject()

dumpCSV

public void dumpCSV()
             throws StorageException
Not implemented.

Specified by:
dumpCSV in interface IStorageManager
Overrides:
dumpCSV in class StorageManager
Throws:
NotImplementedException
StorageException - never thrown
See Also:
StorageManager.dump()

dumpXML

public void dumpXML()
             throws StorageException
Not implemented.

Specified by:
dumpXML in interface IStorageManager
Overrides:
dumpXML in class StorageManager
Throws:
NotImplementedException
StorageException - never thrown
See Also:
StorageManager.dump()

restoreCSV

public void restoreCSV()
                throws StorageException
Not implemented.

Specified by:
restoreCSV in interface IStorageManager
Overrides:
restoreCSV in class StorageManager
Throws:
NotImplementedException
StorageException - never thrown
See Also:
StorageManager.restore()

restoreXML

public void restoreXML()
                throws StorageException
Not implemented.

Specified by:
restoreXML in interface IStorageManager
Overrides:
restoreXML in class StorageManager
Throws:
NotImplementedException
StorageException - never thrown
See Also:
StorageManager.restore()

setStreamTokenizer

public final void setStreamTokenizer(NLPStreamTokenizer poStreamTokenizer)
Description copied from interface: IStatisticalEstimator
Sets desired stream tokenizer.

Specified by:
setStreamTokenizer in interface IStatisticalEstimator
Parameters:
poStreamTokenizer - NLPStreamTokenizer or a derivative to use for tokens
See Also:
IStatisticalEstimator.setStreamTokenizer(marf.nlp.util.NLPStreamTokenizer)

getStreamTokenizer

public NLPStreamTokenizer getStreamTokenizer()
Description copied from interface: IStatisticalEstimator
Retrieves current stream tokenizer.

Specified by:
getStreamTokenizer in interface IStatisticalEstimator
Returns:
the stream tokenizer being used
See Also:
IStatisticalEstimator.getStreamTokenizer()

getProbabilityTable

public ProbabilityTable getProbabilityTable()
Description copied from interface: IStatisticalEstimator
Retrieves current probabilities table.

Specified by:
getProbabilityTable in interface IStatisticalEstimator
Returns:
probabilities table being used
See Also:
IStatisticalEstimator.getProbabilityTable()

setLanguage

public final void setLanguage(java.lang.String pstrLang)
Description copied from interface: IStatisticalEstimator
Allows alteration of the current language being processed.

Specified by:
setLanguage in interface IStatisticalEstimator
Parameters:
pstrLang - desired language
See Also:
IStatisticalEstimator.setLanguage(java.lang.String)

getLanguage

public final java.lang.String getLanguage()
Description copied from interface: IStatisticalEstimator
Retrieves current language.

Specified by:
getLanguage in interface IStatisticalEstimator
Returns:
language name of language being processed
See Also:
IStatisticalEstimator.getLanguage()

resetFilename

public final java.lang.String resetFilename()
Resets the internal filename to the default and returns it.

Returns:
the reset filename
See Also:
getFilename()

getFilename

public final java.lang.String getFilename()
Sets the default filename for dumps as e.g. nlp.StatisticalEstimators.Smoothing.WittenBell.1.en.gzbin. More generally, <h;estimator/smoothing>h;.<h;ngram-model>h;.<h;lang>h;.gzbin

Overrides:
getFilename in class StorageManager
Returns:
filename string
See Also:
StorageManager.getFilename()

getMARFSourceCodeRevision

public static java.lang.String getMARFSourceCodeRevision()
Retrieves class' revision.

Returns:
revision string


SourceForge Logo