| 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectmarf.Storage.StorageManager
marf.nlp.Parsing.ProbabilisticParser
public class ProbabilisticParser
Probabilistic parser is set of parsing a natural language (e.g. English) given probabilistic grammar. Since natural language sentences are ambiguous and a single sentence may have more than one parse each grammar rule is assigned a probability and a parse is chosen for the rule with according to the probability. This class implements the well-known CYK probabilistic parsing algorithm.
The CYK algorithm is cited below. The main reference is here.
 function CYK(words,grammar) returns The most probable parse
                             and its probability
 
     create and clear pi[num_words, num_words, num_nonterminals]
 
     # base case
     for i <-- 1 to num_words
         for A <-- 1 to num_nonterminals
             if (A --> wi) is in grammar then
                 pi [i, i, A] = P(A --> wi)
 
     # recursive case
     for span <-- 2 to num_words
         for begin <-- 1 to num_words - span + 1
             end <-- begin + span - 1
             for m = begin to end - 1
 
                 for A = 1 to num nonterminals
                     for B = 1 to num nonterminals
                         for C = 1 to num nonterminals
 
                             prob = pi [begin, m, B] * pi [m + 1, end, C] * P(A --> BC)
 
                             if (prob > pi[begin, end, A]) then
                                 pi [begin, end, A] = prob
                                 back[begin, end, A] = {m, B, C}
 
     return build_tree(back[1, num_words, 1]), [1, num_words, 1])
 
 $Id: ProbabilisticParser.java,v 1.31 2007/12/18 21:37:54 mokhov Exp $
| Field Summary | 
|---|
| Fields inherited from class marf.Storage.StorageManager | 
|---|
bDumpOnNotFound, iCurrentDumpMode, oObjectToSerialize, strFilename | 
| Fields inherited from interface marf.Storage.IStorageManager | 
|---|
DUMP_BINARY, DUMP_CSV_TEXT, DUMP_GZIP_BINARY, DUMP_HTML, DUMP_SQL, DUMP_XML, MARF_INTERFACE_CODE_REVISION, STORAGE_FILE_EXTENSIONS | 
| Constructor Summary | |
|---|---|
ProbabilisticParser()
Initializes default probabilistic parser with empty grammar.  | 
|
ProbabilisticParser(java.io.StreamTokenizer poStreamTokenizer)
Initializes probabilistic parser with the specified tokenizer.  | 
|
ProbabilisticParser(java.lang.String pstrGrammarFilename)
Initializes probabilistic parser with the grammar filename.  | 
|
| Method Summary | |
|---|---|
 void | 
backSynchronizeObject()
Implements StorageManager interface.  | 
 void | 
dumpBackPointersContents()
Dumps back-pointers to the STDOUT.  | 
 void | 
dumpParseMatrix()
Dumps parse matrix to the STDOUT.  | 
 void | 
dumpParseTree()
Dumps parse tree to the STDOUT.  | 
 void | 
dumpParseTree(int piLevel,
              int i,
              int j,
              int piA)
Dumps a parse sub-tree to to the STDOUT Initial level of S non-terminal should be 0.  | 
static java.lang.String | 
getMARFSourceCodeRevision()
Retrieves class' revision.  | 
protected  java.lang.String | 
getSentencePart(int i,
                int j)
Gets a sentence span given indices.  | 
protected  void | 
indent(int piTabSize)
Indents by the specified number of tabs.  | 
 boolean | 
parse()
Performs parse of a natural language sentence using the CYK algorithm.  | 
 void | 
setStreamTokenizer(java.io.StreamTokenizer poStreamTokenizer)
Allows setting desired stream tokenzer.  | 
 boolean | 
train()
Performs training of the parser by compiling the source probabilistic grammar and then dumping it onto disk as a precompiled binary file for future re-load.  | 
| Methods inherited from class marf.Storage.StorageManager | 
|---|
clone, dump, dumpBinary, dumpCSV, dumpGzipBinary, dumpHTML, dumpSQL, dumpXML, enableDumpOnNotFound, equals, getDefaultExtension, getDefaultExtension, getDumpMode, getFilename, getObjectToSerialize, hashCode, restore, restoreBinary, restoreCSV, restoreGzipBinary, restoreHTML, restoreSQL, restoreXML, setDumpMode, setFilename, toString | 
| Methods inherited from class java.lang.Object | 
|---|
finalize, getClass, notify, notifyAll, wait, wait, wait | 
| Constructor Detail | 
|---|
public ProbabilisticParser(java.lang.String pstrGrammarFilename)
pstrGrammarFilename - the filename of the probabilistic grammarpublic ProbabilisticParser(java.io.StreamTokenizer poStreamTokenizer)
poStreamTokenizer - the stream tokenizer to read the tokens offpublic ProbabilisticParser()
| Method Detail | 
|---|
public boolean parse()
              throws SyntaxError
true if the parse was successful
SyntaxError - in case of some unusual syntax brekagepublic void dumpBackPointersContents()
public void dumpParseMatrix()
public boolean train()
              throws StorageException
true if the training went successful
StorageException - in case of any GrammarCompiler errorpublic void dumpParseTree()
public void dumpParseTree(int piLevel,
                          int i,
                          int j,
                          int piA)
piLevel - starting level (depth) of the tree; also acts as indentation markeri - left index of the spanj - right index of the spanpiA - the non-terminal indexprotected void indent(int piTabSize)
piTabSize - the number of tab characters to indent by
protected java.lang.String getSentencePart(int i,
                                           int j)
i - leftmost word indexj - rightmost word index
public void setStreamTokenizer(java.io.StreamTokenizer poStreamTokenizer)
poStreamTokenizer - the NLP stream tokenizer to read off tokens frompublic void backSynchronizeObject()
backSynchronizeObject in class StorageManagerStorageManager.backSynchronizeObject()public static java.lang.String getMARFSourceCodeRevision()
  | 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||