|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object marf.Storage.StorageManager marf.nlp.Parsing.ProbabilisticParser
public class ProbabilisticParser
Probabilistic parser is set of parsing a natural language (e.g. English) given probabilistic grammar. Since natural language sentences are ambiguous and a single sentence may have more than one parse each grammar rule is assigned a probability and a parse is chosen for the rule with according to the probability. This class implements the well-known CYK probabilistic parsing algorithm.
The CYK algorithm is cited below. The main reference is here.
function CYK(words,grammar) returns The most probable parse and its probability create and clear pi[num_words, num_words, num_nonterminals] # base case for i <-- 1 to num_words for A <-- 1 to num_nonterminals if (A --> wi) is in grammar then pi [i, i, A] = P(A --> wi) # recursive case for span <-- 2 to num_words for begin <-- 1 to num_words - span + 1 end <-- begin + span - 1 for m = begin to end - 1 for A = 1 to num nonterminals for B = 1 to num nonterminals for C = 1 to num nonterminals prob = pi [begin, m, B] * pi [m + 1, end, C] * P(A --> BC) if (prob > pi[begin, end, A]) then pi [begin, end, A] = prob back[begin, end, A] = {m, B, C} return build_tree(back[1, num_words, 1]), [1, num_words, 1])$Id: ProbabilisticParser.java,v 1.31 2007/12/18 21:37:54 mokhov Exp $
Field Summary |
---|
Fields inherited from class marf.Storage.StorageManager |
---|
bDumpOnNotFound, iCurrentDumpMode, oObjectToSerialize, strFilename |
Fields inherited from interface marf.Storage.IStorageManager |
---|
DUMP_BINARY, DUMP_CSV_TEXT, DUMP_GZIP_BINARY, DUMP_HTML, DUMP_SQL, DUMP_XML, MARF_INTERFACE_CODE_REVISION, STORAGE_FILE_EXTENSIONS |
Constructor Summary | |
---|---|
ProbabilisticParser()
Initializes default probabilistic parser with empty grammar. |
|
ProbabilisticParser(java.io.StreamTokenizer poStreamTokenizer)
Initializes probabilistic parser with the specified tokenizer. |
|
ProbabilisticParser(java.lang.String pstrGrammarFilename)
Initializes probabilistic parser with the grammar filename. |
Method Summary | |
---|---|
void |
backSynchronizeObject()
Implements StorageManager interface. |
void |
dumpBackPointersContents()
Dumps back-pointers to the STDOUT. |
void |
dumpParseMatrix()
Dumps parse matrix to the STDOUT. |
void |
dumpParseTree()
Dumps parse tree to the STDOUT. |
void |
dumpParseTree(int piLevel,
int i,
int j,
int piA)
Dumps a parse sub-tree to to the STDOUT Initial level of S non-terminal should be 0. |
static java.lang.String |
getMARFSourceCodeRevision()
Retrieves class' revision. |
protected java.lang.String |
getSentencePart(int i,
int j)
Gets a sentence span given indices. |
protected void |
indent(int piTabSize)
Indents by the specified number of tabs. |
boolean |
parse()
Performs parse of a natural language sentence using the CYK algorithm. |
void |
setStreamTokenizer(java.io.StreamTokenizer poStreamTokenizer)
Allows setting desired stream tokenzer. |
boolean |
train()
Performs training of the parser by compiling the source probabilistic grammar and then dumping it onto disk as a precompiled binary file for future re-load. |
Methods inherited from class marf.Storage.StorageManager |
---|
clone, dump, dumpBinary, dumpCSV, dumpGzipBinary, dumpHTML, dumpSQL, dumpXML, enableDumpOnNotFound, equals, getDefaultExtension, getDefaultExtension, getDumpMode, getFilename, getObjectToSerialize, hashCode, restore, restoreBinary, restoreCSV, restoreGzipBinary, restoreHTML, restoreSQL, restoreXML, setDumpMode, setFilename, toString |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public ProbabilisticParser(java.lang.String pstrGrammarFilename)
pstrGrammarFilename
- the filename of the probabilistic grammarpublic ProbabilisticParser(java.io.StreamTokenizer poStreamTokenizer)
poStreamTokenizer
- the stream tokenizer to read the tokens offpublic ProbabilisticParser()
Method Detail |
---|
public boolean parse() throws SyntaxError
true
if the parse was successful
SyntaxError
- in case of some unusual syntax brekagepublic void dumpBackPointersContents()
public void dumpParseMatrix()
public boolean train() throws StorageException
true
if the training went successful
StorageException
- in case of any GrammarCompiler errorpublic void dumpParseTree()
public void dumpParseTree(int piLevel, int i, int j, int piA)
piLevel
- starting level (depth) of the tree; also acts as indentation markeri
- left index of the spanj
- right index of the spanpiA
- the non-terminal indexprotected void indent(int piTabSize)
piTabSize
- the number of tab characters to indent byprotected java.lang.String getSentencePart(int i, int j)
i
- leftmost word indexj
- rightmost word index
public void setStreamTokenizer(java.io.StreamTokenizer poStreamTokenizer)
poStreamTokenizer
- the NLP stream tokenizer to read off tokens frompublic void backSynchronizeObject()
backSynchronizeObject
in class StorageManager
StorageManager.backSynchronizeObject()
public static java.lang.String getMARFSourceCodeRevision()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |