marf.nlp.Parsing.GrammarCompiler
Class GrammarCompiler

java.lang.Object
  extended by marf.Storage.StorageManager
      extended by marf.nlp.Parsing.GrammarCompiler.GrammarCompiler
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, IStorageManager
Direct Known Subclasses:
ProbabilisticGrammarCompiler

public class GrammarCompiler
extends StorageManager

Grammar compiler -- compiles source grammar file and produces a corresponding transition table for a given language denoted by the grammar.

$Id: GrammarCompiler.java,v 1.30 2007/12/18 21:37:57 mokhov Exp $

Since:
0.3.0.2
Version:
$Revision: 1.30 $
Author:
Serguei Mokhov
See Also:
Serialized Form

Field Summary
protected  Grammar oGrammar
          Instance of the grammar as a set of production Rules, First and Follow sets.
protected  GrammarAnalyzer oGrammarAnalyzer
          Lexical Analyzer for the grammar.
protected  GrammarElement oGrammarElement
          Current grammar element.
protected  Rule oRule
          Current grammar rule.
protected  Token oToken
          Current lexical token.
private static long serialVersionUID
          For serialization versioning.
protected static TransitionTable soTransitionTable
          Instance of the TransitionTable, generated upon the need from the source grammar file.
protected  java.lang.String strGrammarFileName
          Source grammar filename.
static java.lang.String TOKEN_ACTION_BREAK
          Action for adding a next token to the RHS of the current rule signifying to stop.
static java.lang.String TOKEN_ACTION_CONTINUE
          Action for adding a next token to the RHS of the current rule signifying to continue and skip to the next token.
static java.lang.String TOKEN_ACTION_PROCEED
          Action for adding a next token to the RHS of the current rule signifying to proceed to add the current token to the RHS.
 
Fields inherited from class marf.Storage.StorageManager
bDumpOnNotFound, iCurrentDumpMode, oObjectToSerialize, strFilename
 
Fields inherited from interface marf.Storage.IStorageManager
DUMP_BINARY, DUMP_CSV_TEXT, DUMP_GZIP_BINARY, DUMP_HTML, DUMP_SQL, DUMP_XML, MARF_INTERFACE_CODE_REVISION, STORAGE_FILE_EXTENSIONS
 
Constructor Summary
GrammarCompiler()
          Default Constructor.
GrammarCompiler(java.lang.String pstrGrammarFileName)
          Constructor with the grammar filename.
 
Method Summary
protected  boolean addIDToken()
          Adds a non-terminal grammar ID token type to the RHS of the current rule.
protected  void addNextRHSElement()
          Adds next element to the RHS of the current rule.
protected  void addTerminalToken()
          Adds a terminal token type to the RHS of the current rule.
protected  void checkUndefinedNonTerminals()
          Checks for undefined non-terminals in the grammar.
 void compileGrammar()
          Compiles grammar.
protected  void createEOFTerminal()
          Creates and end-of-file indicator terminal.
protected  void createEpsilonToken()
          Creates an instance of an empty (epsilon) token.
protected  void createGrammarAnalyzer()
          Instantiates grammar analyzer.
protected  boolean createNextNonTerminal()
          Creates the next non-terminal of a rule from the upcoming token.
protected  void createRule()
          Creates an embryo of a rule given the LHS non-terminal and the rule operator `::='.
private  void fillInTransitionTable()
          Fills in TransitionalTable data structure.
protected  void getBusted()
          Dies on unexpected grammar token type.
 Grammar getGrammar()
          Allows querying for inner instance of the grammar.
protected  GrammarElement getGrammarElement(java.lang.String pstrName)
          Returns a grammar element object by it's name (lexeme) if it exists; null otherwise.
 java.lang.String getGrammarFileName()
          Allows querying for the filename of the grammar.
static java.lang.String getMARFSourceCodeRevision()
          Retrieves class' revision.
protected  java.lang.String getNextRHSToken()
          Acquires the next RHS token for a rule from the token stream.
static TransitionTable getTransitionTable()
          Allows querying for the inner transition table data structure.
static TransitionTable loadTT(java.lang.String pstrTTFileName)
          Loads (previously serialized) state of the TT.
protected  void outputStats()
          Outputs statistics by serializing the grammar analyzer to a file as well as errors if any to the the error log file.
protected  void parseGrammar()
          Parses grammar and outputs stats at the end.
 boolean serialize(int piOperation)
          Text serialization routine for grammar compilation.
 
Methods inherited from class marf.Storage.StorageManager
backSynchronizeObject, clone, dump, dumpBinary, dumpCSV, dumpGzipBinary, dumpHTML, dumpSQL, dumpXML, enableDumpOnNotFound, equals, getDefaultExtension, getDefaultExtension, getDumpMode, getFilename, getObjectToSerialize, hashCode, restore, restoreBinary, restoreCSV, restoreGzipBinary, restoreHTML, restoreSQL, restoreXML, setDumpMode, setFilename, toString
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

TOKEN_ACTION_BREAK

public static final java.lang.String TOKEN_ACTION_BREAK
Action for adding a next token to the RHS of the current rule signifying to stop.

Since:
0.3.0.5
See Also:
Constant Field Values

TOKEN_ACTION_CONTINUE

public static final java.lang.String TOKEN_ACTION_CONTINUE
Action for adding a next token to the RHS of the current rule signifying to continue and skip to the next token.

Since:
0.3.0.5
See Also:
Constant Field Values

TOKEN_ACTION_PROCEED

public static final java.lang.String TOKEN_ACTION_PROCEED
Action for adding a next token to the RHS of the current rule signifying to proceed to add the current token to the RHS.

Since:
0.3.0.5
See Also:
Constant Field Values

oGrammar

protected Grammar oGrammar
Instance of the grammar as a set of production Rules, First and Follow sets.


strGrammarFileName

protected java.lang.String strGrammarFileName
Source grammar filename.


oGrammarAnalyzer

protected GrammarAnalyzer oGrammarAnalyzer
Lexical Analyzer for the grammar.


soTransitionTable

protected static TransitionTable soTransitionTable
Instance of the TransitionTable, generated upon the need from the source grammar file.


oGrammarElement

protected GrammarElement oGrammarElement
Current grammar element.


oToken

protected Token oToken
Current lexical token.


oRule

protected Rule oRule
Current grammar rule.


serialVersionUID

private static final long serialVersionUID
For serialization versioning. When adding new members or make other structural changes regenerate this number with the serialver tool that comes with JDK.

Since:
0.3.0.4
See Also:
Constant Field Values
Constructor Detail

GrammarCompiler

public GrammarCompiler()
                throws CompilerError
Default Constructor. Assumes grammar-original.txt as the filename.

Throws:
CompilerError - if initialization failed

GrammarCompiler

public GrammarCompiler(java.lang.String pstrGrammarFileName)
                throws CompilerError
Constructor with the grammar filename.

Parameters:
pstrGrammarFileName - the filename of the grammar to compile
Throws:
CompilerError - if initialization failed
Method Detail

createGrammarAnalyzer

protected void createGrammarAnalyzer()
Instantiates grammar analyzer.


compileGrammar

public void compileGrammar()
                    throws CompilerError
Compiles grammar. Compilation consists of parsing source grammar file, creating rules, Terminals, Non-Terminals; then computes first and follow sets, and fills in a TransitionTable data structure.

Throws:
CompilerError - in case there was a lexical or syntax error

createEpsilonToken

protected void createEpsilonToken()
Creates an instance of an empty (epsilon) token. By default this token is denoted by an ampersand &.


createNextNonTerminal

protected boolean createNextNonTerminal()
                                 throws CompilerError
Creates the next non-terminal of a rule from the upcoming token.

Returns:
true if the token was created; and false if the end of file was reached
Throws:
CompilerError - -- either a SyntaxError or LexicalError
See Also:
SyntaxError, LexicalError

createRule

protected void createRule()
                   throws CompilerError
Creates an embryo of a rule given the LHS non-terminal and the rule operator `::='.

Throws:
CompilerError - -- either a SyntaxError or LexicalError
See Also:
SyntaxError, LexicalError

outputStats

protected void outputStats()
Outputs statistics by serializing the grammar analyzer to a file as well as errors if any to the the error log file.

See Also:
GrammarAnalyzer.serialize(int), GrammarAnalyzer.getLexicalGrammarErrors(), GenericLexicalAnalyzer.getErrorsPresent(), GenericLexicalAnalyzer.getErrorLogFilename()

getNextRHSToken

protected java.lang.String getNextRHSToken()
                                    throws CompilerError
Acquires the next RHS token for a rule from the token stream.

Returns:
string literal indicating to "break", "continue", or "proceed" when searching for next RHS tokens. "break" happens when the method encounters %EOL or EOF tokens; "continue" when semantic token is read, and "proceed" otherwise.
Throws:
CompilerError - -- either a SyntaxError or LexicalError
See Also:
SyntaxError, LexicalError

addNextRHSElement

protected void addNextRHSElement()
                          throws SyntaxError
Adds next element to the RHS of the current rule.

Throws:
SyntaxError - if unrecognized token type found

addIDToken

protected boolean addIDToken()
Adds a non-terminal grammar ID token type to the RHS of the current rule. If the token already present in the current grammar, we fetch its reference from there, else we create a new entry.

Returns:
true if the current token type is really of type GrammarTokenType.GRAMMAR_ID
See Also:
GrammarTokenType.GRAMMAR_ID

addTerminalToken

protected void addTerminalToken()
Adds a terminal token type to the RHS of the current rule. If the token already present in the current grammar, we fetch its reference from there, else we create a new entry.


getBusted

protected void getBusted()
                  throws SyntaxError
Dies on unexpected grammar token type.

Throws:
SyntaxError - indicated unexpected token type.

createEOFTerminal

protected void createEOFTerminal()
Creates and end-of-file indicator terminal. By default it is denoted by the `$' sign.


checkUndefinedNonTerminals

protected void checkUndefinedNonTerminals()
                                   throws SemanticError
Checks for undefined non-terminals in the grammar.

Throws:
SemanticError - if there were undefined non-terminals

parseGrammar

protected void parseGrammar()
                     throws CompilerError
Parses grammar and outputs stats at the end. The end result of this method is the grammar data structure with a collection of rules in it.

Throws:
CompilerError - in case of lexical, syntax, or semantic errors
See Also:
Grammar, Rule

getGrammarElement

protected GrammarElement getGrammarElement(java.lang.String pstrName)
Returns a grammar element object by it's name (lexeme) if it exists; null otherwise. First, the terminals list is checked and then the non-terminals one.

Parameters:
pstrName - the name of the element
Returns:
corresponding GrammarElement object or null if not found

fillInTransitionTable

private void fillInTransitionTable()
                            throws CompilerError
Fills in TransitionalTable data structure. Having parsed all Terminals and NonTerminals from the grammar, computed all rules, first and follow sets, we can finally fill this in to be used by the main parser.

Throws:
CompilerError - if either list of terminals, or non-terminals or rules is empty

loadTT

public static TransitionTable loadTT(java.lang.String pstrTTFileName)
                              throws StorageException
Loads (previously serialized) state of the TT. Method declared as static and can be called without an instance of the GrammarCompiler.

Parameters:
pstrTTFileName - filename of a file with previously stored transition table
Returns:
reference to the newly loaded instance of the TransitionTable data structure
Throws:
StorageException - if there was any problem loading the table from file

serialize

public boolean serialize(int piOperation)
Text serialization routine for grammar compilation. Loading is not implemented. TODO: migrate to standard MARF's way of serialization

Parameters:
piOperation - 0 for load and 1 for save
Returns:
true if the I/O operation was successful

getGrammar

public final Grammar getGrammar()
Allows querying for inner instance of the grammar.

Returns:
the contained Grammar object

getGrammarFileName

public final java.lang.String getGrammarFileName()
Allows querying for the filename of the grammar.

Returns:
the current grammar file name

getTransitionTable

public static final TransitionTable getTransitionTable()
Allows querying for the inner transition table data structure.

Returns:
the reference to the transition table

getMARFSourceCodeRevision

public static java.lang.String getMARFSourceCodeRevision()
Retrieves class' revision.

Returns:
revision string


SourceForge Logo