marf.nlp.Parsing
Class GenericLexicalAnalyzer

java.lang.Object
  extended by marf.nlp.Parsing.GenericLexicalAnalyzer
Direct Known Subclasses:
GrammarAnalyzer, LexicalAnalyzer

public abstract class GenericLexicalAnalyzer
extends java.lang.Object

Generic Lexical Analyzer.

(C) 2001 Serguei A. Mokhov

(C) 2002 - 2008 The MARF Research and Development Group

$Id: GenericLexicalAnalyzer.java,v 1.18 2008/01/03 03:21:57 mokhov Exp $

Since:
0.3.0.2
Version:
$Revision: 1.18 $
Author:
Serguei Mokhov

Field Summary
protected  boolean bErrorsPresent
          An indicator of presence of lexical errors.
static java.lang.String DEFAULT_ERROR_FILE
          Default filename for the error log.
static java.lang.String DEFAULT_OUTPUT_FILE
          Default filename for the output.
protected  java.io.FileReader oFileReader
          Internal reference to the file reader of a file to perform lexical analysis of.
protected  java.util.Vector oLexicalErrors
          A collection of lexical errors (if any).
protected  java.io.StreamTokenizer oStreamTokenizer
          A tokenizer used to split the stream of characters into a stream of tokens.
protected  SymbolTable oSymTab
          A reference to local symbol table.
protected  Token oToken
          Current token being processed.
protected  java.util.Vector oTokenList
          A list of tokens extracted so far.
protected  java.lang.String strErrorLogFilename
          File name of a file which serves as an lexical errors log.
protected  java.lang.String strOutputFilename
          File name of a file which serves as an output of the Lexical Analyzer.
protected  java.lang.String strSourceFilename
          File name of a file which serves as an input of the Lexical Analyzer.
 
Constructor Summary
GenericLexicalAnalyzer(SymbolTable poSymTab)
          Constructor with symbol table.
 
Method Summary
 Token createToken(java.lang.String pstrLexeme, TokenSubType poTokenSubType)
          Creates an instance of a Token data structure provided its type and lexeme, and location is calculated dynamically by the StreamTokenizer.
 java.lang.String getErrorLogFilename()
          Access method for the ErrorLogFilename property.
 boolean getErrorsPresent()
          Determines if the ErrorsPresent property is true.
 java.util.Vector getLexicalErrors()
          Allows querying for actual lexical errors happened during scanning.
static java.lang.String getMARFSourceCodeRevision()
          Retrieves class' revision.
abstract  Token getNextToken()
          Core method of the LexicalAnalyzer.
 java.lang.String getOutputFilename()
          Access method for the OutputFilename property.
 java.lang.String getSourceFilename()
          Access method for the SourceFilename property.
 SymbolTable getSymTab()
          Access method for the SymTab property.
 java.util.Vector getTokenList()
          Access method for the TonkenList property.
 boolean init()
          Default initialization routine.
 void scan()
          Scan for tokens through the input stream.
abstract  boolean serialize(int piOperation)
          Load/Save the contents of lists such as Token list and Error list.
 void setErrorLogFilename(java.lang.String pstrErrorLogFilename)
          Sets the value of the ErrorLogFilename property.
 void setOutputFilename(java.lang.String pstrOutputFilename)
          Sets the value of the OutputFilename property.
 void setSourceFilename(java.lang.String pstrSourceFilename)
          Sets the value of the SourceFilename property.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_OUTPUT_FILE

public static final java.lang.String DEFAULT_OUTPUT_FILE
Default filename for the output.

See Also:
Constant Field Values

DEFAULT_ERROR_FILE

public static final java.lang.String DEFAULT_ERROR_FILE
Default filename for the error log.

See Also:
Constant Field Values

oFileReader

protected java.io.FileReader oFileReader
Internal reference to the file reader of a file to perform lexical analysis of.


oStreamTokenizer

protected java.io.StreamTokenizer oStreamTokenizer
A tokenizer used to split the stream of characters into a stream of tokens.


strSourceFilename

protected java.lang.String strSourceFilename
File name of a file which serves as an input of the Lexical Analyzer.


strOutputFilename

protected java.lang.String strOutputFilename
File name of a file which serves as an output of the Lexical Analyzer.


strErrorLogFilename

protected java.lang.String strErrorLogFilename
File name of a file which serves as an lexical errors log.

Since:
September 2001

bErrorsPresent

protected boolean bErrorsPresent
An indicator of presence of lexical errors.

Since:
October 2, 2001

oTokenList

protected java.util.Vector oTokenList
A list of tokens extracted so far.


oSymTab

protected SymbolTable oSymTab
A reference to local symbol table.


oLexicalErrors

protected java.util.Vector oLexicalErrors
A collection of lexical errors (if any).


oToken

protected Token oToken
Current token being processed.

Constructor Detail

GenericLexicalAnalyzer

public GenericLexicalAnalyzer(SymbolTable poSymTab)
Constructor with symbol table.

Parameters:
poSymTab - symbol table to use.
Method Detail

init

public boolean init()
Default initialization routine. Should be overridden by derivatives because it is language-specific, and default initialization will not always suffice.

Returns:
true of initialization is successful

scan

public void scan()
          throws LexicalError
Scan for tokens through the input stream.

Throws:
LexicalError - as a notification there were one more more lexical errors; the actual error messages can be queried via getLexicalErrors().
See Also:
getLexicalErrors()

serialize

public abstract boolean serialize(int piOperation)
Load/Save the contents of lists such as Token list and Error list. Must be overridden by the derivatives.

Parameters:
piOperation - 0 means load, 1 means save
Returns:
true if the serialization was successful

getNextToken

public abstract Token getNextToken()
                            throws LexicalError
Core method of the LexicalAnalyzer. Should know how to return the next token according to language specification. Must be overridden by the derivatives.

Returns:
newly recognized lexical token
Throws:
LexicalError - in case of invalid character stream (alphabet) entries found

createToken

public Token createToken(java.lang.String pstrLexeme,
                         TokenSubType poTokenSubType)
Creates an instance of a Token data structure provided its type and lexeme, and location is calculated dynamically by the StreamTokenizer. TODO: reliably get a character position within a line.

Parameters:
pstrLexeme - token's spelling
poTokenSubType - token's data type
Returns:
the Token data structure instance; null of either of parameters is empty

getSourceFilename

public java.lang.String getSourceFilename()
Access method for the SourceFilename property.

Returns:
the current value of the SourceFilename property

setSourceFilename

public void setSourceFilename(java.lang.String pstrSourceFilename)
Sets the value of the SourceFilename property.

Parameters:
pstrSourceFilename - the new value of the SourceFilename property

getOutputFilename

public java.lang.String getOutputFilename()
Access method for the OutputFilename property.

Returns:
the current value of the OutputFilename property

setOutputFilename

public void setOutputFilename(java.lang.String pstrOutputFilename)
Sets the value of the OutputFilename property.

Parameters:
pstrOutputFilename - the new value of the OutputFilename property

getErrorLogFilename

public java.lang.String getErrorLogFilename()
Access method for the ErrorLogFilename property.

Returns:
the current value of the ErrorLogFilename property

setErrorLogFilename

public void setErrorLogFilename(java.lang.String pstrErrorLogFilename)
Sets the value of the ErrorLogFilename property.

Parameters:
pstrErrorLogFilename - the new value of the ErrorLogFilename property

getErrorsPresent

public boolean getErrorsPresent()
Determines if the ErrorsPresent property is true.

Returns:
true if the ErrorsPresent property is true

getTokenList

public java.util.Vector getTokenList()
Access method for the TonkenList property.

Returns:
the current value of the TonkenList property

getSymTab

public SymbolTable getSymTab()
Access method for the SymTab property.

Returns:
the current value of the SymTab property

getLexicalErrors

public java.util.Vector getLexicalErrors()
Allows querying for actual lexical errors happened during scanning.

Returns:
a collection of caught lexical errors
Since:
0.3.0.5

getMARFSourceCodeRevision

public static java.lang.String getMARFSourceCodeRevision()
Retrieves class' revision.

Returns:
revision string


SourceForge Logo