Implementing Stack Oriented Languages – Part 4

This entry is part 4 of 4 in the series Implementing Stack Oriented Languages

Post Stastics

  • This post has 4341 words.
  • Estimated read time is 20.67 minute(s).

Strings and Variable

So far we’ve added lots of stack operations and a couple I/O routines with the KEY and EMIT keywords. At the moment if we wanted to write a simple “Hello World” application we would need to place each character on the stack and pop them off using the EMIT keyword. This is messy and bit difficult to manage in a larger program. What we need is a way to store strings somewhere and then a way to tell our language where to find the string we wish to manipulate. So how can we do this?

Well, a simple straight-forward solution would be to place all strings in a string table. A string table is an area of memory where we store the strings. Then we need to tell our language how to find the string. In a compiled version of a Stack-Oriented language, we would use a pointer to the string, passing its address via the stack to any method that needed to manipulate the string. In an interpreted version of the language, we only need to pass an index into our list of strings. Sounds pretty easy, right? Well, it’s a bit more complicated. When the parser encounters a string we will need to store it in the string table and then place the string’s address (index) onto the stack. Since managing the string table is a bit more work than we want the parser to do, and in keeping with the Single Responsibility Principle, we will create a StringTable class to keep track of the strings and their addresses.

Our string table class will need a list to store strings into, a save method that will return the index of the string in the table. A get method to get the string by index and a length method to return the length of the string.

Our strings will need to store two peices of data. The actual string (value) and it’s length. So we have to determine if we will return the object containing this information of just the printable string contents. I opted to provide methods for both. The get(idx) method will return the string data object while the get_string(idx) method will extract the string contents and return only the printable string.

Ok, it’s time to see what this StringTable class looks like:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# string_table.py
class StringTable:
def __init__(self):
self.data = []
# Insert string and meta data into string list
def save(self, string):
entry = {'len': len(string), 'val': string}
self.data.append(entry)
return (len(self.data) - 1)
# Return string storage object (including length)
def get(self, idx):
return self.data[idx]
# Return length of string
def length(self, idx):
return int(self.data[idx]['len'])
# Return printable string only (no meta data).
def get_str(self, idx):
return str(self.data[idx]['val'])
def __str__(self):
return str(self.data)
# string_table.py class StringTable: def __init__(self): self.data = [] # Insert string and meta data into string list def save(self, string): entry = {'len': len(string), 'val': string} self.data.append(entry) return (len(self.data) - 1) # Return string storage object (including length) def get(self, idx): return self.data[idx] # Return length of string def length(self, idx): return int(self.data[idx]['len']) # Return printable string only (no meta data). def get_str(self, idx): return str(self.data[idx]['val']) def __str__(self): return str(self.data)
# string_table.py

class StringTable:

    def __init__(self):
        self.data = []

    # Insert string and meta data into string list
    def save(self, string):
        entry = {'len': len(string), 'val': string}
        self.data.append(entry)
        return (len(self.data) - 1)

    # Return string storage object (including length)
    def get(self, idx):
        return self.data[idx]

    # Return length of string
    def length(self, idx):
        return int(self.data[idx]['len'])

    # Return printable string only (no meta data).
    def get_str(self, idx):
        return str(self.data[idx]['val'])

    def __str__(self):
        return str(self.data)

As you can see the string table couldn’t be much simpler. Now we need to handle inserting strings in the parser. Our parser will need to take the string from the token passed back from the lexer, save it in the string table, and push it’s address (index) on to the stack. Then it will need to push the string length onto the stack. Once the string is encountered it is up to the next operation to handle the length-address pair.

Place the following code in the parser after the arithmetic and logic operations and before the section of code to handle keywords.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# parser.py
...
# Handle Strings and Characters
elif tok.type == TT_STRING:
string = self.current_tok.value
addr = self.string_table.save(string)
self.data.push(addr)
self.data.push(len(string))
# Handle KEYWORDS
elif tok.type == TT_KEYWORD:
...
# parser.py ... # Handle Strings and Characters elif tok.type == TT_STRING: string = self.current_tok.value addr = self.string_table.save(string) self.data.push(addr) self.data.push(len(string)) # Handle KEYWORDS elif tok.type == TT_KEYWORD: ...
# parser.py
              ...
               # Handle Strings and Characters
               elif tok.type == TT_STRING:
                string = self.current_tok.value
                addr = self.string_table.save(string)
                self.data.push(addr)
                self.data.push(len(string))

              # Handle KEYWORDS
              elif tok.type == TT_KEYWORD:
              ...

Now before we can run this code we need to first import it and add it to our parser object at the bottom of the __init__() method:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
def __init__(self, lexer: Lexer, data_stack: Stack = None):
...
# Setup string table
if string_table is not None:
self.string_table = string_table
else:
self.string_table = StringTable()
def __init__(self, lexer: Lexer, data_stack: Stack = None): ... # Setup string table if string_table is not None: self.string_table = string_table else: self.string_table = StringTable()
   def __init__(self, lexer: Lexer, data_stack: Stack = None):
       ...
       # Setup string table
       if string_table is not None:
           self.string_table = string_table
       else:
           self.string_table = StringTable()

Next, we need to change the signature of the __init__() method to allow us to pass a new symbol table into the method when needed. You will see why we would want to do this later. For now, just know it’s something we may need to do.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
class Parser:
def __init__(self, lexer: Lexer, data_stack: Stack = None, string_table=None):
self.lexer = lexer
# Setup data stack
...
class Parser: def __init__(self, lexer: Lexer, data_stack: Stack = None, string_table=None): self.lexer = lexer # Setup data stack ...
class Parser:

    def __init__(self, lexer: Lexer, data_stack: Stack = None, string_table=None):
        self.lexer = lexer

        # Setup data stack
   ...

With that change, any errors you were getting in the __init__() method should have disappeared. If not, then make sure you’ve imported the StringTable class at the top of the file.

We now need to change our source code in the “if” statement at the bottom of the parser file to:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
if __name__ == '__main__':
source = """ 16 2 >>
"This is a string"
dump
"""
lexer = Lexer(source)
parser = Parser(lexer)
parser.parse()
if __name__ == '__main__': source = """ 16 2 >> "This is a string" dump """ lexer = Lexer(source) parser = Parser(lexer) parser.parse()
if __name__ == '__main__':
    source = """ 16 2 >> 
    "This is a string" 
    dump
    """
    lexer = Lexer(source)
    parser = Parser(lexer)
    parser.parse()

If we run this code now you should see the following:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
[ '4', '0', '16' ]
[ '4', '0', '16' ]
[ '4', '0', '16' ]

These values come from the dump operation which displays the contents of the stack after the parser processed the string. Since we haven’t put any other strings in the string table, the index for this string is 0 and it’s length is 16.

Now that we can get the string stowed away in the string table and the address of it on the stack, what can we do with it? Well, at the moment not a lot. So let’s fix that. The primary thing we want to do with strings is display them. So let’s add some code to handle a new keyword “TYPE”. Type will pop the length and address of a string off the stack and print the string out to the console.

Add the following code to the parser between the code to handle the “emit” and the code to handle the “key” keywords:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# parser.py
...
elif tok.value == 'type':
"""<addr> <length> type
Sends a string to standard out. The string is located
in the string table at location <addr> and has a length
of <length>.
"""
length = int(self.data.pop())
addr = int(self.data.pop())
string = self.string_table.get_str(addr)
if length > len(string):
raise IndexError(f'Error: String length of {length} was given but actual length is {len(string)}.')
idx = 0
while idx < length:
print(string[idx], end='')
idx += 1
print()
...
# parser.py ... elif tok.value == 'type': """<addr> <length> type Sends a string to standard out. The string is located in the string table at location <addr> and has a length of <length>. """ length = int(self.data.pop()) addr = int(self.data.pop()) string = self.string_table.get_str(addr) if length > len(string): raise IndexError(f'Error: String length of {length} was given but actual length is {len(string)}.') idx = 0 while idx < length: print(string[idx], end='') idx += 1 print() ...
# parser.py 
               ...

                 elif tok.value == 'type':
                    """<addr> <length> type
                    Sends a string to standard out. The string is located
                    in the string table at location <addr> and has a length
                    of <length>.
                    """
                    length = int(self.data.pop())
                    addr = int(self.data.pop())

                    string = self.string_table.get_str(addr)
                    if length > len(string):
                        raise IndexError(f'Error: String length of {length} was given but actual length is {len(string)}.')

                    idx = 0
                    while idx < length:
                        print(string[idx], end='')
                        idx += 1
                    print()

                ...

Now change the code for the runner to:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
if __name__ == '__main__':
source = """ 16 2 >>
"This is a string"
dump
type dump
"""
lexer = Lexer(source)
parser = Parser(lexer)
parser.parse()
if __name__ == '__main__': source = """ 16 2 >> "This is a string" dump type dump """ lexer = Lexer(source) parser = Parser(lexer) parser.parse()
if __name__ == '__main__':
    source = """ 16 2 >> 
    "This is a string" 
    dump
    type dump
    """
    lexer = Lexer(source)
    parser = Parser(lexer)
    parser.parse()

When you run this you should get the following output:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
[ '4', '0', '16' ]
This is a string
[ '4' ]
[ '4', '0', '16' ] This is a string [ '4' ]
[ '4', '0', '16' ]
This is a string
[ '4' ]

When the parser encounters the string it already has 4 on the stack from the previous operation. Then it places the strings address (index), on the stack followed by the string’s length. We then call “type” which takes the length and address of the string off the stack and uses this information to display the string. Then we call “dump” again to show that only the string values were consumed, leaving 4 still on the stack when the program completes.

With this, we can now right that “Hello World” app. Give it a try!

Now that we have strings working, let’s get characters working. Characters are single-quoted ASCII characters, and when encountered by the parser, they are converted to their ASCII values and placed on the stack. I bet by now you can add the code for this without my help. So, I’ll just show you my implementation below:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
...
elif tok.type == TT_CHAR:
char = self.current_tok.value
ascii_code = ord(char)
self.data.push(ascii_code)
# Handle Keywords
elif tok.type == TT_KEYWORD:
...
... elif tok.type == TT_CHAR: char = self.current_tok.value ascii_code = ord(char) self.data.push(ascii_code) # Handle Keywords elif tok.type == TT_KEYWORD: ...
            ...
            elif tok.type == TT_CHAR:
                char = self.current_tok.value
                ascii_code = ord(char)
                self.data.push(ascii_code)

            # Handle Keywords
            elif tok.type == TT_KEYWORD:
           ...

Here we simply get the character from the token value and convert it to its code-point using the ord() function from python. Then we push that value onto the stack.

I had wanted to include variable handling in this issue. However, I feel that it will make this post exceedingly long so I will postpone variables until next time.

As always, I have included the code as it now stands, below:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# keywords.py
KEYWORDS = [
# Stack Operators
'SWAP', 'SWAP2', 'DUMP', 'DROP', 'ROL', 'ROR', 'OVER',
'DUP', 'DUP2', 'DUP3', 'DUP4', 'CLEAR', 'DUMP',
# Conditional Operators
'IF', 'ELSE', 'THEN', 'NOT',
# Iterative
'DO', 'LOOP',
'WHILE', 'WEND',
# I/O Operators
'TYPE', # Outputs a string to stdout
'EMIT', # Outputs a character to stdout
'KEY', # Takes input character from stdin
# Special Character Values
# See ASCII Character Table for values
'CR', 'LF', 'NULL', 'BELL', 'BS', 'TAB',
'VT', 'FF', 'ESC',
# Logical Operators
'NEQ', 'AND', 'OR',
]
# keywords.py KEYWORDS = [ # Stack Operators 'SWAP', 'SWAP2', 'DUMP', 'DROP', 'ROL', 'ROR', 'OVER', 'DUP', 'DUP2', 'DUP3', 'DUP4', 'CLEAR', 'DUMP', # Conditional Operators 'IF', 'ELSE', 'THEN', 'NOT', # Iterative 'DO', 'LOOP', 'WHILE', 'WEND', # I/O Operators 'TYPE', # Outputs a string to stdout 'EMIT', # Outputs a character to stdout 'KEY', # Takes input character from stdin # Special Character Values # See ASCII Character Table for values 'CR', 'LF', 'NULL', 'BELL', 'BS', 'TAB', 'VT', 'FF', 'ESC', # Logical Operators 'NEQ', 'AND', 'OR', ]
# keywords.py

KEYWORDS = [
    # Stack Operators
    'SWAP', 'SWAP2', 'DUMP', 'DROP', 'ROL', 'ROR', 'OVER',
    'DUP', 'DUP2', 'DUP3', 'DUP4', 'CLEAR', 'DUMP',


    # Conditional Operators
    'IF', 'ELSE', 'THEN', 'NOT',

    # Iterative
    'DO', 'LOOP',
    'WHILE', 'WEND',

    # I/O Operators
    'TYPE',  # Outputs a string to stdout
    'EMIT',  # Outputs a character to stdout
    'KEY',   # Takes input character from stdin

    # Special Character Values
    # See ASCII Character Table for values
    'CR', 'LF', 'NULL', 'BELL', 'BS', 'TAB',
    'VT', 'FF', 'ESC',

    # Logical Operators
    'NEQ', 'AND', 'OR',

]
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# lexer.py
from keywords import KEYWORDS
from tokens import *
class Lexer:
def __init__(self, text, line=None, col=None):
self.text = text
self.pos = 0
self.line = line if line else 1
self.col = col if col else 1
self.current_char = self.text[0]
#############################################
# Utility Methods #
#-------------------------------------------#
# These methods help manage the internal #
# workings of the lexer. #
#############################################
def advance(self):
"""advance():
Advances the current position in the character stream.
"""
if not self.is_eoi():
self.pos += 1
self.current_char = self.text[self.pos]
self.update_position()
else:
self.current_char = ''
def update_position(self):
"""update_position
Maintains the line and column positions for the current token scan.
"""
if self.current_char == '\n':
self.line += 1
self.col = 0
else:
self.col += 1
def is_eoi(self):
"""is_eoi():
:return:
True if current position is End Of Input stream.
Returns False otherwise.
"""
return self.pos >= (len(self.text) - 1)
def expected(self, expected, found):
"""expected
Raises error is expected and found are different.
"""
if expected == found:
return
else:
raise ValueError(f'[Error] Expected {expected} but found {found} in line: {self.line} position: {self.col}')
def peek(self):
"""peek
Returns next character in input stream after the current character.
Note: If we are at EOI (End Of Input) then a space character is returned.
"""
if not self.is_eoi():
return self.text[self.pos + 1]
else:
return ' ' # space
#############################################
# Recognizer Methods #
#-------------------------------------------#
# These methods collect the characters that #
# belong to a single multi-character token #
# into a single unit, and return that unit #
# to the caller. #
#############################################
def char(self):
if self.current_char != "'":
self.expected("'", self.current_char)
self.advance() # Consume opening single quote
char = self.current_char
self.advance() # Consume character
# Now we must have a closing single quote mark
if self.current_char != "'":
self.expected("'", self.current_char)
self.advance() # Consume closing single quote
return char
def string(self):
if self.current_char != '"':
self.expected('"', self.peek())
self.advance() # Consume leading '"'
string = ''
while self.current_char != '"':
string += self.current_char
self.advance()
self.advance() # Consume trailing '"'
return string
def number(self):
number = ''
while self.current_char.isdigit():
number += self.current_char
self.advance()
return number
def is_ident(self):
return self.current_char.isalpha() or self.current_char == '_'
def _ident(self):
_id = ''
# identifiers begin with alpha or underscores
if self.current_char.isalpha() or self.current_char == '_':
while self.current_char.isalnum() or self.current_char == '_':
_id += self.current_char
self.advance()
return _id
def is_keyword(self, kw):
return kw.upper() in KEYWORDS
def eat_comment(self):
if self.current_char == '(':
while self.current_char != ')' and not self.is_eoi():
self.advance()
self.advance() # skip trailing )
elif self.current_char == '\\':
while self.current_char != '\n' and not self.is_eoi():
self.advance()
self.advance() # skip new line character
#############################################
# Tokenization #
#-------------------------------------------#
# The method next_token() is called by the #
# client to obtain the next token in the #
# input stream. #
#############################################
def next_token(self) -> Token:
if not self.is_eoi():
if self.current_char.isspace():
while self.current_char.isspace():
self.advance()
if self.current_char.isdigit():
return Token(TT_NUMBER, self.number(), self.line, self.col)
elif self.current_char == '+':
val = self.current_char
self.advance()
return Token(TT_PLUS, val, self.line, self.col)
elif self.current_char == '-':
val = self.current_char
self.advance()
return Token(TT_MINUS, val, self.line, self.col)
elif self.current_char == '*':
val = self.current_char
if self.peek() == '*':
val += '*'
self.advance()
self.advance()
return Token(TT_POW, val, self.line, self.col)
self.advance()
return Token(TT_MUL, val, self.line, self.col)
elif self.current_char == '/':
val = self.current_char
self.advance()
return Token(TT_DIV, val, self.line, self.col)
elif self.current_char == '&':
val = self.current_char
self.advance()
return Token(TT_AND, val, self.line, self.col)
elif self.current_char == '|':
val = self.current_char
self.advance()
return Token(TT_OR, val, self.line, self.col)
elif self.current_char == '^':
val = self.current_char
self.advance()
return Token(TT_XOR, val, self.line, self.col)
elif self.current_char == '~':
val = self.current_char
self.advance()
return Token(TT_INV, val, self.line, self.col)
elif self.is_ident():
_id = self._ident()
if self.is_keyword(_id):
return Token(TT_KEYWORD, _id, self.line, self.col)
else:
return Token(TT_ID, _id, self.line, self.col)
elif self.current_char == '<':
val = self.current_char
if self.peek() == '<':
val += '<'
self.advance()
self.advance()
return Token(TT_LSL, val, self.line, self.col)
self.advance()
return Token(TT_LESS, val, self.line, self.col)
elif self.current_char == '>':
val = self.current_char
if self.peek() == '>':
val += '>'
self.advance()
self.advance()
return Token(TT_LSR, val, self.line, self.col)
self.advance()
return Token(TT_GREATER, val, self.line, self.col)
elif self.current_char == '=':
val = self.current_char
if self.peek() == '=':
val += '='
self.advance()
self.advance()
return Token(TT_EQ, val, self.line, self.col)
self.advance()
return Token(TT_ASSIGN, val, self.line, self.col)
elif self.current_char == ':':
val = self.current_char
self.advance()
return Token(TT_STARTDEF, val, self.line, self.col)
elif self.current_char == ';':
val = self.current_char
self.advance()
return Token(TT_ENDDEF, val, self.line, self.col)
elif self.current_char == '(' or self.current_char == '\\':
self.eat_comment()
return self.next_token()
elif self.current_char == "'":
return Token(TT_CHAR, self.char(), self.line, self.col)
elif self.current_char == '"':
string = self.string()
return Token(TT_STRING, string, self.line, self.col)
elif self.is_eoi():
return Token(TT_EOF, None, self.line, self.col)
else:
raise ValueError(f'Unexpected character: {self.current_char} in input stream at position: {str(self.pos)}.')
else:
return Token(TT_EOF, None, self.line, self.col)
def main(text: str):
lexer = Lexer(text)
tok = lexer.next_token()
while tok.type != TT_EOF:
print(tok)
tok = lexer.next_token()
if __name__ == "__main__":
text = """
( this is a multi-line
comment. )
\ This ia a line comment."
_myIdent """
main(text)
# lexer.py from keywords import KEYWORDS from tokens import * class Lexer: def __init__(self, text, line=None, col=None): self.text = text self.pos = 0 self.line = line if line else 1 self.col = col if col else 1 self.current_char = self.text[0] ############################################# # Utility Methods # #-------------------------------------------# # These methods help manage the internal # # workings of the lexer. # ############################################# def advance(self): """advance(): Advances the current position in the character stream. """ if not self.is_eoi(): self.pos += 1 self.current_char = self.text[self.pos] self.update_position() else: self.current_char = '' def update_position(self): """update_position Maintains the line and column positions for the current token scan. """ if self.current_char == '\n': self.line += 1 self.col = 0 else: self.col += 1 def is_eoi(self): """is_eoi(): :return: True if current position is End Of Input stream. Returns False otherwise. """ return self.pos >= (len(self.text) - 1) def expected(self, expected, found): """expected Raises error is expected and found are different. """ if expected == found: return else: raise ValueError(f'[Error] Expected {expected} but found {found} in line: {self.line} position: {self.col}') def peek(self): """peek Returns next character in input stream after the current character. Note: If we are at EOI (End Of Input) then a space character is returned. """ if not self.is_eoi(): return self.text[self.pos + 1] else: return ' ' # space ############################################# # Recognizer Methods # #-------------------------------------------# # These methods collect the characters that # # belong to a single multi-character token # # into a single unit, and return that unit # # to the caller. # ############################################# def char(self): if self.current_char != "'": self.expected("'", self.current_char) self.advance() # Consume opening single quote char = self.current_char self.advance() # Consume character # Now we must have a closing single quote mark if self.current_char != "'": self.expected("'", self.current_char) self.advance() # Consume closing single quote return char def string(self): if self.current_char != '"': self.expected('"', self.peek()) self.advance() # Consume leading '"' string = '' while self.current_char != '"': string += self.current_char self.advance() self.advance() # Consume trailing '"' return string def number(self): number = '' while self.current_char.isdigit(): number += self.current_char self.advance() return number def is_ident(self): return self.current_char.isalpha() or self.current_char == '_' def _ident(self): _id = '' # identifiers begin with alpha or underscores if self.current_char.isalpha() or self.current_char == '_': while self.current_char.isalnum() or self.current_char == '_': _id += self.current_char self.advance() return _id def is_keyword(self, kw): return kw.upper() in KEYWORDS def eat_comment(self): if self.current_char == '(': while self.current_char != ')' and not self.is_eoi(): self.advance() self.advance() # skip trailing ) elif self.current_char == '\\': while self.current_char != '\n' and not self.is_eoi(): self.advance() self.advance() # skip new line character ############################################# # Tokenization # #-------------------------------------------# # The method next_token() is called by the # # client to obtain the next token in the # # input stream. # ############################################# def next_token(self) -> Token: if not self.is_eoi(): if self.current_char.isspace(): while self.current_char.isspace(): self.advance() if self.current_char.isdigit(): return Token(TT_NUMBER, self.number(), self.line, self.col) elif self.current_char == '+': val = self.current_char self.advance() return Token(TT_PLUS, val, self.line, self.col) elif self.current_char == '-': val = self.current_char self.advance() return Token(TT_MINUS, val, self.line, self.col) elif self.current_char == '*': val = self.current_char if self.peek() == '*': val += '*' self.advance() self.advance() return Token(TT_POW, val, self.line, self.col) self.advance() return Token(TT_MUL, val, self.line, self.col) elif self.current_char == '/': val = self.current_char self.advance() return Token(TT_DIV, val, self.line, self.col) elif self.current_char == '&': val = self.current_char self.advance() return Token(TT_AND, val, self.line, self.col) elif self.current_char == '|': val = self.current_char self.advance() return Token(TT_OR, val, self.line, self.col) elif self.current_char == '^': val = self.current_char self.advance() return Token(TT_XOR, val, self.line, self.col) elif self.current_char == '~': val = self.current_char self.advance() return Token(TT_INV, val, self.line, self.col) elif self.is_ident(): _id = self._ident() if self.is_keyword(_id): return Token(TT_KEYWORD, _id, self.line, self.col) else: return Token(TT_ID, _id, self.line, self.col) elif self.current_char == '<': val = self.current_char if self.peek() == '<': val += '<' self.advance() self.advance() return Token(TT_LSL, val, self.line, self.col) self.advance() return Token(TT_LESS, val, self.line, self.col) elif self.current_char == '>': val = self.current_char if self.peek() == '>': val += '>' self.advance() self.advance() return Token(TT_LSR, val, self.line, self.col) self.advance() return Token(TT_GREATER, val, self.line, self.col) elif self.current_char == '=': val = self.current_char if self.peek() == '=': val += '=' self.advance() self.advance() return Token(TT_EQ, val, self.line, self.col) self.advance() return Token(TT_ASSIGN, val, self.line, self.col) elif self.current_char == ':': val = self.current_char self.advance() return Token(TT_STARTDEF, val, self.line, self.col) elif self.current_char == ';': val = self.current_char self.advance() return Token(TT_ENDDEF, val, self.line, self.col) elif self.current_char == '(' or self.current_char == '\\': self.eat_comment() return self.next_token() elif self.current_char == "'": return Token(TT_CHAR, self.char(), self.line, self.col) elif self.current_char == '"': string = self.string() return Token(TT_STRING, string, self.line, self.col) elif self.is_eoi(): return Token(TT_EOF, None, self.line, self.col) else: raise ValueError(f'Unexpected character: {self.current_char} in input stream at position: {str(self.pos)}.') else: return Token(TT_EOF, None, self.line, self.col) def main(text: str): lexer = Lexer(text) tok = lexer.next_token() while tok.type != TT_EOF: print(tok) tok = lexer.next_token() if __name__ == "__main__": text = """ ( this is a multi-line comment. ) \ This ia a line comment." _myIdent """ main(text)
# lexer.py

from keywords import KEYWORDS
from tokens import *

class Lexer:
    def __init__(self, text, line=None, col=None):
        self.text = text
        self.pos = 0
        self.line = line if line else 1
        self.col = col if col else 1
        self.current_char = self.text[0]


    #############################################
    # Utility Methods                           #
    #-------------------------------------------#
    # These methods help manage the internal    #
    # workings of the lexer.                    #
    #############################################
    def advance(self):
        """advance():
        Advances the current position in the character stream.
        """
        if not self.is_eoi():
            self.pos += 1
            self.current_char = self.text[self.pos]
            self.update_position()
        else:
            self.current_char = ''

    def update_position(self):
        """update_position
        Maintains the line and column positions for the current token scan.
        """
        if self.current_char == '\n':
            self.line += 1
            self.col = 0
        else:
            self.col += 1

    def is_eoi(self):
        """is_eoi():
        :return:
            True if current position is End Of Input stream.
            Returns False otherwise.
        """
        return self.pos >= (len(self.text) - 1)

    def expected(self, expected, found):
        """expected
        Raises error is expected and found are different.
        """
        if expected == found:
            return
        else:
            raise ValueError(f'[Error] Expected {expected} but found {found} in line: {self.line} position: {self.col}')

    def peek(self):
        """peek
        Returns next character in input stream after the current character.
        Note: If we are at EOI (End Of Input) then a space character is returned.
        """
        if not self.is_eoi():
            return self.text[self.pos + 1]
        else:
            return ' '  # space

    #############################################
    # Recognizer Methods                        #
    #-------------------------------------------#
    # These methods collect the characters that #
    # belong to a single multi-character token  #
    # into a single unit, and return that unit  #
    # to the caller.                            #
    #############################################
    def char(self):
        if self.current_char != "'":
            self.expected("'", self.current_char)
        self.advance()  # Consume opening single quote

        char = self.current_char
        self.advance()  # Consume character

        # Now we must have a closing single quote mark
        if self.current_char != "'":
           self.expected("'", self.current_char)
        self.advance()  # Consume closing single quote
        return char

    def string(self):
        if self.current_char != '"':
            self.expected('"', self.peek())
        self.advance()  # Consume leading '"'

        string = ''
        while self.current_char != '"':
            string += self.current_char
            self.advance()
        self.advance()  # Consume trailing '"'
        return string

    def number(self):
        number = ''
        while self.current_char.isdigit():
            number += self.current_char
            self.advance()
        return number

    def is_ident(self):
        return self.current_char.isalpha() or self.current_char == '_'

    def _ident(self):
        _id = ''
        # identifiers begin with alpha or underscores
        if self.current_char.isalpha() or self.current_char == '_':
            while self.current_char.isalnum() or self.current_char == '_':
                _id += self.current_char
                self.advance()
        return _id

    def is_keyword(self, kw):
        return kw.upper() in KEYWORDS

    def eat_comment(self):
        if self.current_char == '(':
            while self.current_char != ')' and not self.is_eoi():
                self.advance()
            self.advance()  # skip trailing )
        elif self.current_char == '\\':
            while self.current_char != '\n' and not self.is_eoi():
                self.advance()
            self.advance()  # skip new line character

    #############################################
    # Tokenization                              #
    #-------------------------------------------#
    # The method next_token() is called by the  #
    # client to obtain the next token in the    #
    # input stream.                             #
    #############################################
    def next_token(self) -> Token:
        if not self.is_eoi():
            if self.current_char.isspace():
                while self.current_char.isspace():
                    self.advance()

            if self.current_char.isdigit():
                return Token(TT_NUMBER, self.number(), self.line, self.col)

            elif self.current_char == '+':
                val = self.current_char
                self.advance()
                return Token(TT_PLUS, val, self.line, self.col)

            elif self.current_char == '-':
                val = self.current_char
                self.advance()
                return Token(TT_MINUS, val, self.line, self.col)

            elif self.current_char == '*':
                val = self.current_char
                if self.peek() == '*':
                    val += '*'
                    self.advance()
                    self.advance()
                    return Token(TT_POW, val, self.line, self.col)
                self.advance()
                return Token(TT_MUL, val, self.line, self.col)

            elif self.current_char == '/':
                val = self.current_char
                self.advance()
                return Token(TT_DIV, val, self.line, self.col)

            elif self.current_char == '&':
                val = self.current_char
                self.advance()
                return Token(TT_AND, val, self.line, self.col)

            elif self.current_char == '|':
                val = self.current_char
                self.advance()
                return Token(TT_OR, val, self.line, self.col)

            elif self.current_char == '^':
                val = self.current_char
                self.advance()
                return Token(TT_XOR, val, self.line, self.col)

            elif self.current_char == '~':
                val = self.current_char
                self.advance()
                return Token(TT_INV, val, self.line, self.col)

            elif self.is_ident():
                _id = self._ident()
                if self.is_keyword(_id):
                    return Token(TT_KEYWORD, _id, self.line, self.col)
                else:
                    return Token(TT_ID, _id, self.line, self.col)

            elif self.current_char == '<':
                val = self.current_char
                if self.peek() == '<':
                    val += '<'
                    self.advance()
                    self.advance()
                    return Token(TT_LSL, val, self.line, self.col)
                self.advance()
                return Token(TT_LESS, val, self.line, self.col)

            elif self.current_char == '>':
                val = self.current_char
                if self.peek() == '>':
                    val += '>'
                    self.advance()
                    self.advance()
                    return Token(TT_LSR, val, self.line, self.col)
                self.advance()
                return Token(TT_GREATER, val, self.line, self.col)

            elif self.current_char == '=':
                val = self.current_char
                if self.peek() == '=':
                    val += '='
                    self.advance()
                    self.advance()
                    return Token(TT_EQ, val, self.line, self.col)
                self.advance()
                return Token(TT_ASSIGN, val, self.line, self.col)




            elif self.current_char == ':':
                val = self.current_char
                self.advance()
                return Token(TT_STARTDEF, val, self.line, self.col)

            elif self.current_char == ';':
                val = self.current_char
                self.advance()
                return Token(TT_ENDDEF, val, self.line, self.col)

            elif self.current_char == '(' or self.current_char == '\\':
                self.eat_comment()
                return self.next_token()


            elif self.current_char == "'":
                return Token(TT_CHAR, self.char(), self.line, self.col)

            elif self.current_char == '"':
                string = self.string()
                return Token(TT_STRING, string, self.line, self.col)

            elif self.is_eoi():
                return Token(TT_EOF, None, self.line, self.col)

            else:
                raise ValueError(f'Unexpected character: {self.current_char} in input stream at position: {str(self.pos)}.')

        else:
            return Token(TT_EOF, None, self.line, self.col)


def main(text: str):
    lexer = Lexer(text)
    tok = lexer.next_token()
    while tok.type != TT_EOF:
        print(tok)
        tok = lexer.next_token()

if __name__ == "__main__":
    text = """ 
    ( this is a multi-line 
       comment. ) 
    \ This ia a line comment." 
    _myIdent """

    main(text)
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# parser.py
import sys
from lexer import Lexer
from posts.post4.string_table import StringTable
from stack import Stack
from tokens import *
#######################################
# Parser #
#######################################
class Parser:
def __init__(self, lexer: Lexer, data_stack: Stack = None, string_table=None):
self.lexer = lexer
# Setup data stack
if data_stack:
self.data = data_stack
else:
self.data = Stack()
self.current_tok = Token(TT_START, None)
# Setup string table
if string_table is not None:
self.string_table = string_table
else:
self.string_table = StringTable()
def next_token(self):
if self.current_tok != TT_EOF:
self.current_tok = self.lexer.next_token()
def parse(self):
# get token and decide what to do with it
tok = self.current_tok
while tok.type != TT_EOF:
self.next_token()
tok = self.current_tok
if tok.type == TT_START:
continue
if tok.type == TT_NUMBER:
self.data.push(int(tok.value))
continue
# Arithmetic
elif tok.type == TT_PLUS:
right = self.data.pop()
left = self.data.pop()
result = left + right
self.data.push(result)
elif tok.type == TT_MINUS:
right = self.data.pop()
left = self.data.pop()
result = left - right
self.data.push(result)
elif tok.type == TT_MUL:
right = self.data.pop()
left = self.data.pop()
result = left * right
self.data.push(result)
elif tok.type == TT_DIV:
right = self.data.pop()
left = self.data.pop()
result = left // right
self.data.push(result)
elif tok.type == TT_POW:
right = self.data.pop()
left = self.data.pop()
result = left ** right
self.data.push(result)
# Logical Operations
elif tok.type == TT_INV:
left = self.data.pop()
left = ~left
self.data.push(left)
elif tok.type == TT_XOR:
right = self.data.pop()
left = self.data.pop()
self.data.push((left ^ right))
elif tok.type == TT_AND:
right = self.data.pop()
left = self.data.pop()
self.data.push((left & right))
elif tok.type == TT_OR:
right = self.data.pop()
left = self.data.pop()
self.data.push((left | right))
elif tok.type == TT_LSL:
right = self.data.pop()
left = self.data.pop()
self.data.push(int(left) << int(right))
elif tok.type == TT_LSR:
right = self.data.pop()
left = self.data.pop()
self.data.push((left >> right))
elif tok.type == TT_STRING:
string = self.current_tok.value
addr = self.string_table.save(string)
self.data.push(addr)
self.data.push(len(string))
elif tok.type == TT_CHAR:
char = self.current_tok.value
ascii_code = ord(char)
self.data.push(ascii_code)
# Handle Keywords
elif tok.type == TT_KEYWORD:
# Process keyword
if tok.value == 'dump':
"""dump
Display the data stack contents in the console
"""
print(str(self.data))
elif tok.value == 'drop':
"""pop
Remove the top value from the stack and discard it.
"""
self.data.pop()
elif tok.value == 'clear':
"""clear
Empty the stack.
"""
self.data.clear()
elif tok.value.lower() == 'swap':
"""swap
Swaps the position of the top two items on the stack.
Ex: x y z --> x z y
"""
if self.data.count() < 2:
raise IndexError(
f'Attempt to SWAP on stack of size {self.data.count()}!\n\t\t\tStack must contain at least 2 elements to SWAP.')
n1 = self.data.pop()
n2 = self.data.pop()
self.data.push(n1)
self.data.push(n2)
elif tok.value == 'swap2':
"""swap2
Swaps the position of the top two pairs of items on the stack.
Ex: w x y z --> y z w x
"""
if self.data.count() < 4:
raise IndexError(
f'Attempt to SWAP on stack of size {self.data.count()}!\n\t\t\tStack must contain at least 4 elements to SWAP2.')
n3 = self.data.pop()
n2 = self.data.pop()
n1 = self.data.pop()
n0 = self.data.pop()
self.data.push(n2)
self.data.push(n3)
self.data.push(n0)
self.data.push(n1)
elif tok.value == 'over':
"""over
Copies the second item on the stack to the head position.
Ex: x y z --> x y z y
"""
top = self.data.pop()
next = self.data.pop()
self.data.push(next)
self.data.push(top)
self.data.push(next)
elif tok.value == 'over2':
"""over
Copies the second item on the stack to the head position.
Ex: x y z --> x y z y
"""
top = self.data.pop()
sec = self.data.pop()
next = self.data.pop()
self.data.push(next)
self.data.push(top)
self.data.push(next)
elif tok.value == 'dup2':
"""dup2
Duplicates the top two items on the stack.
Ex: x y z --> x y z y z
"""
n1 = self.data.pop()
n2 = self.data.pop()
self.data.push(n2)
self.data.push(n1)
self.data.push(n2)
self.data.push(n1)
elif tok.value == 'dup3':
"""dup3
Duplicates the top three items on the stack.
Ex: x y z --> x y z x y z
"""
n1 = self.data.pop()
n2 = self.data.pop()
n3 = self.data.pop()
self.data.push(n3)
self.data.push(n2)
self.data.push(n1)
self.data.push(n3)
self.data.push(n2)
self.data.push(n1)
elif tok.value == 'dup4':
"""dup3
Duplicates the top three items on the stack.
Ex: w x y z --> w x y z w x y z
"""
n1 = self.data.pop()
n2 = self.data.pop()
n3 = self.data.pop()
n4 = self.data.pop()
self.data.push(n4)
self.data.push(n3)
self.data.push(n2)
self.data.push(n1)
self.data.push(n4)
self.data.push(n3)
self.data.push(n2)
self.data.push(n1)
elif tok.value == 'rol':
"""ror
Rotates the values in the stack by moving the top
element to the bottom and shifting everything else up
one position.
Ex: rot x y z -> y z x
"""
item = self.data.pop()
last = self.data.que(item)
elif tok.value == 'ror':
"""rot
Rotates the values in the stack by moving the bottom
element to the top and pushing everything else down
one position.
Ex: rot x y z -> y z x
"""
last = self.data.tail()
self.data.push(last)
# Handle I/O Routines
elif tok.value == 'emit':
"""emit
The word EMIT takes a single ASCII representation on the
stack, using the low-order byte only, and sends the
character to stdout. For example, in decimal:
65 EMIT↵ A
66 EMIT↵ B
67 EMIT↵ C
"""
ch = self.data.pop()
if ch > 0x110000 or ch < 0:
raise ValueError(f'[Error] character popped by emit: {ch} is out of printable range!')
char = chr(ch)
print(char, end='')
elif tok.value == 'type':
"""<addr> <length> type
Sends a string to standard out. The string is located
in the string table at location <addr> and has a length
of <length>.
"""
length = int(self.data.pop())
addr = int(self.data.pop())
string = self.string_table.get_str(addr)
if length > len(string):
raise IndexError(f'Error: String length of {length} was given but actual length is {len(string)}.')
idx = 0
while idx < length:
print(string[idx], end='')
idx += 1
print()
elif tok.value == 'key':
"""key
Pauses execution until a key is pressed. Once a key
is pressed, the key code is pushed onto the stack.
Ex: <A> --> 65
"""
getch = sys.stdin.read
key = None
while not key:
key = getch(1)
self.data.push(ord(key))
# Helpful Constants
elif tok.value == 'cr': # Carriage Return
self.data.push(13)
elif tok.value == 'lf': # Line Feed
self.data.push(10)
elif tok.value == 'sp':
self.data.push(32)
elif tok.value == 'null': # NUll
self.data.push(0)
elif tok.value == 'bell': # System Bell
self.data.push(7)
elif tok.value == 'bs': # Backspace
self.data.push(8)
elif tok.value == 'tab': # TAB
self.data.push(9)
elif tok.value == 'vt': # Vertical TAB
self.data.push(11)
elif tok.value == 'ff': # Form Feed
self.data.push(12)
elif tok.value == 'esc': # Escape
self.data.push(27)
if __name__ == '__main__':
source = """ 16 2 >>
"This is a string"
dump
type dump
'C' 'B' 'A'
dump
"""
lexer = Lexer(source)
parser = Parser(lexer)
parser.parse()
#print(parser.data)
# parser.py import sys from lexer import Lexer from posts.post4.string_table import StringTable from stack import Stack from tokens import * ####################################### # Parser # ####################################### class Parser: def __init__(self, lexer: Lexer, data_stack: Stack = None, string_table=None): self.lexer = lexer # Setup data stack if data_stack: self.data = data_stack else: self.data = Stack() self.current_tok = Token(TT_START, None) # Setup string table if string_table is not None: self.string_table = string_table else: self.string_table = StringTable() def next_token(self): if self.current_tok != TT_EOF: self.current_tok = self.lexer.next_token() def parse(self): # get token and decide what to do with it tok = self.current_tok while tok.type != TT_EOF: self.next_token() tok = self.current_tok if tok.type == TT_START: continue if tok.type == TT_NUMBER: self.data.push(int(tok.value)) continue # Arithmetic elif tok.type == TT_PLUS: right = self.data.pop() left = self.data.pop() result = left + right self.data.push(result) elif tok.type == TT_MINUS: right = self.data.pop() left = self.data.pop() result = left - right self.data.push(result) elif tok.type == TT_MUL: right = self.data.pop() left = self.data.pop() result = left * right self.data.push(result) elif tok.type == TT_DIV: right = self.data.pop() left = self.data.pop() result = left // right self.data.push(result) elif tok.type == TT_POW: right = self.data.pop() left = self.data.pop() result = left ** right self.data.push(result) # Logical Operations elif tok.type == TT_INV: left = self.data.pop() left = ~left self.data.push(left) elif tok.type == TT_XOR: right = self.data.pop() left = self.data.pop() self.data.push((left ^ right)) elif tok.type == TT_AND: right = self.data.pop() left = self.data.pop() self.data.push((left & right)) elif tok.type == TT_OR: right = self.data.pop() left = self.data.pop() self.data.push((left | right)) elif tok.type == TT_LSL: right = self.data.pop() left = self.data.pop() self.data.push(int(left) << int(right)) elif tok.type == TT_LSR: right = self.data.pop() left = self.data.pop() self.data.push((left >> right)) elif tok.type == TT_STRING: string = self.current_tok.value addr = self.string_table.save(string) self.data.push(addr) self.data.push(len(string)) elif tok.type == TT_CHAR: char = self.current_tok.value ascii_code = ord(char) self.data.push(ascii_code) # Handle Keywords elif tok.type == TT_KEYWORD: # Process keyword if tok.value == 'dump': """dump Display the data stack contents in the console """ print(str(self.data)) elif tok.value == 'drop': """pop Remove the top value from the stack and discard it. """ self.data.pop() elif tok.value == 'clear': """clear Empty the stack. """ self.data.clear() elif tok.value.lower() == 'swap': """swap Swaps the position of the top two items on the stack. Ex: x y z --> x z y """ if self.data.count() < 2: raise IndexError( f'Attempt to SWAP on stack of size {self.data.count()}!\n\t\t\tStack must contain at least 2 elements to SWAP.') n1 = self.data.pop() n2 = self.data.pop() self.data.push(n1) self.data.push(n2) elif tok.value == 'swap2': """swap2 Swaps the position of the top two pairs of items on the stack. Ex: w x y z --> y z w x """ if self.data.count() < 4: raise IndexError( f'Attempt to SWAP on stack of size {self.data.count()}!\n\t\t\tStack must contain at least 4 elements to SWAP2.') n3 = self.data.pop() n2 = self.data.pop() n1 = self.data.pop() n0 = self.data.pop() self.data.push(n2) self.data.push(n3) self.data.push(n0) self.data.push(n1) elif tok.value == 'over': """over Copies the second item on the stack to the head position. Ex: x y z --> x y z y """ top = self.data.pop() next = self.data.pop() self.data.push(next) self.data.push(top) self.data.push(next) elif tok.value == 'over2': """over Copies the second item on the stack to the head position. Ex: x y z --> x y z y """ top = self.data.pop() sec = self.data.pop() next = self.data.pop() self.data.push(next) self.data.push(top) self.data.push(next) elif tok.value == 'dup2': """dup2 Duplicates the top two items on the stack. Ex: x y z --> x y z y z """ n1 = self.data.pop() n2 = self.data.pop() self.data.push(n2) self.data.push(n1) self.data.push(n2) self.data.push(n1) elif tok.value == 'dup3': """dup3 Duplicates the top three items on the stack. Ex: x y z --> x y z x y z """ n1 = self.data.pop() n2 = self.data.pop() n3 = self.data.pop() self.data.push(n3) self.data.push(n2) self.data.push(n1) self.data.push(n3) self.data.push(n2) self.data.push(n1) elif tok.value == 'dup4': """dup3 Duplicates the top three items on the stack. Ex: w x y z --> w x y z w x y z """ n1 = self.data.pop() n2 = self.data.pop() n3 = self.data.pop() n4 = self.data.pop() self.data.push(n4) self.data.push(n3) self.data.push(n2) self.data.push(n1) self.data.push(n4) self.data.push(n3) self.data.push(n2) self.data.push(n1) elif tok.value == 'rol': """ror Rotates the values in the stack by moving the top element to the bottom and shifting everything else up one position. Ex: rot x y z -> y z x """ item = self.data.pop() last = self.data.que(item) elif tok.value == 'ror': """rot Rotates the values in the stack by moving the bottom element to the top and pushing everything else down one position. Ex: rot x y z -> y z x """ last = self.data.tail() self.data.push(last) # Handle I/O Routines elif tok.value == 'emit': """emit The word EMIT takes a single ASCII representation on the stack, using the low-order byte only, and sends the character to stdout. For example, in decimal: 65 EMIT↵ A 66 EMIT↵ B 67 EMIT↵ C """ ch = self.data.pop() if ch > 0x110000 or ch < 0: raise ValueError(f'[Error] character popped by emit: {ch} is out of printable range!') char = chr(ch) print(char, end='') elif tok.value == 'type': """<addr> <length> type Sends a string to standard out. The string is located in the string table at location <addr> and has a length of <length>. """ length = int(self.data.pop()) addr = int(self.data.pop()) string = self.string_table.get_str(addr) if length > len(string): raise IndexError(f'Error: String length of {length} was given but actual length is {len(string)}.') idx = 0 while idx < length: print(string[idx], end='') idx += 1 print() elif tok.value == 'key': """key Pauses execution until a key is pressed. Once a key is pressed, the key code is pushed onto the stack. Ex: <A> --> 65 """ getch = sys.stdin.read key = None while not key: key = getch(1) self.data.push(ord(key)) # Helpful Constants elif tok.value == 'cr': # Carriage Return self.data.push(13) elif tok.value == 'lf': # Line Feed self.data.push(10) elif tok.value == 'sp': self.data.push(32) elif tok.value == 'null': # NUll self.data.push(0) elif tok.value == 'bell': # System Bell self.data.push(7) elif tok.value == 'bs': # Backspace self.data.push(8) elif tok.value == 'tab': # TAB self.data.push(9) elif tok.value == 'vt': # Vertical TAB self.data.push(11) elif tok.value == 'ff': # Form Feed self.data.push(12) elif tok.value == 'esc': # Escape self.data.push(27) if __name__ == '__main__': source = """ 16 2 >> "This is a string" dump type dump 'C' 'B' 'A' dump """ lexer = Lexer(source) parser = Parser(lexer) parser.parse() #print(parser.data)
# parser.py
import sys

from lexer import Lexer
from posts.post4.string_table import StringTable
from stack import Stack
from tokens import *

#######################################
# Parser                              #
#######################################

class Parser:

    def __init__(self, lexer: Lexer, data_stack: Stack = None, string_table=None):
        self.lexer = lexer

        # Setup data stack
        if data_stack:
            self.data = data_stack
        else:
            self.data = Stack()

        self.current_tok = Token(TT_START, None)

        # Setup string table
        if string_table is not None:
            self.string_table = string_table
        else:
            self.string_table = StringTable()

    def next_token(self):
        if self.current_tok != TT_EOF:
            self.current_tok = self.lexer.next_token()

    def parse(self):
        # get token and decide what to do with it
        tok = self.current_tok

        while tok.type != TT_EOF:
            self.next_token()
            tok = self.current_tok

            if tok.type == TT_START:
                continue

            if tok.type == TT_NUMBER:
                self.data.push(int(tok.value))
                continue

            # Arithmetic
            elif tok.type == TT_PLUS:
                right = self.data.pop()
                left = self.data.pop()
                result = left + right
                self.data.push(result)

            elif tok.type == TT_MINUS:
                right = self.data.pop()
                left = self.data.pop()
                result = left - right
                self.data.push(result)

            elif tok.type == TT_MUL:
                right = self.data.pop()
                left = self.data.pop()
                result = left * right
                self.data.push(result)

            elif tok.type == TT_DIV:
                right = self.data.pop()
                left = self.data.pop()
                result = left // right
                self.data.push(result)

            elif tok.type == TT_POW:
                right = self.data.pop()
                left = self.data.pop()
                result = left ** right
                self.data.push(result)

            # Logical Operations
            elif tok.type == TT_INV:
                left = self.data.pop()
                left = ~left
                self.data.push(left)

            elif tok.type == TT_XOR:
                right = self.data.pop()
                left = self.data.pop()
                self.data.push((left ^ right))

            elif tok.type == TT_AND:
                right = self.data.pop()
                left = self.data.pop()
                self.data.push((left & right))

            elif tok.type == TT_OR:
                right = self.data.pop()
                left = self.data.pop()
                self.data.push((left | right))

            elif tok.type == TT_LSL:
                right = self.data.pop()
                left = self.data.pop()
                self.data.push(int(left) << int(right))

            elif tok.type == TT_LSR:
                right = self.data.pop()
                left = self.data.pop()
                self.data.push((left >> right))

            elif tok.type == TT_STRING:
                string = self.current_tok.value
                addr = self.string_table.save(string)
                self.data.push(addr)
                self.data.push(len(string))

            elif tok.type == TT_CHAR:
                char = self.current_tok.value
                ascii_code = ord(char)
                self.data.push(ascii_code)

            # Handle Keywords
            elif tok.type == TT_KEYWORD:
                # Process keyword
                if tok.value == 'dump':
                    """dump
                    Display the data stack contents in the console
                    """
                    print(str(self.data))

                elif tok.value == 'drop':
                    """pop
                    Remove the top value from the stack and discard it.
                    """
                    self.data.pop()

                elif tok.value == 'clear':
                    """clear
                    Empty the stack.
                    """
                    self.data.clear()

                elif tok.value.lower() == 'swap':
                    """swap
                    Swaps the position of the top two items on the stack.
                    Ex: x y z --> x z y
                    """
                    if self.data.count() < 2:
                        raise IndexError(
                            f'Attempt to SWAP on stack of size {self.data.count()}!\n\t\t\tStack must contain at least 2 elements to SWAP.')
                    n1 = self.data.pop()
                    n2 = self.data.pop()
                    self.data.push(n1)
                    self.data.push(n2)

                elif tok.value == 'swap2':
                    """swap2
                    Swaps the position of the top two pairs of items on the stack.
                    Ex: w x y z --> y z w x
                    """
                    if self.data.count() < 4:
                        raise IndexError(
                            f'Attempt to SWAP on stack of size {self.data.count()}!\n\t\t\tStack must contain at least 4 elements to SWAP2.')
                    n3 = self.data.pop()
                    n2 = self.data.pop()
                    n1 = self.data.pop()
                    n0 = self.data.pop()
                    self.data.push(n2)
                    self.data.push(n3)
                    self.data.push(n0)
                    self.data.push(n1)

                elif tok.value == 'over':
                    """over
                    Copies the second item on the stack to the head position.
                    Ex: x y z --> x y z y
                    """
                    top = self.data.pop()
                    next = self.data.pop()
                    self.data.push(next)
                    self.data.push(top)
                    self.data.push(next)

                elif tok.value == 'over2':
                    """over
                    Copies the second item on the stack to the head position.
                    Ex: x y z --> x y z y
                    """
                    top = self.data.pop()
                    sec = self.data.pop()
                    next = self.data.pop()
                    self.data.push(next)
                    self.data.push(top)
                    self.data.push(next)

                elif tok.value == 'dup2':
                    """dup2
                    Duplicates the top two items on the stack.
                    Ex: x y z --> x y z y z
                    """
                    n1 = self.data.pop()
                    n2 = self.data.pop()
                    self.data.push(n2)
                    self.data.push(n1)
                    self.data.push(n2)
                    self.data.push(n1)

                elif tok.value == 'dup3':
                    """dup3
                    Duplicates the top three items on the stack.
                    Ex: x y z --> x y z x y z
                    """
                    n1 = self.data.pop()
                    n2 = self.data.pop()
                    n3 = self.data.pop()
                    self.data.push(n3)
                    self.data.push(n2)
                    self.data.push(n1)
                    self.data.push(n3)
                    self.data.push(n2)
                    self.data.push(n1)

                elif tok.value == 'dup4':
                    """dup3
                    Duplicates the top three items on the stack.
                    Ex: w x y z --> w x y z w x y z 
                    """
                    n1 = self.data.pop()
                    n2 = self.data.pop()
                    n3 = self.data.pop()
                    n4 = self.data.pop()
                    self.data.push(n4)
                    self.data.push(n3)
                    self.data.push(n2)
                    self.data.push(n1)
                    self.data.push(n4)
                    self.data.push(n3)
                    self.data.push(n2)
                    self.data.push(n1)

                elif tok.value == 'rol':
                    """ror
                    Rotates the values in the stack by moving the top
                    element to the bottom and shifting everything else up 
                    one position. 
                    Ex: rot x y z -> y z x
                    """
                    item = self.data.pop()
                    last = self.data.que(item)

                elif tok.value == 'ror':
                    """rot
                    Rotates the values in the stack by moving the bottom
                    element to the top and pushing everything else down 
                    one position. 
                    Ex: rot x y z -> y z x
                    """
                    last = self.data.tail()
                    self.data.push(last)

                # Handle I/O Routines
                elif tok.value == 'emit':
                    """emit
                    The word EMIT takes a single ASCII representation on the 
                    stack, using the low-order byte only, and sends the 
                    character to stdout. For example, in decimal:

                        65 EMIT↵ A  
                        66 EMIT↵ B  
                        67 EMIT↵ C  
                    """
                    ch = self.data.pop()
                    if ch > 0x110000 or ch < 0:
                        raise ValueError(f'[Error] character popped by emit: {ch} is out of printable range!')
                    char = chr(ch)
                    print(char, end='')

                elif tok.value == 'type':
                    """<addr> <length> type
                    Sends a string to standard out. The string is located
                    in the string table at location <addr> and has a length
                    of <length>.
                    """
                    length = int(self.data.pop())
                    addr = int(self.data.pop())

                    string = self.string_table.get_str(addr)
                    if length > len(string):
                        raise IndexError(f'Error: String length of {length} was given but actual length is {len(string)}.')

                    idx = 0
                    while idx < length:
                        print(string[idx], end='')
                        idx += 1
                    print()

                elif tok.value == 'key':
                    """key
                    Pauses execution until a key is pressed. Once a key 
                    is pressed, the key code is pushed onto the stack.
                    Ex: <A> --> 65
                    """
                    getch = sys.stdin.read
                    key = None
                    while not key:
                        key = getch(1)
                    self.data.push(ord(key))

                # Helpful Constants
                elif tok.value == 'cr':  # Carriage Return
                    self.data.push(13)

                elif tok.value == 'lf':  # Line Feed
                    self.data.push(10)

                elif tok.value == 'sp':
                    self.data.push(32)

                elif tok.value == 'null':  # NUll
                    self.data.push(0)

                elif tok.value == 'bell':  # System Bell
                    self.data.push(7)

                elif tok.value == 'bs':  # Backspace
                    self.data.push(8)

                elif tok.value == 'tab':  # TAB
                    self.data.push(9)

                elif tok.value == 'vt':  # Vertical TAB
                    self.data.push(11)

                elif tok.value == 'ff':  # Form Feed
                    self.data.push(12)

                elif tok.value == 'esc':  # Escape
                    self.data.push(27)


if __name__ == '__main__':
    source = """ 16 2 >> 
    "This is a string" 
    dump
    type dump
    'C' 'B' 'A'
    dump
    """
    lexer = Lexer(source)
    parser = Parser(lexer)
    parser.parse()
    #print(parser.data)
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
#! usr/bin/env/python3
# sol.py
from lexer import Lexer
from parser import Parser
import argparse
def main(src):
_lexer = Lexer(src)
_parser = Parser(_lexer)
_parser.parse()
if __name__ == '__main__':
argparse = argparse.ArgumentParser()
argparse.add_argument("-i", "--infile",
help="Input source file", required=True)
argparse.add_argument("-v", "--verbose", help="Echo source file", action="store_true")
args = argparse.parse_args()
infile = args.infile
sorce = ''
with open(infile, 'r') as ifh:
source = ifh.read()
if args.verbose:
print(source)
main(source)
#! usr/bin/env/python3 # sol.py from lexer import Lexer from parser import Parser import argparse def main(src): _lexer = Lexer(src) _parser = Parser(_lexer) _parser.parse() if __name__ == '__main__': argparse = argparse.ArgumentParser() argparse.add_argument("-i", "--infile", help="Input source file", required=True) argparse.add_argument("-v", "--verbose", help="Echo source file", action="store_true") args = argparse.parse_args() infile = args.infile sorce = '' with open(infile, 'r') as ifh: source = ifh.read() if args.verbose: print(source) main(source)
#! usr/bin/env/python3

# sol.py

from lexer import Lexer
from parser import Parser
import argparse


def main(src):
    _lexer = Lexer(src)
    _parser = Parser(_lexer)
    _parser.parse()



if __name__ == '__main__':
    argparse = argparse.ArgumentParser()
    argparse.add_argument("-i", "--infile",
                        help="Input source file", required=True)
    argparse.add_argument("-v", "--verbose", help="Echo source file", action="store_true")
    args = argparse.parse_args()

    infile = args.infile
    sorce = ''
    with open(infile, 'r') as ifh:
        source = ifh.read()
        if args.verbose:
            print(source)

    main(source)
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# stack.py
class Stack:
def __init__(self, report_errors=False):
self.store = []
self.report_error = report_errors
def push(self, item):
self.store.append(item)
def pop(self):
if self.store:
return self.store.pop()
else:
if self.report_error:
raise IndexError('Attempt to pop a value from empty stack!')
else:
return None
def tail(self):
if self.store:
return self.store.pop(0)
else:
return None
def clear(self):
self.store.clear()
def count(self):
return len(self.store)
def is_empty(self):
return not bool(self.store)
def __str__(self):
s = '[ '
for v in self.store:
s += "'" + str(v) + "', "
s = s[:-2]
s += ' ]'
return s
def __repr__(self):
return self.__str__()
# stack.py class Stack: def __init__(self, report_errors=False): self.store = [] self.report_error = report_errors def push(self, item): self.store.append(item) def pop(self): if self.store: return self.store.pop() else: if self.report_error: raise IndexError('Attempt to pop a value from empty stack!') else: return None def tail(self): if self.store: return self.store.pop(0) else: return None def clear(self): self.store.clear() def count(self): return len(self.store) def is_empty(self): return not bool(self.store) def __str__(self): s = '[ ' for v in self.store: s += "'" + str(v) + "', " s = s[:-2] s += ' ]' return s def __repr__(self): return self.__str__()
# stack.py

class Stack:
    def __init__(self, report_errors=False):
        self.store = []
        self.report_error = report_errors

    def push(self, item):
        self.store.append(item)

    def pop(self):
        if self.store:
            return self.store.pop()
        else:
            if self.report_error:
                raise IndexError('Attempt to pop a value from empty stack!')
            else:
                return None

    def tail(self):
        if self.store:
            return self.store.pop(0)
        else:
            return None

    def clear(self):
        self.store.clear()

    def count(self):
        return len(self.store)

    def is_empty(self):
        return not bool(self.store)

    def __str__(self):
        s = '[ '
        for v in self.store:
            s += "'" + str(v) + "', "
        s = s[:-2]
        s += ' ]'
        return s

    def __repr__(self):
        return self.__str__()
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
class StringTable:
def __init__(self):
self.data = []
# Insert string and meta data into string list
def save(self, string):
entry = {'len': len(string), 'val': string}
self.data.append(entry)
return (len(self.data) - 1)
# Return string storage object (including length)
def get(self, idx):
return self.data[idx]
# Return length of string
def length(self, idx):
return int(self.data[idx]['len'])
# Return printable string only (no meta data).
def get_str(self, idx):
return str(self.data[idx]['val'])
def __str__(self):
return str(self.data)
class StringTable: def __init__(self): self.data = [] # Insert string and meta data into string list def save(self, string): entry = {'len': len(string), 'val': string} self.data.append(entry) return (len(self.data) - 1) # Return string storage object (including length) def get(self, idx): return self.data[idx] # Return length of string def length(self, idx): return int(self.data[idx]['len']) # Return printable string only (no meta data). def get_str(self, idx): return str(self.data[idx]['val']) def __str__(self): return str(self.data)
class StringTable:

    def __init__(self):
        self.data = []

    # Insert string and meta data into string list
    def save(self, string):
        entry = {'len': len(string), 'val': string}
        self.data.append(entry)
        return (len(self.data) - 1)

    # Return string storage object (including length)
    def get(self, idx):
        return self.data[idx]

    # Return length of string
    def length(self, idx):
        return int(self.data[idx]['len'])

    # Return printable string only (no meta data).
    def get_str(self, idx):
        return str(self.data[idx]['val'])

    def __str__(self):
        return str(self.data)
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# tokens.py
# Data
TT_CHAR = 'CHAR' # 'c'
TT_STRING = 'STRING' # "string"
TT_NUMBER = 'NUMBER' # 1234 (integers only
TT_ID = 'ID'
# Arithmatic Operators
TT_ASSIGN = 'ASSIGN' # '='
TT_PLUS = 'PLUS' # '+'
TT_MINUS = 'MINUS' # '-'
TT_MUL = 'MUL' # '*'
TT_DIV = 'DIV' # '/'
TT_POW = 'POW' # '**"
# Bitwise Operators
TT_LSR = 'LSR' # '>>'
TT_LSL = 'LSL' # '<<'
TT_AND = 'AND' # '&' Bitwise AND
TT_OR = 'OR' # '|' Bitwise OR
TT_XOR = 'XOR' # '^' Bitwise XOR
TT_INV = 'INV' # '~' Bitwise NOT/Invert
# Relational Operators
TT_EQ = 'EQ' # '==' Is Equal?
TT_LESS = 'LESS' # '<' Is Less?
TT_GREATER = 'GREATER' # '>' Is greater?
# Logical Operators
TT_LAND = 'LAND' # AND Logical
TT_LOR = 'LOR' # OR Logical
TT_NEQ = 'NEQ' # NEQ Logical
# DEFINE Functions
TT_STARTDEF = 'STARTDEFINE' # ':'
TT_ENDDEF = 'ENDDEFINE' # ';'
# INTERNAL USE
TT_KEYWORD = 'KEYWORD'
TT_START = "START"
TT_EOF = 'EOF'
class Token:
def __init__(self, _type, _value, line=None, col=None):
self.type = _type
self.value = _value
self.col = col
self.line = line
def __str__(self):
return f'''type: {self.type}, value: {self.value},
at line {self.line} in column {self.col}'''
def __repr__(self):
return self.__str__()
# tokens.py # Data TT_CHAR = 'CHAR' # 'c' TT_STRING = 'STRING' # "string" TT_NUMBER = 'NUMBER' # 1234 (integers only TT_ID = 'ID' # Arithmatic Operators TT_ASSIGN = 'ASSIGN' # '=' TT_PLUS = 'PLUS' # '+' TT_MINUS = 'MINUS' # '-' TT_MUL = 'MUL' # '*' TT_DIV = 'DIV' # '/' TT_POW = 'POW' # '**" # Bitwise Operators TT_LSR = 'LSR' # '>>' TT_LSL = 'LSL' # '<<' TT_AND = 'AND' # '&' Bitwise AND TT_OR = 'OR' # '|' Bitwise OR TT_XOR = 'XOR' # '^' Bitwise XOR TT_INV = 'INV' # '~' Bitwise NOT/Invert # Relational Operators TT_EQ = 'EQ' # '==' Is Equal? TT_LESS = 'LESS' # '<' Is Less? TT_GREATER = 'GREATER' # '>' Is greater? # Logical Operators TT_LAND = 'LAND' # AND Logical TT_LOR = 'LOR' # OR Logical TT_NEQ = 'NEQ' # NEQ Logical # DEFINE Functions TT_STARTDEF = 'STARTDEFINE' # ':' TT_ENDDEF = 'ENDDEFINE' # ';' # INTERNAL USE TT_KEYWORD = 'KEYWORD' TT_START = "START" TT_EOF = 'EOF' class Token: def __init__(self, _type, _value, line=None, col=None): self.type = _type self.value = _value self.col = col self.line = line def __str__(self): return f'''type: {self.type}, value: {self.value}, at line {self.line} in column {self.col}''' def __repr__(self): return self.__str__()
# tokens.py


# Data
TT_CHAR     = 'CHAR'    # 'c'
TT_STRING   = 'STRING'  # "string"
TT_NUMBER   = 'NUMBER'  # 1234 (integers only
TT_ID       = 'ID'

# Arithmatic Operators
TT_ASSIGN   = 'ASSIGN'  #   '='
TT_PLUS     = 'PLUS'    #   '+'
TT_MINUS    = 'MINUS'   #   '-'
TT_MUL      = 'MUL'     #   '*'
TT_DIV      = 'DIV'     #   '/'
TT_POW      = 'POW'     #   '**"

# Bitwise Operators
TT_LSR      = 'LSR'     #   '>>'
TT_LSL      = 'LSL'     #   '<<'
TT_AND     = 'AND'    #   '&' Bitwise AND
TT_OR      = 'OR'     #   '|' Bitwise OR
TT_XOR     = 'XOR'     #   '^' Bitwise XOR
TT_INV     = 'INV'     #   '~' Bitwise NOT/Invert

# Relational Operators
TT_EQ       = 'EQ'      #   '==' Is Equal?
TT_LESS     = 'LESS'    #   '<'  Is Less?
TT_GREATER  = 'GREATER' #   '>' Is greater?

# Logical Operators
TT_LAND  = 'LAND'   #   AND Logical
TT_LOR   = 'LOR'    #   OR Logical
TT_NEQ   = 'NEQ'    #   NEQ Logical

# DEFINE Functions
TT_STARTDEF = 'STARTDEFINE' #   ':'
TT_ENDDEF   = 'ENDDEFINE'   #   ';'

# INTERNAL USE
TT_KEYWORD  = 'KEYWORD'
TT_START    = "START"
TT_EOF      = 'EOF'


class Token:
    def __init__(self, _type, _value, line=None, col=None):
        self.type = _type
        self.value = _value
        self.col = col
        self.line = line

    def __str__(self):
        return f'''type: {self.type}, value: {self.value}, 
                at line {self.line} in column {self.col}'''

    def __repr__(self):
        return self.__str__()
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
\ file: test.sol
\
\ Expected Output
\ -----------------------
\ [ '2', '3' ]
\ [ '5' ]
\ [ '5', '5' ]
\ [ '10' ]
\ [ '10', '6' ]
\
2 3 dump + dump
5 dump + dump
6 dump * dump
\ file: test.sol \ \ Expected Output \ ----------------------- \ [ '2', '3' ] \ [ '5' ] \ [ '5', '5' ] \ [ '10' ] \ [ '10', '6' ] \ 2 3 dump + dump 5 dump + dump 6 dump * dump
\ file: test.sol
\
\ Expected Output
\ -----------------------
\ [ '2', '3' ]
\ [ '5' ]
\ [ '5', '5' ]
\ [ '10' ]
\ [ '10', '6' ]
\

2 3 dump + dump
5 dump + dump
6 dump * dump

Until next time, Happy Coding!

Series Navigation<< Implementing Stack Oriented Languages – Part 3

Leave a Reply

Your email address will not be published. Required fields are marked *