Need help on line parsing, making an assembler.

Question

notuserfriendly 1 Junior Poster

15 Years Ago

Hello, i am currently writing an assembler for the theoretical SIC/xe machine and i decided to do it in python.
Very new to python but i thought it would be fund to do it in and willing to learn. So far i know what to do and have the diea except i cannot implement it in python code yet. I can do it in java or c++.
Basically i want to parse through the source.txt file and since the source is made with 3 columns i decide to split by tab and match there.
here is my sample code

def main():
	comment = "."
	SYMTAB = {"START":' ', "BYTE":" ","WORD":" ", "RESB":" ","RESW":" ","END":" ","BASE":" ","NOBASE":" "}
	OPTAB = {"ADDR":" ", "COMPR":" ", "SUBR":" ", "ADD":" ", "SUB":" ", "MUL":" ", "DIV":" ", "COMP":" ", "J":" ", "JEQ":" ", "JGT":" ", 
"JLT":" ", "JSUB":" ", "LDA":" ", "LDB":" ", "LDCH":" ", "LDL":" ", "LDT":" ", "LDX":" ", "RSUB":" ", "TIX":" ", "TIXR":" ", "RD":" ", "TD":" ", "WD":" ", 
"STA":" ", "STB":" ", "STCH":" ", "STL":" ", "STX":" ", "CLEAR":" "}
	inf = open("source.txt", 'rU')
	outf = open ("temp.txt", 'w')
	LOCCTR = 0
	lines = inf.readlines()
	for line in lines:
		if line.find("START") > -1:
			about = line.split()
			outf.write(about[2])
			LOCCTR = about[2]
	
	for line in lines:
		while line.find("END") < -1:
			print line
		
				
			
					
			
				

		

	outf.close()	
	inf.close()
	
	
	return 0

if __name__ == '__main__':
	main()

I would like to know how to iterate better through lines with python.
Please ask me more questions if necessary
i have attached a sample source.txt file as well so you can see how it is. As you can see its separated in 3 columns but sometimes the first column is empty.

python

sic.txt (0.43 KB)

PGRM1        START   0 
FIRST        LDB     NUMB1
             STL     RETAD
             LDA     NUMB1
LOOP         ADD     NUMB2
             STA     ARRAY,X
             TIX     LIMIT
             ADD     #1
             STA     NUMB3
             J       @RETAD
RETAD        RESW    1
LIMIT        WORD    10
ARRAY        RESB    1024
NUMB1        WORD    5
NUMB2        WORD    10
NUMB3        RESW    1
             END     FIRST

5 Contributors
10 Replies
509 Views
2 Days Discussion Span
Latest Post 15 Years Ago Latest Post by woooee

TrustyTony 888 ex-Moderator

15 Years Ago

I would start with something like this. I would same time produce simulator for the code in the end.

L == LOCCTR ??

generate=False ## code generation on?


## adr parameter is the first column value, op is the third column (list?)
def start(adr,op):
    global generate
    generate=True
    pass

def byte(adr,op):
    pass

def word(adr,op):
    pass

def resb(adr,op):
    pass

def resw(adr,op):
    pass

def end(adr,op):
    global generate
    generate=False
    pass

def base(adr,op):
    pass

def nobase(adr,op):
    pass

## INSTRUCTIONS
def addr(adr,op):
    pass

def compr(adr,op):
    pass

def subr(adr,op):
    pass

def add(adr,op):
    pass

def sub(adr,op):
    pass

def mul(adr,op):
    pass

def div(adr,op):
    pass

def comp(adr,op):
    pass

def j(adr,op):
    pass

def jeq(adr,op):
    pass

def jgt(adr,op):
    pass

def jlt(adr,op):
    pass

def jsub(adr,op):
    pass

def lda(adr,op):
    pass

def ldb(adr,op):
    pass

def ldch(adr,op):
    pass

def ldl(adr,op):
    pass

def ldt(adr,op):
    pass

def ldx(adr,op):
    pass

def rsub(adr,op):
    pass

def tix(adr,op):
    pass

def rd(adr,op):
    pass

def td(adr,op):
    pass

def wd(adr,op):
    pass

def sta(adr,op):
    pass

def stb(adr,op):
    pass

def stch(adr,op):
    pass

def stl(adr,op):
    pass

def stx(adr,op):
    pass

def clear(adr,op):
    pass
## DICTS
SYMTAB = {
    "START":start,
    "BYTE":byte,
    "WORD":word,
    "RESB":resb,
    "RESW":resw,
    "END":end,
    "BASE":base,
    "NOBASE":nobase
    }

OPTAB = {
    "ADDR":addr,
    "COMPR":compr,
    "SUBR":subr,
    "ADD":add,
    "SUB":sub,
    "MUL":mul,
    "DIV":div,
    "COMP":compr,
    "J":j,
    "JEQ":jeq,
    "JGT":jgt,
    "JLT":jlt,
    "JSUB":jsub,
    "LDA":lda,
    "LDB":ldb,
    "LDCH":ldch,
    "LDL":ldl,
    "LDT":ldt,
    "LDX":ldx,
    "RSUB":rsub,
    "TIX":tix,
    "TIXR":tix,
    "RD":rd,
    "TD":td,
    "WD":wd,
    "STA":sta,
    "STB":stb,
    "STCH":stch,
    "STL":stl,
    "STX":stx,
    "CLEAR":clear
    }

VALUES={
    "LOCCTR": 0,
    "A":0x0,
    "B":0x0,
    "CH":0x0,
    "L":0x0,
    "X":0x0,
    "FLAGS": 0x0
    }

if __name__ == '__main__':
    
    inf = open("sic.txt", 'rU')

    lines = [(code[:13].strip(), code[13:21].strip(), code[21:].strip()) for code in inf]

    for i,line in enumerate(lines):  ## generate index,value pairs from lines
        if line[0]:
            VALUES[line[0]]=i

    print '** SYMBOLS **'
    
    for k in sorted(VALUES.keys()): print k,' = ',VALUES[k]

    print '-'*40
    print '** AS LIST **'
    for i in lines: print i
    print '-'*40
    print '** THE PROGRAM WITH LINE NUMBERS **'
    for li,line in enumerate(lines):
        print li,':\t','\t'.join(line)

## process the replacement of symbols by value, spliting of third field parameters
## run the functions in loop for each line for assembler
## run as program updating program counter and registers in functions
## according of the meaning of instructions
## to do simulation run of the program

Edited 15 Years Ago by TrustyTony because: n/a

jice 53 Posting Whiz in Training

15 Years Ago

Just one thing. For line iterating, i find it is a bad idea to load the whole file in a list (using readlines()) except if you can't process it sequentially. This will consume memory for really nothing interesting...
Python allows iterating directly on file lines :

for line in open(myfile,'r'):
    print line

Or

with open(myfile,'r') as infile:
    for line in infile:
        print line

This seems to be the preferred way for some obscures reasons I don't know (if someone can explain, i would learn this with pleasure).

woooee 814 Nearly a Posting Maven

15 Years Ago

You are on the right track with the dictionary, but am not sure exactly what you want to do. Some input and output data might help. This code is an example of what I think you want.

SYMTAB = {"START":' ', "BYTE":" ","WORD":" ", "RESB":" ","RESW":" ","END":" ","BASE":" ","NOBASE":" "}
test_list = [ "START 1000\n", "JUNK 2000\n", "END 3000\n" ]
for rec in test_list:
    rec = rec.strip()
    substrs = rec.split()
    ## assume the keyword is always the first word
    print substrs[0],
    if substrs[0] in SYMTAB:
        print "FOUND  = xx %s" % ( " ".join(substrs[1:]) ) 
    else:
        print "NOT FOUND"

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

tonymuilenburg -2 Light Poster · Answer 1 · 2010-04-25T14:18:09+00:00

I like to use readlines to get the data. Some other useful commands include stip and append. Here is some ugly code to demonstrate these:

batch_data = FileHandle.readlines()
    FileHandle.close()
    for line in batch_data:
        SplitArray = ["0"]
        #remove endlines from the lines
        CleanLine = line.strip("\n")
        #remove white space from lines
        CleanLine = CleanLine.strip()
        SplitArray = CleanLine.split("=")
        #remove white space from lines
        SplitArray[0] = SplitArray[0].strip()
        #append the read line to an array
        ParameterArray.append(SplitArray[0])
        if len(SplitArray) == 2:
            #remove white space from lines
            SplitArray[1] = SplitArray[1].strip()
            #append the read line to an array
            ParameterArray.append(SplitArray[1])

Hope that helps.
Tony

notuserfriendly 1 Junior Poster · Answer 2 · 2010-04-25T14:30:23+00:00

Ok i see that is useful. But how do i go about matching strings in that split array from string in a database or another array?

tonymuilenburg -2 Light Poster · Answer 3 · 2010-04-25T17:32:46+00:00

In python, you can compare strings easily. Just loop through the code, and do a compare like this:

while (i < len(ParameterArray)):
        if ParameterArray[i] == databaseItem:
             match = True

You can also use search to find partial text in strings. Here is an example:

import re
#...
re.compile("some text").search(ParameterArray[j])

notuserfriendly 1 Junior Poster · Answer 4 · 2010-04-26T12:57:36+00:00

I would start with something like this. I would same time produce simulator for the code in the end.

hey tonyjv, i udnerstand that those are functions for each symbol but what paramters do i give. can you give me an example.

 def subr(2,94):
    pass

or

def subr(adr,op):
    adr = 2
    op = 94

Also how can i access the opcode stored there?
like subr.value() ?

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 5 · 2010-04-26T17:21:53+00:00

I thought that you iterate over lines of code and do calls to subroutines found through dict, maybe having both SYMTAB and OPTAB as one table would make life easier, if they can not have save values (like adrress called SUBR)
I thought this kind of use. My example loop for symbols must be replaced with loop aware of length of each instruction and use byte addressing (mostly used for processors)

## example opcode lives in high byte of 32 bit number 
## and adr is in the remaining 24 bits
## op I thought to use for the value calculated from the third colum
## = operand
def subr(adr,op):
    ## subr = 0x94
    ## in case we have have op as clear number
    VALUES["LOCCNTR"] = op
    ## return the 4 bytes and increase location adr by 4 for code scan
   ## I mean that next instruction must be written 4 bytes from old value
    return (0x94<<24 + op, adr + 4)

In the loop we would do (this would work in reality for variable length instructions, would need some function for multibyte saving to memory, instead of list assignment).

memory[adr], adr =  this_instructions_operation(adr,op)

Using double assignment

>>> a = [0,1,2]
>>> i=1
>>> a[i],i = (99,i+1)
>>> i
2
>>> a
[0, 99, 2]
>>>

Maybe better alternative to returning ready bytes of assembly is to return list value and do second pass over these values to write real values. This is maybe better as we have to make second pass anyway if we have variable length instructions (so we do not know length of each instruction and so the offset for address from the START address)

However this direct assembly was more easy for me to make up as example.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 6 · 2010-04-26T18:27:33+00:00

Could you explain the meaning of the code, my try as python style comments.

PGRM1        START   0      ## PGRM1==?
FIRST        LDB     NUMB1  ## this will put word address of NUMB1 to B? or contents of that location?
             STL     RETAD  ## STore LOCCTR? The value of next LDA instructions location
             LDA     NUMB1
LOOP         ADD     NUMB2
             STA     ARRAY,X ## put current value of A to memory location starting from ARRAY?
             TIX     LIMIT  ## ??
             ADD     #1     ## add to A constant 1?
             STA     NUMB3
             J       @RETAD  ## jump to location stored in location RETAD?
RETAD        RESW    1       ## reserve word=2 bytes for value (uninitialized) and call location RETAD
LIMIT        WORD    10      ## put two bytes word value for constant 10 or reserve 10 words array?
ARRAY        RESB    1024    ## reserve 1024B=kByte for array?
NUMB1        WORD    5
NUMB2        WORD    10
NUMB3        RESW    1
             END     FIRST  ## NOT PGRM1

notuserfriendly 1 Junior Poster · Answer 7 · 2010-04-26T21:33:55+00:00

I didnt explain it clearly. This is not a linker. Just an assembler. I search the file for symbols, if the symbol is not in the symbol table, i add it to SYMTAB. If it is then continue. basically i just assemble the opcodes with their values and locctr and produce object code. .
Here is some pseudo code. http://pastebin.com/Af21Dk7Y . i do not want/need to have somone write that out fo me in python. i just want to know syntax on how to deal witht eh strings, after finding them , assign them a locctr value. and wrtie out thei value. so basically if i FIND start in a line. I initiate Locctr to whatever value is after START. So copy START 1000 would give a locctr value of 1000. then i add the opcode in front of it. So its xx1000
and so on for the rest of the code for each label in there. so i dont think all those functions are necessary. unless i make them return the opcode value but that still would bee too much.