File Parsing

Question

tcl76 0 Light Poster

13 Years Ago

hi,

i would like to parse an input file (input.txt) into output file (output.txt) which i'll into excel. appreciate any advice on how to parse the input.txt into output.txt.

thanks
johnny

file-system python

input.txt (0.84 KB)

-- num cell port function safe [ccell disval rslt]
   "17 (BC_1, CLK, input, X)," &
   "16 (BC_1, OC_NEG, input, X), " &-- Merged input/
   "16 (BC_1, *, control, 1), " &-- control
   "15 (BC_1, D(1), input, X)," &
   "14 (BC_1, D(2), input, X)," &
   "13 (BC_1, D(3), input, X)," &
   "12 (BC_1, D(4), input, X)," &
   "11 (BC_1, D(5), input, X)," &
   "10 (BC_1, D(6), input, X)," &
   " 9 (BC_1, D(7), input, X)," &
   " 8 (BC_1, D(8), input, X)," &  -- cell 16 @ 1 -> Hi-Z
   " 7 (BC_1, Q(1), output3, X, 16, 1, Z)," &
   " 6 (BC_1, Q(2), output3, X, 16, 1, Z)," &
   " 5 (BC_1, Q(3), output3, X, 16, 1, Z)," &
   " 4 (BC_1, Q(4), output3, X, 16, 1, Z)," &
   " 3 (BC_1, Q(5), output3, X, 16, 1, Z)," &
   " 2 (BC_1, Q(6), output3, X, 16, 1, Z)," &
   " 1 (BC_1, Q(7), output3, X, 16, 1, Z)," &
   " 0 (BC_1, Q(8), output3, X, 16, 1, Z)";

output.txt (0.52 KB)

17 BC_1 CLK input X 
   16 BC_1 OC_NEG input X  
   16 BC_1 * control 1  
   15 BC_1 D1 input X 
   14 BC_1 D2 input X 
   13 BC_1 D3 input X 
   12 BC_1 D4 input X 
   11 BC_1 D5 input X 
   10 BC_1 D6 input X 
    9 BC_1 D7 input X 
    8 BC_1 D8 input X   
    7 BC_1 Q1 output3 X 16 1  
    6 BC_1 Q2 output3 X 16 1  
    5 BC_1 Q3 output3 X 16 1  
    4 BC_1 Q4 output3 X 16 1 
    3 BC_1 Q5 output3 X 16 1  
    2 BC_1 Q6 output3 X 16 1  
    1 BC_1 Q7 output3 X 16 1  
    0 BC_1 Q8 output3 X 16 1

3 Contributors
8 Replies
224 Views
23 Hours Discussion Span
Latest Post 13 Years Ago Latest Post by tcl76

All 8 Replies

Gribouillis 1,391 Programming Explorer

13 Years Ago

great. the output is right. it's strange though when i open using output.txt using notepad, it shows:
17 BC_1 CLK input X 16 BC_1 OC_NEG input X 15 BC_1 D 1
but when i use other editor like wordpad it shows (which is what i want)
17 BC_1 CLK input X
16 BC_1 OC_NEG input X
15 BC_1 D 1
i also notice that in the code already inserted \n so it should write to a new line,not sure why notepad is showing differently.
anyway this certainly works. do you mind explaining the logic of your code, i'm trying to understand why there are 2 ifs. and the meaning of a=re.split("\W+", ?
thanks a lot

About the end of line issue, try to open the output file in mode "w" instead of "wb" to see if it changes something.

Gribouillis 1,391 Programming Explorer

13 Years Ago

you are right, it works after i use the 'w' option instead of 'wb'. any comment why notepad view differently? tq

See the doc of os.linesep for more information.

Also a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6] is best written ' '.join(a[1:7])

Edited 13 Years Ago by Gribouillis because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

thines01 401 Postaholic Team Colleague Featured Poster · Answer 1 · 2012-01-04T23:37:56+00:00

What are the rules on the data you want kept?
From what I see it's:
1) Delete the header
2) Ignore ALL punctuation (except asterisk * )
3) Ignore all text after the last parenthesis
4) If it contains "output" also ignore the last field

Is that correct?

tcl76 0 Light Poster · Answer 2 · 2012-01-05T06:59:18+00:00

hi, the rules are:
1) either remove the header or keep it as below format
2) Ignore ALL punctuation (except asterisk * )
3) Ignore all text after the last parenthesis
the end result is:
num cell port function safe ccell
17 BC_1 CLK input X
16 BC_1 OC_NEG input X
16 BC_1 * control 1
15 BC_1 D1 input X
14 BC_1 D2 input X
13 BC_1 D3 input X
12 BC_1 D4 input X
11 BC_1 D5 input X
10 BC_1 D6 input X
9 BC_1 D7 input X
8 BC_1 D8 input X
7 BC_1 Q1 output3 X 16 1
6 BC_1 Q2 output3 X 16 1
5 BC_1 Q3 output3 X 16 1
4 BC_1 Q4 output3 X 16 1
3 BC_1 Q5 output3 X 16 1
2 BC_1 Q6 output3 X 16 1
1 BC_1 Q7 output3 X 16 1
0 BC_1 Q8 output3 X 16 1

i tried but it gave index error due to last sentence is longer than the initial sentence. not sure how to proceed, pls help. tq

import re
lines=open("input.txt",'r').readlines()

for line in lines:
    a=re.findall(r'\w+',line)
    print re.findall(r'\w+',line)
    print a[0],a[1],a[2],a[3],a[4],a[5],a[6]

thines01 401 Postaholic Team Colleague Featured Poster · Answer 3 · 2012-01-05T11:24:07+00:00

Well, this is not very elegant, but it's functional:

import re

fileIn = open("input.txt", "rb")
fileOut = open("output.txt", "wb")

for strData in fileIn:
    strData = strData.split('-')[0]
    if("input" in strData):
        a=re.split("\W+", strData)
        fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+'\n')
   
    if("output" in strData):
        a=re.split("\W+", strData)
        fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+' '+a[7]+' '+a[8]+'\n')
       
fileOut.close()
fileIn.close()

tcl76 0 Light Poster · Answer 4 · 2012-01-05T11:37:32+00:00

great. the output is right. it's strange though when i open using output.txt using notepad, it shows:
17 BC_1 CLK input X 16 BC_1 OC_NEG input X 15 BC_1 D 1

but when i use other editor like wordpad it shows (which is what i want)
17 BC_1 CLK input X
16 BC_1 OC_NEG input X
15 BC_1 D 1

i also notice that in the code already inserted \n so it should write to a new line,not sure why notepad is showing differently.

anyway this certainly works. do you mind explaining the logic of your code, i'm trying to understand why there are 2 ifs. and the meaning of a=re.split("\W+", ?

thanks a lot

tcl76 0 Light Poster · Answer 5 · 2012-01-05T11:53:01+00:00

you are right, it works after i use the 'w' option instead of 'wb'. any comment why notepad view differently? tq

tcl76 0 Light Poster · Answer 6 · 2012-01-05T13:01:10+00:00

tcl76 0 Light Poster

13 Years Ago

tq

File Parsing

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers