Extracting certain parts of a textfile - thanks in advance

Question

whypython 0 Newbie Poster

12 Years Ago

Suppose I have this textfile:

Generation #1:
trash_jsdbjsabnf
trash_nsdjklfndsnf
trash_jlsndfknsf
...
trash_akjsdlkjasdasd
Game_List:
game = 111
game = 222
game = 333

Generation #2:
trash_jsdbjsabnf
trash_nsdjklfndsnf
trash_jlsndfknsf
...
trash_slajdlaskjdlas
Game_List:
game = 119
game = 262
game = 323

...
...

Generation #500:
trash_jsdbjsabnf
trash_nsdjklfndsnf
trash_jlsndfknsf
...
trash_jkansdklnalsda
Game_List:
game = 323
game = 213
game = 211

I was wondering if there is a simple way to extract only the Generation # and the corresponding list of game under Game_List. That is, ignoring all the trash lines, which literally is about 50,000 lines.

Thank you so much in advance.

python

3 Contributors
8 Replies
129 Views
19 Hours Discussion Span
Latest Post 12 Years Ago Latest Post by TrustyTony

TrustyTony 888 ex-Moderator

12 Years Ago

check start of line with startswith method. It can take tuple of line starts, not only single string.

dilbert_here00 0 Light Poster

12 Years Ago

try something like this:

data = open("file.txt", 'r')
while True:
    line = data.readline()
    if line.startswith("Generation"):
        while True:
            list = []
            line2 = data.readline()
            if line2.startswith("game"):
                dict[line1] = list.append(line2)
            elif line.startswith("generation"):
                break
            else: break

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2012-07-09T21:15:41+00:00

So my version explicitely would be:

with open("file.txt") as infile:
    interesting = [line for line in infile if line.lower().startswith(('game =', 'generation #'))
print(interesting)

whypython 0 Newbie Poster · Answer 2 · 2012-07-10T03:39:51+00:00

Thank you for all of the prompt responses. I have tried both methods (and several more I found) but none worked. Since I'm a one-week-old Pythoner, I'm not sure what the mistakes are. Dilbert's code didn't run although it makes near perfect logical sense to me; and when I removed the last "else:break" part, it reported "TypeError: 'type' object does not support item assignment." And Tony, this might be some stupid mistake that I couldn't see, but the command "print" on the 3rd line reported a syntax error... Any ideas?

whypython 0 Newbie Poster · Answer 3 · 2012-07-10T05:53:49+00:00

ok, thanks Tony, i got it finally! I took your pseudocode too literally. This is cool stuff. Thanks all for the support!

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 4 · 2012-07-10T06:44:06+00:00

Mine was working code, I would guess (without trying it out), but this line

dict[line1] = list.append(line2)

was small clitch from dilbert, as dict is type and he musted have thought one instance of it, which must be initilized before loop, like

d = dict()
#.... inside loop
d.setdefault(line1, []).append(line2)

whypython 0 Newbie Poster · Answer 5 · 2012-07-10T14:41:16+00:00

Ok, that's why. Thanks.

The textfile I obtained from this code is:

1 game = 111
1 game = 222
1 game = 333
2 game = 121
2 game = 231
2 game = 432
…
500 game = 321
500 game = 311
500 game = 121
Where the first number is the Generation #.

When I count how many different unique games in each generation, I use this code:

text = open (file.txt)
gen = 1
count = 0
for line in text:
    fields = line.split()
    if int(fields[0])> gen:
        print gen, count
        gen = int(fields[0])
        count = 1
    else:
        count += 1

Which works fine. But I couldn’t find a way to incorporate this code into the original code. It would save me a ton of time if I knew how to do it as I wouldn't have to wait for the computer to print out 500 generations worth it 'games'. Do you have any suggestions?

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 6 · 2012-07-10T16:33:52+00:00

If you can not assume it is number after last #. You could print in my code:

print('%i generations' % sum('#' in line for line in interesting))