how to take blocks of lines?

Question

eikonal 0 Newbie Poster

13 Years Ago

Hi guys! Pretty easy question below

From the code below 'data' is made by a column of numbers, let's say 12 numbers (see below)

56.71739
56.67950
56.65762
56.63320
56.61648
56.60323
56.63215
56.74365
56.98378
57.34681
57.78903
58.27959

def block_generator():
        with open ('test', 'r') as lines:
            data = lines.readlines ()[5::]
            for line in data: # this actually gives me the
                if not line.startswith ("          "):  # column of 12 numbers
                    block = # how to get blocks of 4 lines???
                    yield block

How can I create blocks of of four numbers? For example

56.71739
56.67950
56.65762
56.63320

56.61648
56.60323
56.63215
56.74365

and so on... because I need to process all the blocks.

Thanks for reading

python

4 Contributors
16 Replies
183 Views
1 Day Discussion Span
Latest Post 13 Years Ago Latest Post by eikonal

All 16 Replies

woooee 814 Nearly a Posting Maven

13 Years Ago

There is more than one way, but you could use a for() loop with step=4. We don't have the file so you will have to test and correct this yourself.

def block_generator():
        with open ('test', 'r') as lines:
            data = lines.readlines ()[5::]

            for ctr in range(0, len(data), 4):
                block = []
                for x in range(4):
                    block.append(data[ctr+x]
                print block

Edited 13 Years Ago by woooee because: n/a

Gribouillis commented: I like the " with ... as lines" +13

Gribouillis 1,391 Programming Explorer

13 Years Ago

Here is a pure "itertool style" solution

from itertools import islice, repeat, takewhile

def blocks(fileobj, size=4):
    blank = " " * 10
    lines = (line for line in fileobj if not line.startswith(blank))
    return takewhile(len, (list(islice(lines, 0, size)) for x in repeat(None)))

with open("blocks.txt") as ifh:
    for b in blocks(ifh):
        print b

""" my output -->
['56.71739\n', '56.67950\n', '56.65762\n', '56.63320\n']
['56.61648\n', '56.60323\n', '56.63215\n', '56.74365\n']
['56.98378\n', '57.34681\n', '57.78903\n', '58.27959\n']
"""

Notice that the file is not loaded in memory.

Edited 13 Years Ago by Gribouillis because: n/a

Gribouillis 1,391 Programming Explorer

13 Years Ago

As I said, your algorithm is wrong. There is no reason to call blocks_generator() for each output file. It only needs to be called once. Try this first

def tgrid_files ():
    """Creates the t.grid files"""
    times = open ('test', 'r') # what is this ?
    header = header_tgrid_files()
    file_names = open ('receiver_names', 'r')
    blocks = blocks_generator('test')
    for (element, block) in zip(file_names, blocks):
        names = element.split()[0] # the plural is misleading. I guess it's only one name
        print names, block

It should print each output file name with the block that goes in this file. If this works, you only need to open the file and write the header and the block (use output_files.writelines(block))

Edited 13 Years Ago by Gribouillis because: n/a

Gribouillis 1,391 Programming Explorer

13 Years Ago

Oh man I can't believe it!!!! It's working!!!! I sweat blood over it!
One last thing if you can: why I'm getting this output in the file:

rather than, you know just a column without list and string symbols like:
54.22953
54.17732
54.13724
and so on...

Instead of write("%s" % block), use writelines(block), it should print a column.

>>> import sys
>>> sys.stdout.writelines([' 54.22953\n', ' 54.17732\n', ' 54.13724\n', ' 54.10664\n', ' 54.08554\n'])
 54.22953
 54.17732
 54.13724
 54.10664
 54.08554
>>>

Edited 13 Years Ago by Gribouillis because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

eikonal 0 Newbie Poster · Answer 1 · 2011-11-04T07:54:23+00:00

Great, it does work! thanks
However, by using print I can see all the blocks, but using return shows me only one block...why?
I thought it was a question of indentation but it's not.
Any idea???

eikonal 0 Newbie Poster · Answer 2 · 2011-11-04T10:54:35+00:00

It works but again the same problem with 'return', exactly as I said above. It returns only one block instead of three as 'print' correctly does.... Why?????

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 3 · 2011-11-04T10:57:24+00:00

It works but again the same problem with 'return', exactly as I said above. It returns only one block instead of three as 'print' correctly does.... Why?????

I don't see any problem. blocks() returns an iterable which yields the blocks one after the other. Please post your problematic code

eikonal 0 Newbie Poster · Answer 4 · 2011-11-04T11:04:12+00:00

I don't see any problem. blocks() returns an iterable which yields the blocks one after the other. Please post your problematic code

def blocks(fileobj, size=20):
    infile = fileobj.readlines()[5::]
    blank = " " * 10
    lines = (line for line in infile if not line.startswith(blank))
    return takewhile(len, (list(islice(lines, 0, size)) for x in repeat(None)))
 
def prova ():
    with open('test', 'r') as ifh:
        for b in blocks(ifh):
            return b

Consider that I'm using blocks made by 20 numbers instead of 4.
The result I get is:

which is just one block. It is like in the above example you posted you would get only

why is this happening?
I need the three different blocks cos I'm going to write every block in a different file.
Hope you can help with that!

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 5 · 2011-11-04T11:08:01+00:00

Then use

def prova():
    return blocks(open("test"))

If you write 'return' in a for loop, the loop exits at first iteration.

Edit: also why did you write this infile = fileobj.readlines()[5::] . You're breaking the code by using readlines() which loads the whole file. Use infile = islice(fileobj, 5, None)

snippsat 661 Master Poster · Answer 6 · 2011-11-04T14:12:49+00:00

An other way,nice use of itertools Gribouillis.

def blocks(filename, chunks):
    with open(filename) as f:
        numb_list = [i for i in f]
    return [numb_list[i:i+chunks] for i in range(0, len(numb_list), chunks)]

filename = 'numb.txt'
chunks = 4
print blocks(filename, chunks)

""" Out-->
[['56.71739\n', '56.67950\n', '56.65762\n', '56.63320\n'], ['56.61648\n', '56.60323\n', '56.63215\n', '56.74365\n'], ['56.98378\n', '57.34681\n', '57.78903\n', '58.27959\n']]"""

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 7 · 2011-11-04T14:33:37+00:00

Actually, at the end of the itertools module documentation, there is a list of recipes containing the following grouper() function which could be used

from itertools import izip_longest

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

for b in grouper(4, open("blocks.txt")):
    print b

""" my output -->
('56.71739\n', '56.67950\n', '56.65762\n', '56.63320\n')
('56.61648\n', '56.60323\n', '56.63215\n', '56.74365\n')
('56.98378\n', '57.34681\n', '57.78903\n', '58.27959\n')
"""

In python 3, izip_longest() becomes zip_longest(). I recommend saving the whole list of recipes in a module itertoolsrecipes.py. New python 3 recipes can also be added to the list.

eikonal 0 Newbie Poster · Answer 8 · 2011-11-05T06:11:37+00:00

Ok guys all your codes are working very well, thanks very much for that. however as I said before I can't write the blocks in different files. My piece of code is:

with open ("%s" % names, 'w') as output_files: # names are defined elsewhere, it's working no issue here
            blocks = blocks_generator('test') # Gives me three blocks as you guys suggested (nested list)
            for block in blocks:
                output_files.write ("%s" % block)

What I get is three files with the first block printed as a list in them:
file1:

file2:

file3:

BUT what I'd like is:
file1:
56.71739
56.67950
56.65762
56.63320

file2:
56.61648
56.60323
56.63215
56.74365

and so on......
just to be precise and I think it's important, 'blocks generator' returns:

[[' 56.71739\n', ' 56.67950\n', ' 56.65762\n', ' 56.63320\n'], [' 56.61648\n', ' 56.60323\n', ' 56.63215\n', ' 56.74365\n'], ['56.98378\n', '57.34681\n', '57.78903\n', '58.27959\n']]

Hope you can help!!!

Thanks everyone for helping me out!

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 9 · 2011-11-05T11:42:48+00:00

with open ("%s" % names, 'w') as output_files: # this can't work
# a call to open() can't open more than one file.

Your problem is that you don't think about your code's logic. WRITE PSEUDO CODE to describe the sequence of operations that your program should do. You want to write one block per file ? I see 2 algorithms

for each block:
    open a file # how do we pick the file's name ?
    write the block in the file

and

open 3 files
for each pair (file, block):
    write the block in the file

both can be implemented. I'd choose the first.

eikonal 0 Newbie Poster · Answer 10 · 2011-11-05T12:04:37+00:00

I didn't want to do this but I have to post all the code otherwise, obviously, we can't understand each other.

from itertools import islice

def header_tgrid_files():
    """Creates the header that goes in .tgrid files needed by loc3d"""
    f = open ('test', 'r') 
    while True:
        line1 = f.readline()
        line2 = f.readline()
        line3 = f.readline()
        header = line1 + line2 + line3
        return header  # This gives me the header that will be written in all files

def blocks_generator(filename): # This gives me the blocks
    with open(filename) as f:
        infile = islice(f, 5, None)
        blank = " " * 10
        lines = (line for line in infile if not line.startswith(blank))
        numb_list = [i for i in lines]
        chunks = 20
        return [numb_list[i:i+chunks] for i in range(0, len(numb_list), chunks)]

def tgrid_files ():
    """Creates the t.grid files"""
    times = open ('test', 'r')
    header = header_tgrid_files()
    file_names = open ('receiver_names', 'r')
    for element in file_names:
        names = element.split()[0] 
        with open ("%s.tgrid" % names, 'w') as output_files: # Opens the files, it obviously works here cos of the for loop
            output_files.write ("%s\t1\n" % header)
            blocks = blocks_generator('test')
            for block in blocks: # HERE is the problem. I've got the header in every file but the same
                for item in block:# block in every file, printed as a list. I want every different block inside
                    output_files.write ("%s" % item) # the different files

header_tgrid_files and blocks_ generator are alright, actually tgrid_file as well except for the last lines as I have commented. My blocks are made by 20 numbers in this case.
Tried what you said but it's not working....

eikonal 0 Newbie Poster · Answer 11 · 2011-11-05T13:00:20+00:00

Oh man I can't believe it!!!! It's working!!!! I sweat blood over it!
One last thing if you can: why I'm getting this output in the file:

rather than, you know just a column without list and string symbols like:
54.22953
54.17732
54.13724

and so on...

eikonal 0 Newbie Poster · Answer 12 · 2011-11-05T13:15:55+00:00

I just put these two lines:

output_files.write ("%s\t1\n" % header)
output_files.writelines (block)

and it's working.
Thanks very, very much for your help and being patience with me. I have learnt a lot of things with your hints.
Cheers!

how to take blocks of lines?

Recommended Answers Collapse Answers

All 16 Replies

Recommended Answers