can we treat contents of a file as a 2D array?

Question

ssdeep 9 Light Poster

13 Years Ago

i want to be able to treat contents of a file as an array and traverse through it that way without having to store them in an array,can i do that?if so how?

perl

3 Contributors
14 Replies
242 Views
2 Days Discussion Span
Latest Post 13 Years Ago Latest Post by Dandello

d5e5 109 Master Poster

13 Years Ago

i want to be able to treat contents of a file as an array and traverse through it that way without having to store them in an array,can i do that?if so how?

Tie::File treats the contents of a file as an array, but not a 2D array. You might try using Tie::File with the recsep option and calculate that, for example, if you want the first (really the zero'th) element in the third (really the 2nd) row and there are 5 columns in every row then you can refer to element[10] of your one-dimensional array. It will take some thinking to determine how to calculate your row and column indices, but once you have it figured out the program should run fairly fast because Tie::File doesn't have to load the entire file into memory.

Dandello 8 Posting Whiz in Training

13 Years Ago

Within the file you need some sort of end of line marker to mark the end of a record (usually a '\n' but not always), and a delimiter between fields.
As an example

open my $DAT1, '<', 'archive/title.txt' or die 'cannot open file';
my @data1  = <$DAT1>;
close $DAT1 or die 'cannot close file';

opens the file to use. Since in this case it's a pipe ('|') delimited file:

foreach my $a (@data1) {
    chomp $a;
    my (
        $alph,  $title, $new1,   $URL,  $author,
        $adult, $blurb, $series, $size, $win
    ) = split /[|]/xsm, $a;

But nearly anything can be used to delimit the fields.
Now I could, if I needed to, access a single line and break it up:

my (
        $alph,  $title, $new1,   $URL,  $author,
        $adult, $blurb, $series, $size, $win
    ) = split /[|]/xsm, $data1{0};

If I were using Tie::File opening the file would entail:

tie my @data1, 'Tie::File', 'archive/title.txt', recsep => "\n";

Note the 'recsep => "\n"' That's the record separator (line marker).
And I would still have to break the data on each line into discrete fields using split.

Naturally, there are other ways to do this, but I happened to have these examples at hand.

Note: unless the file you're tying is so huge you cannot put the entire file into memory as an array, you're better off, resource and time-wise to simply load the file into memory. Tied files run very very slowly.

Edited 13 Years Ago by Dandello because: n/a

Dandello 8 Posting Whiz in Training

13 Years Ago

Since you're actually working with a string made up of 1 and 0, simply treat it like a string. That should take care of some of the complication. And you might try writing the changes to a different file - write each character to the new file, counting as the sting is altered. Just remember to end the line with a '\n'.

I've been able to process 200mg files that way.

Dandello 8 Posting Whiz in Training

13 Years Ago

I've been able to use Tie::File on a Windows machine without any problems with the recsep = "\n"; (I don't have Notepad++ but there should be a setting somewhere to use Unix type line breaks.)

The problem is probably with line 54. Try removing the '\n' from it. The Tie::File should already know that the record separator is '\n' and should put it back automatically.

You can also try trimming the white space from the line before writing it to the file :

sub trim {
my $sting = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}

But assuming your file is supposed to be 0s and 1s, you really should look at string functions. In this case, count the 0s and do a global replace on them to 1. Single pass on each line. And then remove the blanks before writing back to the file.

Edited 13 Years Ago by Dandello because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

ssdeep 9 Light Poster · Answer 1 · 2011-11-29T22:15:49+00:00

ok,if a single line wer to contain two words,then how wud the array take it?
for e.g if array[1] holds the second line,will $array[1]="hello world\n"? or
$array[1]="hello" $array[2]="world" like that?

ssdeep 9 Light Poster · Answer 2 · 2011-11-30T00:26:45+00:00

i tried Tie::File,it is kind of what i am looking for but i hav one big problem,i am getting the entire file in $array[0] itself,not record wise....I am using windows.

ssdeep 9 Light Poster · Answer 3 · 2011-11-30T11:45:14+00:00

thanks for the replies,

My Problem:
Implement a sort of flood fill algorithm with a text file containing 0's and 1's where 0=black pixel and 1=white pixel. basically i have to convert 0's into 1's and display the number of interations it takes to do so(imagine in MS paint the number of clicks u need to completely fill the whole picture with one color).
The file sizes are of the range upto 2-10Mb so I have no option but to go with Tie::File so as to not time out
The algorithm i used actually does work when i load the entire thing into an array but only for small size inputs
So far following your advice i was able to read the lines and store each line into a single one dimensional array,i can even modify that array but the problem arises when after I modify it and try to copy it back into the tied array.
It is highly difficult as the array seems to be taking a lot of whitespaces thereby scrambling my entire file and i hav to regenerate the input files each time i execute making the whole thing quite cubersome.
Can anyone give an example:
in which you take the file into an @array,you assign each line into a temp array say @array2=split(//,$array[1]),to this array you make a modification say change one character and the most important step is assign this array back to the original array $array[1]="$array2"; so as to keep everything intact

ssdeep 9 Light Poster · Answer 4 · 2011-11-30T12:44:05+00:00

Since you're actually working with a string made up of 1 and 0, simply treat it like a string. That should take care of some of the complication. And you might try writing the changes to a different file - write each character to the new file, counting as the sting is altered. Just remember to end the line with a '\n'.
I've been able to process 200mg files that way.

I may hav to traverse the file upwards and downwards iteratively another new file would lead to even more complications, ill post the code,it reads the filename from an argument and takes the number of rows and columns from the first row and the matrix starts from second row,there are a couple of useless variables that i declared in order to debug the prog so just ignore it,the gist is my @array is tied to input file which is read from the arguments,@array2 stores each member of @array in form of another array(there by taking care of the '2D' aspect of it) $X and $Y correspond to the number of rows and number of colums respectively.

#!/usr/bin/perl
#use warnings;
use Tie::File;
$name=$ARGV[0];
open(FH,$ARGV[0]);
$len=<FH>;
@lent=split(//,$len);
@XY=split(' ',$len);

$offset=$#lent;
close(FH);
$X=$XY[0];
$Y=$XY[1];

tie @array,'Tie::File',$name or die,recsep=>"\n",autochomp=>0;
@array=split('\n',$array[0]);
$ref=\@array;
print "$array[0]\n";
@array2={};
$num_clicks=0;
for($i=1;$i<=$X;$i++)
{
	 @array2=split(//,$array[$i]);
	 chomp(@array2);
	print "@array2**\n";
	for($j=0;$j<$Y;$j++)
	{
		
		
		if($array2[$j] eq 0)
		{
			
			$num_clicks++;
			push(@stack,($i,$j));
			while($#stack>1)
			{
				
				$j1=pop(@stack);
				$i1=pop(@stack);
				
				if(($i1==0)||($i1==$X+1)||($j1==-1)||($j1==$Y))
			{
				next;
				}
			@array2=split(//,$array[$i1]);
			print "@array2\n";
			if($array2[$j1] eq 0)
		{
			
			
			$array2[$j1]=1;
			print "@array2 PP\n";
			chomp(@array2);
			$array[$i1]="@array2\n";
			push(@stack,($i1,$j1-1,$i1,$j1+1,$i1-1,$j1-1,$i1-1,$j1,$i1-1,$j1+1,$i1+1,$j1-1,$i1+1,$j1,$i1+1,$j1+1));
			
			
			
			
			}
				
				}
			
			
			
			}
		
	
		
		
		}
	
	}

print "$num_clicks\n";
untie @array;

Dandello 8 Posting Whiz in Training · Answer 5 · 2011-11-30T13:16:45+00:00

Okay - if all there is in the original file is 0s, 1s and end of lines, then there's no need to split the fields since it's actually a string and strings take less resources than fields.

So iterate over the string elements instead - you can count and change them without splitting them by using substring functions

Maybe if you posted a little of the data, it would make sense.

Dandello 8 Posting Whiz in Training · Answer 6 · 2011-11-30T21:39:32+00:00

Sorry if I sounded snarky - it was bedtime.
A couple things to remember:
Once you've properly tied the file, you should be able to treat it exactly like a standard array, which means that :

@array=split('\n',$array[0]);

is trying assign whatever is in the first row of the @array back to @array, thereby trashing @array.

Also, none of my references mention 'autochomp' as a parameter for Tie::File. Just chomp the lines when you start processing them.

ssdeep 9 Light Poster · Answer 7 · 2011-11-30T21:57:44+00:00

Sorry if I sounded snarky - it was bedtime.
A couple things to remember:
Once you've properly tied the file, you should be able to treat it exactly like a standard array, which means that :
@array=split('\n',$array[0]);
is trying assign whatever is in the first row of the @array back to @array, thereby trashing @array.
Also, none of my references mention 'autochomp' as a parameter for Tie::File. Just chomp the lines when you start processing them.

hey no probs :-)
I tried autochomp to see if the file remains the same after modification.The original problem I had,which I suspect to be a windows specific one is that the tied file is taking the entire file as the first element of the array that is $array[0] and not just the first line, i tried to delimit using "\r\n" as suggested for windows users but had no luck with it,so i had to take this detour.The files are set to open with notepad++.

Currently the program as I have posted seems to be doing something with the lines which previously wasn't happening but it is adding more whitespaces after each read messing with the data in that process.

In the for loop the first execution of each statement is happening as I wanted but for each succesive iteration it gets worse the problem i think is in how i am reassigning the modified lines back to the file array.

ssdeep 9 Light Poster · Answer 8 · 2011-11-30T22:01:53+00:00

I was referring to the statement in line 54.

ssdeep 9 Light Poster · Answer 9 · 2011-12-01T21:53:26+00:00

Guess what I did'nt have to do any of that!!,The problem was as I suspected it was line 54.
My array2 holds its elements with whitspaces i cannot change that,all i had to do was assign this array2 to a scalar using join.That's it,its a one line code and the whole thing works like a charm.

Twist in the tale:I solved this program first in c++ and got a fantastic execution time of 0.000secs but it was taking up too much memory,I thought shifting to perl would solve my problem as it has some good text processing capabilities.Perl was taking even more memory and executing at 0.035secs as I was loading the whole file into an array.Then I wanted to try modifying the file itself instead of loading the whole file into an array so this Tie::File option looked perfect.After a lot of struggle one line of code solved all my problems.But guess what,this is the worst option yet!! It is taking 14.53secs and taking up way too much memory like twice that of the c++ code haha,anyway thanks to Dandello for all the help and patience

Dandello 8 Posting Whiz in Training · Answer 10 · 2011-12-01T22:16:00+00:00

Tie::File is NOT fast. It's a last resort option when you have to process freaking huge files and don't want to break them up first. You still might want to look at processing your lines as strings instead of arrays - strings and regex require a lot less memory to process and that makes it faster.

But since you already have a solution, it's just an idea to play with.

How about marking this as solved?