Need help with my script.

Question

vesnushka 0 Newbie Poster

14 Years Ago

Hi everyone!!
Help me please with my script. I'm using Perl Express for scripting in .
This is my script:

use warnings;
use strict;


open (file,'C:\Documents and Settings\soea\Desktop\Test.docx') || die "Can not open: $!\n";
@file = <file>;

for  (my $i = 0;
	 $i<=scalar(@file)-1;
         $i++;)
         	{
                	if ($f[@file] =~ /^Exec.+\n/)
                        {
                        splice(@file, $f, 1);
                        }
                }

close (file);

This is output:
Unrecognized \D passed through at ... line 6
Unrecognized \s passed through at ... line 6
Unrecognized \D passed through at ... line 6
Unrecognized \T passed through at ... line 6
Global symbol "@file" requires explicit package name at ... line 7
.....

perl

4 Contributors
24 Replies
316 Views
4 Weeks Discussion Span
Latest Post 14 Years Ago Latest Post by vesnushka

sinnerFA 9 Junior Poster in Training

14 Years Ago

Change your full filepath to:

open (file,'C:\\Documents and Settings\\soea\\Desktop\\Test.docx') ||

and give it a go.

HTH's
sinnerFA

d5e5 109 Master Poster

14 Years Ago

See comments in the following where I modified your statements to avoid the errors and warnings you were getting. I still get a runtime error because the data file I use doesn't contain the same data as your file.

#!/usr/bin/perl -w
use strict;
my ( @file, @f, $f, $i );    #Declare variables before using them

#Use capital letters for filehandle. IN is a better filehandle name than 'file'.
open( IN, 'C:\Documents and Settings\soea\Desktop\Test.docx' )
  || die "Can not open: $!\n";

@file = <IN>;

for (
    $i = 0,                  #use comma, not semicolon
    $i <= scalar(@file) - 1, $i++
  )
{
    if ( $f[@file] =~ /^Exec.+\n/ ) {
        splice( @file, $f, 1 );
    }
}

close(IN);

If you still encounter problems, can you attach your data file, or show us some test data?

d5e5 109 Master Poster

14 Years Ago

In this line: if ( $f[@file] =~ /^Exec.+\n/ ) { $f has not been initialized with any value. If you want to look at the first line in your input file, you would refer to it as @file[0] . The next record from the file is found in @file[1] and so on.

d5e5 109 Master Poster

14 Years Ago

Test.docx is a large binary file. The following script opens it, loads it into the @file array and prints the first element of the array. Since this is not a text file, printing it is pointless except to show that you can open it and it contains binary data that means nothing to me.

#!/usr/bin/perl -w
use strict;
my ( @file, @f, $f, $i );    #Declare variables before using them

#Use capital letters for filehandle. IN is a better filehandle name than 'file'.
open( IN, 'C:\Users\David\Programming\Perl\Test.docx' )#Changed path to my folder location
  || die "Can not open: $!\n";

@file = <IN>;

for ($i = 0,$i <= scalar(@file) - 1, $i++)
    {
        #if ( $f[@file] =~ /^Exec.+\n/ ) { # $f has no value. What are you looking for?
        #    splice( @file, $f, 1 );
        #}
        print $file[$i]; #Prints lots of garbage and beeps.
    }

close(IN);

Does this solve your question?

d5e5 109 Master Poster

14 Years Ago

It is easy to read text files successfully in Perl. Reading any other type of file is more difficult because you have to know how many bytes you want to read each time you read the file and what you want to do with them. If you want to translate some of the bytes in a binary file into characters you have to know where in the file these bytes can be found and how to interpret them as characters.

Did the test.docx file that you attached look like text in your program? What program created it? My Windows platform had no program associated with the file type, or couldn't guess what the file type was. I don't have MS-Word so tried to open it with Open Office Writer, unsuccessfully. Also tried to open it with a text editor, unsuccessfully. Is it a music file? Video? Executable program?

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

vesnushka 0 Newbie Poster · Answer 1 · 2010-01-18T17:39:23+00:00

Thanks for your comments!!
I did so, but It is new problem appeared:

Can not open: No such file or directory

But I use correct address... What problem can it be?
I thought it is because of C:\ dir, so I changed file location to 'D:\\temp\\Test.doc' but nothing is changed.
Help me please...

Here is the file example.

vesnushka 0 Newbie Poster · Answer 2 · 2010-01-18T18:05:47+00:00

vesnushka 0 Newbie Poster

14 Years Ago

This is the example of the file.

This attachment is potentially unsafe to open. It may be an executable that is capable of making changes to your file system, or it may require specific software to open. Use caution and only open this attachment if you are comfortable working with zip files.

Test.docx (17.89 KB)

vesnushka 0 Newbie Poster · Answer 3 · 2010-01-25T19:54:18+00:00

What file/data type should it be in windows to execute perl script successfully?

vesnushka 0 Newbie Poster · Answer 4 · 2010-01-26T17:32:14+00:00

No, the test.docx is the Windows Word 2007 text document.

vesnushka 0 Newbie Poster · Answer 5 · 2010-01-26T18:30:21+00:00

vesnushka 0 Newbie Poster

14 Years Ago

I copied the data to the .txt document (attached). But there is the problem with variable initialization... (attached)

Use of uninitialized value within @f in pattern match (m//) at t.pl line 13, <IN> line 255.
Use of uninitialized value within @f in pattern match (m//) at t.pl line 13, <IN> line 255.
Use of uninitialized value within @f in pattern match (m//) at t.pl line 13, <IN> line 255.

Is it some perl library for work with Windows Word? 'cause as I understood the data is stored in Windows Word as inner standard data format.

test.txt (3.82 KB)

Chapter 1.Test Lab
1.1.Root


1.1.1.Root\Deployment


1.1.1.1.Root December Deployment  

1.1.1.1.1.Root\2009 December (Public DocType )
Test Sets: 
1.1.1.1.1.1.Test Set : 465456 (Public DocType B)
Status : Open
Open Date : 12/18/2009

Tests : 
1.1.1.1.1.1.1.Plan: Test Name : Public D
Plan: Type : MANUAL
Exec Date : 12/18/2009
Time : 3:50:45 PM



Runs : 
1.1.1.1.1.1.1.1.Run Name : Fast_Rn_12-18_15-50-43
Status : Passed
Tester : soea
Exec Date : 12/18/2009
Exec Time : 3:50:44 PM
Duration : 0
Steps : 
Step Name : Step 1
Description : Open a web-browser

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:44 PM
Expected : logged in succe



Step Name : Step 2
Description : Go to the Clients -> Documents

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:44 PM
Expected : Documents screen should appear to User1.



Step Name : Step 3
Description : Enter the field.
Ner

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:44 PM
Expected : Only 



Step Name : Step 4
Description : Close the uuuu.

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:44 PM
Expected : The web



 




1.1.1.1.1.1.2.Plan: Test Name : P
Plan: Type : MANUAL
Exec Date : 12/18/2009
Time : 3:50:47 PM



Runs : 
1.1.1.1.1.1.2.1.Run Name : Fast_Run_12-18_15-50-46
Status : Passed
Tester : soea
Exec Date : 12/18/2009
Exec Time : 3:50:46 PM
Duration : 0
Steps : 
Step Name : Step 1
Description : Open a web-
Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:46 PM
Expected : should be 


Step Name : Step 2
Description : Go to the Clients

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:46 PM
Expected : Documents  appear to 



Step Name : Step 3
Description : Ente

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:46 PM
Expected : "should be displayed.



Step Name : Step 4
Description : Close the web-browser.

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:46 PM
Expected : The we displayed.



 




1.1.1.1.1.1.3.Plan: Test Name : Public DocType - 001
Status : Passed
Plan: Type : MANUAL
Exec Date : 12/18/2009
Time : 3:50:49 PM



Runs : 
1.1.1.1.1.1.3.1.Run Name : Fast_Run_12-18_15-50-48
Status : Passed
Tester : soea
Exec Date : 12/18/2009
Exec Time : 3:50:48 PM
Duration : 0
Steps : 
Step Name : Step 1
Description : Login to 

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:48 PM
Expected :  should be ln successfully.



Step Name : Step 2
Description : Go to Customer
Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:48 PM
Expected : The Customer


Step Name : Step 3
Description : Go to the Documents t

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:48 PM
Expected : Added 



Step Name : Step 4
Description : Choose "Public"

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:48 PM
Expected : Customer s



Step Name : Step 5
Description : Open a web-

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:48 PM
Expected : User


Step Name : Step 6
Description : Go to the C
Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:48 PM
Expected : Documents scre



Step Name : Step 7
Description : Enter the 

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:48 PM
Expected : Only documents 



Step Name : Step 8
Description : Close the web-browser.

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:48 PM
Expected : The web-browser should not be displayed.



Step Name : Step 9
Description : Close the VFT application.

Status : No Run
Exec Date : 12/18/2009
Exec Time : 3:50:48 PM
Expected : The application 



 




1.1.1.1.1.1.4.Plan: Test Name : Public DocType - 004
Status : Passed
Plan: Type : MANUAL
Exec Date : 12/18/2009
Time : 3:50:53 PM

d5e5 109 Master Poster · Answer 6 · 2010-01-27T02:58:04+00:00

OK, I'll take a look at your text file. Meanwhile you might want to look at http://www.wellho.net/solutions/perl-using-perl-to-read-microsoft-word-documents.html These modules work only when you have MS-Word installed on your computer, which I don't so I haven't tried them.

d5e5 109 Master Poster · Answer 7 · 2010-01-27T03:37:03+00:00

I made a few changes to get rid of error and warning messages. Also I removed the splice command because I wasn't sure what you wanted it to do. The following reads the test.txt file (you can change the path to make it work on your computer) into an array and prints only the records that start with Exec followed by other optional characters.

use warnings;
use strict;
my $i;
open (F,'C:\Users\David\Programming\Perl\Test.txt') || die "Can not open: $!\n";
my @file = <F>;

foreach (@file) {
    if ($_ =~ m/^Exec.+$/) {
        print; #print what is in $_ (the default variable)
    }
}

close (F);

vesnushka 0 Newbie Poster · Answer 8 · 2010-01-27T23:33:02+00:00

your script is working good, thanks! but I need to remove exact line into the file...that's why I used "splice".

I've tryed another way but here I've met other problems that I can't solve myself...

The result is empty test.txt file and no warning messages.

use strict;
use warnings;

my $i;

sub readdata {

open (F, 'C:\Documents and Settings\Desktop\test.txt') || die "Can not open: $!\n";
	my @data = <F>;
		close (F);
			return (@data);
}


sub writedata {

open ( F, '>C:\Documents and Settings\Desktop\test.txt' ) or die "Can not open: $!";

	foreach (my @data){
		print F "$_\n";
			}
close (F);
}


foreach (my @edit = readdata()){
$_ = /\AExec.+\Z/ ;
splice (@edit, $_ , 0);
writedata(@edit);
}

d5e5 109 Master Poster · Answer 9 · 2010-01-28T03:06:04+00:00

I think using splice to remove some lines from an array is more difficult than just testing each member of the array and deciding whether or not to write it into your file.

use strict;
use warnings;

#my $path_and_file = 'C:\Documents and Settings\Desktop\test.txt'; #vesnushka's file location
my $path_and_file = 'C:\Users\David\Programming\Perl\test.txt'; #d5e5's file location
my @edit = readdata();
writedata(@edit);

sub readdata {

    open( F, ,'<', $path_and_file ) #open in input mode
      || die "Can not open: $!\n";
    my @data = <F>;
    close(F);
    return (@data);
}

sub writedata {
    my @arrayout = @_; # @_ contains list passed when calling this subroutine
    open( F, , '>', $path_and_file ) or die "Can not open: $!"; ##open in output mode
    foreach ( @arrayout ) {
        chomp; #remove trailing newline from $_
        unless ($_ =~ m/^Exec/) { #Do not write lines starting with 'Exec' etc. into your file
            print F "$_\n";
        }
    }
    close(F);
}

d5e5 109 Master Poster · Answer 10 · 2010-01-29T02:25:51+00:00

Actually it's simpler to read and write one line at a time in a loop instead of opening, closing and reopening the file and creating arrays. The following is based on KevinADC's code snippet

#!/usr/bin/perl
use strict;
use warnings;

#my $path_and_file = 'C:\Documents and Settings\Desktop\test.txt'; #vesnushka's file location
my $path_and_file = 'C:\Users\David\Programming\Perl\test.txt'; #d5e5's file location

{
   local @ARGV = ($path_and_file);
   local $^I = '.bac';
   while(<>){
      next if $_ =~ m/^Exec/;
      print;
   }
}
print "finished processing file.";

vesnushka 0 Newbie Poster · Answer 11 · 2010-02-01T14:07:22+00:00

Could you explain me please what mean "$^I = '.bac';".

d5e5 109 Master Poster · Answer 12 · 2010-02-02T02:36:06+00:00

Could you explain me please what mean "$^I = '.bac';".

$^I = '.bac'; tells Perl do an in-place edit on the the file being processed by the <> construct. The <> construct opens whatever filehandles are named by the @ARGV array. The @ARGV array gets a list of files from the command line that calls Perl and your script, if you added any filenames to the command line after your script name. But if you didn't put filenames on the command line, you can put a statement in your script to put one or more filenames in a local copy of @ARGV. This allows you to process your files with the <> construct.

One advantage of using the <> construct is that you can use in-place editing on the file(s) processed by <>. In-place editing means that the file being read gets renamed with its original name plus the value you give to the $^I variable (in our case, '.bac'). A new, empty file with your original filename is created and any print statement within the block processing the <> construct will write a line to the file. This enables you to rewrite the file with whatever changes you wish to make. Instead of deleting a line, its easier to do an in-place edit, write the records you want to keep and don't write the records you don't want. See this example of in-place editing by Tek-tips.

vesnushka 0 Newbie Poster · Answer 13 · 2010-02-03T15:54:51+00:00

Thanks a lot for your help!!
so one more question...
I need to delete empty line before "Status" word. I'm using the following string but it do nothing... I do not understand why..

s/(?=Status)\n//m;

d5e5 109 Master Poster · Answer 14 · 2010-02-04T03:07:45+00:00

One reason your attempt to remove the blank line before the line containing 'Status' doesn't work is because the loop is reading one line at a time into the $_ variable. You want to look ahead within the contents of $_ but the next line has not yet been read into $_ so you can't see it at this time. And by the time you read the next record that contains 'Status' you have already read and rewritten the blank line to the file so it is too late to skip it. Since the in-place edit method reads a file one line at a time it can't skip lines based on what it hasn't read yet.

You can accomplish what you want in a different way by reading the entire file into one string variable, changing the contents of the string as you want, and then writing the output to a new file. Please try the following:

#!/usr/bin/perl
use strict;
use warnings;

#my $path_and_file = 'C:\Documents and Settings\Desktop\test.txt'; #vesnushka's file location
my $path_and_file = 'C:\Users\David\Programming\Perl\test.txt'; #d5e5's file location
my $path_and_file_out = substr($path_and_file, 0, -4) . '_edited.txt';
open (F, '<', $path_and_file);
undef $/; # $/ usually contains \n to indicate end of record. 
my $string = <F>; # There's no value in $/ so entire file is read as one record.
$string =~ s/^Exec.*\n//gm; #delete all lines starting with Exec (g means global, m means multiline mode)
$string =~ s/^\s*\n(?=Status)//gm; #delete all (g for global) empty lines preceding line starting with Status
close F;

open (FOUT, '>', $path_and_file_out);
print FOUT $string;
print "Finished processing $path_and_file.\n\nLook for output in $path_and_file_out\n\n";
close FOUT;

Note that it doesn't change the original test.txt file but creates a new output file called test_edited.txt. I did it this way so I could test without having to keep replacing the original file. (I make a lot of mistakes while testing.)

hahanottelling 0 Newbie Poster · Answer 15 · 2010-02-04T07:07:39+00:00

.docx files are actually zip files, not sure if you knew that...
Inside of the zip (docx) files are multiple xml files. If you just want to extract the text, it's fairly easy. You just open the main xml file and strip out/replace all the xml tags. I've done it in PHP, I can give you that code if you want it. (Sorry, I'm just beginning to learn Perl and am not sure of how to do it yet.)

I'm sure Perl has a library to handle zip files, or one is available on the net.

vesnushka 0 Newbie Poster · Answer 16 · 2010-02-04T23:10:06+00:00

Thank you for help so mutch!!
yes, it will be very interesting to see how PHP works... please send me a file... and tell me please what I environment should I install to use PHP?...

hahanottelling 0 Newbie Poster · Answer 17 · 2010-02-05T06:21:36+00:00

The easiest way that I know of to use PHP is XAMPP (from here: http://www.apachefriends.org/en/xampp.html). It is very easy to setup (pretty much does everything for you).

I can give you a link to the file tomorrow, I need to get it from school. However, you may want to note that a) it is just something I wrote quickly and some time ago, and b) the text loses all but the basic formatting. (You could parse more of the XML and retain more of the formatting if you wanted to)

That said, it does what I needed it to well: extract plaintext from .docx files.

hahanottelling 0 Newbie Poster · Answer 18 · 2010-02-10T10:52:27+00:00

Sorry to take so long to post the file. Here it is:
http://max-land.org/docx.zip

(The are a bunch of things in the file, but they are extracted from the word document. All you care about are the two PHP files.

vesnushka 0 Newbie Poster · Answer 19 · 2010-02-12T19:16:59+00:00

vesnushka 0 Newbie Poster

14 Years Ago

Thanks a lot everyone for a help !!!