Possible to remove part of a line?

Question

ryan461 0 Junior Poster

14 Years Ago

I'm new to perl, some experience in python. I've been tasked to create a script to remove comments in files. So basically id like to remove  from every line that has it in an assortment of files in a directory tree. Some comments may span multiple lines.

I've done some research and this one here seems to want to remove anything. My test file looks like:

some code <!-- this is a comment-->
more code

And after i run the script, the file is empty.

script:

#!/usr/bin/perl

use File::Find;
use strict;

my $directory = "/home/rmcleod/perltest";
find (\&process, $directory);

sub process
{
    my @outlines
    my $line;
   
    if ($File::Find::name=~/\.xsd$/) {
        open (FILE, $File::Find::name) or
        die "Cannot open file $!";

        print "\n". $File::Find::name. "\n";
        while ($line = <FILE>){
            foreach ($line =~ /<!--.*?(^>]*)-->/is) {
                push(@outlines, $line);
            }

        }

        close FILE;
        open(OUTFILE, ">$File::Find::name") or
        die "Cannot open file:$!";

        print (OUTFILE my @outlines);
        close (OUTFILE);

        undef (@outlines);
    }

}

I've played around with some stuff. The foreach was an if statement, and before that was just the $line =~ regex. Which the guide I used had. But my limited knowledge of perl has kinda stopped me from any more playin around.

Thanks

perl regex

3 Contributors
14 Replies
154 Views
2 Days Discussion Span
Latest Post 14 Years Ago Latest Post by ryan461

All 14 Replies

mitchems 12 Posting Whiz in Training

14 Years Ago

use strict;
use warnings;
open(FILE,"<com.txt");
open (OUT,">comout.txt");
while(<FILE>){
	chomp;
	s/<!--[^>]*-->//g;
	print OUT "$_\n";
}
close FILE;
close OUT;

mitchems 12 Posting Whiz in Training

14 Years Ago

To make sure that comments that span multiple lines are matched:

use strict;
use warnings;
undef($/);
open(FILE,"<com.txt");
open (OUT,">comout.txt");
my $file=<FILE>;
$file=~s/<!--[^>]*-->//g;
print OUT $file;
$/="\n";
close FILE;
close OUT;

mitchems 12 Posting Whiz in Training

14 Years Ago

You can't read and write to the same file at the same time. You will need to do this to get that to work. Open the file, read it, close it and then open it again for writing.

use strict;
use warnings;
undef($/);
open(FILE,"<com.txt");
my $file=<FILE>;
close FILE;
open (OUT,">com.txt");
$file=~s/<!--[^>]*-->//g;
print OUT $file;
$/="\n";
close OUT;

d5e5 109 Master Poster

14 Years Ago

Try the following. I made a couple of changes to your script, indicated by comments. I changed the regex slightly because $f=~s///g; will not remove comments if the character '>' occurs anywhere between the comment tags. It's better to use a dot that represents all characters.

#!/usr/bin/perl
use strict;
use warnings;
use File::Find;

#undef($/); Better to take care of $/ in subr as local variable

my $directory = "/home/user/perltest";
find (\&subr, $directory);

sub subr
{
    foreach ($File::Find::name=~/.*\.xsd$/) {
        open (FILE, "<", $File::Find::name);
        local $/;
        my $f=<FILE>;
        print $f;
        $f=~s/<!--.*-->//gs; #s option means dot (.) includes newline character
        close FILE;
        open (OUT, ">", $File::Find::name);
        print OUT $f;
        close OUT;
    }
}

Edited 14 Years Ago by d5e5 because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

ryan461 0 Junior Poster · Answer 1 · 2010-09-08T23:20:40+00:00

use strict;
use warnings;
open(FILE,"<com.txt");
open (OUT,">comout.txt");
while(<FILE>){
	chomp;
	s/<!--[^>]*-->//g;
	print OUT "$_\n";
}
close FILE;
close OUT;

I tried this, and only thing I changes was to have the OUT as the same file as the IN. And this causes all the data to be removed. Is there a way to get it to put the necessary data back onto the same file? or have it only remove the unnecessary data? Only other way i can think of is renaming the file back to its original name, an os mv.

ryan461 0 Junior Poster · Answer 2 · 2010-09-09T00:24:38+00:00

Ah that works well, and avoids the hassle of having to mv the file. Thanks

ryan461 0 Junior Poster · Answer 3 · 2010-09-09T21:15:25+00:00

The main issue I had has been solved, with the removal of part of a line. However the script you've written I havent been able to adapt it successfully to run on multiple files with the same extension.

Here's what I have:

use strict;
use warnings;
use File::Find;

undef($/);
my $directory = "/home/user/perltest";
find (\&process, $directory);
sub process
{
    if ($File::Find::name=~/\.xsd$/) {
        open (FILE, $File::Find::name);
        my $file = <FILE>;
        
        print "\n". $File::Find::name. "\n";
        $file = ~s/<!--[^>]*-->//g;
        close FILE;
        open (OUT, ">", $File::Find::name);
        print OUT $file;
        $/="\n";
        close OUT;
    }
}

What happens here is it writes a series of numbers to the file and deletes everything else. It basically looks like:

2304985201274924

I'd have to imagine I need a change either with "sub process" or how the Find is used maybe. The files themselves are properly found, as displayed by the first print statement.

I have a file in each sub dir in perltest:

1/test1.xsd 2/test1.xsd 3/test1.xsd

ryan461 0 Junior Poster · Answer 4 · 2010-09-10T00:46:22+00:00

Here's some updated changes:

#!/usr/bin/perl -w
use strict;
use warnings;
use File::Find;

undef($/);
my $directory = "/home/user/perltest";
find (\&subr, $directory);

sub subr
{
    if ($File::Find::name=~/.*\.xsd$/) {
        open (FILE, "<", $File::Find::name);
        my $f=<FILE>;
        print $f;
        $f=~s/<!--[^>]*-->//g;
        close FILE;
        open (OUT, ">", $File::Find::name);
        print OUT $f;
        $/="";
        close OUT;
    }
}

Well this works, but what does happen is its saying theres an use of uninitialized value $f on lines 15, 16 and 19. looks initialized to me. They are just warnings, so not a huge deal

ryan461 0 Junior Poster · Answer 5 · 2010-09-10T01:59:59+00:00

ah thanks, no warnings there.

good call on the regex, i did initially have it like you suggested, but changed it somewhere along the way when i was having problems.

ryan461 0 Junior Poster · Answer 6 · 2010-09-10T19:29:12+00:00

AH the regex needs the [^>] otherwise it fails to span multiple lines

ryan461 0 Junior Poster · Answer 7 · 2010-09-10T20:22:14+00:00

Hmm its having trouble spanning multiple lines. Which doesnt make sense, it works in my rx toolkit. The way i had before spanned multiple lines[^>], but there are a couple comments with a > in them.

ryan461 0 Junior Poster · Answer 8 · 2010-09-10T20:57:18+00:00

Sorry for so many posts, but:

$f=~s/<!--[\w\W]*-->//gi;

works

d5e5 109 Master Poster · Answer 9 · 2010-09-10T21:17:07+00:00

Sorry for so many posts, but:
$f=~s///gi;
works

#!/usr/bin/perl
#remove_comments.pl
use strict;
use warnings;

#Put sample xsd file contents into a string for purpose of testing
my $f = <<END;
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<!-- definition of simple elements -->
<xs:element name="orderperson" type="xs:string"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<!--Here is a multi-line
comment to remove
for test purposes -->
<xs:element name="country" type="xs:string"/>
<xs:element name="title" type="xs:string"/>
<xs:element name="note" type="xs:string"/>
<xs:element name="quantity" type="xs:positiveInteger"/>
<xs:element name="price" type="xs:decimal"/>

<!-- definition of attributes -->
<xs:attribute name="orderid" type="xs:string"/>

<!-- definition of complex elements -->
<xs:element name="shipto">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="address"/>
      <xs:element ref="city"/>
      <xs:element ref="country"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="item">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="title"/>
      <xs:element ref="note" minOccurs="0"/>
      <xs:element ref="quantity"/>
      <xs:element ref="price"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

END

$f=~s/<!--[\w\W]*-->//gi;#regex has greedy quantifier *
print $f;

Running the above gives the following output:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">


<xs:element name="shipto">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="address"/>
      <xs:element ref="city"/>
      <xs:element ref="country"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="item">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="title"/>
      <xs:element ref="note" minOccurs="0"/>
      <xs:element ref="quantity"/>
      <xs:element ref="price"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

Looks like it removed way too much code along with the comments! The regex I suggested had the same flaw, which I realised only after further testing today. Let's try adding a ? after the * to make the quantifier lazy so it won't remove so much. Change your regex to this: $f=~s///gi;#regex has lazy quantifier *? After making this change, running the test program gives the following output:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">


<xs:element name="orderperson" type="xs:string"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>

<xs:element name="country" type="xs:string"/>
<xs:element name="title" type="xs:string"/>
<xs:element name="note" type="xs:string"/>
<xs:element name="quantity" type="xs:positiveInteger"/>
<xs:element name="price" type="xs:decimal"/>


<xs:attribute name="orderid" type="xs:string"/>


<xs:element name="shipto">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="address"/>
      <xs:element ref="city"/>
      <xs:element ref="country"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="item">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="title"/>
      <xs:element ref="note" minOccurs="0"/>
      <xs:element ref="quantity"/>
      <xs:element ref="price"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

The result looks better to me but before trying this on your directory tree I hope you have a good backup.:)

ryan461 0 Junior Poster · Answer 10 · 2010-09-10T21:25:06+00:00

haha THANKS. This is the issue Im currently running into. It was removing everything up till the last -->. Ill try this and let you know.

Yep all appears to be well with a few files I looked at. Thanks for the help, much appreciated :)

Possible to remove part of a line?

Recommended Answers Collapse Answers

All 14 Replies

Recommended Answers