I have a file containing data in form :
R1 1987 or 1789 and 8585 (7654)
R2 7698 or 8656 or 74746
Now I want my file in the form
R1 1987
R1 1789
R1 8585
R1 7654
R2 7698
R2 8656
R2 74746

reegex to remove the "and" and "or" and "()"
split
print

Hi preeti2,
You can do the following:

Since split function uses regex, you can split on or OR and OR ( OR ) and space, then print out your result as you desire like thus:

use warnings;
use strict;

my $reg = qr/ or | and |\s+\(|\)\s+|\s+/;

while (<DATA>) {
    my @val = split /$reg/, $_;
    print join( ' ' => $val[0], $_ ), $/ for @val[ 1 .. $#val ];
}

__DATA__
R1 1987 or 1789 and 8585 (7654)
R2 7698 or 8656 or 74746

Which will produce the following:

R1 1987
R1 1789
R1 8585
R1 7654
R2 7698
R2 8656
R2 74746

Please, find the full explaination of the regex used in the code above:

The regular expression:

(?-imsx: or | and |\s+\(|\)\s+|\s+)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
   or                      ' or '
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
   and                     ' and '
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  \(                       '('
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  \)                       ')'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.