Delete repeating elements in a file.

Question

knan 0 Light Poster

14 Years Ago

I have a file named test.txt
I get the file

file=open("test.txt","r")
obj=file.read()
file.close()
print obj
a=a

b=b

c=c



d=e
e=d

e=f
f=e

f=g
g=h

All I want to do with this obj is that, I've to create a regular expression such that,

1.If the left number matches the right number, it should become a single number.ie., a=a should become a.

2.Then d=e & e=d means the same. In this case any one of them must be removed. So as for e=f & f=e.

3. Notice the newlines. Some have \n ,some have \n\n and some have \n\n\n . Make everything into a singe \n for each.

The output should be

a
b
c
d=e
e=f
f=g
g=h

Someone please help me coding the regular expression. I've tried to find one for ages, but I could'nt... Help me please.

python regex

Edited 14 Years Ago by knan because: n/a

4 Contributors
4 Replies
203 Views
13 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by TrustyTony

Gribouillis 1,391 Programming Explorer

14 Years Ago

Thank you very much. That was very helpful. How am I going to achieve the 1st and 2nd conditions. I am still trying, but i couldnt figure out a regular expression...

You can write pseudo code to build the regular expression. You want to match this

pattern:
    either:
        symbol1 equal symbol2
        newline
        symbol2 equal symbol1
    or:
        symbol3 equal symbol3
    or:
        symbol4 equal symbol5
    newlines (0 or more)

Each of these elements has an equivalent regex pattern:

symbol1 ->  (?P<symbol1>[a-z])
symbol2 ->  (?P<symbol2>[a-z])
repeated symbol1  -> (?P=symbol1)
repeated symbol2  -> (?P=symbol2)
equal -> [=]
newline -> \n
zero or more -> *

This should give you hints to build the regular expression.

Edited 14 Years Ago by Gribouillis because: n/a

knan commented: Thank you very much!! I think i am nearing the answer. +0

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

griswolf 304 Veteran Poster · Answer 1 · 2010-10-26T14:38:05+00:00

lets start easy:

lines = []
with open('test.txt', 'r') as f:
  for x in f:
    if x.strip() # lose empty lines
      lines.append(x.strip())
for line in lines:
  print(line)

This just eliminates the blank lines, then prints out the remainder. Of course you will want to do some more work. You will probably want to do something like lhs,rhs = line.split('=') at some point.

knan 0 Light Poster · Answer 2 · 2010-10-26T17:16:33+00:00

Thank you very much. That was very helpful. How am I going to achieve the 1st and 2nd conditions. I am still trying, but i couldnt figure out a regular expression...

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 3 · 2010-10-26T20:10:40+00:00

lets start easy:
lines = []
with open('test.txt', 'r') as f:
  for x in f:
    if x.strip() # lose empty lines
      lines.append(x.strip())
for line in lines:
  print(line)
This just eliminates the blank lines, then prints out the remainder. Of course you will want to do some more work. You will probably want to do something like lhs,rhs = line.split('=') at some point.

Good advices. Also good of not giving ready solution, as OP must solve the problem by RE.

So I am free to post two of my non-RE solutions:

import itertools as it
with open("test.txt","r") as datasource:
    c,d = '',''
    for ab in datasource:
        if '=' in ab:
            a,b =  ab.rstrip().split('=')
            if a == b:
                print a
            else:
                if (a,b) != (d,c):
                    print '='.join((a,b))
                    c, d = a, b
 
print 60 * '-'
with open("test.txt","r") as source:
    datasource = (sorted(d.rstrip().split('='))
                  for d in source if '=' in d)
    print '\n'.join(sorted(set(a if a==b else a+'='+b for a,b in datasource)))