Hei,

list = [line.split() for line in open(file) if line is not None]

and

output:

[linux@localhost ~]$
[[], ['text'], ['text', 'text', 'text', 'text'], ['text', 'text', 'text', 'text']]

How to remove none types [] or something.

So should we guess how the input file looks like?

deb http://archive.canonical.com/ lucid partner
deb http://archive.canonical.com/ lucid partner

deb http://archive.canonical.com/ lucid partnerO

and code:

# -*- coding: utf-8 -*-
 
import os


files = ['/home/timo/' + file for file in os.listdir('/home/timo/') if not('.save' or 'c++' in file)] 

files.append('/home/timo/sources.list')

dublicate = []
for file in files:
  list = [line.split() for line in open(file) if line is not None]
  if list:
   list.sort()
   last = list[-1]
   for i in range(len(list) - 2, - 1, - 1):
     if last == list[i]:
       dublicate.append(list[i])
       print dublicate[0]
     else:
       last = list[i]
  print list

Are you trying to do something like this (your file I named sources.lst):

# -*- coding: utf-8 -*-
import os

mylist = set(' '.join(word for word in line.split())
             for line in open('sources.lst').readlines()
             if line.strip())

print '\n'.join(sorted(mylist))
commented: strip() -> what I needed, thanks! +0

Ok, small another problem.

list = os.listdir('/home/timo/')

files = ['/home/timo/' + file for file in list if not('.save' or 'c++' in file)] 

print files

Output:

[]

First point:
Do not use list and file as variable names

Second:
Your expression

'.save' or 'c++' in filename

is always True as '.save' is not False value and part after or is not considered. Therefore value is

>>> '.save' or False
'.save'
>>> not('.save')
False
>>>

You probably mean to do:

import os

filelist = os.listdir('d:/test')

files = [os.path.realpath(filename) for filename in filelist
         if not(any(part in filename for part in ('.save','c++')))]

print '\n'.join(files)
filelist = os.listdir('/home/timo/')

files = [os.path.realpath(filename) for filename in filelist
         if not(any(part in filename for part in ('.save', 'c++')))]

print '\n'.join(files)


dublicate = []
for file in files:
  list = [line for line in open(file).readlines() if line.strip() and os.path.isfile(file)]
  if list:
   last = list[-1]
   for i in range(len(list) - 2, - 1, - 1):
     if last == list[i]:
       dublicate.append(list[i])
       print '\n'.join(sorted(dublicate))
     else:
       last = list[i]
Traceback (most recent call last):
  File "Dup.py", line 23, in <module>
    list = [line for line in open(file).readlines() if line.strip() and os.path.isfile(file)]
IOError: [Errno 21] Is a directory: '/home/timo/.macromedia'

At this point it might be best for you to at least study the basics of the Python language.

What vegaseat means that you are making variation of exactly same mistake I explained and line.strip() is False only when the line is empty (except whitespace including '\n'). Do some interactive practice in command line.

I am familiar with C + +.
Python, unfortunately, is too confusing.

I am familiar with C + +.
Python, unfortunately, is too confusing.

That is first steps, after it gets easier. Ask vegaseat.

I analyzed the situation of your code, the situation is actually different as you have and condition. The problem is that you are opening file for line even if it is directory in part f

or line in open(file).readlines()

Let me show some magic of Python, this gets you rid of those dups (variation of my earlier code, cleaner), we do not check before we just pass over dictionaries:

import os
for fn in ('sources.lst','/'):
    try:
        nodups=set(line for line in open(fn) if line.strip())
    except IOError:
        pass
    else:
        print ''.join(nodups)

Ok, it works now!

[timo@localhost Python]$ python Dup.py
Duplicate: deb http://archive.canonical.com/ lucid partner in sources.list
Duplicate: tere tere tere tere in a.list

First file:

tere tere tere tere
tere tere tere tere

Second file:

deb http://archive.canonical.com/ lucid partner
deb http://archive.canonical.com/ lucid partner

deb http://archive.canonical.com/ lucid partnerO
# -*- coding: utf-8 -*-
import os
             
path = "/home/timo/Python/Proov/"  # insert the path to the directory of interest
dirList = os.listdir(path)
sources = []

dup = False
for filename in dirList:
    if os.path.isfile("/home/timo/Python/Proov/" + filename):
      for line in open("/home/timo/Python/Proov/" + filename):
	split = line.split()
	if split and not(split[0] == '#'): # quit if line is empty or a comment
	  for i in split[3:]:
	    src = split[0] + ' ' + split[1] + ' ' + split[2] + ' ' + i
	    if src in sources:
	      print 'Duplicate: ' + src + ' in ' + filename
	      dup = True
	    else:
	      sources.append(src)
	      #print sources
if not(dup):
  print 'No duplicates found'

Ok, but it is not maybe so good to use the split as variable name also. Not so bad as list or file as it is only used as method in string, but still can lead to less readable code and more confusing code. Say rename to splitedline?

Ok, but it is not maybe so good to use the split as variable name also. Not so bad as list or file as it is only used as method in string, but still can lead to less readable code and more confusing code. Say rename to splitedline?

My problem is that I'd like to use list comprenhension:P

It is wise to use it here?

I did in my post the removal, but to announce the removal takes little consideration as that only removes the dups silently. It also do not keep the original order of lines. My first solution for dups was forgiving for white space by doing the ' '.join(line.split()). You could use remove adjacent I posted to StackOverflow:

def remove_adjacent(nums):
     return [a for a,b in zip(nums, nums[1:]+[not nums[-1]]) if a != b]

You would first need unify the lines and to sort the lines to use this.

I am familiar with C + +.
Python, unfortunately, is too confusing.

The truth is that neither C++ nor Python lets you get away with gibberish code.

The truth is that neither C++ nor Python lets you get away with gibberish code.

Do you disparage?

Ok, C++ simple opportunity:

for (int i = 0; i < 100; i++) {}

And Python:

for i, a in enumerate(['a', 'b', 'c'])
for a in ['a', 'b', 'c']
knights = {'gallahad': 'the pure', 'robin': 'the brave'}
>>> for k, v in knights.iteritems():
>>> questions = ['name', 'quest', 'favorite color']
>>> answers = ['lancelot', 'the holy grail', 'blue']
>>> for q, a in zip(questions, answers):
...     print 'What is your {0}?  It is {1}.'.format(q, a)
xrange and range

They are many, and I do not know when something is good to use (dict, list, set, etc)


Sorry:icon_sad:

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.