Hello,

The situation here is I have a file and each line has a 8 digit number in it (aorkey) the file is called target.lst In another file (file.lst) there are tons of paths in each path there is an aorkey (somewhere its not always in the same place and it is there more than once sometimes) I am trying to take the paths from file.lst that contain an aorkey found in target.lst and write them to a new file (called test.lst) The files look like:

target.lst

name 1 some folder 12345678
name 2 blah blah 87654321

and file.lst:
/home/machine/blah/00000001/blah_00000001/blah1.fits
/home/machine/blah/00000001/blah_00000001/blah2.fits
/home/machine/blah/00000002/blah_00000002/blah1.fits
/home/machine/blah/00000002/blah_00000002/blah2.fits
/home/machine/blah/12345678/blah_12345678/blah1.fits
/home/machine/blah/12345678/blah_12345678/blah2.fits
/home/machine/blah/00000003/blah_00000003/blah1.fits
/home/machine/blah/00000003/blah_00000003/blah2.fits
/home/machine/blah/87654321/blah_87654321/blah1.fits
/home/machine/blah/87654321/blah_87654321/blah2.fits
/home/machine/blah/00000004/blah_00000003/blah1.fits
/home/machine/blah/00000004/blah_00000003/blah2.fits
here is my code:

#!/usr/bin/python
#Filename: targetlistredux.py


#Copies entires for objects found in target.lst from file.lst
#to a new .lst file


import pylab	#import nessacary packages
import matplotlib
import os
import sys

#define paramters (tempory)
targetname = 'target.lst'
orgFile = 'file.lst'
redFile = 'test.lst'


#initalize the list of desired objects
#identified via aorkeys
aorkeys = []
#extract desired aorkeys from target.lst
f = open(targetname,'r')
for line in f.readlines():
   for i in line.split():	
      if i.isdigit() & (len(i) == 8):
         aorkeys.append(i)
f.close()
#See if the output file already exists
if os.path.isfile(redFile):
   flag = True
   print
   print redFile + " already exists and will be overwritten"
   while flag:
      ans = raw_input("Do you wish to continue? (y/n):")
      if ans == 'y':
         flag = False
      elif ans == 'n':
         sys.exit()
      else:
         print
         print "invalid response..."

#ready the output and data source files
h = open(redFile,'w')
g = open(orgFile,'r')

#Search each entry in file.lst, if an entry is for a desired object
#add it to the output file

for j in range(0,len(aorkeys)): 
   temp = aorkeys[j]
   print 
   for entry in g.readlines():
      mark = True
      print j
      for k in range(0,(len(entry)-7)):
         if (temp[0] == entry[k])&(mark):
            valid = True
            l = 1
            while valid & (l<8):
               q = k + l
               if temp[l] == entry[q]:
                  pass
               else:
                  valid = False
               l = l + 1
            #Write the desired entries into redFile
            if valid & mark:
               post = entry + '\n'
               h.write(post)
               mark = False
               #print post
#close file handles
h.close()
g.close()

The aorkeys extract correctly, but only the first aorkey's paths get copied to the new file, im so lost! please help! We are running python 2.5 on debian linux

I have some random print outs i forgot to delete while i was debigging (well trying to) please disregard those!

You must close file before rereading it, but tour programm is not using the language properly.

Use re module for finding aorkeys with regular expression linke this : r'.*(\d{8}).*'

you read each line of the other file once and extract aorkey with another re.
then you test
if aorkey in aorkeys:
....
Better you can use a set instead of a list for aorkeys

This way you scan the file only one time.

The aorkeys extract correctly,

Assuming that this means that the list aorkeys contains all of the numbers that you want to find, and that it is not a huge list, it is easiest to look for each key in the list.

h = open(redFile,'w')
g = open(orgFile,'r')

#Search each entry in file.lst, if an entry is for a desired object
#add it to the output file

for rec in g:
   for key in aorkeys: 
      if key in rec:
         h.write("%s\n" % (key))
g.close()
h.close()

This is simplier, but potentially more time consuming than finding the number on each individual line and looking it up.

that did it, thank you!!!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.