Hi all. My new job involves writing scripts for people in other departments. I'm pretty much on my own with this and I'm still a beginner with Python(I think my brain is still in PHP mode and I'm still struggling with this object oriented approach).
Here is what I have:
Input file, call it bob.txt for arrangement's sake:
<tag>
....stuff1
....more stuff1
</tag>
<tag>
....stuff2
....more stuff2
</tag>
<tag>
....stuff3
....more stuff3
</tag>
My code:
#!/usr/local/bin/python2.6
import sys
if (len(sys.argv) < 4):
print "Usage: splitfilebytag option1 option2 option3 option4"
print "Run this application from the input file directory"
print "Option 1: input filename"
print "Option 2: output filename"
print "Option 3: output file extension"
print "Option 4: tag that indicates split. eg: \"</tag>\". Use inverted commas"
print "Option 5(optional): Start file number. eg: 172"
print "Example usage: splitfilebytag test.xml out txt \"</tag>\" 12"
exit()
readfile_= sys.argv[1]
outputfilename_ = sys.argv[2]
extension_ = sys.argv[3]
tag_ = sys.argv[4]
try:
if sys.argv[5]:
num_ = int(sys.argv[5])
except:
num_ = 0
def split_(readfile_, outputfilename_, extension_, tag_, num_):
thelist_=[]
with open(readfile_, 'r') as thefile_:
for line_ in thefile_:
if tag_ in line_:
thelist_.append(line_)
outfilename_ = '%s%03d.%s' % (outputfilename_, num_, extension_)
num_ += 1
outfile_ = open(outfilename_, 'w')
for item_ in thelist_:
outfile_.write(item_)
thelist_=[]
outfile_.close()
else:
thelist_.append(line_)
if __name__ == "__main__":
split_(readfile_, outputfilename_, extension_, tag_, num_)
So in this case if my "users" run this ./plitfilebytag.py bob.txt out txt \"</tag>\" 12
They will en up with 3 separate files that look like this:
out012.txt
<tag>
....stuff1
....more stuff1
</tag>
out013.txt
<tag>
....stuff2
....more stuff2
</tag>
out014.txt
<tag>
....stuff3
....more stuff3
</tag>
If there is anyone that could suggest a better approach or improvements to this I would appreciate it. I think this looks ok(to me at least) and the only thing I might change is to have the amount of leading zeroes as another option.
Anybody?