I am trying to do a search for sub-folders in a directory that conaint numbers. The format could be a date range for example: 1-1, 01-1, 1-01, 01-01. The first number will only go up to 12 and the second one will go as high as 31 and im trying to figure out how to read the date of files that are in there then once it finds the correctly formated file name it kicks off the code to go into that file and do what else my code is set to do . If there is a simple way to do this please let me knwo cause this has me running in circles and if my code is needed i will post it.
abaddon2031 0 Junior Poster in Training
Edited by abaddon2031 because: update on project
vegaseat 1,735 DaniWeb's Hypocrite Team Colleague
Maybe this will help ...
s = "test01-15.dat"
q = s.split('.')
print(q)
print(q[0])
numeric = "".join(n for n in s.split('.')[0] if n in '0123456789-')
print(numeric)
print(numeric.split('-'))
''' result ...
['test01-15', 'dat']
test01-15
01-15
['01', '15']
'''
Gribouillis 1,391 Programming Explorer Team Colleague
Another way, returning a list of integers
>>> import re
>>> def getints(string):
... return [int(x) for x in re.findall(r'\d+', string)]
...
>>> getints("foo23bar01-14qux2")
[23, 1, 14, 2]
abaddon2031 0 Junior Poster in Training
it does for the files but i jsut got told they arent files but subfolders that have the numbers in the names which makes this so much more confussing for me.
snippsat 661 Master Poster
it does for the files but i jsut got told they arent files but subfolders that have the numbers in the names which makes this so much more confussing for me.
Use os.walk()
it recursive scan all folder and subfolders.
Example.
import os
import re
search_pattern = r'\d'
target_path = os.path.abspath(".") #current folder
for path, dirs, files in os.walk(target_path):
for folder_name in dirs:
if re.findall(search_pattern, folder_name):
print folder_name # Folder with numbers
print(os.path.join(path, folder_name)) # Full path
Edited by snippsat
abaddon2031 0 Junior Poster in Training
snippsat that works great. One last question is how do i get it to retunr the largest of the numbers cuase i ran it on the files i have which are 08-1, 8-02, 8-3, and 08-04 it returns them all and i jsut need it returnign the largest of the number sets cause at times the subfolders wont get deleted till the end of the month so it can have as many as 31 subfolders there and we jsut need the one for a specified date set.
Gribouillis 1,391 Programming Explorer Team Colleague
In python, the largest of a sequence can be obtained with the max()
function with a key argument to compute the score of an item.
>>> import re
>>> def getints(string):
... return tuple(int(x) for x in re.findall(r'\d+', string))
...
>>> L = ['test02-05','test01-15','test03-2','test02-17',]
>>> max(L, key = getints)
'test03-2'
abaddon2031 0 Junior Poster in Training
I just tried that and had it print out and it doesnt match like i want it to there are 3 subfolders that it just returns the first digit which is the month when i want it to return the day and i only need it to return the largest date where when i printed it returned that largest date off of all the folders that are there.
Gribouillis 1,391 Programming Explorer Team Colleague
Instead of getints()
, define your own score function
def score(dirname):
"""return a value extracted from the dirname"""
value = ??? # your code here
return value
thedir = max(L, key=score)
Edited by Gribouillis
abaddon2031 0 Junior Poster in Training
that did the same thing on several of them it returned the month and not the day and i need it to check the day to see if it is formated like the date profided by a argument or if it matches in any way to that date. i will post my code to see if that helps cause this is got me realyl confussed right now.
import datetime,glob,os,csv,fnmatch,StringIO,smtplib,argparse,math,re, sys
parser = argparse.ArgumentParser(description='Search art folders.')
parser.add_argument('-b', help='The base path', required=False, dest='basePath', metavar='Base directory path',default='/home/hatterx/Desktop/beds/')
parser.add_argument('-o', help='File Output Location', required=False, dest ='fileOutput', metavar='File Output', default='/home/hatterx/Desktop/bedsused')
args = parser.parse_args()
parser.add_argument('-d', help='Subfolder Date', required=False, dest ='fileDate', metavar='Subfolder Date', default=datetime.datetime.now().strftime("%m-%d"))
parser.add_argument('--AQT', help='AQT SQFT Factor', required=False, dest ='AQTFactor', metavar='AQT Factor', default=64)
parser.add_argument('--INI', help='INI SQFT Factor', required=False, dest ='INIFactor', metavar='INI Factor', default=50)
parser.add_argument('--N/A', help='Not Avalable SQFT Factor', required=False, dest ='naFactor', metavar='N/A Factor', default=50)
args = parser.parse_args()
filestart=args.basePath
outputCount= args.fileOutput
DT = datetime.datetime.now().strftime("%Y_%m_%d")
dt = datetime.datetime.now().strftime("%Y/%m/%d %I:%M:%S%p")
fileDate = datetime.datetime.now().strftime("%m-%d")
def fileBreak(pathname):
filecount = {}
bedcount = {}
halfbedcount = {}
sqftFactor = {"AQT":args.AQTFactor, "INI":args.INIFactor, "n/a":args.naFactor}
total = {"files":0, "beds":0, "half beds":0, "full bed sqft":0, "half bed sqft":0}
for filename in os.listdir(pathname):
fileNameWithNoExtension = re.split('\.', filename)[0]
printerTypesearch = re.search('[-_]p', fileNameWithNoExtension, flags=re.I)
if printerTypesearch == None:
print filename + ' is not formated with _P correctly.'
continue
printerRemoval = re.split('[-_]p', fileNameWithNoExtension, flags=re.I)[1]
bedInfosearch = re.search('[-_]b', printerRemoval, flags=re.I)
if bedInfosearch == None:
print filename + ' is not formated with _B correctly.'
continue
printerType = re.split('[-_]b', printerRemoval, flags=re.I)[0]
if printerType not in sqftFactor:
sqftFactor[printerType]=sqftFactor["n/a"]
bedInfo = re.split('[-_]b', printerRemoval, flags=re.I)[1]
halfBedsearch = re.search('h', bedInfo, flags=re.I)
if halfBedsearch:
bedNumber = re.split('h', bedInfo, flags=re.I)[0]
else:
bedNumber = bedInfo
if bedNumber == '':
bedNumber = '1'
if printerType not in filecount:
filecount[printerType] = 0
if printerType not in bedcount:
bedcount[printerType] = 0
if printerType not in halfbedcount:
halfbedcount[printerType] = 0
filecount[printerType] = filecount[printerType]+1
total['files'] = total['files'] + 1
if halfBedsearch:
halfbedcount[printerType] = halfbedcount[printerType] + int(bedNumber)
total['half beds'] = total['half beds'] + int(bedNumber)
total['half bed sqft'] = total['half bed sqft'] + int(bedNumber)* sqftFactor[printerType]*.5
else:
bedcount[printerType] = bedcount[printerType] + int(bedNumber)
total['beds'] = total['beds'] + int(bedNumber)
total['full bed sqft'] = total['full bed sqft'] + int(bedNumber)* sqftFactor[printerType]
with open(args.fileOutput+'/Filecount.csv','wb') as f:
data=['Printer Type', 'File Count']
writer = csv.writer(f)
writer.writerow(data)
for type in filecount:
data = [type,str(filecount[type])]
writer = csv.writer(f)
writer.writerow(data)
with open(args.fileOutput+'/Bedcount.csv','wb') as f:
data=['Printer Type','Total Beds','Half Beds','Full Beds']
writer = csv.writer(f)
writer.writerow(data)
for type in filecount:
data =[type,str(bedcount[type]+halfbedcount[type]*0.5),str(halfbedcount[type]),str(bedcount[type])]
writer = csv.writer(f)
writer.writerow(data)
with open(args.fileOutput+'/SQFTcount.csv','wb') as f:
data=['Printer Type','Total SQFT','Half Bed SQFT','Full Bed SQFT']
writer = csv.writer(f)
writer.writerow(data)
for type in filecount:
data =[type,str(sqftFactor[type] * bedcount[type]+(sqftFactor[type]*halfbedcount[type]*.5)),str(sqftFactor[type]*halfbedcount[type]*.5),str(sqftFactor[type] * bedcount[type])]
writer = csv.writer(f)
writer.writerow(data)
with open(args.fileOutput+'/FullInfo.csv','wb') as f:
data=['Date','Printer Type','Total Beds','Total SQFT']
writer = csv.writer(f)
writer.writerow(data)
for type in filecount:
data = [dt,type,str(filecount[type]),str(bedcount[type] + halfbedcount[type]*0.5),str(sqftFactor[type] * bedcount[type]+(sqftFactor[type]*halfbedcount[type]*.5))]
writer = csv.writer(f)
writer.writerow(data)
with open(args.fileOutput+'/TotalInfo.csv','wb') as f:
data=['Total File Count','Total Beds','Total Full Beds','Total Half Beds','Total SQFT', 'Total Full Bed SQFT', 'Total Half Bed SQFT']
writer = csv.writer(f)
writer.writerow(data)
writer = csv.writer(f)
writer.writerow([total['files'], total['beds']+total['half beds']*.5,total['beds'], total['half beds'],total['half bed sqft']+total['full bed sqft'],total['full bed sqft'],total['half bed sqft']])
print args.fileDate
search_pattern = r'\d'
target_path = os.path.abspath(filestart)
for path, dirs, files in os.walk(target_path):
for folder_name in dirs:
if folder_name == args.fileDate:
fileBreak(filestart+args.fileDate)
print args.fileDate + ' was the correct format'
sys.exit()
else:
if re.findall(search_pattern, folder_name):
fileBreak(filestart+folder_name)
print folder_name + ' was the found format'
else:
print 'Proper Folder Format Not Found
Gribouillis 1,391 Programming Explorer Team Colleague
how do i get it to retunr the largest of the numbers cuase i ran it on the files i have which are 08-1, 8-02, 8-3, and 08-04 it returns them all and i jsut need it returnign the largest of the number sets
I don't see anywhere in your code where you are looking for the largest of the number sets, whatever that means. There is no call to max()
. You must describe the issue more precisely.
Edited by Gribouillis
abaddon2031 0 Junior Poster in Training
i tried the max thing and it didnt work so i reverted back to my way of searching. Which returns all the files in teh subfolder but what im wanting to to do is look for the dat eprovided by args.fileDate and if thats not there to compare the files that it finds to that to see if there is a close match.
snippsat 661 Master Poster
That 100 line long fileBreak function is really not good at all.
You should split it up,functions should be around 10-15 lines.
Do not try to do to much in a single function.
Do a couple of task and have a clear return value.
This make code much eaiser to read and test.
Edited by snippsat
abaddon2031 0 Junior Poster in Training
Ok i will keep that in mind i jsu treally could use some help with the subfolder thing cause thats my last hurdle and this code will be finished and ready to be deployed
Gribouillis 1,391 Programming Explorer Team Colleague
Perhaps you could describe a small hierarchy of folders, then tell us which one is the good one and why, assuming that you are searching a single folder in the hierarchy.
Edited by Gribouillis
abaddon2031 0 Junior Poster in Training
right now it goes base directoy, then folder containing the beds, then subfolders of days of the month that contain the print information. what im wanting to do is search the subfolders for the current date no matter how its formated or a date that has been input through the argument so that it finds the correct subfolder and cna then do its magic with breaking the print files names up and writing out the information.
Gribouillis 1,391 Programming Explorer Team Colleague
This does not tell us which one is the correct subfolder and why. We don't have the names of the folders.
abaddon2031 0 Junior Poster in Training
the sub folders could be named in any four of these formats: 08-01, 8-01, 08-1, 8-1. It could be any of those but jsut different days of the month. So the correct would would be either the one that is 08-01 or which ever one of the days matches the current day the best. OS for example today is formated as 8-6 where the args.fileDate says todays date is 08-06.
Gribouillis 1,391 Programming Explorer Team Colleague
Here is a progran to create an example hierarchy of folders and search a correct folder. The idea is to extract a pair of numbers from the names and compare this pair to a target pair
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import datetime
import os
import os.path
import re
def random_folder_names(cnt):
"""this function was used to create random folder names"""
import random
def year():
day = datetime.timedelta(days = 1)
start = datetime.date(2012, 1, 1)
for i in range(366):
yield start
start += day
year = list(year())
def names():
for day in year:
m, d = day.month, day.day
for fmt in ("{}-{}","{:0>2s}-{}","{}-{:0>2s}","{:0>2s}-{:0>2s}"):
yield fmt.format(str(m), str(d))
names = sorted(set(names()))
return random.sample(names, cnt)
def create_example_hierarchy():
"""Create a random hierarchy of directories for testing purposes"""
import shutil
base = 'example_base'
beds = 'beds'
other = 'other'
folder_names = [
'6-5', '6-14', '5-03', '7-8', '5-21',
'09-02', '03-27', '08-14', '06-30', '4-20',
'06-13', '07-30', '11-07', '12-01', '10-29',
'10-03', '12-5', '3-04', '7-26', '10-14',
'01-14', '3-28', '5-09', '10-21', '6-18'
]
try:
shutil.rmtree(base)
except OSError:
pass
os.makedirs(os.path.join(base, beds))
os.makedirs(os.path.join(base, other))
for name in folder_names:
os.mkdir(os.path.join(base, beds, name))
def dir_sequence():
"""returns the sequence of subdirs"""
return next(os.walk('example_base/beds'))[1]
def extract_md(dirname):
"""extracts month and day as a tuple of 2 integers"""
t = tuple(int(x) for x in re.findall(r'\d+', dirname))
assert len(t) == 2
return t
if __name__ == '__main__':
create_example_hierarchy()
print "directories:", dir_sequence()
file_date = '3-4'
pair = extract_md(file_date)
correct_dir = [d for d in dir_sequence() if extract_md(d) == pair][0]
print 'correct_dir:', correct_dir
The output is
directories: ['10-29', '5-03', '12-5', '7-8', '6-14', '5-09', '3-28', '10-03', '06-30', '5-21', '10-14', '09-02', '12-01', '7-26', '07-30', '11-07', '3-04', '10-21', '06-13', '01-14', '4-20', '6-18', '6-5', '03-27', '08-14']
correct_dir: 3-04
Edited by Gribouillis
abaddon2031 0 Junior Poster in Training
Thank you for all the help i actually figured out something simpler.
import datetime
parser.add_argument('-d', help='Subfolder Time', required=False, dest ='fileTime', metavar='Subfolder Time', default=datetime.datetime.now().strftime("%Y-%m-%d"))
args = parser.parse_args()
fileDate = datetime.datetime.strptime(args.fileTime, "%Y-%m-%d")
day = int(fileDate.strftime('%d'))
month = int(fileDate.strftime('%m'))
for dirn in ['{:02d}-{:02d}'.format(day,month), '{:d}-{:02d}'.format(day,month), '{:d}-{:02d}'.format(day,month), '{:d}-{:d}'.format(day,month)]:
print dirn + ' exists: ' + str(os.path.exists(dirn))
Gribouillis 1,391 Programming Explorer Team Colleague
Simplicity is in the eye of the beholder ;)
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.