Reverend Jim 4,966 Hi, I'm Jim, one of DaniWeb's moderators. Moderator Featured Poster

As I continue my conversion from vbScript to Python I am finding the gotchas. For example...

A lot of my utility scripts take a file name or a file pattern as a parameter. My script, bitrate.vbs, for example allows me to invoke it as

bitrate file
bitrate pattern

Technically file and pattern are identical since specifying a file is really just specifying a pattern that matches up to one file. To get file listings in vbScript I would just spawn a dos dir command and grab the output from stdout. In Python, however, they provide what I thought was a great module named glob. So a lot of my code looked like

for file in glob.glob(pattern):

I had rewritten five scripts before I got bitten. Let's take a folder that has videos named like

Piano Recital [1994].mp4
Awards Day [1994].mp4
Betula Lake [1983].mp4

glob.glob("*.mp4") gives the expected result

['Awards Day [1994].mp4', 'Betula Lake [1983].mp4', 'Piano Recital [1994].mp4']

so if I were to run bitrate *.mp4 I would process the expected set of files. But if I want the bitrate of only one file and type bitrate "Awards Day [1994].mp4" I get nothing because the result of `glob.glob("Awards Day [1994].mp4") is

[]

And there is a (semi) valid reason for this. glob allows you to use character classes (a subset of regular expressions) in file patterns and [ and ] are used to delimit character classes. Fortunately, glob also provides an escape method in case your file name contains these delimiters. Lets see how this works.

 >>> glob.escape("Awards Day [1994].mp4")
'Awards Day [[]1994].mp4'

so using glob.glob(glob.escape(filename)) instead of glob.glob(filename) handles the case where filename contains [ or ]. But look what happens with actual dos wildcard patterns.

>>> glob.escape("*.mp4")
'[*].mp4'

>>> glob.glob(glob.escape("*.mp4"))
[]

Regular (as in Windows) wildcards are also escaped. There are other Python filename listers like fnmatch, but these follow the same convention. So I've gone back to

import subprocess
def dosfiles(pattern, recurse=False):
    cmd = "dir /b " + ("/s " if recurse else "") + ' "' + pattern + '"'
    return subprocess.check_output(cmd, shell=True).decode().splitlines()

While I am on the subject of gotchas, here's one that pisses me off. When you create a string you can include special characters like newline (\n). I love being able to do this. vbScript was butt ugly, relying on constructs like "line 1" & vbCrLf & "line 2". But this can be awkward when coding up file names like 'D:\temp\log.txt' because \t is a tab character. In Python you have to do 'D:\\temp\\log.txt'. But the language provides an alternative. You can specify a string as a raw string in which the backslash is to be taken literally. The above filename can then be coded as r'D:\temp\log.txt.

But here's the gotcha. In a raw string, a backslash is to be treated just like any other character - nothing special here - UNLESS it is the last character. So don't use r'D:\temp\'. Instead you have to use r'D:\temp\\'. I can't imagine a scenario in which this makes sense.