Hi. I don't know anything about python but I need to use a script which I have found on the internet. I don't know where to start. I have installed python on my windows xp desktop and I have also downloaded and put in the Lib folder the file BeautifulSoup.py (http://www.crummy.com/software/BeautifulSoup/) since I know that the script needs it. The aim of the script is to generate an html file with a list of links based on the name of a linux package. Infact I need to use this script to download all the *.deb files of a given package plus all the .deb files which depend upon that a package from http://packages.ubuntu.com/. I guess that it's very easy to use this script but I don't know how! Help would be really appreciated.

"""ubuntu deb digger"""

from BeautifulSoup import BeautifulSoup
import urllib
import urlparse
import re

_depgif = '../../Pics/dep.gif'
_deps = {}

def get_debs(url, arch="i386", packages=None):
    """grab the deb defined by url from packages.ubuntu.com and all its dependencies"""
    if packages is None:
        packages = {}
    source = urllib.urlopen(url).read()
    soup = BeautifulSoup(source)
    downloadheader = soup('div', {'id': 'pdownload'})[0].h2
    name = downloadheader.string.replace('Download ','')
    if name in packages:
        return {}
    print name

    # update the packages dictionary with the download link for this package
    archlinks = [link for link in soup('a') if link.string in (arch, 'all')]
    archlink = urlparse.urljoin(url,archlinks[0]['href'])
    mirrorpage = urllib.urlopen(archlink).read()
    mirrorsoup = BeautifulSoup(mirrorpage)
    downloadlink = mirrorsoup.firstText('archive.ubuntu.com/ubuntu').parent['href']
    packages.update({name: downloadlink})

    # get dependencies
    deplinks = [dt.a for dt in soup('dt') if dt.img['src'] == _depgif]
    for link in deplinks:
        get_debs(urlparse.urljoin(url,link['href']), packages=packages)
        
    return packages

if __name__ == '__main__':
    import sys
    packages = get_debs(sys.argv[0])
    html = "\n".join(["<a href='%s'>%s</a><br/>" % (value,key) for key,value in packages.iteritems()])
    print 'writing packages.html'
    open('packages.html','w').write(html)

If i try to run it I get:

Traceback (most recent call last):
  File "C:\Documents and Settings\Andrea\Desktop\UbuntuPackageGrabber.py", line 40, in <module>
    packages = get_debs(sys.argv[0])
  File "C:\Documents and Settings\Andrea\Desktop\UbuntuPackageGrabber.py", line 15, in get_debs
    source = urllib.urlopen(url).read()
  File "C:\Python25\lib\urllib.py", line 82, in urlopen
    return opener.open(url)
  File "C:\Python25\lib\urllib.py", line 187, in open
    return self.open_unknown(fullurl, data)
  File "C:\Python25\lib\urllib.py", line 199, in open_unknown
    raise IOError, ('url error', 'unknown url type', type)
IOError: [Errno url error] unknown url type: 'c'

Hi balance,

How are you running it?

The error occurs because the "downloader" module, urllib, has a urlopen() function which doesn't recognize the URL being passed in. If you step backwards, you see that the error is in line 15 of UbuntuPackageGrabber.py, the main script. That line says:

source = urllib.urlopen(url).read()

where 'url' is an input parameter to the get_debs() function. If we then look for where the get_debs() function is called, we see that it's at line 40 of the same program, which is:

packages = get_debs(sys.argv[0])

So 'url' is really sys.argv[0]. In case you didn't know, sys.argv is the list of command-line tokens, and sys.argv[0] ought to be the first of those tokens, the program name (all other tokens start at 1 and continue in that fashion). So the script tells urllib to open a URL with the same name as the script? This seems like it should be a bug!

What is the input to this script supposed to be? A package name? Because it seems like it's expecting a URL. Is that URL supposed to be http://packages.ubuntu.com? Is it supposed to be the link for the package? I have no idea what's going on, and I can't find this script through Google.

Do you have a link to the script's documentation?

First of all thank you for your kind reply!
Unfortunately I don't have the documentation of the script!

You're right I should pass an url to the script. For example it should be possible to pass an url like this:

http://packages.ubuntu.com/gutsy/base/adduser

My problem is that I don't know how to pass it. Maybe sys.argv[0] is used to pass the url directly from the command line? But also if this was true, I wouldn't know how to do it!

Hoping that you can tell me how to pass the url to the script

Thanks in advance.

Hi balance,

You should open a command prompt, navigate to the directory where the script is, and invoke it by saying:

python UbuntuPackageGrabber.py URL_GOES_HERE

Substitute the package URL you have in mind. But I still think it will misfire, unless you change sys.argv[0] to sys.argv[1]. Try it both ways and see what happens.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.