Hi, i am basically from QA. So what we do each morning is open a web browser with the address http://11.12.13.27:8080/cruisecontrol. Then we click on the build with last nights date. Then we click on a folder say client. Then in that folder we click on an exe file say game.exe. Then we get a save dialog box to save the exe to our local machine (windows xp..not a home pc). Then we goto the place where we have downloaded that exe file, double click on it to start the installation and then start testing the application. This has to be done each day.

We want to automate this process such that when i run a script or smthing like that, it will automatically download and install the exe file.

I've tried doing this using php but i dont have to have wamp server installed on all the team members machines.

Can this be done using python? Please help.

Well if the file is always found at the same address, it shouldn't be too hard to automate the task, a script could look like

from urllib2 import urlopen
import subprocess as sp

DOWNLOAD_URL = "http://11.12.13.27:8080/.....game.exe"
DOWNLOAD_DST = "C:/....game.exe"

def download(url, dst_file):
    content = urlopen(url).read()
    outfile = open(dst_file, "wb")
    outfile.write(content)
    outfile.close()


def install(prog):
    process = sp.Popen(prog, shell=True)
    process.wait()

def main():
    download(DOWNLOAD_URL, DOWNLOAD_DST)
    install(DOWNLOAD_DST)

if __name__ == "__main__":
    main()

Hi Gribouillis,
Thanks for the help. Im very much new to Python such that i dont even know where the script should be written :)
Btw i found that out, and wrote the below code in IDLE. Then ran the script through command prompt. It worked fine i.e. only the exe file got downloaded to the destination mentioned in the code but it did not start the installation.

Can you please suggest what could be wrong?


Well if the file is always found at the same address, it shouldn't be too hard to automate the task, a script could look like

from urllib2 import urlopen
import subprocess as sp

DOWNLOAD_URL = "http://11.12.13.27:8080/.....game.exe"
DOWNLOAD_DST = "C:/....game.exe"

def download(url, dst_file):
    content = urlopen(url).read()
    outfile = open(dst_file, "wb")
    outfile.write(content)
    outfile.close()


def install(prog):
    process = sp.Popen(prog, shell=True)
    process.wait()

def main():
    download(DOWNLOAD_URL, DOWNLOAD_DST)
    install(DOWNLOAD_DST)

if __name__ == "__main__":
    main()

We are going to explore why it didn't work. Can you run this script and post it's output ? (you must set the DOWNLOAD_DST)

DOWNLOAD_DST = "C:/....game.exe"

class Command(object):
    """Run a command and capture it's output string, error string and exit status"""

    def __init__(self, command):
        self.command = command 

    def run(self, shell=True):
        import subprocess as sp
        process = sp.Popen(self.command, shell = shell, stdout = sp.PIPE, stderr = sp.PIPE)
        self.pid = process.pid
        self.output, self.error = process.communicate()
        self.failed = process.returncode
        return self

    @property
    def returncode(self):
        return self.failed

def install(prog):
    com = Command(prog).run()
    print("OUTPUT")
    print(com.output)
    print("ERROR")
    print(com.error)
    print("RETURN CODE: %s" % str(com.returncode))

if __name__ == "__main__":
    install(DOWNLOAD_DST)

Hi,
Im so sorry, the code is working fine now. Its downloading the exe file to the mentioned destination and also starting the installation on its own Thank you very much.

But now there is a problem. This cruisecontrol is a continuous build thing. So what it does is it takes the latest checked in code and creates a build each night. So this game.exe is never taken from the same http address. So i need to build the http path.

This is how cruisecontrol is like. When i type http://11.12.13.27:8080/cruisecontrol i will get a list of branch names (displayed as hyperlinks) e.g A 1.1, B 2.4, C 1.7. Then i click on B 2.4. Now the latest build of this branch is displayed in the browser. Here i need to click on another hyperlink called 'Build Artifacts'. After this another page is displayed. On this page i need to click a hyperlink which starts with 'clients_test*'. After this another page is displayed where i click on the exe file i.e game.exe which gets downloaded and then i have to install it.
So basically what u have helped me with is a static address. But i need the above. How can this be done.

Thanks a lot for ur help.


Well if the file is always found at the same address, it shouldn't be too hard to automate the task, a script could look like

from urllib2 import urlopen
import subprocess as sp

DOWNLOAD_URL = "http://11.12.13.27:8080/.....game.exe"
DOWNLOAD_DST = "C:/....game.exe"

def download(url, dst_file):
    content = urlopen(url).read()
    outfile = open(dst_file, "wb")
    outfile.write(content)
    outfile.close()


def install(prog):
    process = sp.Popen(prog, shell=True)
    process.wait()

def main():
    download(DOWNLOAD_URL, DOWNLOAD_DST)
    install(DOWNLOAD_DST)

if __name__ == "__main__":
    main()

Is there a way to predict the address of the exe file depending on the day ? It's difficult to help you if I can't see the web pages. A possibility would be to download the web pages and parse them, but we need some criteria to automate the choice of the links to follow.

I can see the web pages. But now im clicking on each link to reach the exe file and as i click on each link the http address is formed.

1. What i need is a txt file say branch.txt where i will supply the branch name say GMS_7.4. So the http address formed is http://11.12.13.27:8080/cruisecontrol/buildresults/ABC_gms_7.4_nightly_build?log=log20100417020551L26902 where 20100417020551L26902 will be the latest build name (year month date timeofbuild build#)
2. On this page i need to find a hyperlink 'Build Artifacts' which when clicked will have http address http://11.12.13.27:8080/cruisecontrol/artifacts/ABC_gms_7.4_nightly_build/20100417020551/
3. On this page i need to find a hyperlink 'Clients_test' which when clicked will have http address http://11.12.13.27:8080/cruisecontrol/artifacts/ABC_gms_7.4_nightly_build/20100417020551/clients_test/
4. On this page i find game.exe

So the criteria to automate the choice of links would be the branch name supplied through the txt file.

Is there a way to predict the address of the exe file depending on the day ? It's difficult to help you if I can't see the web pages. A possibility would be to download the web pages and parse them, but we need some criteria to automate the choice of the links to follow.

Here is a piece of code which should display a list of the links available from your page http://11.12.13.27:8080/cruisecontrol

URL = "http://11.12.13.27:8080/cruisecontrol"


from urllib2 import urlopen
from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        if tag == "a":
            attrs = dict(attrs)
            if "href" in attrs:
                print attrs

    def handle_endtag(self, tag):
        pass

if __name__ == "__main__":
    parser = MyHTMLParser()
    parser.feed(urlopen(URL).read())
    parser.close()

Can you post this list here and tell me if you see the link that you must follow from this page, and how you recognize it ? If this works, we may be able to download the next page and simulate the procedure step by step.

Hi, I executed the script. List similar to below appears

{'href': '/cruisecontrol/?sort=project', 'class': 'sorted'} {'href': '/cruisecontrol/?sort=status', 'class': 'sorted'} {'href': '/cruisecontrol/?sort=last failure', 'class': 'sorted'} {'href': '/cruisecontrol/?sort=last successful', 'class': 'sorted'} {'href': 'buildresults/ABC_gsm_7.4_nightly_build'} {'href': 'settings.jsp?projectName=ABC_gsm_7.4_nightly_build'}{'href': 'buildresults/ABC_ors_3.2_nightly build'}
{'href': 'settings.jsp?projectName=ABC_ors_3.2_nightly_build}
{'href': 'buildresults/ABC_trunk_nightly_build'} {'href': 'settings.jsp?projectName=ABC_trunk_nightly_build'}
...........
...........

Now i may need to fetch the exe file from any of the above i.e either from gsm_7.4 or ors_3.2 or tunk. This is what the user needs to supply through a txt file or anything like that.

Ok, now create a text file named 'branch.txt' which contains gsm_7.4 for example and
run this modified code, it should list the links in the next page of the procedure. Please post these links again

URL = "http://11.12.13.27:8080/cruisecontrol"
BRANCH_FILE = "branch.txt"


from urllib2 import urlopen
from HTMLParser import HTMLParser

def read_branch():
    return open(BRANCH_FILE).read().strip()

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        if tag == "a":
            attrs = dict(attrs)
            if "href" in attrs:
                print attrs

    def handle_endtag(self, tag):
        pass

if __name__ == "__main__":
    parser = MyHTMLParser()
    branch = read_branch().lower()
    url = URL + "/buildresults/ABC_%s_nightly_build" % branch
    parser.feed(urlopen(url).read())
    parser.close()

I must go for a few hours, so won't be able to post the next step immediately.

Hi, i was out of town for work. Will execute this tomorrow morning and will post the resultant list.

hi,
below is the generated list
....
{'href': '/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100421121825L26982', 'class': 'link'}
{'href': '/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100421020445L26963', 'class': 'link'}
......(List of other builds)...
......(List of other builds)...
{'href': 'artifacts/ABC_gsm_7.4_nightly_build/20100421121825'}
{'href': 'Deploy.jsp?ArtifactsUrl=artifacts/ABC_gsm_7.4_nightly_build/20100421121825'}
......

Now i need to click on Build Artifacts hyperlink on this page. It is shown in the above list as {'href': 'artifacts/ABC_gsm_7.4_nightly_build/20100421121825'}

Ok, now create a text file named 'branch.txt' which contains gsm_7.4 for example and
run this modified code, it should list the links in the next page of the procedure. Please post these links again

URL = "http://11.12.13.27:8080/cruisecontrol"
BRANCH_FILE = "branch.txt"


from urllib2 import urlopen
from HTMLParser import HTMLParser

def read_branch():
    return open(BRANCH_FILE).read().strip()

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        if tag == "a":
            attrs = dict(attrs)
            if "href" in attrs:
                print attrs

    def handle_endtag(self, tag):
        pass

if __name__ == "__main__":
    parser = MyHTMLParser()
    branch = read_branch().lower()
    url = URL + "/buildresults/ABC_%s_nightly_build" % branch
    parser.feed(urlopen(url).read())
    parser.close()

I must go for a few hours, so won't be able to post the next step immediately.

This new version should print the links in the clients_test page (I skipped the artifacts page). If it works, we should find your exe file in the next step. Please post the links again

URL = "http://11.12.13.27:8080/cruisecontrol"
BRANCH_FILE = "branch.txt"


from urllib2 import urlopen
from HTMLParser import HTMLParser

def read_branch():
    return open(BRANCH_FILE).read().strip()

class MyHTMLParser(HTMLParser):
    def __init__(self, *args, **kwd):
        HTMLParser.__init__(self, *args, **kwd)
        self.links = []

    def handle_starttag(self, tag, attrs):
        if tag == "a":
            attrs = dict(attrs)
            if "href" in attrs:
                self.links.append(dict(attrs))

    def handle_endtag(self, tag):
        pass

def get_links(url):
    parser = MyHTMLParser()
    parser.feed(urlopen(URL).read())
    parser.close()
    return parser.links

if __name__ == "__main__":
    branch = read_branch().lower()
    url = URL + "/buildresults/ABC_%s_nightly_build" % branch
    artifacts_url = None
    for link in get_links(url):
        if link["href"].startswith("artifacts/ABC_%s_nightly_build" % branch):
            artifacts_url = "%s/%s"(URL, link["href"])
            break
    L = artifacts_url.split("/")
    number = L[-1]
    clients_url = artifacts_url+"/clients_test/"
    for link in get_links(clients_url):
        print link

What does this error mean
AttributeError: 'NoneType' object has no attribute 'split'

Sorry, it means that it didn't find the artifacts page. Replace the get_links function by this one. I hope there is no other error :)

def get_links(url):
    parser = MyHTMLParser()
    parser.feed(urlopen(url).read())
    parser.close()
    return parser.links

also replace artifacts_url = "%s/%s"(URL, link["href"]) by artifacts_url = "%s/%s" % (URL, link["href"])

Same error again. It says error is in L = artifacts_url.split("/")
Another thing is when i click on a particular branch link the url is http://11.12.13.27:8080/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build. Then when i click on 'Build Artifacts' link on this page the url is changed to http://11.12.13.27:8080/cruisecontrol/artifacts/ABC_gsm_7.4_nightly_build/20100421230451/ where 20100421230451 is the latest build.

Can you replace the if __name__ == "__main__" part by this

if __name__ == "__main__":
    branch = read_branch().lower()
    url = URL + "/buildresults/ABC_%s_nightly_build" % branch
    artifacts_url = None
    print "BEFORE LINK LOOP"
    for link in get_links(url):
        print link["href"]
        if link["href"].startswith("artifacts"):
            artifacts_url = "%s/%s" % (URL, link["href"])
            break
    print "AFTER LINK LOOP"
    #L = artifacts_url.split("/")
    #number = L[-1]
    #clients_url = artifacts_url+"/clients_test"
    #for link in get_links(clients_url):
    #   print link

and post the output ? It's difficult for me because I can't test the code on your machine...

BEFORE LINK LOOP
http://cruisecontrol.sourceforge.net
AFTER LINK LOOP
index
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100421121825L26982
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100421020445L26963
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100420105649L26934
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100420091736
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100420023854
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100417020551L26902
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100416033703L26873
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100415125237L26857
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100415030535L26850
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100414025512L26817
AFTER LINK LOOP
rss/ABC_gsm_7.4_nightly_build
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?tab=testResults
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?tab=autotestResults
AFTER LINK LOOP
logs/ABC_gsm_7.4_nightly_build/log20100421121825L26982
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?tab=metrics
AFTER LINK LOOP
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?tab=controlPanel
AFTER LINK LOOP
artifacts/ABC_gsm_7.4_nightly_build/20100421121825

It seems that it finds the artifacts url ... Now replace again the if __name__ == "__main__" part by this, and don't change the indentation in any way

if __name__ == "__main__":
    branch = read_branch().lower()
    url = URL + "/buildresults/ABC_%s_nightly_build" % branch
    artifacts_url = None
    for link in get_links(url):
        if link["href"].startswith("artifacts"):
            artifacts_url = "%s/%s" % (URL, link["href"])
            break
    #L = artifacts_url.split("/")
    #number = L[-1]
    print "ARTIFACTS URL", artifacts_url
    clients_url = artifacts_url+"/clients_test"
    for link in get_links(clients_url):
        print link

Getting below error:
clients_url = artifacts_url + "/clients_test/"
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
I noticed that the link is client test. There is no underscore sign between. Also the link starts with client test. It can be client test4 or client test2 or client test6 etc etc.

ARTIFACTS URL None
Traceback (most recent call last):
File "C:\list.py", line 45, in <module>
clients_url = artifacts_url + "/clients test/"
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Here is a new version with printed output, I tried to isolate the function to get the url of the artifacts page because I don't understand the problem. Please post the output again

URL = "http://11.12.13.27:8080/cruisecontrol"
BRANCH_FILE = "branch.txt"


from urllib2 import urlopen
from HTMLParser import HTMLParser

def read_branch():
    return open(BRANCH_FILE).read().strip()

class MyHTMLParser(HTMLParser):
    def __init__(self, *args, **kwd):
        HTMLParser.__init__(self, *args, **kwd)
        self.links = []

    def handle_starttag(self, tag, attrs):
        if tag == "a":
            attrs = dict(attrs)
            if "href" in attrs:
                self.links.append(dict(attrs))

    def handle_endtag(self, tag):
        pass

def get_links(url):
    parser = MyHTMLParser()
    parser.feed(urlopen(url).read())
    parser.close()
    return parser.links

def get_artifacts_url():
    branch = read_branch().lower()
    url = URL + "/buildresults/ABC_%s_nightly_build" % branch
    for link in get_links(url):
        print link["href"]
        if link["href"].startswith("artifacts/"):
            return "%s/%s" % (URL, link["href"])


if __name__ == "__main__":
    url = get_artifacts_url()
    if url is None:
        print "Could not find artifacts url"
    else:
        print get_links(url)

http://cruisecontrol.sourceforge.net
index
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100421121825L26982
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?log=log20100421020445L26963
........
.........
rss/ABC_gsm_7.4_nightly_build
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?tab=testResults
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?tab=autotestResults
logs/ABC_gsm_7.4_nightly_build/log20100421121825L26982
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?tab=metrics
/cruisecontrol/buildresults/ABC_gsm_7.4_nightly_build?tab=controlPanel
artifacts/ABC_gsm_7.4_nightly_build/20100421121825
[{'href': '/cruisecontrol/artifacts/ABC_gsm_7.4_nightly_build/20100421121825/ABC_gsm_7.4_nightly_build_26982_GDS-1.zip'}, {'href': '/cruisecontrol/artifacts/ABC_gsm_7.4_nightly_build/20100421121825/ABC_gsm_7.4_nightly_build_26982__managerserver.zip'}, {'href': '/cruisecontrol/artifacts/ABC_gsm_7.4_nightly_build/20100421121825/ABC_gsm_7.4_nightly_build_26982_arch_db_sql_scripts-1.zip'}, ................... , ...... , {'href': '/cruisecontrol/artifacts/ABC_gsm_7.4_nightly_build/20100421121825/clients_test5'}, {'href': '/cruisecontrol/artifacts/ABC_gsm_7.4_nightly_build/20100421121825/clients_test7'}]

It can pick the exe from any of the 2 folders i.e client_test5 or client_test7

Ok, here is a new version which should read the client url

URL = "http://11.12.13.27:8080/cruisecontrol"
BRANCH_FILE = "branch.txt"


from urllib2 import urlopen
from HTMLParser import HTMLParser
import re
client_re = re.compile(r"\d+/clients_test\d*")

def read_branch():
    return open(BRANCH_FILE).read().strip()

class MyHTMLParser(HTMLParser):
    def __init__(self, *args, **kwd):
        HTMLParser.__init__(self, *args, **kwd)
        self.links = []

    def handle_starttag(self, tag, attrs):
        if tag == "a":
            attrs = dict(attrs)
            if "href" in attrs:
                self.links.append(dict(attrs))

    def handle_endtag(self, tag):
        pass

def get_links(url):
    parser = MyHTMLParser()
    parser.feed(urlopen(url).read())
    parser.close()
    return parser.links

def get_artifacts_url():
    branch = read_branch().lower()
    url = URL + "/buildresults/ABC_%s_nightly_build" % branch
    for link in get_links(url):
        #print link["href"]
        if link["href"].startswith("artifacts/"):
            return "%s/%s" % (URL, link["href"])

def get_client_url():
    url = get_artifacts_url()
    for link in get_links(url):
        if client_re.search(link["href"]):
            return "http://11.12.13.27:8080" + link["href"]

if __name__ == "__main__":
    url = get_client_url()
    if url is None:
        print "Could not find client url"
    else:
        print get_links(url)

Could not find client url

Try to replace line 9 by client_re = re.compile(r"clients_test")

Still the same.

Ok, replace the get_client_url function by

def get_client_url():
    url = get_artifacts_url()
    for link in get_links(url):
        print link["href"]
        if link["href"].find("/clients_test") > 0:
            return "http://11.12.13.27:8080" + link["href"]

/cruisecontrol/artifacts/ABC_gsm_7.4_nightly_build/20100421230451/ABC_gsm_7.4_nightly_build_26998_GDS-1.zip
............
............
/cruisecontrol/artifacts/ABC_gsm_7.4_nightly_build/20100421230451/artifacts_test2
/cruisecontrol/artifacts/ABC_gsm_7.4_nightly_build/20100421230451/clients_test2
/cruisecontrol/artifacts/ABC_gsm_7.4_nightly_build/20100421230451/clients_test4

Could not find client url

Sorry i had written client_test instead of clients_test. Now im getting below error

URLError: <urlopen error (11001, 'getaddrinfo failed')>

Sorry i had written client_test instead of clients_test. Now im getting below error

URLError: <urlopen error (11001, 'getaddrinfo failed')>

Did you check that the last printed url was an url to a clients_test directory containing the game.exe ? If your page contains many links with "clients_test", how do we select the good link. You said we could use clients_test5 or clients_test7, but why not clients_test4 ? Also, can you post the exact url of the clients_test page ?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.