extracting data from a txt file

Question

dbphydb 0 Junior Poster in Training

15 Years Ago

Hi

My txt file looks something like this:
Branch: ltm_7.4
Destination: Test 5

lines = open("branch_dest.txt").readlines()
lines=[x.split() for x in lines]
print lines

branch = "%s" % lines[0][1].strip(': ')
print branch
destination = "%s" % lines[1][2].strip(': ')
print destination

I need to extract the branch name and the destination. But i get the below output.

[['Branch:', 'ltm_7.4'], ['Destination:', 'Test', '5']]
ltm_7.4
5

Branch is ok. But the destination needs to be Test 5. Kindly help as i am new to Python.

python

3 Contributors
3 Replies
5K Views
10 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by dbphydb

lllllIllIlllI 178 Veteran Poster

15 Years Ago

just an FYI as to what split is for (i am assuming you are new to python) is that is automatically splits up a string into its separate words. Such as

'Hello i am Paul'.split()
#Would give the result of a list:
['Hello','i','am','Paul']

The argument you can give it is what so split by in case you don't want to split by spaces. For your case it appears that you have a colon distinguishing your keyword from your value. Therefore the previous poster is correct in saying that x.split(':') should be perfect for you as it will split the string instead of by the spaces but by the colon.

Hope that helps

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

griswolf 304 Veteran Poster · Answer 1 · 2010-05-11T13:15:57+00:00

x.split(':') will get you most of the way there.

dbphydb 0 Junior Poster in Training · Answer 2 · 2010-05-11T16:54:22+00:00

Hi Paul,
Thanks for the in-depth explanation. I am actually from QA and don't have good knowledge on this.

I embedded this in a script which reads the branch name and the destnation from a txt file and does parsing of HTML pages, but im getting error.

URL = "http://11.12.13.27:8080/cruisecontrol"

from urllib2 import urlopen
from HTMLParser import HTMLParser

import re

# Fetching links using HTMLParser
def get_links(url):
    parser = MyHTMLParser()
    parser.feed(urlopen(url).read())
    parser.close()
    return parser.links

# Build url for Deploy page
def get_deploy_url():
    lines = [x.split(None, 1) for x in open("branch_dest.txt")]
    print lines
    branch = "%s" % lines[0][1].strip(': ')
    print branch
    destination = "%s" % lines[1][1].strip(': ')
    print destination
    url = URL + "/buildresults/Poker-TTM_%s_nightly_build" % branch
    print url
    for link in get_links(url):
        print "hello1"
        if link["href"].startswith("Deploy"):
            return "%s/%s" % (URL, link["href"])
        print link["href"]

# Build url for Destination page
def get_destination_url():
    url = get_deploy_url()
    print url
    print destination
    destination_re = re.compile(r"%s") % destination
    for link in get_links(url):
        if destination_re.search(link["href"]):
            return "http://11.12.13.27:8080/cruisecontrol/" + link["href"]

# Deploying the build
#def deploy(url):
    

# Parsing HTML pages 
class MyHTMLParser(HTMLParser):
    def __init__(self, *args, **kwd):
        HTMLParser.__init__(self, *args, **kwd)
        self.links = []

    def handle_starttag(self, tag, attrs):
        if tag == "a":
            attrs = dict(attrs)
            if "href" in attrs:
                self.links.append(dict(attrs))

    def handle_endtag(self, tag):
        pass

if __name__ == "__main__":
    final_url = get_destination_url()
    if final_url is None:
        print "Could not find a destination to deploy"
    else:
        print final_url
        #deploy(final_url)

I am getting an error

Traceback (most recent call last):
  File "C:\deploy_input.py", line 71, in <module>
    final_url = get_destination_url()
  File "C:\deploy_input.py", line 35, in get_destination_url
    url = get_deploy_url()
  File "C:\deploy_input.py", line 27, in get_deploy_url
    for link in get_links(url):
  File "C:\deploy_input.py", line 13, in get_links
    parser.feed(urlopen(url).read())
  File "C:\Python26\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python26\lib\urllib2.py", line 389, in open
    req = meth(req)
  File "C:\Python26\lib\urllib2.py", line 1067, in do_request_
    raise URLError('no host given')
URLError: <urlopen error no host given>

just an FYI as to what split is for (i am assuming you are new to python) is that is automatically splits up a string into its separate words. Such as
'Hello i am Paul'.split()
#Would give the result of a list:
['Hello','i','am','Paul']
The argument you can give it is what so split by in case you don't want to split by spaces. For your case it appears that you have a colon distinguishing your keyword from your value. Therefore the previous poster is correct in saying that x.split(':') should be perfect for you as it will split the string instead of by the spaces but by the colon.
Hope that helps