How do I read a .txt/.csv file from an internet address? For example: http:\\www.internetaddress.com\file.txt I don't think file() would work for this.
Thanks
How do I read a .txt/.csv file from an internet address? For example: http:\\www.internetaddress.com\file.txt I don't think file() would work for this.
Thanks
Basic example with for loop
#URL LIBRARY
from urllib2 import *
ur = urlopen("http://www.daniweb.com/forums/thread161312.html")#open url
contents = ur.readlines()#readlines from url file
fo = open("test.txt", "w")#open test.txt
for line in contents:
print "writing %s to a file" %(line,)
fo.write(i)#write lines from url file to text file
fo.close()#close text file
Thanks for the help. That solved my problem.
How to remove all the html tags?
urlopen()
does not seem to work for me, as in I cannot import it. I am using Python 3.4.3 though.
In python 3, urlopen()
is in module urllib.request
. You can go here https://docs.python.org/3/index.html and type the name of a function in the quick search box to find it in the documentation.
Here are the diffrent ways,
and also what i would call the prefered way these day with Requests.
Python 2:
from urllib2 import urlopen
page_source = urlopen("http://python.org").read()
print page_source
Python 3:
from urllib.request import urlopen
page_source = urlopen('http://python.org').read().decode('utf_8')
print(page_source)
For Python 3 to get str
output and not byte
we need to decode to utf-8.
Here with Requests,work for Python 2 and 3:
import requests
page_source = requests.get('http://python.org')
print(page_source.text)
Basic web-scraping we read in with Requests and parse with BeautifulSoup.
import requests
from bs4 import BeautifulSoup
page_source = requests.get('http://python.org')
soup = BeautifulSoup(page_source.text)
print(soup.find('title').text) #--> Welcome to Python.org
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.