Hi,
I'm making something of an RSS reader-ish.
I'm using the Universal Feed Parser to do this.
The feed is just a list of TV shows, and the date they air.
I'm successful at getting the feed, now what i'm trying to do is split it up into chunks that I can manipulate.
I'll give you the code bit by bit so as not to confuse too much (and I do apologize for such messy code):
def episode_info (feed_url): #get episode information, returns a dictionary
"""This is a function that is fed an xml feed and returns a dictionary
holding episode titles, content, numbers, and dates"""
d = feedparser.parse(feed_url)
entry_number = len(d['entries']) - 1
episode_name = []
episode_content = []
while entry_number != -1: #add entries to episode_name and episode_content
episode_name.append (d.entries[entry_number].title)
episode_content.append (d.entries[entry_number].description)
entry_number = entry_number - 1
episode_name.reverse()
This first part just takes a feed, and splits it into two. episode_name holds the name of the entry, and episode_content takes the rest.
Here's where it gets tricky:
content = []
episode_number = []
episode_date = []
entry_number = len(episode_content)-1
while entry_number != -1: # splits content into a list with three entries, the summary, and the episode number and the date
if '<br />' in episode_content[entry_number]:
index1 = episode_content[entry_number].index('<br />')
content.append (episode_content[entry_number][0:index1])
else:
print 'there is no index1'
if '–' in episode_content[entry_number]:
index2 = episode_content[entry_number].index('–')
episode_number.append (episode_content[entry_number][index1+6:index2])
if '<p><sub><i>' in episode_content[entry_number]:
index3 = episode_content[entry_number].index('<p><sub><i>')
epdate = episode_content[entry_number][index2+13:index3].replace('/', '-')
epdate = epdate.split('-')
epdate = epdate[2] + '-' + epdate[0] + '-' + epdate[1]
epdate = epdate.replace(' ', '')
episode_date.append (epdate)
else:
episode_number.append (episode_content[entry_number][index1+6:])
episode_date.append (0)
entry_number = entry_number - 1
So the first part to be split is the actual content, which is divided by the rest by '<br />'. I'm using if else here, because I don't know what else to use. All the feeds I use have <br />, but I put the if in there just in case, and it seems to work fine.
Then there's the second part which is seperated by '–'. In between the '<br />' and the '–' is a small bit of text normally like "Season 2, Episode 5".
But what's after that is the date in this format ' – Aired: 12/22/2009' or 'Airs: 5/10/2010'. I realise I have to adjust the code depending on if its Airs or Aired. I'll do that later.
Here's the problem. sometimes the "Airs/Aired Date" isn't there. So I want to make it so that, if it's there, the date is added to the dictionary (as it should be doing already), but if there's no date, to just skip the dictionary, or else mark it in the dictionary as 0 so I can test to see if there's a date later on in the program. I can't figure out why this part isn't working.
The program finishes off by returning all the values gather in a dictionary:
entry_number = len(episode_content) - 1
episodes = {}
while entry_number != -1: # put it all in a dictionary: episodes
if episode_date[entry_number] == 0:
print 'There is no date, therefore the episode cannot be added to the calendar'
break
else:
episodes[entry_number] = [episode_name[entry_number],content[entry_number], episode_number[entry_number], episode_date[entry_number]]
entry_number = entry_number - 1
return episodes
.
The error I get at the moment is :
Traceback (most recent call last):
File "rss.py", line 137, in <module>
print fringe[0]
KeyError: 0
Not too sure what's happening there. Here's the code as a whole to give you a better idea of what's going on:
#!/usr/bin/env python
import feedparser
def episode_info (feed_url): #get episode information, returns a dictionary
"""This is a function that is fed an xml feed and returns a dictionary
holding episode titles, content, numbers, and dates"""
d = feedparser.parse(feed_url)
entry_number = len(d['entries']) - 1
episode_name = []
episode_content = []
while entry_number != -1: #add entries to episode_name and episode_content
episode_name.append (d.entries[entry_number].title)
episode_content.append (d.entries[entry_number].description)
entry_number = entry_number - 1
episode_name.reverse()
content = []
episode_number = []
episode_date = []
entry_number = len(episode_content)-1
while entry_number != -1: # splits content into a list with three entries, the summary, and the episode number and the date
if '<br />' in episode_content[entry_number]:
index1 = episode_content[entry_number].index('<br />')
content.append (episode_content[entry_number][0:index1])
else:
print 'there is no index1'
if '–' in episode_content[entry_number]:
index2 = episode_content[entry_number].index('–')
episode_number.append (episode_content[entry_number][index1+6:index2])
if '<p><sub><i>' in episode_content[entry_number]:
index3 = episode_content[entry_number].index('<p><sub><i>')
epdate = episode_content[entry_number][index2+13:index3].replace('/', '-')
epdate = epdate.split('-')
epdate = epdate[2] + '-' + epdate[0] + '-' + epdate[1]
epdate = epdate.replace(' ', '')
episode_date.append (epdate)
else:
episode_number.append (episode_content[entry_number][index1+6:])
episode_date.append (0)
entry_number = entry_number - 1
entry_number = len(episode_content) - 1
episodes = {}
while entry_number != -1: # put it all in a dictionary: episodes
if episode_date[entry_number] == 0:
print 'There is no date, therefore the episode cannot be added to the calendar'
break
else:
episodes[entry_number] = [episode_name[entry_number],content[entry_number], episode_number[entry_number], episode_date[entry_number]]
entry_number = entry_number - 1
return episodes
fringe = episode_info('http://feed43.com/lietome_timetable.xml')
print fringe[0]
Again, sorry about the messiness of the code, and if there are any questions I can answer to help solve the problem, I'll try my best to answer.
Thank you.