Hello developers,

I have written this code which walks through the html files in a directory and fetches event and date information and print it on screen. I want this data to be stored in a database. I have installed mongodb and pymongo module. How can I insert the output data into the database.

import re
import os
from bs4 import BeautifulSoup

for subdir, dirs, files in os.walk("/home/himanshu/event/"):
    for fle in files:
        path = os.path.join(subdir, fle)    
        soup = BeautifulSoup(open(path))

        #print event title
        print (soup.h1.string)

        #Date and Time detection
        s=soup.get_text()
        pat=r'\d{1,2} - \d{1,2} \w{3} \d{4}'
        m=re.search(pat,s)
        if m is None:            
            pat1=r'\d{1,2} \w{3} \d{4}'
            m1=re.search(pat1,s)
            if m1 is None:            
                pat2=r'\d{1,2} \w{3} - \d{1,2} \w{3} \d{4}'
                m2=re.search(pat2,s)
                if m2 is None:
                    pat3=r'\w{3} \d{4}'
                    m3=re.search(pat3,s)
                    if m3 is None:
                        print "No date found"

                    else:
                        m3.group(0)
                        print m3.group(0)

                else:
                    m2.group(0)
                    print m2.group(0)

            else:
                m1.group(0)
                print m1.group(0)
        else:            
            m.group(0)
            print m.group(0)

Output is displayed as follows:

event1
date1
event2
date2
event3
date3
event4
date4
and so on...

Also is there a way to anyhow store all these regular expressions in an array and call them in a loop instead of using if-else for each and every regular expression.

I would try a single regular expression

pat = r"(?:\d{1,2}(?:(?:(?: \w{3})? - \d{1,2})? )?(?:\w{3} \d{4}))"

I don't know about the mongodb part. Use kodos to debug python regexes.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.