Finding a value from HTTP and remove it

Question

jex89 0 Newbie Poster

15 Years Ago

Hi, I am undertaking a peice of work and may need a bit of help.

The problem i need to find a solution for is as follows -

I am requesting a Text based document through a http request, and currently have the document i want from the http request. Now i need to look for an element within this text and remove it

Does anyone have an idea how i could do this my code is below. also i am requesting the document from a solr database, which basically outputs the results from the query in a text view but

#!/bin/env python2.5
from urllib2 import *
import sys
import os
import pickle
import logging
from optparse import OptionParser, OptionGroup
import urllib

def set_solr_query(docID):
	return '[internal network http]'
	

def request_doc(url):
	conn = urlopen(url)
	rsp = eval( conn.read() )
	docedit = rsp['response']['docs'][0]
	#docedit = fileobject.read()
	#doc.readlines()
	find_keyword(docedit)
	print "number of matches=", rsp['response']['numFound']
	for doc in rsp['response']['docs']:
		print 'Year =', doc['year']
    
	
	
def find_keyword(x):
	print "opened File object and dumped doc into pickle"
	file_pi = open('filename_pi.obj', 'w') 
	pickle.dump(x,file_pi)
 


def main(argv):

   print '+++++++++++++++++++++++++++++++++++++++CONFIGURATIONS++++++++++++++++++++++++++++++++++++++++++++++'
   docID=argv[1]
   keyword=argv[2]

   url = set_solr_query(docID)
   request_doc(url)

   
   print docID
   print keyword
   print url
   print '++++++++++++++++++++++++++++++++++++++++++++END++++++++++++++++++++++++++++++++++++++++++++++++++++'

   

if __name__ == "__main__":
   main(sys.argv)

The only part of the above code i cannot publish is the internal http request. apologies for this.

I hope someone can help point a first time python user to this, i would be very greatful. I look foward to hearing back from someone.

Thanks

Dan

python

2 Contributors
2 Replies
73 Views
3 Days Discussion Span
Latest Post 15 Years Ago Latest Post by jlm699

jlm699 320 Veteran Poster

15 Years Ago

I'd look into beautifulsoup if you're looking to break this thing down into an object hierarchy type structure. I haven't used it much, but I know there's plenty of examples on this site of how to implement it.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

jex89 0 Newbie Poster · Answer 1 · 2009-11-12T17:19:18+00:00

Hi, I am undertaking a peice of work and may need a bit of help.
The problem i need to find a solution for is as follows -
I am requesting a Text based document through a http request, and currently have the document i want from the http request. Now i need to look for an element within this text and remove it
Does anyone have an idea how i could do this my code is below. also i am requesting the document from a solr database, which basically outputs the results from the query in a text view but
#!/bin/env python2.5
from urllib2 import *
import sys
import os
import pickle
import logging
from optparse import OptionParser, OptionGroup
import urllib

def set_solr_query(docID):
	return '[internal network http]'
	

def request_doc(url):
	conn = urlopen(url)
	rsp = eval( conn.read() )
	docedit = rsp['response']['docs'][0]
	#docedit = fileobject.read()
	#doc.readlines()
	find_keyword(docedit)
	print "number of matches=", rsp['response']['numFound']
	for doc in rsp['response']['docs']:
		print 'Year =', doc['year']
    
	
	
def find_keyword(x):
	print "opened File object and dumped doc into pickle"
	file_pi = open('filename_pi.obj', 'w') 
	pickle.dump(x,file_pi)
 


def main(argv):

   print '+++++++++++++++++++++++++++++++++++++++CONFIGURATIONS++++++++++++++++++++++++++++++++++++++++++++++'
   docID=argv[1]
   keyword=argv[2]

   url = set_solr_query(docID)
   request_doc(url)

   
   print docID
   print keyword
   print url
   print '++++++++++++++++++++++++++++++++++++++++++++END++++++++++++++++++++++++++++++++++++++++++++++++++++'

   

if __name__ == "__main__":
   main(sys.argv)
The only part of the above code i cannot publish is the internal http request. apologies for this.
I hope someone can help point a first time python user to this, i would be very greatful. I look foward to hearing back from someone.

Thanks
Dan

Thanks