Hi all.
I'm a newbie here so excuse my question if it's a bit dumb. I'm a C programmer but needed to do some text file stripping so was told Pythin would be good for this.
I have been messing about with this for about a week now and have the following problem.
I want to look at blocks of HTML and leave certain chunks which contain a name. So for example if my text doc looked like the below, i want to be able to scan through and only take out the blocks with the word "remove"
I wrote some code which was capturing all the blocks but i can't figure out how to leave the "leave" blocks and carry on. My code was getting to "leave" and then would start to rescan the doc again causing it to be stuck in a loop. I have also included my code, don't laugh i'm a beginner ;)
<tr>
<td><a leave </a></td>
</tr>
<tr>
<td><a remove </a></td>
</tr>
<tr>
<td><a leave </a></td>
</tr>
<tr>
<td><a remove </a></td>
</tr>
import re
TRUE = 1
FALSE = 0
leave_search = re.compile ('leave')#need to use this to somehow skip block with this regex
main_search = re.compile ('<tr>\s.*\s</tr>\s')
def file_strip(file_name,search_type):
result = search_type.search(file_name)
leave = leave_search.search(result.group())
print (file_name) #debug not needed
print (result) #debug not needed
search = TRUE
while search:
if result:
print ('We have a result')
if leave:
print ('leave text found') #here i somehow need to search to the next block
else:
print ('leave text NOT found')
file_name = file_name.replace(result.group(),"")
else:
print ('No result left in file')
return file_name
def HTML_strip (filename):
file_to_open = open(filename, 'r')
file_to_read = file_to_open.read()
file_to_open.closed
file_to_read = file_strip(file_to_read,main_search)
file_to_open = open(filename, 'w')
file_to_open.closed
file_to_open = open(filename, 'r+')
file_to_open.write(file_to_read)
file_to_open.closed
return file_to_read
HTML_strip ('webtest.txt')