Hi, I am new to python. I am working on parsing text from a smi file . I want to extract only dialogues and want to ignore the timestamps (lines starting with <SYNC) for one case, e.g. Below is part of smi file
<SAMI>
<HEAD>
<Title>ÁŠžñÀ» ÀûŸî ÁÖŒŒ¿ä.</Title>
<Style TYPE="text/css">
<!--
P {margin-left:8pt; margin-right:8pt; margin-bottom:2pt; margin-top:2pt;
text-align:center; font-size:22pt; font-family: Arial, Sans-serif;
font-weight:bold; color:white;}
.KRCC {Name:Korean; lang:ko-KR; SAMIType:CC;}
.ENCC {Name:English; lang:en-US; SAMIType:CC;}
#STDPrn {Name:Standard Print;}
#VLargePrn {Name:34pt (VLarge Print); font-size:34pt;}
#LargePrn {Name:28pt (Large Print); font-size:28pt;}
#MediumPrn {Name:24pt (Medium Print); font-size:24pt;}
#BSmallPrn {Name:18pt (BSmall Print); font-size:18pt;}
#SmallPrn {Name:12pt (Small Print); font-size:12pt;}
-->
</Style>
<!--
-->
</HEAD>
<BODY>
<SYNC Start=52><P Class=ENCC>
Subtitles by Korea NSC Subtitle Team <br>
([url]http://club.nate.com/tsm[/url])
<SYNC Start=3989><P Class=ENCC>
<SYNC Start=5047><P Class=ENCC>
Back, back, back, back!
<SYNC Start=7235><P Class=ENCC>
<SYNC Start=10725><P Class=ENCC>
Yeah, Dan!
I want to extract only lines that have text sentence e.g.
Back, back, back, back!
and write them into a file.
I have written a function but it doesn't give the required ouput.
def getText( inFile ):
text = []
file = open( inFile, "ra" )
wholefile = file.readlines()
for line in wholefile:
line = line.strip()
if line.startswith("<"):
break
elif line.endswith(:>)
break
else
continue
text.append(line)
text.sort()
return text
What could be the problem with this code or is there any other way to do it.
In second case I want to keep the timestamps count and ignore the text while parsing. How could it be achieved?