Hello,
I think this is a pretty simple problem but I just don't know where to start. I have a text file:
1
00:00:34,000 --> 00:00:36,135
Thank you, Detective.
2
00:00:42,714 --> 00:00:45,794
- Any change?
- Nothing since you left.
3
00:00:52,988 --> 00:00:55,585
She seems to be looking for something.
4
00:00:55,588 --> 00:00:59,234
Camera?
5
00:01:23,961 --> 00:01:26,662
She has a nice ass.
6
00:01:27,571 --> 00:01:30,407
Stay focused on the mission.
7
00:01:36,600 --> 00:01:40,336
Keep an eye on her,
but don't get too close.
8
00:01:51,605 --> 00:01:53,832
- Good morning.
- Good morning.
Actually, its a .srt (subtitle) file and I need to extract the text, so ignore the 'timestamps' and 'index number'. Ultimately, I need to create a corpus of subtile files as part of my linguistics course. Is python the right tool for this job? Any help would be much appreciated :D