Python - problem with strings or lines...?

Question

milil 24 Newbie Poster

12 Years Ago

Hello friends i am new on DaniWeb and new in Python...
I have a few problems so i need help...
I have a text document with a code from some web page, how can i extract some string between a two different strings in line and then that stings write in a new file after all that program must continue to search more strings in that same line...
i have 50 lines with 150 000 chr Ex: (',...3627s-/a<<*g12-5<d/ajqjh5i1/*//*,.,.,-,')

EX: " (a,b){var c=encodeURIComponent,d=["//www.google.com/gen_204?atyp=i&zx=",(new Da "
How can i extract just '204?atyp=i&zx'

My string is always between 'google.com/gen_' and '="' in this example so i need a help to write a code that open a file, read from it, extract exact string between two strings in a line write him in a new file and then continue to search in that line and rest of the lines in file...
And is it possible that my program can't see all 150 000 chr in one line on web page because when he write it on new file it has just 72 775 something like that in a single line...
So is it possible to write a program who open's a file, read it, input first string, input second string and write string between this two in a new file...
So if you can write any part of the code and I would be very grateful to you...
Thanks

python

3 Contributors
8 Replies
188 Views
2 Days Discussion Span
Latest Post 12 Years Ago Latest Post by milil

TrustyTony 888 pyMod

12 Years Ago

I have posted code for just a case as it is quite typical case:
Picking piece of string between separators

Also it is simple to get these pieces by regular expression using the re module. You then must take care that greedy matching does not take the first beginning separator and match all string until the last end separator:

{m,n}?
Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string 'aaaaaa', a{3,5} will match 5 'a' characters, while a{3,5}? will only match 3 characters.

Edited 12 Years Ago by TrustyTony because: n/a

snippsat 661 Master Poster

12 Years Ago

You may have to give more exact info,like post more of the file.
Here is a regex example with the string you posted,that may need some changes to be more greedy.

import re

s = '(a,b){var c=encodeURIComponent,d=["//www.google.com/gen_204?atyp=i&zx=",(new Da "'
r = re.findall(r'\gen_(\d.*)="', s)
print(r) #['204?atyp=i&zx']

Edited 12 Years Ago by snippsat because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

milil 24 Newbie Poster · Answer 1 · 2011-09-10T01:41:38+00:00

When I use pyTony's method it look like this:

def between(left,right,s):
    before,_,a = s.partition(left)
    a,_,after = a.partition(right)
    return a

myFile = open('kod1.txt')
for lines in myFile.readlines():
     
    s = myFile.readline()
    print between('<script type="text/ja','" src="http://s',s)
  
myFile.close()

and then it just print 50 empty lines... I use that 'tags' from the kod1.txt

but when i use snippsat's method it look like this:

import re

myFile = open('kod1.txt')
for lines in myFile.readlines():
    
    s = myFile.readline()
r = re.findall(r'\<script type="text/javascript"(\d.*)src.php/v1/yP/r/jMx', s)
print(r)

and it just print this [] i replace 'tags' with something else no work...

snippsat 661 Master Poster · Answer 2 · 2011-09-10T02:18:07+00:00

Can you post a sample of the file.
You are making some basic mistake,so what you do will not work.

milil 24 Newbie Poster · Answer 3 · 2011-09-10T03:01:21+00:00

false;}" title="Search"><span class="hidden_elem">Search</span></button></span></span></div></div></div><input type="hidden" name="init" id="init" value="quick" /><input type="hidden" name="tas" class="search_sid_input" value="search_preload" /><input type="hidden" name="search_first_focus" id="search_first_focus" value="" /><

all that is from 1 line 1/50 of a line...

so how i will find 'eload" /><i' in this string if you are writing code pls. write how to open file properly... Thanks...

snippsat 661 Master Poster · Answer 4 · 2011-09-10T05:51:07+00:00

You gone struggle with this if your regex and python skill are not so good.
This file is a mix of javascript and html.
Regex and html is not and the best fit,that`s why it exist parser like lxml and BeautifulSoup.

So this time you want to find something completely diffrent than the first post.
I use with open() then you dont need to close fileobject.

import re

pattern = re.compile(r'pr(.+i)n')
with open('test.txt') as f:
    for match in pattern.finditer(f.read()):
        print(match.group(1)) #eload" /><i

And why do you want to only find a part of word and some tag delimiter?

milil 24 Newbie Poster · Answer 5 · 2011-09-10T14:04:44+00:00

I need a crtitical information from code, that code is a link but link is splited in two half by Ex: 'ss=\"passiveName\" href=\"http:\/\/www.example.com\/profile.php?id=0000000000\" data-hovercard=\"\/ajax\/hoverc'
So i need a program who will go from tag to tag Ex: tag1 'ss=\"passiveName\" href=\"http:\/\/' tag2 '\' and tag3 in this ex: '\" data-hovercard=\"\/ajax\/hoverc'
And then write this in new file 'www.example.com/profile.php?id=0000000000' or program who just extract ID of a file... but that program must continue to looking for new link's between tags...

And nobody didn't answer me is it possible that my program:

filehandle = p.cevapi()
myFile = open('kod1.txt','w')
for lines in filehandle:
    
    myFile.write(lines)
   
myFile.close()

'p.cevapi' is return of the function so the code of page is in there, my question is: is it possible that my program goes just to 72 000 char but in source code that line have 150 000...?

milil 24 Newbie Poster · Answer 6 · 2011-09-12T03:01:48+00:00

Does anyone know the answer to this last question and i have another one...
In which text "code" my program is writing in file when I download a code from a web page? Is it ASC11 or some other? Because when i want to find some code from web page in my file the text is changed...