I am fairly new to Python and trying towork on this problem. I want to split the file which contains two seuqence of letters by the blank line that separates them then 'compare' them:
What you need to do: Text file genesequences.txt contains two gene sequences, separated
from each other by an empty line. Write a program that will read the gene sequences in
(make sure to discard the ‘\n’ characters when you read in the gene sequences), and find the longest region that is shared between the sequences that is also homozygous (has “A” and“B” but no “C”). You may assume that the shared regions will be in the same location in
both gene sequences, so you will only have to check regions starting at the same location in
both sequences.
I think you will want to read and store both gene sequences in two variables, and
discard any extraneous characters but A, B and C. (ii) Use a window that starts with a size of1, but increases by 1 for each iteration, and goes upto length of the entire string. In each
iteration, check window-sized regions of the two sequences. If a match is found, and it does
not contain a ‘C’, note the location and length of the match. (iii) Note that if you use an
increasing window size, any subsequent match will be bigger than previous matches.)
Any ideas? Thanks.