Hello, I would like to know if I need to use a regular expression to match the desired substring in order to print out 10 characters of the start codon ATG.
My dna sequence is "CATAGAGATA"
Thanks for any advice.
Hello, I would like to know if I need to use a regular expression to match the desired substring in order to print out 10 characters of the start codon ATG.
My dna sequence is "CATAGAGATA"
Thanks for any advice.
You need to provide more information in order for us (or me anyway) to create such a regex. What does your data look like?
Hello, I would like to know if I need to use a regular expression to match the desired substring in order to print out 10 characters of the start codon ATG.
My dna sequence is "CATAGAGATA"
Thanks for any advice.
I don't think I understand the question. Your dna sequence consists of 10 characters and you want to print out 10 characters starting with the substring 'ATG'? I don't see any occurrence of the substring 'ATG' in your sequence. Can we shuffle the dna sequence until it contains (or starts with?) 'ATG'? Please tell us how you would determine the output without using a program and then maybe we can advise how to write a program that does it.
For example, does the following do what you want?
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(shuffle); #This module includes a method to shuffle arrays.
my $str = "CATAGAGATA";
my @arr;
while (1){
@arr = $str =~ m/[AGCT]/g; #Convert string into array of single letters
@arr = shuffle(@arr); #Shuffle the letters of the array randomly
last if @arr[0,1,2] = qw(A T G)# Exit loop if first 3 elements = start codon
}
print "Shuffled sequence is:\n";
print join('', @arr), "\n";
This outputs:
Shuffled sequence is:
ATGAGTCTAA
I don't think I understand the question. Your dna sequence consists of 10 characters and you want to print out 10 characters starting with the substring 'ATG'? I don't see any occurrence of the substring 'ATG' in your sequence. Can we shuffle the dna sequence until it contains (or starts with?) 'ATG'? Please tell us how you would determine the output without using a program and then maybe we can advise how to write a program that does it.
For example, does the following do what you want?
#!/usr/bin/perl use strict; use warnings; use List::Util qw(shuffle); #This module includes a method to shuffle arrays. my $str = "CATAGAGATA"; my @arr; while (1){ @arr = $str =~ m/[AGCT]/g; #Convert string into array of single letters @arr = shuffle(@arr); #Shuffle the letters of the array randomly last if @arr[0,1,2] = qw(A T G)# Exit loop if first 3 elements = start codon } print "Shuffled sequence is:\n"; print join('', @arr), "\n";
This outputs:
Shuffled sequence is: ATGAGTCTAA
Thank you for your response and I missed some characters. The DNA sequence is "CCCCATAGAG". I am supposed to print out the 10 characters upstream of the start codon ATG. I think that my output should provide me with the 10 bases upstream of ATG. I don't even know where to start because the question confuses me. Thank you
The question confuses me too. I still see only 10 bases and I don't see any 'ATG' in the sequence. Whoever gave you this question may have made a mistake.
The question confuses me too. I still see only 10 bases and I don't see any 'ATG' in the sequence. Whoever gave you this question may have made a mistake.
You were right I checked and it was a mistake. The dna sequence is: "CCCCATAGAGATAGAGATAGAGAACCCCGCGCGCTCGCATGGGG"; and I need to print out the 20 characters upstream of the start codon ATG.
When you say "upstream", you mean BEFORE the ATG, correct?
When you say "upstream", you mean BEFORE the ATG, correct?
That is correct
Here's some code that does that, if the ATG occurs once:
use strict;
use warnings;
my $seq="CCCCATAGAGATAGAGATAGAGAACCCCGCGCGCTCGCATGGGG";
my $term="ATG";
$_=$seq;
/(\D{20})$term/;
print "$1\n";
Output:
AGAGAACCCCGCGCGCTCGC
Here's some code that does that, if the ATG occurs once:
use strict; use warnings; my $seq="CCCCATAGAGATAGAGATAGAGAACCCCGCGCGCTCGCATGGGG"; my $term="ATG"; $_=$seq; /(\D{20})$term/; print "$1\n";
Output:
AGAGAACCCCGCGCGCTCGC
Thanks for the help and it makes a lot of sense now.
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.