:cheesy: hi all,
i do not know how many people have worked in biopython before but, i am soo close to this answer i can feel it! just need a lil help again... basically this takes a FASTA file from NCBI and makes it into a dictionary which is wonderful and easy. However my fasta file has id's that are the same name and need a unique id. I wanted to add like a number to the id that is the same..(error comes from bio.mindy) i wrote this:
if key in index:
index[key] = index[key] + 1
but didnt work any ideas? thanks guys/gals!:mrgreen:
import string
from Bio import Fasta
from Bio.Alphabet import IUPAC
def get_accession_num(fasta_record):
title_atoms = string.split(fasta_record.title)
# all of the accession number information is stuck in the first element
# and separated by '|'s
accession_atoms = string.split(title_atoms[0], '|')
# the accession number is the 4th element
gb_name = accession_atoms[3]
# strip the version info before returning
return gb_name[:-2]
index_file(file_to_index,index_file_to_create,function_to_get_index_key)
if key in index:
index[key] = index[key] + 1
Fasta.index_file("ls_orchid.fasta", "orchid_2.idx",get_accession_num)
#dna_parser = Fasta.SequenceParser(IUPAC.protein)
orchid_dict = Fasta.Dictionary("orchid_2.idx")#,dna_parser)