Please read the FE.dox to see the 5 questions that need to be done.
Price can negotiate.
Rules: The only modules allowed for this exam are the os and re modules.
Q1) Write a program that asks the user for a file containing a FASTA nucleotide sequence (included is a file called sequence.fasta you can use). Then prompt the user to select from the following menu:
A. Calculate DNA composition: This will print to the screen the numbers of A, G, C and T n_u_c_l_e_o_t_i_d_e_s_,_ _a_n_d_ _a_n_y_ _u_n_k_n_o_w_n_s_ _(_N_’s_)_._ _
B. Calculate AT content: Prints to the screen the percentage of AT in the sequence.
C. Calculate GC content: Prints to the screen the percentage of GC in the sequence.
D. Compliment: Prints to the screen the compliment of the DNA sequence.
E. Reverse compliment: Prints to the screen the reverse compliment.
Each menu item above should be implemented in its own function. The function should be called when the user selects the respective menu item. The functions should accept as argument the DNA sequence and then perform the appropriate calculationsalgorithm.
Input validation: Check to see that the file name entered by the user exists AND that the sequence is in FASTA format. You can assume that there is only one sequence in the file.
Q2) Write a program that asks the user for a file containing a FASTA nucleotide sequence (you can use the same sequence.fasta file as above). Then prompt the user to select a frame (number 1 through 6). Your program should then find the translation (protein sequence) of the nucleotide sequence in that frame. Print the translation to the screen.
Input validation: Check to see that the file name entered by the user exists AND that the sequence is in FASTA format. You can assume that there is only one sequence in the file.
Q3) Write a program that asks the user for a sequence in GenBank format (included is a file called sequence.gb that you can use). Your program should convert the GenBank formatted sequence into FASTA format. Write the FASTA formatted sequence to a file, name of which should include the accession number (i.e. NM_001250672.txt, where NM_001250672 is the accession number).
Q4) Write a program that asks the user for a file containing a nucleotide sequence AND the name of a restriction enzyme. Your program should return the positions in the sequence where the enzyme cuts. Parse out the enzymes and their cut sites from the attached RestrictionEnzymes.txt file.
Q5) Read in a whole genome (in FASTA format – _file called genome.txt, see attached) and compute the background codon frequencies. The background frequency of a codon is computed by the formula: background_frq(codon) = 100 * N(codon)/ Total_codons where N(codon) is the number of occurrence of the codon across the entire genome, and Total_codons is the total number of all codons in the whole genome. Print out the background frequency of each codon, from AAA to TTT. Use a dictionary in your solution. Your program should count codons that appear in all reading frames and then calculate and display the average.
>NM_001250672.2 Glycine max cationic peroxidase 2 (PRX2), mRNA
GAGCAAGAGTGAAGAGCGAAGAGAATGGCTCCCAAGGGTTTAATCTTTTTGGCTGTGTTATGCTTCTCAG
CACTGTCACTGAGTCGTTGTCTTGCGGAGGATAATGGACTTGTTATGAACTTCTACAAGGAATCATGCCC
TCAGGCTGAAGACATCATCAAAGAACAAGTCAAGCTTCTCTACAAGCGCCACAAGAACACTGCTTTCTCC
TGGCTCAGAAACATCTTCCATGACTGTGCTGTTCAGAGTTGTGATGCTTCACTGTTGCTGGACTCCACAA
GAAGGAGCTTGTCTGAGAAGGAAACAGATAGAAGCTTTGGGTTGAGAAATTTCAGGTACATTGAGACCAT
CAAAGAAGCTTTGGAAAGGGAATGCCCAGGAGTTGTTTCCTGTGCTGATATCCTCGTTCTCTCTGCCAGA
GATGGCATTGTTTCGCTAGGAGGTCCCCATATCCCTCTTAAAACAGGAAGAAGGGATGGTAGAAGGAGCA
GAGCCGATGTGGTAGAGCAGTTCCTCCCAGACCACAATGAATCCATTTCTGCAGTTCTTGACAAGTTTGG
TGCCATGGGAATTGACACCCCCGGCGTAGTTGCATTGCTTGGAGCACACAGTGTTGGTCGAACCCATTGT
GTGAAGTTGGTGCACCGTTTGTACCCAGAGATTGATCCAGCTCTGAACCCTGACCACGTCCCTCACATTC
TGAAGAAGTGCCCTGATGCCATTCCAGACCCTAAGGCCGTGCAGTACGTGAGAAACGACCGTGGCACCCC
CATGATTCTAGACAACAATTACTACAGAAATATATTGGACAACAAGGGCTTGTTGATAGTGGATCACCAA
CTAGCCAATGACAAGAGGACCAAGCCTTATGTGAAGAAAATGGCCAAGAGCCAGGACTATTTCTTCAAGG
AGTTTTCTAGAGCCATTACTTTGCTCTCTGAGAACAACCCTCTCACTGGCACAAAGGGTGAGATCAGAAA
GCAGTGCAATGCTGCCAACAAGCACCATGAGGAGCCTTAATTGCTTCCCGCTTAATTTGGGCCTTGAATT
TTCTTCCCCTTCTCTATGTGGAAGAAATCTGTAAGATATTATGCAAAAAATAATTAAGGTGTTTTTCTTT
AAATGGGTTGGTTGATTGGTTCAATGAACCGATCAAGACCACAGCAGGTTCATGGGGATGCGAGGATTAA
GACGCTTTGTTTTTTAATCTTCCGATGTCACTCTTGTTTGTTAGTTTGTTTTTTTATTTTTTATTTCAAT
AAGTACTGTGCAAGTAGGTTAGAGTTGGGTAGAAGGGCATGTTCATGGTGTTAATTACTATGTTATGTAT
GCATGTGAGTGCTGCTATCGATGGCAAGATGTCAATGTATGCGTGTAGTGCTGTTATCGATGAGAGTGAA
AATGTTTATGATATCCACACTAATAAAGCTAGCTTGCTCTTGCTACATAATAAATAAATCATGGCCCACG
GTCATTATACAAAAAAAAAAAAAAAAAA
AanI TTA’TAA
AarI CACCTGCNNNN’NNNN
AasI GACNNNN’NNGTC
AatI AGG’CCT
AatII GACGT’C
AbsI CC’TCGAGG
AccI GT’MKAC
AccII CG’CG
AccIII T’CCGGA
Acc16I TGC’GCA
Acc36I ACCTGCNNNN’NNNN
Acc65I G’GTACC
AccB1I G’GYRCC
AccB7I CCANNNN’NTGG
AccBSI CCG’CTC
AceIII CAGCTCNNNNNNN’NNNN
AciI C’CGC
AclI AA’CGTT
AclWI GGATCNNNN’N
AcoI Y’GGCCR
AcsI R’AATTY
AcuI CTGAAGNNNNNNNNNNNNNNNN’
AcvI CAC’GTG
AcyI GR’CGYC
AdeI CACNNN’GTG
AfaI GT’AC
AfeI AGC’GCT
AfiI CCNNNNN’NNGG
AflII C’TTAAG
AflIII A’CRYGT
AgeI A’CCGGT
AgsI TTS’AA
AhdI GACNNN’NNGTC
AhlI A’CTAGT
AjiI CAC’GTC
AjnI ‘CCWGG
AjuI GAANNNNNNNTTGGNNNNNNNNNNN’
AleI CCAANNNNNNNTTCNNNNNNNNNNNN’
AlfI CACNN’NNGTG
AloI GCANNNNNNTGCNNNNNNNNNNNN’
AluI GAACNNNNNNTCCNNNNNNNNNNNN’
AluBI GGANNNNNNGTTCNNNNNNNNNNNN’
AlwI AG’CT
Alw21I AG’CT
Alw26I GGATCNNNN’N
Alw44I GWGCW’C
AlwFI GTCTCN’NNNN
AlwNI G’TGCAC
Ama87I GAAAYNNNNNRTG
Aor13HI CAGNNN’CTG
Aor51HI C’YCGRG
AoxI T’CCGGA
ApaI AGC’GCT
ApaBI ‘GGCC
ApaLI GGGCC’C
ApeKI GCANNNNN’TGC
ApoI G’TGCAC
ApyPI G’CWGC
AquII R’AATTY
AquIII ATCGACNNNNNNNNNNNNNNNNNNNN’
AquIV GCCGNACNNNNNNNNNNNNNNNNNNNN’
ArsI GAGGAGNNNNNNNNNNNNNNNNNNNN’
AscI GRGGAAGNNNNNNNNNNNNNNNNNNN’
AseI GACNNNNNNTTYGNNNNNNNNNNN’
Asi256I CRAANNNNNNGTCNNNNNNNNNNNNN’
AsiGI GG’CGCGCC
AsiSI AT’TAAT
AspI G’ATC
Asp700I A’CCGGT
Asp718I GCGAT’CGC
AspA2I GACN’NNGTC
AspCNI GAANN’NNTTC
AspEI G’GTACC
AspLEI C’CTAGG
AspS9I GCCGC
AssI GACNNN’NNGTC
AsuII GCG’C
AsuC2I G’GNCC
AsuHPI AGT’ACT
AsuNHI TT’CGAA
AvaI CC’SGG
AvaII GGTGANNNNNNNN’
AvaIII G’CTAGC
AviII C’YCGRG
AvrII G’GWCC
AxyI ATGCAT
BaeI TGC’GCA
BaeGI C’CTAGG
BalI CC’TNAGG
BamHI ACNNNNGTAYCNNNNNNNNNNNN’
BanI GRTACNNNNGTNNNNNNNNNNNNNNN’
BanII GKGCM’C
BanIII TGG’CCA
BarI G’GATCC
BasI G’GYRCC
BauI GRGCY’C
BbeI AT’CGAT
Bbr7I GAAGNNNNNNTACNNNNNNNNNNNN’
BbrPI GTANNNNNNCTTCNNNNNNNNNNNN’
BbsI CCANNNN’NTGG
BbuI C’ACGAG
BbvI GGCGC’C
Bbv12I GAAGACNNNNNNN’NNNN
BbvCI CAC’GTG
BccI GAAGACNN’NNNN
BceAI GCATG’C
BcefI GCAGCNNNNNNNN’NNNN
BcgI GWGCW’C
BciVI CC’TCAGC
BclI CCATCNNNN’N
BcnI ACGGCNNNNNNNNNNNN’NN
BcuI ACGGCNNNNNNNNNNNN’N
BdaI CGANNNNNNTGCNNNNNNNNNNNN’
BfaI GCANNNNNNTCGNNNNNNNNNNNN’
BfiI GTATCCNNNNNN’
BfmI T’GATCA
BfoI CC’SGG
BfrI A’CTAGT
BfuI TGANNNNNNTCANNNNNNNNNNNN’
BfuAI C’TAG
BfuCI ACTGGGNNNNN’
BglI C’TRYAG
BglII RGCGC’Y
BisI C’TTAAG
BlnI GTATCCNNNNNN’
BlpI ACCTGCNNNN’NNNN
BlsI ‘GATC
BmcAI GCCNNNN’NGGC
Bme18I A’GATCT
Bme1390I GC’NGC
BmeRI C’CTAGG
BmeT110I GC’TNAGC
BmgI GCN’GC
BmgBI AGT’ACT
BmgT120I G’GWCC
BmiI CC’NGG
BmrI GACNNN’NNGTC
BmrFI C’YCGRG
BmsI GKGCCC
BmtI CAC’GTC
BmuI GG’NCC
BoxI GGN’NCC
BpiI ACTGGGNNNNN’
BplI CC’NGG
BpmI GCATCNNNNN’NNNN
Bpu10I GCTAG’C
Bpu14I ACTGGGNNNNN’
Bpu1102I GACNN’NNGTC
BpuAI GAAGACNN’NNNN
BpuEI GAGNNNNNCTCNNNNNNNNNNNNN’
BpuMI CTGGAGNNNNNNNNNNNNNNNN’
BpvUI CC’TNAGC
BsaI TT’CGAA
Bsa29I GC’TNAGC
BsaAI GAAGACNN’NNNN
BsaBI CTTGAGNNNNNNNNNNNNNNNN’
BsaHI CC’SGG
BsaJI CGAT’CG
BsaMI GGTCTCN’NNNN
BsaWI AT’CGAT
BsaXI YAC’GTR
BsbI GATNN’NNATC
Bsc4I GR’CGYC
BscAI C’CNNGG
BscGI GAATGCN’
Bse1I W’CCGGW
Bse8I ACNNNNNCTCCNNNNNNNNNN’
Bse21I GGAGNNNNNGTNNNNNNNNNNNN’
Bse118I CAACACNNNNNNNNNNNNNNNNNNNNN’
BseAI CCNNNNN’NNGG
BseBI GCATCNNNN’NN
BseCI CCCGT
BseDI ACTGGN’
Bse3DI GATNN’NNATC
BseGI CC’TNAGG
BseJI R’CCGGY
BseLI T’CCGGA
BseMI CC’WGG
BseMII AT’CGAT
BseNI C’CNNGG
BsePI GCAATGNN’
BseRI GGATGNN’
BseSI GATNN’NNATC
BseXI CCNNNNN’NNGG
BseX3I GCAATGNN’
BseYI CTCAGNNNNNNNNNN’
BsgI ACTGGN’
Bsh1236I G’CGCGC
Bsh1285I GAGGAGNNNNNNNNNN’
BshFI GKGCM’C
BshNI GCAGCNNNNNNNN’NNNN
BshTI C’GGCCG
BshVI C’CCAGC
BsiEI GTGCAGNNNNNNNNNNNNNNNN’
BsiHKAI CG’CG
BsiHKCI CGRY’CG
BsiSI GG’CC
BsiWI G’GYRCC
BslI A’CCGGT
BslFI AT’CGAT
BsmI CGRY’CG
BsmAI GWGCW’C
BsmBI C’YCGRG
BsmFI C’CGG
BsnI C’GTACG
Bso31I CCNNNNN’NNGG
BsoBI GGGACNNNNNNNNNN’NNNN
Bsp13I GAATGCN’
Bsp19I GTCTCN’NNNN
Bsp24I CGTCTCN’NNNN
Bsp68I GGGACNNNNNNNNNN’NNNN
Bsp119I GG’CC
Bsp120I GGTCTCN’NNNN
Bsp143I C’YCGRG
Bsp1286I T’CCGGA
Bsp1407I C’CATGG
Bsp1720I GACNNNNNNTGGNNNNNNNNNNNN’
BspACI CCANNNNNNGTCNNNNNNNNNNNNN’
BspCNI TCG’CGA
BspDI TT’CGAA
BspD6I G’GGCCC
BspEI ‘GATC
BspFNI GDGCH’C
BspGI T’GTACA
BspHI GC’TNAGC
BspLI C’CGC
BspMI CTCAGNNNNNNNNN’
BspNCI AT’CGAT
BspOI GACTCNNNN’NN
BspPI T’CCGGA
BspQI CG’CG
BspTI CTGGAC
BspT104I T’CATGA
BspT107I GGN’NCC
BspTNI ACCTGCNNNN’NNNN
BsrI CCAGA
BsrBI GCTAG’C
BsrDI GGATCNNNN’N
BsrFI GCTCTTCN’NNN
BsrGI C’TTAAG
BsrSI TT’CGAA
BssAI G’GYRCC
BssECI GGTCTCN’NNNN
BssHII ACTGGN’
BssKI CCG’CTC
BssMI GCAATGNN’
BssNI R’CCGGY
BssNAI T’GTACA
BssSI ACTGGN’
BssT1I R’CCGGY
Bst6I C’CNNGG
Bst98I G’CGCGC
Bst1107I ‘CCNGG
BstACI ‘GATC
BstAFI GR’CGYC
BstAPI GTA’TAC
BstAUI C’ACGAG
BstBI C’CWWGG
Bst2BI CTCTTCN’NNN
BstBAI C’TTAAG
Bst4CI GTA’TAC
BstC8I GR’CGYC
BstDEI C’TTAAG
BstDSI GCANNNN’NTGC
BstEII T’GTACA
BstENI TT’CGAA
BstF5I C’ACGAG
BstFNI YAC’GTR
BstH2I ACN’GT
BstHHI GCN’NGC
BstKTI C’TNAG
BstMAI C’CRYGG
BstMBI G’GTNACC
BstMCI CCTNN’NNNAGG
BstMWI GGATGNN’
BstNI CG’CG
BstNSI RGCGC’Y
BstOI GCG’C
BstPI GAT’C
BstPAI GTCTCN’NNNN
BstSCI ‘GATC
BstSFI CGRY’CG
BstSLI GCNNNNN’NNGC
BstSNI CC’WGG
BstUI RCATG’Y
Bst2UI CC’WGG
BstV1I G’GTNACC
BstV2I GACNN’NNGTC
BstXI ‘CCNGG
BstX2I C’TRYAG
BstYI GKGCM’C
BstZI TAC’GTA
BstZ17I CG’CG
BsuI CC’WGG
Bsu15I GCAGCNNNNNNNN’NNNN
Bsu36I GAAGACNN’NNNN
BsuRI CCANNNNN’NTGG
BsuTUI R’GATCY
BtgI R’GATCY
BtgZI C’GGCCG
BthCI GTA’TAC
BtrI GTATCCNNNNNN’
BtsI AT’CGAT
BtsCI CC’TNAGG
BtuMI GG’CC
BveI AT’CGAT
Cac8I C’CRYGG
CaiI GCGATGNNNNNNNNNN’NNNN
CciI GCNG’C
CciNI CAC’GTC
CdiI GCAGTGNN’
CdpI GGATGNN’
CelII TCG’CGA
CfoI ACCTGCNNNN’NNNN
CfrI GCN’NGC
Cfr9I CAGNNN’CTG
Cfr10I T’CATGA
Cfr13I GC’GGCCGC
Cfr42I CATC’G
ChaI GCGGAGNNNNNNNNNNNNNNNNNNNN’
CjeI GC’TNAGC
CjeNII GCG’C
CjePI Y’GGCCR
CjeP659IV C’CCGGG
CjuI R’CCGGY
CjuII G’GNCC
ClaI CCGC’GG
CpoI GATC’
CseI CCANNNNNNGTNNNNNNNNNNNNNNN’
CsiI ACNNNNNNTGGNNNNNNNNNNNNNN’
CspI GAGNNNNNGT
Csp6I CCANNNNNNNTCNNNNNNNNNNNNNN’
Csp45I GANNNNNNNTGGNNNNNNNNNNNNN’
CspAI CACNNNNNNNGAA
CspCI CAYNNNNNRTG
CstMI CAYNNNNNCTC
CviAII AT’CGAT
CviJI CG’GWCCG
CviKI-1 GACGCNNNNN’NNNNN
CviQI A’CCWGGT
DdeI CG’GWCCG
DinI G’TAC
DpnI TT’CGAA
DpnII A’CCGGT
DraI CAANNNNNGTGGNNNNNNNNNNNN’
DraII CCACNNNNNTTGNNNNNNNNNNNNN’
DraIII AAGGAGNNNNNNNNNNNNNNNNNNNN’
DraRI C’ATG
DrdI RG’CY
DrdII RG’CY
DrdIV G’TAC
DriI C’TNAG
DseDI GGC’GCC
EaeI GA’TC
EagI ‘GATC
Eam1104I TTT’AAA
Eam1105I RG’GNCCY
EarI CACNNN’GTG
EciI CAAGNACNNNNNNNNNNNNNNNNNNNN’
Ecl136II GACNNNN’NNGTC
EclXI GAACCA
Eco24I TACGACNNNNNNNNNNNNNNNNNNNN’
Eco31I GACNNN’NNGTC
Eco32I GACNNNN’NNGTC
Eco47I Y’GGCCR
Eco47III C’GGCCG
Eco52I CTCTTCN’NNN
Eco57I GACNNN’NNGTC
Eco72I CTCTTCN’NNN
Eco81I GGCGGANNNNNNNNNNN’
Eco88I GAG’CTC
Eco91I C’GGCCG
Eco105I GRGCY’C
Eco130I GGTCTCN’NNNN
Eco147I GAT’ATC
EcoHI G’GWCC
EcoICRI AGC’GCT
Eco57MI C’GGCCG
EcoNI CTGAAGNNNNNNNNNNNNNNNN’
EcoO65I CAC’GTG
EcoO109I CC’TNAGG
EcoRI C’YCGRG
EcoRII G’GTNACC
EcoRV TAC’GTA
EcoT14I C’CWWGG
EcoT22I AGG’CCT
EcoT38I ‘CCSGG
Eco53kI GAG’CTC
EgeI CTGRAGNNNNNNNNNNNNNNNN’
EheI CCTNN’NNNAGG
ErhI G’GTNACC
EsaBC3I RG’GNCCY
EsaSSI G’AATTC
Esp3I ‘CCWGG
FaeI GAT’ATC
FaiI C’CWWGG
FalI ATGCA’T
FaqI GRGCY’C
FatI GAG’CTC
FauI GGC’GCC
FauNDI GGC’GCC
FbaI C’CWWGG
FblI TC’GA
FinI GACCAC
FmuI CGTCTCN’NNNN
Fnu4HI CATG’
FokI YA’TR
FriOI AAGNNNNNCTTNNNNNNNNNNNNN’
FseI GGGACNNNNNNNNNN’NNNN
FspI ‘CATG
FspAI CCCGCNNNN’NN
FspBI CA’TATG
FspEI T’GATCA
Fsp4HI GT’MKAC
GdiII GGGAC
GlaI GGNC’C
GluI GC’NGC
GsaI GGATGNNNNNNNNN’NNNN
GsuI GRGCY’C
HaeI GGCCGG’CC
HaeII TGC’GCA
HaeIII RTGC’GCAY
HaeIV C’TAG
HapII CCNNNNNNNNNNNN’NNNN
HgaI GC’NGC
HgiEII C’GGCCR
HhaI GC’GC
Hin1I GC’NGC
Hin1II CCCAG’C
Hin4I CTGGAGNNNNNNNNNNNNNNNN’
Hin6I WGG’CCW
HinP1I RGCGC’Y
HincII GG’CC
HindII GAYNNNNNRTCNNNNNNNNNNNNNN’
HindIII GAYNNNNNRTCNNNNNNNNNNNNN’
HinfI C’CGG
HpaI GACGCNNNNN’NNNNN
HpaII ACCNNNNNNGGT
HphI GCG’C
Hpy8I GR’CGYC
Hpy99I CATG’
Hpy166II GAYNNNNNVTCNNNNNNNNNNNNN’
Hpy188I GABNNNNNRTCNNNNNNNNNNNNN’
Hpy188III G’CGC
HpyAV G’CGC
HpyCH4III GTY’RAC
HpyCH4IV GTY’RAC
HpyCH4V A’AGCTT
HpyF3I G’ANTC
HpyF10VI GTT’AAC
Hsp92I C’CGG
Hsp92II GGTGANNNNNNNN’
HspAI GTN’NAC
KasI CGWCG’
KflI GTN’NAC
KpnI TCN’GA
Kpn2I TC’NNGA
KspI CCTTCNNNNNN’
Ksp22I ACN’GT
KspAI A’CGT
Kzo9I TG’CA
LguI C’TNAG
LlaGI GCNNNNN’NNGC
LpnI GR’CGYC
LpnPI CATG’
Lsp1109I G’CGC
LweI G’GCGCC
MabI GG’GWCCC
MaeI GGTAC’C
MaeII T’CCGGA
MaeIII CCGC’GG
MalI T’GATCA
MaqI GTT’AAC
MauBI ‘GATC
MbiI GCTCTTCN’NNN
MboI CTNGAYG
MboII RGC’GCY
McaTI CCDGNNNNNNNNNN’NNNN
MfeI GCAGCNNNNNNNN’NNNN
MflI GCATCNNNNN’NNNN
MhlI A’CCWGGT
MjaIV C’TAG
MlsI A’CGT
MluI ‘GTNAC
MluNI GA’TC
MlyI CRTTGACNNNNNNNNNNNNNNNNNNNNN’
Mly113I CG’CGCGCG
MmeI CCG’CTC
MnlI ‘GATC
Mph1103I GAAGANNNNNNNN’
MreI GCGC’GC
MroI C’AATTG
MroNI R’GATCY
MroXI GDGCH’C
MscI GTNNAC
MseI TGG’CCA
MslI A’CGCGT
MspI TGG’CCA
Msp20I GAGTCNNNNN’
MspA1I GG’CGCC
MspCI TCCRACNNNNNNNNNNNNNNNNNNNN’
MspJI CCTCNNNNNNN’
MspR9I ATGCA’T
MssI CG’CCGGCG
MunI T’CCGGA
MvaI G’CCGGC
Mva1269I GAANN’NNTTC
MvnI TGG’CCA
MvrI T’TAA
MwoI CAYNN’NNRTG
NaeI C’CGG
NarI TGG’CCA
NciI CMG’CKG
NcoI C’TTAAG
NdeI CNNRNNNNNNNNN’NNNN
NdeII CC’NGG
NgoAVIII GTTT’AAAC
NgoMIV C’AATTG
NhaXI CC’WGG
NheI GAATGCN’
NlaIII CG’CG
NlaIV CGAT’CG
NlaCI GCNNNNN’NNGC
Nli3877I GCC’GGC
NmeAIII GG’CGCC
NmeDI CC’SGG
NmuCI C’CATGG
NotI CA’TATG
NruI ‘GATC
NsbI GACNNNNNTGANNNNNNNNNNNNN’
NsiI TCANNNNNGTCNNNNNNNNNNNNNN’
NspI G’CCGGC
NspV CAAGRAG
OliI G’CTAGC
PabI CATG’
PacI GGN’NCC
PaeI CATCACNNNNNNNNNNNNNNNNNNN’
PaeR7I CYCGR’G
PagI GCCGAGNNNNNNNNNNNNNNNNNNNNN’
PalAI RCCGGYNNNNNNN’NNNNN
PasI ‘GTSAC
PauI GC’GGCCGC
PceI TCG’CGA
PciI TGC’GCA
PciSI ATGCA’T
PcsI RCATG’Y
PctI TT’CGAA
PdiI CACNN’NNGTG
PdmI GTA’C
PfeI TTAAT’TAA
Pfl23II GCATG’C
Pfl1108I C’TCGAG
PflFI T’CATGA
PflMI GG’CGCGCC
PfoI CC’CWGGG
PhoI G’CGCGC
PinAI AGG’CCT
PlaDI A’CATGT
PleI GCTCTTCN’NNN
Ple19I WCGNNNN’NNNCGW
PmaCI GAATGCN’
PmeI GCC’GGC
PmlI GAANN’NNTTC
PpiI G’AWTC
PpsI C’GTACG
Ppu10I TCGTAG
Ppu21I GACN’NNGTC
PpuMI CCANNNN’NTGG
PscI T’CCNGGA
PshAI GG’CC
PshBI A’CCGGT
PsiI CATCAGNNNNNNNNNNNNNNNNNNNNN’
Psp03I GAGTCNNNN’N
Psp5II CGAT’CG
Psp6I CAC’GTG
Psp1406I GTTT’AAAC
Psp124BI CAC’GTG
PspCI GAACNNNNNCTCNNNNNNNNNNNNN’
PspEI GAGNNNNNGTTCNNNNNNNNNNNN’
PspGI GAGTCNNNN’N
PspLI A’TGCAT
PspN4I YAC’GTR
PspOMI RG’GWCCY
PspOMII A’CATGT
PspPI GACNN’NNGTC
PspPPI AT’TAAT
PspPRI TTA’TAA
PspXI GGWC’C
PsrI RG’GWCCY
PssI ‘CCWGG
PstI AA’CGTT
PstNI GAGCT’C
PsuI CAC’GTG
PsyI G’GTNACC
PteI ‘CCWGG
PvuI C’GTACG
PvuII GGN’NCC
RcaI G’GGCCC
RceI CGCCCARNNNNNNNNNNNNNNNNNNNN’
RgaI G’GNCC
RigI RG’GWCCY
RleAI CCYCAGNNNNNNNNNNNNNNN’
RpaBI VC’TCGAGB
RpaB5I GAACNNNNNNTACNNNNNNNNNNNN’
RruI GTANNNNNNGTTCNNNNNNNNNNNN’
RsaI RGGNC’CY
RsaNI CTGCA’G
RseI CAGNNN’CTG
RsrII R’GATCY
Rsr2I GACN’NNGTC
SacI G’CGCGC
SacII CGAT’CG
SalI CAG’CTG
SapI T’CATGA
SaqAI CATCGACNNNNNNNNNNNNNNNNNNNN’
SatI GCGAT’CGC
Sau96I GGCCGG’CC
Sau3AI CCCACANNNNNNNNNNNN’
SbfI CCCGCAGNNNNNNNNNNNNNNNNNNNN’
ScaI CGRGGACNNNNNNNNNNNNNNNNNNNN’
SchI TCG’CGA
SciI GT’AC
ScrFI G’TAC
SdaI CAYNN’NNRTG
SdeAI CG’GWCCG
SdeOSI CG’GWCCG
SduI GAGCT’C
SelI CCGC’GG
SetI G’TCGAC
SexAI GCTCTTCN’NNN
SfaAI T’TAA
SfaNI GC’NGC
SfcI G’GNCC
SfiI ‘GATC
SfoI CCTGCA’GG
Sfr274I AGT’ACT
Sfr303I GAGTCNNNNN’
SfuI CTC’GAG
SgeI CC’NGG
SgfI CCTGCA’GG
SgrAI CAGRAGNNNNNNNNNNNNNNNNNNNNN’
SgrBI GACNNNNRTGANNNNNNNNNNNN’
SgrDI TCAYNNNNGTCNNNNNNNNNNNNN’
SgsI GDGCH’C
SimI ‘CGCG
SinI ASST’
SlaI A’CCWGGT
SmaI GCGAT’CGC
SmiI GCATCNNNNN’NNNN
SmiMI C’TRYAG
SmlI GGCCNNNN’NGGCC
SmoI GGC’GCC
SmuI C’TCGAG
SnaI CCGC’GG
SnaBI TT’CGAA
SpeI CNNGNNNNNNNNN’NNNN
SphI GCGAT’CGC
SpoDI CR’CCGGYG
SrfI CCGC’GG
Sse9I CG’TCGACG
Sse8387I GG’CGCGCC
Sse8647I GG’GTC
SseBI G’GWCC
SsiI C’TCGAG
SspI CCC’GGG
SspDI ATTT’AAAT
SspD5I CAYNN’NNRTG
SstI C’TYRAG
SstII C’TYRAG
SstE37I CCCGCNNNN’NN
Sth132I GTATAC
Sth302II TAC’GTA
StrI A’CTAGT
StsI GCATG’C
StuI GCGGRAG
StyI GCCC’GGGC
StyD4I ‘AATT
SwaI CCTGCA’GG
TaaI AG’GWCCT
TaiI AGG’CCT
TaqI C’CGC
TaqII AAT’ATT
TasI G’GCGCC
TatI GGTGANNNNNNNN’
TauI GAGCT’C
TfiI CCGC’GG
TliI CGAAGACNNNNNNNNNNNNNNNNNNNN’
Tru1I CCCGNNNN’NNNN
Tru9I CC’GG
TscAI C’TCGAG
TseI GGATGNNNNNNNNNN’NNNN
TsoI AGG’CCT
Tsp45I C’CWWGG
Tsp509I ‘CCNGG
TspDTI ATTT’AAAT
TspEI ACN’GT
TspGWI ACGT’
TspMI T’CGA
TspRI GACCGANNNNNNNNNNN’
TssI CACCCANNNNNNNNNNN’
TstI ‘AATT
TsuI W’GTACW
Tth111I GCSG’C
Tth111II G’AWTC
UbaF9I C’TCGAG
UbaF11I T’TAA
UbaF12I T’TAA
UbaF13I CASTGNN’
UbaF14I G’CWGC
UbaPI TARCCANNNNNNNNNNN’
UnbI ‘GTSAC
Van91I ‘AATT
Vha464I ATGAANNNNNNNNNNN’
VneI ‘AATT
VpaK11AI ACGGANNNNNNNNNNN’
VpaK11BI C’CCGGG
VspI CASTGNN’
XagI GAGNNNCTC
XapI CACNNNNNNTCCNNNNNNNNNNNN’
XbaI GGANNNNNNGTGNNNNNNNNNNNNN’
XceI GCGAC
XcmI GACN’NNGTC
XhoI CAARCANNNNNNNNNNN’
XhoII TACNNNNNRTGT
XmaI TCGTA
XmaJI CTACNNNGTC
XmiI GAGNNNNNNCTGG
XmnI CCANNNNNTCG
XspI CGAACG
ZraI ‘GGNCC
ZrmI CCANNNN’NTGG
Zsp2I C’TTAAG
…
LOCUS NM_001250672 1498 bp mRNA linear PLN 18-OCT-2018
DEFINITION Glycine max cationic peroxidase 2 (PRX2), mRNA.
ACCESSION NM_001250672
VERSION NM_001250672.2
KEYWORDS RefSeq.
SOURCE Glycine max (soybean)
ORGANISM Glycine max
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae;
Pentapetalae; rosids; fabids; Fabales; Fabaceae; Papilionoideae; 50
kb inversion clade; NPAAA clade; indigoferoid/millettioid clade;
Phaseoleae; Glycine; Glycine subgen. Soja.
REFERENCE 1 (bases 1 to 1498)
AUTHORS Gijzen M, Miller SS, Bowman LA, Batchelor AK, Boutilier K and Miki
BL.
TITLE Localization of peroxidase mRNAs in soybean seeds by in situ
hybridization
JOURNAL Plant Mol. Biol. 41 (1), 57-63 (1999)
PUBMED 10561068
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from BT099403.1.
On Sep 4, 2012 this sequence version replaced NM_001250672.1.
##Evidence-Data-START##
Transcript exon combination :: AK244214.1, AK286032.1 [ECO:0000332]
RNAseq introns :: single sample supports all introns
SAMN00264986, SAMN00264988
[ECO:0000348]
##Evidence-Data-END##
PRIMARY REFSEQ_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP
1-1498 BT099403.1 1-1498 c
FEATURES Location/Qualifiers
source 1..1498
/organism=”Glycine max”
/mol_type=”mRNA”
/db_xref=”taxon:3847″
/chromosome=”17″
/map=”17″
gene 1..1498
/gene=”PRX2″
/note=”cationic peroxidase 2″
/db_xref=”GeneID:547513″
misc_feature 10..12
/gene=”PRX2″
/note=”upstream in-frame stop codon”
CDS 25..1020
/gene=”PRX2″
/EC_number=”1.11.1.7″
/note=”class III plant peroxidase”
/codon_start=1
/product=”cationic peroxidase 2 precursor”
/protein_id=”NP_001237601.1″
/db_xref=”GeneID:547513″
/translation=”MAPKGLIFLAVLCFSALSLSRCLAEDNGLVMNFYKESCPQAEDI
IKEQVKLLYKRHKNTAFSWLRNIFHDCAVQSCDASLLLDSTRRSLSEKETDRSFGLRN
FRYIETIKEALERECPGVVSCADILVLSARDGIVSLGGPHIPLKTGRRDGRRSRADVV
EQFLPDHNESISAVLDKFGAMGIDTPGVVALLGAHSVGRTHCVKLVHRLYPEIDPALN
PDHVPHILKKCPDAIPDPKAVQYVRNDRGTPMILDNNYYRNILDNKGLLIVDHQLAND
KRTKPYVKKMAKSQDYFFKEFSRAITLLSENNPLTGTKGEIRKQCNAANKHHEEP”
sig_peptide 25..96
/gene=”PRX2″
/inference=”COORDINATES: ab initio prediction:SignalP:4.0″
ORIGIN
1 gagcaagagt gaagagcgaa gagaatggct cccaagggtt taatcttttt ggctgtgtta
61 tgcttctcag cactgtcact gagtcgttgt cttgcggagg ataatggact tgttatgaac
121 ttctacaagg aatcatgccc tcaggctgaa gacatcatca aagaacaagt caagcttctc
181 tacaagcgcc acaagaacac tgctttctcc tggctcagaa acatcttcca tgactgtgct
241 gttcagagtt gtgatgcttc actgttgctg gactccacaa gaaggagctt gtctgagaag
301 gaaacagata gaagctttgg gttgagaaat ttcaggtaca ttgagaccat caaagaagct
361 ttggaaaggg aatgcccagg agttgtttcc tgtgctgata tcctcgttct ctctgccaga
421 gatggcattg tttcgctagg aggtccccat atccctctta aaacaggaag aagggatggt
481 agaaggagca gagccgatgt ggtagagcag ttcctcccag accacaatga atccatttct
541 gcagttcttg acaagtttgg tgccatggga attgacaccc ccggcgtagt tgcattgctt
601 ggagcacaca gtgttggtcg aacccattgt gtgaagttgg tgcaccgttt gtacccagag
661 attgatccag ctctgaaccc tgaccacgtc cctcacattc tgaagaagtg ccctgatgcc
721 attccagacc ctaaggccgt gcagtacgtg agaaacgacc gtggcacccc catgattcta
781 gacaacaatt actacagaaa tatattggac aacaagggct tgttgatagt ggatcaccaa
841 ctagccaatg acaagaggac caagccttat gtgaagaaaa tggccaagag ccaggactat
901 ttcttcaagg agttttctag agccattact ttgctctctg agaacaaccc tctcactggc
961 acaaagggtg agatcagaaa gcagtgcaat gctgccaaca agcaccatga ggagccttaa
1021 ttgcttcccg cttaatttgg gccttgaatt ttcttcccct tctctatgtg gaagaaatct
1081 gtaagatatt atgcaaaaaa taattaaggt gtttttcttt aaatgggttg gttgattggt
1141 tcaatgaacc gatcaagacc acagcaggtt catggggatg cgaggattaa gacgctttgt
1201 tttttaatct tccgatgtca ctcttgtttg ttagtttgtt tttttatttt ttatttcaat
1261 aagtactgtg caagtaggtt agagttgggt agaagggcat gttcatggtg ttaattacta
1321 tgttatgtat gcatgtgagt gctgctatcg atggcaagat gtcaatgtat gcgtgtagtg
1381 ctgttatcga tgagagtgaa aatgtttatg atatccacac taataaagct agcttgctct
1441 tgctacataa taaataaatc atggcccacg gtcattatac aaaaaaaaaa aaaaaaaa
//
Why Choose Us
- 100% non-plagiarized Papers
- 24/7 /365 Service Available
- Affordable Prices
- Any Paper, Urgency, and Subject
- Will complete your papers in 6 hours
- On-time Delivery
- Money-back and Privacy guarantees
- Unlimited Amendments upon request
- Satisfaction guarantee
How it Works
- Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
- Fill in your paper’s requirements in the "PAPER DETAILS" section.
- Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
- Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
- From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.