| Chem 432 |
Biochemistry |
Spring 2002 |
| Lecture Notes:: 8 April |
© R. Paselk 2002 |
|
| |
|
|
| PREVIOUS |
|
NEXT |
Transcription IV
Exons and Introns
As noted last time, the mRNA transcript, hnRNA is first processed
in the nucleus by capping the 5' end and cleaving and adding a
poly(A) tail to the 3' end. The is followed by additional processing
in which the introns are removed before the final mRNA, capped
and with a poly(A) tail is transported to the cytosol for translation.
Note that for most eukaryotic genes the majority of the transcribed
RNA is never translated. Rather about 80% on average is excised
as introns and degraded.
The process whereby the introns are excised leaving the exons
strung together is referred to as gene splicing. Gene splicing
is very precise (a single base error would result in an unreadable
transcript) and assembles the exons in sequence.
There seems to be a very high degree of sequence homolgy at
exon-intron junctions, which is necessary and sufficient for proper
excision. In most eukaryotes these sequences include:
- There is an invariant GU at the intron 5' boundary.
- There is an invariant AG at the intron 3' boundary.
The actual excision of the intron occurs in two reactions:
- A 2'-3' phosphodiester bond is formed by the attack of the
2'-OH on the ribose of a specific A and the 5' terminal phosphate,
which releases the 5' end of the exon. As a result the intron
gains a lariat structure on the 5' end. In vertebrates
the A is within a highly conserved sequence, CURAY, located 20-50
residues upstream of the 3' end of the intron.
- The resultant free 3'-OH group of the upstream exon now attacks
the the 5' phosphate of the downstream exon, forming a new phophodiester
bond and releasing the intron, and creating the spliced product.
The intron is released as a lariat structure, which is rapidly
degraded.
Splicing is mediated by small nuclear RNA containing proteins
(snRNPs or "snurps"). The small nuclear RNAs (60-300
bases) in these snRNPs are highly conserved. A number of snRNPs
have known functions:
- U1-snRNPs RNA's 5' end is complementary to to a consensus
sequence in the mRNA'a splice junctions.
- U2-snRNP recognizes the the intron lariat branch point.
- U5-snRNP recognizes the 3' splice junction.
The splicing itself takes place in the spiceosome particle.
This large particle (50-60 s, about the size of the large ribosomal
particle of E. coli = approx 1.6 megadaltons) includes
a pre-mRNA, the snRNPs above and the U4-U6-snRNP (held together
by base-pairing)and a variety of pre-mRNA binding proteins.
In addition to the modifications already noted, hnRNA also
gets methylated to the extent of about 0.1 % of the A residues,
of which many are retained in the final mRNA.
rRNA Processing
As we noted some time ago, rRNA in both E. coli and
eukaryotes is coded in a large piece of RNA which must be leaved
to release the large and small ribosomal RNAs and the 5s RNA of
the ribosomes. Though, similar, the two systems differ in significant
ways.
E.coli. There are seven polycistronic operons
containing nearly identical rRNA genes and up to four tRNA genes
each. The operon transcripts are cleaved by endonucleases.
- The rRNA primary transcript is first cleaved as it is transcribed
by a series of specific endonucleases:
- RNase III action gives the pre-16s and pre-23s rRNAs.
- RNaseP, RNaseE and RNaseF action gives the pre-5s rRNA and
some tRNAs.
- The 5' and 3' ends of the pre-rRNAs are then trimmed after
they associate with ribosomal proteins to give the final rRNAs:
- RNaseM16 trims16s rRNA
- RNaseM23 trims 23s rRNA
- RNaseM5 trims 5s rRNA
Eukaryotes generally have hundreds of tandemly repeated
copies of the rRNA genes for the 5.8s, 18s and 28s RNAs.
- The transcript is arranged (5'-3') in the order: 18s, 5.8s,
23s with intervening spacers to give a 45s RNA of about 7500
nucleotides.
- The RNA processing is at least in part done via self-splicing
(only a few have introns):
- The 3' OH of guanine forms a phophodiester bond with the
intron's 5' end.
- The 3' OH of the newly liberated 5' exon forms a phosphodiester
bonds with the 5' terminal P of the 3' exon, thus splicing
them together.
- The 3' OH of the intron forms a phophodiester bond with the
phosphate of the nucleotide 15 residues from the 5' end, giving
a 5' terminal fragment, with the remainder of the intron cyclized.
- The 5s rRNA is processed separately and in similar fashion
to the tRNAs.
Translation
The Genetic Code
Major considerations in understanding the coding required to
translate the four base nucleic acid alphabet to the 20 amino
acid alphabet include:
- How many bases are used to determine each amino acid? Obviously
need at least 20 codon "words." From a simple consideration
of possibilities:
- A one base codon could code for a maximum of 41
= 4 amino acids, clearly not sufficient.
- A two base codon could code for a maximum of 42
= 16 amino acids, still not enough.
- A three base codon could code for a maximum of 43
= 64 amino acids, which is more than adequate. Thus a three base
codon is needed, but will be highly degenerate if all codons
are used.
- Is the code punctuated - that is, are there signals between
codons indicating the beginning and ends of codons. (For example,
one combination of bases could be set aside as a "period"
to indicate and set off read codons.)
- Is the code overlapping? Thus we could imagine a triplet
code where every possible triplet is read such that ABCDABCD
might be read as: ABC, BCD, CDA, DAB, etc. instead of ABC, DAB,
etc.
In fact the code has proven to be a non-overlapping, non-punctuated,
triplet code in which gene sequences are co-linear with peptide
sequences, and where 5'Æ 3' corresponds
to NH2Æ COO-.
The code was originally elucidated in cell-free systems containing
the complete protein synthetic system except for a messenger RNA
(ribosomes, GTP, amino acyl tRNAs etc.). If polyU is then introduced
to the system, a poly-phe is produced, so one codon for phe =
UUU, similarly each of the other three polyNA's can be used. Then
can do alternate (e.g. UCUCUCUCUCUC) two different amino acids
will be coded etc. Finally, were able to synthesize and work with
triplets to get the entire code.
The Code. the "Standard" genetic code
is given in Table 26-1 of your text. This is the code used by
all known organisms, the only exceptions being some deviations
in the mitochondrial tRNAs, and, it is now known, in the ciliated
protozoa.
- Of the 64 possible codons, 61 code for amino acids. The remaining
three are "stop" codons (UAA = ochre, UAG = amber,
and UGA = opal. The names are derived from the discoverer of
UAG, Bernstein which is German for amber. The other two are puns
on amber.).
- The code is very conservative, many mutations will
have no effect, particularly in the third base.
- The second base determines the character of the amino acid.
Thus:
- U in the second position gives a hydrophobic
amino acid.
- C in the second position gives a neutral hydrophilic
amino acid or proline.
- G in the second position gives a basic or
neutral hydrophilic amino acid.
- A in the second position gives a hydrophilic
amino acid.
Last modified 10 April 2002