By the end of this session the
student should be able to:
* Use the genetic code to translate coding sequences
* Calculate the number of codons and amino acids from the number of
bases
* Name the main sites of non-coding DNA segments
* Name the function and features of the
promoter sequence
* Distinguish between introns and exons
* Explain how gene expression is controlled
A gene is a stretch
of DNA that carries a coded message for the synthesis of a specific
protein. Most of the DNA of an organism
does not code for proteins. A stretch
of DNA is recognised as a gene coding for a protein if it is preceded by a promoter sequence. The promoter sequence is the site of
attachment for RNA polymerase, which is responsible for the process of
transcription of the genetic message to m-RNA. The promoter sequence is present
only on the template (transcribed) strand. The sequence of bases on m-RNA is
complementary to that on the transcribed strand of DNA but corresponds to that
on the coding strand except that T is replaced by U (Fig. 4.1).

The genetic code is, by convention, interpreted with reference to the sequence of bases on m-RNA. The m-RNA sequence happens to correspond to the coding strand of DNA, with the exception that U on RNA corresponds to T on DNA.
The sequence of
bases on m-RNA determines the exact sequence of amino acids in the protein. The
bases are read in triplets. Each triplet of bases is termed a codon and corresponds to a
particular amino acid. The genetic code designates how the codons correspond to
the amino acids. There are four base
(A, U, C and G) and 64 possible ways in which these can be combined to form codons
as shown in Fig. 4.2. Each codon is
specific for one amino acid. However,
there are only 20 amino acids and so one amino acid may be represented by more
than one codon. Because of this the genetic code is described as degenerate.
The codon AUG
codes for methionine but when it occurs after a promoter sequence it also
serves as a "start"
signal indicating the beginning of the coded message.
The codons UAA
(also called "ochre"), UAG (also called "amber") and UGA do not code for any amino acid but act as
"stop"
signals for the end of a gene message. Note that all the stop codons start with U and include an A.

Every three nucleotides correspond to one codon and one amino acid. If a
gene contains 1,200 nucleotides, this corresponds to 1,200/3 = 400 codons and
the resulting protein contains 400 amino acids.
The size of a protein is often expressed as its molecular mass. The molecular masses of amino acids vary but
it can be assumed that amino acids have an average molecular mass of
approximately 100 Daltons. Thus a
protein consisting of 400 amino acids will have a molecular mass of
approximately 40,000 Daltons. Conversely, a protein with a molecular mass of
60,000 Daltons is expected to contain about 60,000/100 = 600 amino acids and the gene coding for
this protein will consist of about 600 codons and 600 x 3 = 1,800 nucleotides.
The amount of DNA contained in a cell of a particular organism is termed
the C-value. Every organism has a
specific c-value. More complex
organisms are expected to contain more genes than simple organisms and consequently
would require more DNA. In fact,
however, the C-value or DNA content for a particular organism is not
proportional to the number of genes or to the complexity of the organism. Thus
the frog has seven times the DNA content of Man, and the Lilly has 100 times
the DNA content of Man. This is the C-value paradox.
It has been estimated that the human genome contains 3.5 billion base pairs. This amount of DNA could contain about 2 million genes. In fact, however, the total number of genes in man has been estimated to be only about 40,000 to 80,000. This means that about 3% of human genome codes for proteins, and 97 % of the human genome is non-coding DNA.
Although
over 95% of the total DNA is non-coding, it may have other important functions
such as the regulation of gene activity.
The main examples of non-coding DNA are the following:
1.
Promoter sequences
2.
Intervening
sequences (introns)
3.
Terminator sequences
4.
Sequences related to
chromosome structure
5.
Pseudogenes
6.
Repetitive DNA
These are explained in the following sections.
In order to be transcribed a gene must be preceded by a promoter sequence. This is
the recognition site for the attachment of RNA polymerase, the enzyme
responsible for transcription. The promoter sequence is followed by an initiator sequence,
which marks the site where transcription to m-RNA begins. The initiator codon or start
signal on m-RNA is the
sequence AUG, which is
also codon 1 and corresponds to the amino acid methionine. It also marks the site where translation
begins (Fig. 4.3).

The initiator sequence is not the beginning
of the gene itself but the first part to be transcribed to m-RNA. The beginning of the gene itself is
indicated by the sequence "AUG" on m-RNA. This corresponds to TAC on the
transcribed strand. This is
codon 1 of the gene and is translated to the amino acid methionine. Subsequent triplets of bases are read as codons and are translated according to the genetic code until a stop signal is
encountered.