4

HOW GENES FUNCTION

 

 

OBJECTIVES:

 

By the end of this session the student should be able to:

 

*  Use the genetic code to translate coding sequences

*  Calculate the number of codons and amino acids from the number of bases

*   Name the main sites of non-coding DNA segments

*  Name the function and features of the promoter sequence

*  Distinguish between introns and exons

*  Explain how gene expression is controlled

       

 


THE GENETIC CODE

 

A gene is a stretch of DNA that carries a coded message for the synthesis of a specific protein.  Most of the DNA of an organism does not code for proteins.  A stretch of DNA is recognised as a gene coding for a protein if it is preceded by a promoter sequence. The promoter sequence is the site of attachment for RNA polymerase, which is responsible for the process of transcription of the genetic message to m-RNA. The promoter sequence is present only on the template (transcribed) strand.  The sequence of bases on m-RNA is complementary to that on the transcribed strand of DNA but corresponds to that on the coding strand except that T is replaced by U  (Fig. 4.1).

 

 


 

 


The genetic code is, by convention, interpreted with reference to the sequence of bases on m-RNA. The m-RNA sequence happens to correspond to the coding strand of DNA, with the exception that U on RNA corresponds to T on DNA.

 

The sequence of bases on m-RNA determines the exact sequence of amino acids in the protein. The bases are read in triplets. Each triplet of bases is termed a codon and corresponds to a particular amino acid. The genetic code designates how the codons correspond to the amino acids.   There are four base (A, U, C and G) and 64 possible ways in which these can be combined to form codons as shown in Fig. 4.2.  Each codon is specific for one amino acid.  However, there are only 20 amino acids and so one amino acid may be represented by more than one codon. Because of this the genetic code is described as degenerate.

The codon AUG codes for methionine but when it occurs after a promoter sequence it also serves as a "start" signal indicating the beginning of the coded message.

 

The codons UAA (also called "ochre"), UAG (also called "amber") and UGA do not code for any amino acid but act as "stop" signals for the end of a gene message. Note that all the  ‘stop’ codons start with U and  include an A. 

 

 


 


SIZES OF GENES AND PROTEINS

Every three nucleotides correspond to one codon and one amino acid. If a gene contains 1,200 nucleotides, this corresponds to 1,200/3 = 400 codons and the resulting protein contains 400 amino acids.

 

The size of a protein is often expressed as its molecular mass.  The molecular masses of amino acids vary but it can be assumed that amino acids have an average molecular mass of approximately 100 Daltons.  Thus a protein consisting of 400 amino acids will have a molecular mass of approximately 40,000 Daltons. Conversely, a protein with a molecular mass of 60,000 Daltons is expected to contain about 60,000/100  = 600 amino acids and the gene coding for this protein will consist of about 600 codons and 600 x 3 = 1,800 nucleotides.

 

THE C-VALUE PARADOX

The amount of DNA contained in a cell of a particular organism is termed the C-value.  Every organism has a specific c-value.  More complex organisms are expected to contain more genes than simple organisms and consequently would require more DNA.  In fact, however, the C-value or DNA content for a particular organism is not proportional to the number of genes or to the complexity of the organism. Thus the frog has seven times the DNA content of Man, and the Lilly has 100 times the DNA content of Man. This is the C-value paradox.

 

It has been estimated that the human genome contains 3.5 billion base pairs. This amount of DNA could contain about 2 million genes. In fact, however, the total number of genes in man has been estimated to be only about 40,000 to 80,000.  This means that about 3% of human genome codes for proteins, and 97 % of the human genome is non-coding DNA.

 

  

NON-CODING DNA

Although over 95% of the total DNA is non-coding, it may have other important functions such as the regulation of gene activity.  The main examples of non-coding DNA are the following:

1.       Promoter sequences

2.     Intervening sequences (introns)

3.     Terminator sequences

4.     Sequences related to chromosome structure

5.     Pseudogenes

6.     Repetitive DNA

These are explained in the following sections.

1. Promoter sequence

In order to be transcribed a gene must be preceded by a promoter sequence. This is the recognition site for the attachment of RNA polymerase, the enzyme responsible for transcription. The promoter sequence is followed by an initiator sequence, which marks the site where transcription to m-RNA begins.   The initiator codon or start signal on m-RNA is the sequence AUG, which is also codon 1 and corresponds to the amino acid methionine.  It also marks the  site where translation begins (Fig. 4.3).

 

 

 

 


 

 

 


The initiator sequence is not the beginning of the gene itself but the first part to be transcribed to m-RNA.  The beginning of the gene itself is indicated by the sequence "AUG" on m-RNA.  This corresponds to “TAC” on the transcribed strand.  This is codon 1 of the gene and is translated to the amino acid methionine. Subsequent triplets of bases are read as codons and are translated according to the genetic code until a “stop” signal is encountered.