The instructions for the synthesis of proteins and RNA are stored in coded genes made of DNA, as a continuous string of
For protein synthesis, every 3 bases code for an amino acid:
(Iso = Isoleucine, Asg = Asparagine, Gln = Glutamine. Note how the acidic, basic, hydrophilic, and hydrophobic amino acids are grouped together.)
There are 10 bases for each turn of the double helix. The bases are oriented oppositely on the two strands.
DNA is in itself protective of its bases, because these occur in joined pairs inside the helix. In addition, it is stored coiled around proteins, which assemble together to form coiled structures called
Positively-charged
When genes need to be used, their information needs to be transcribed faithfully and translated into proteins.
To start the process, the local packaging proteins around a gene are removed to leave a stretch of DNA. Most of the time, genes are switched off by having
To transcribe the gene, two events must happen:
The main copying protein, RNA polymerase, attaches to the transcription factor and then to the DNA.
Powered by ATP, it starts copying DNA (by first unwinding the DNA), outputting RNA. RNA differs from DNA only in having one extra
When the RNA polymerase reaches the end of the gene, it finds special repeating sequences, or special small proteins, which cause the outcoming RNA to coil up on itself, halting the polymerase. The RNA is then cut out, while the polymerase detaches from the DNA.
Several RNA polymerase proteins are usually copying a single DNA gene, as seen in the inset photo: the dark line is DNA, with hundreds of RNA being copied out. The copied RNA is then "processed" by other proteins, trimming out the tail.
The whole transcription process may be stopped short by certain protein factors.
The resulting RNA may be of two types:
rRNA and tRNA are continuously transcribed by special polymerases that do not need the initial stages of initiation, because they need to be present all the time to synthesize proteins. In fact, several gene copies code for them, to keep up with demand, and just in case some fail — tRNAs are probably the most plentiful macro-molecules inside the cell.
Several ribosomes in turn attach to the mRNA at the starting point. As they move along, each one produces the polypeptide chain of amino acids coded for by the mRNA. Within of switching on a gene, are being produced. The lifetime of a single mRNA is about 1-15-100mins.
mRNA is eventually cut up by a large
The polypeptide normally folds up into its intended shape. It may be "processed" further by clipping some amino acids, adding cross-linking bonds, or adding some functional groups. For example, small polypeptides tend to wiggle too much to fold properly, so a longer polypeptide is coded for, which folds well, and the extra part is cut off by an enzyme.
When the temperature is too high, the polypeptides take longer to fold, or simply remain unfolded. This is dangerous for the cell, as misfolded polypeptides will often have exposed hydrophobic parts, and these will stick to other misfolded proteins' hydrophobic parts, forming aggregates that clog the cell. To prevent this from happening, special small "chaperone" proteins protect the hydrophobic parts of an unfolded polypeptide, until they reach a large chaperone protein, which allows the polypeptide to enter and fold properly. Other chaperones force a mis-folded polypeptide through a pore, giving it a second chance to fold properly.
Some errors occur in translation, at less than 1 error every 100k amino acids:
Even though the error rate is low, the mRNA is quite long, and this results in about 0.1% of polypeptides being seriously faulty. Faulty or old proteins are marked by ubiquitin and are doomed to be cut up into peptides inside a large hollow
Genes work in a self-regulating network. Some are switched on/off by a feedback mechanism, e.g. when the concentration of some molecule in the cell reaches a critical level. Other genes are switched on/off by an external stimulus through receptors and a cascade multiplier effect.
A few external molecules activate receptor proteins on the membrane surface which then transmit the signal via new messenger molecules inside the cell, which go on to activate other proteins or trigger the gene repressor molecules to detach.
In one common type, the active receptors link up, causing their interior protein part to also join, thus releasing the messengers. These messengers are continually reattaching to their protein.
Another type of receptor breaks up transducer proteins (e.g. "G-proteins"), and these activate the synthesis or pumping of messengers (e.g.
In most cases, it is phosphorylation or
Messenger proteins usually require several triggering molecules before they pass on the signal, for reliability and reduced sensitivity. Some act as AND or OR logic gates on signals.
If a signal persists, the receptor-messenger pathway becomes less sensitive to the signal in one way or another.
Different external stimuli — food, acid, poisons, light, cold, heat shock, ... — trigger different signal pathways.
Signals that activate or repress proteins already present in cell act in seconds; those that affect genes take half an hour or so. Signals only have a temporary effect.
In this simple model of lactose metabolism, a repressor stops RNA polymerase from transcribing mRNA. When present, lactose binds to the repressor causing it to detach and allows transcription to proceed. The gene codes for a protein (lactase) which breaks down the lactose. When no more lactose is present, the repressors are free to attach to the repressor site once again. Both proteins and mRNA are continually removed.
This model is only schematic: actual genes are relatively much longer, more proteins per mRNA are produced, and the real process takes long minutes.
Damage to
Averaged out, each base has about chance of being damaged every day. Even so, a whole genome will surely incur hundreds to thousands of lesions daily, and the cell needs to repair them as far as it can in order to survive.
Repair is possible because DNA consists of complementary strands; it is much rarer that damage is done to paired up bases simultaneously (1 in 1010).
A complete (double-strand) DNA break is much harder to repair: 'Ku' proteins try to reconnect the ends, and a ligase joins them together.
Another difficulty encountered by the extremely long strands of DNA is that they are liable to get entangled or supercoiled. Cells use a simple and effective method for dealing with this.
A protein,
Other proteins (
DNA replication starts when special proteins uncoil part of the DNA. Immediately other proteins attach to the single strands to prevent them from rejoining, while a
The unzipped strands have opposite directions, so replication is not a straightforward affair: the main copying protein can only add bases to a pre-existing backbone, and this in one direction only. So a special
The rest of the proteins assemble into the replisome: the
The RNA primer adds a short stretch of RNA to the reverse strand, and the clamp-loader passes the clamp with the primed DNA to the DNA polymerase; this can start copying and proof-reading the strand, until it reaches a finished part of DNA. This causes it to stop, detach its clamp, and tells the RNA primer to start the cycle again. The lagging strand of 2k DNA bases is replete with primed RNA, and empty bases (skipped or cut out by the proof-reader); this is converted to DNA by the cell's repair machinery. The ring clamps prevent the polymerases from detaching, thus speeding the process.
The replisome manages to copy at a rate of up to . Yet the DNA polymerase, with its proof-reader, only makes a mistake about every bases. These can be either:
Thus once every replications on average, a gene changes slightly. Most often, such changes cause the associated protein to fold wrongly or to be too short, increasing the chance of cell death. But there’s a small probability (1 in 30?) that the change in the gene is unimportant (neutral substitution or non-synonymous but depends on where in gene), and the resulting protein or ribozyme is still functional — or perhaps even better, when the original one was not very good because the environment has changed.
Over thousands of generations, cells therefore end up with different variations of the gene, called its alleles; an equilibrium distribution is established, with each allele occurring with probability where is the mutation rate creating the allele, and is the death rate of the cell removing the allele.
When a whole gene is duplicated, one copy slowly degenerates (over generations) into a non-functional pseudo-gene then junk DNA and slowly lost.
The smallest viable genome is about genes, half coding for enzymes involved in metabolism and the membrane, and half for transcription, repair and replication of DNA. But such cells would be too slow to react and to reproduce, to compete with other cells in the wild.
The smallest living genomes ( genes) belong to parasites that have a stable environment.
The smallest genome of a free-living cell is about genes. There is no upper limit to how large a genome can be, except that it takes longer to copy.
Genes are typically bases long, and are situated randomly on either strand. The smaller proteins are transcription factors and effectors, while the longer ones are enzymes, structural proteins, ...
RNA viruses (60%) are single stranded RNA ( bases), that is mistaken by the cell for
A minority (<1/3) of RNA viruses, have the complementary RNA instead; they need to bring along the replicator protein to start off an infection; or they are double-stranded RNA, and need an RNA polymerase with them to produce the first viral mRNA.
DNA viruses (30%) are double-stranded DNA ( bases), that is mistaken by the cell for
Some small (5k bases) DNA viruses, start off as RNA which is "repaired" by the cell into DNA. They are too small to code for activating proteins, so they only replicate with the cell.
Retroviruses (10%) are RNA or DNA (about bases) that bring along a protein with the ability to convert its RNA into DNA (reverse transcription) and a protein that splices this DNA into the cell's genome (integrase); so they replicate automatically when the cell transcribes its DNA into mRNA, and the protein reverse transcribes it into DNA. They may lie dormant for long times, until switched on by the cell.
Viroids, (less than 500 bases) replicate using the cell's RNA polymerase; they are too small to code for replicators and protein coats. Satellite viruses (1-2k bases) code only for one capsule protein; they require another virus replicator for this.
Viruses are unable to heal themselves; the protein capsule can only survive outdoors for up to a week. They must therefore be sufficiently "virulent" to infect enough other cells to compensate for their losses.
Viruses employ various space-saving techniques:
The capsule proteins join up together to form
Some viruses (viroids, plasmids, transposons...) do not code for a protein coat. They are carried from cell to cell by other viruses, bacteria etc. or are inherited down generations.
The cell itself has a number of anti-viral genes to defeat viruses: