Browse chapters
1B · Biomolecules and the chemistry of life
Transmission of genetic information from the gene to the protein
The central dogma: DNA is copied (replication), read into RNA (transcription), and decoded into protein (translation). This category covers the structure of nucleic acids, each step of that flow and the machinery that runs it, how cells turn genes on and off, and the lab tools (PCR, sequencing, cloning) built on these mechanisms.
Nucleic acid structure
DNA and RNA are polymers of nucleotides (a sugar, a phosphate, and a nitrogenous base). DNA is a double helix of two antiparallel strands held together by complementary base pairing.
A nucleotide = a five-carbon sugar (deoxyribose in DNA, ribose in RNA) + a phosphate group + a nitrogenous base. Nucleotides link through phosphodiester bonds between the 3′ carbon of one sugar and the 5′ phosphate of the next, giving every strand a directional sugar-phosphate backbone read 5′ → 3′. The bases carry the information.
Nucleotides & the bases
Two purines (adenine, guanine — double ring) and three pyrimidines (cytosine, thymine, uracil — single ring). DNA uses A, G, C, T; RNA swaps thymine for uracil.
Remember the rings with "PURe As Gold" (purines = A, G, double ring) vs. the pyrimidines C, U, T (single ring) — "CUT the py(e)". A nucleoside is base + sugar; a nucleotide adds the phosphate(s). Nucleotides do double duty beyond heredity: ATP/GTP carry energy, and NAD⁺/FAD/coenzyme A are nucleotide-derived coenzymes.
Don't confuse
Nucleoside (base + sugar, no phosphate) vs. nucleotide (base + sugar + phosphate). Also deoxyribose (DNA, lacks the 2′-OH) vs. ribose (RNA).
The double helix & base pairing
Two antiparallel strands (one 5′→3′, the other 3′→5′) wind into a right-handed helix. A pairs with T (two H-bonds); G pairs with C (three H-bonds), so GC-rich DNA is harder to separate.
Complementary bases pair through hydrogen bonds — A–T (2 bonds) and G–C (3 bonds) — always a purine with a pyrimidine, which keeps the helix a constant width. The strands run antiparallel, a fact that drives the whole logic of replication and transcription (polymerases only build 5′→3′). Chargaff's rules: in any double-stranded DNA, %A = %T and %G = %C. The backbone is on the outside; the stacked base pairs are inside, with major and minor grooves where proteins read the sequence.
Denaturation & Tm
Heat (or high pH) separates the two strands (denaturation/melting); cooling lets them reanneal. The melting temperature (Tm) rises with GC content (more H-bonds) and with length.
Because G–C pairs have three hydrogen bonds to A–T's two, GC-rich sequences need more energy to separate, so they have a higher Tm. This is the basis of PCR (heat to denature, cool to anneal primers) and of probe hybridization.
Why DNA is stable and RNA isn't
RNA's extra 2′-OH can nucleophilically attack the adjacent phosphate in the backbone, cleaving the strand. That chemical instability is exactly why DNA (which lacks the 2′-OH) is used for long-term storage while RNA carries short-lived, disposable messages.
In RNA, the 2′-hydroxyl sits one carbon away from the phosphodiester linkage and can act as an internal nucleophile, attacking the phosphate to break the backbone — so RNA is base-labile and hydrolyzes far more readily than DNA, especially at high pH. DNA's sugar is deoxyribose precisely because removing that 2′-OH eliminates the self-cleavage pathway, giving the genome the chemical durability it needs to persist for the life of the cell. This is the structure-explains-function payoff of the deoxyribose-vs-ribose distinction: messenger RNA is meant to be transcribed, translated, and degraded (turnover lets the cell adjust protein output quickly), whereas the DNA archive must not turn over. The trade-off also explains why labs handle RNA with extra care (ubiquitous RNases, cold, RNase-free reagents) but DNA is comparatively rugged.
Don't confuse
Deoxyribose vs. ribose is not just a trivia difference — the missing 2′-OH is the cause of DNA's stability. Don't say RNA is less stable "because it's single-stranded"; the single-stranded form contributes, but the AAMC-tested mechanistic reason is the reactive 2′-OH attacking the backbone.
How AAMC tests it
Passages contrast DNA's role as the permanent information store with RNA's transient role, then ask you to reason from the 2′-OH to relative stability, hydrolysis rate, or why a virus/cell uses one over the other — expect to pick the answer that ties instability to the ribose hydroxyl, not to base composition.
DNA replication
Replication is semiconservative — each new double helix has one old strand and one new one. A team of enzymes unwinds the helix and copies both strands 5′→3′ at the replication fork.
Replication begins at an origin and proceeds bidirectionally. Because the strands are antiparallel and polymerases only extend 5′→3′, the two template strands are copied differently — the heart of the topic below. Eukaryotes have many origins per chromosome (prokaryotes have one) and copy DNA during S phase.
The replication machinery
Helicase unwinds; single-strand binding proteins hold strands apart; topoisomerase (gyrase) relieves supercoiling; primase lays an RNA primer; DNA polymerase extends 5′→3′; ligase seals nicks.
In order: helicase breaks the H-bonds to open the fork; single-strand binding proteins keep it open; topoisomerase/DNA gyrase cuts and reseals ahead of the fork to relieve the torsional strain of unwinding. Primase synthesizes a short RNA primer that gives DNA polymerase a 3′-OH to build on. In prokaryotes, DNA polymerase III is the main builder; DNA polymerase I removes the RNA primers and replaces them with DNA; DNA ligase joins the fragments.
Leading vs. lagging strand
Polymerase only builds 5′→3′, so one template is copied continuously (leading strand) and the other in short pieces (Okazaki fragments, the lagging strand) that ligase later joins.
Toward the fork, the leading strand is synthesized continuously in the same direction the fork opens. The lagging strand runs the "wrong way," so it's made in short Okazaki fragments, each needing its own primer, synthesized away from the fork and then stitched together by ligase. This is the single most-tested mechanical detail in replication.
Don't confuse
The leading strand needs one primer; the lagging strand needs many (one per Okazaki fragment).
Proofreading & telomeres
DNA polymerase proofreads with a 3′→5′ exonuclease activity, catching most errors. Linear chromosome ends (telomeres) shorten each division unless telomerase extends them.
Proofreading (3′→5′ exonuclease) removes a mis-incorporated base immediately, dropping the error rate enormously; mismatches that slip past are caught later by repair (next topic). Because the lagging strand can't be fully primed at the very end, linear chromosomes lose a little length each round — telomeres (repetitive caps) absorb the loss, and telomerase (active in germ cells, stem cells, and many cancers) rebuilds them.
DNA repair & mutation
Cells fix damage with mismatch, nucleotide-excision, and base-excision repair. Damage that isn't fixed becomes a mutation.
Mismatch repair corrects replication errors the polymerase missed; nucleotide-excision repair removes bulky lesions like UV-induced thymine dimers (defective in xeroderma pigmentosum); base-excision repair removes single damaged bases. What survives repair is heritable change.
Types of mutation
Point mutations swap one base: silent (no amino-acid change), missense (one amino acid changes), nonsense (creates a stop codon). Frameshifts (insertions/deletions not in multiples of three) shift the whole reading frame.
A silent mutation is buffered by the genetic code's redundancy. A missense swaps one amino acid (sickle-cell is the classic). A nonsense mutation introduces a premature stop codon, truncating the protein. Insertions or deletions that aren't multiples of three cause a frameshift, garbling every codon downstream — usually far more damaging than a point mutation.
Don't confuse
A point mutation that creates a stop codon = nonsense; one that changes an amino acid = missense; one with no effect on the protein = silent.
Transcription
RNA polymerase reads a DNA template strand 3′→5′ and builds an mRNA 5′→3′, starting at a promoter. In eukaryotes the transcript is then processed before it leaves the nucleus.
RNA polymerase binds the promoter (e.g., the TATA box), unwinds the DNA, and synthesizes RNA complementary to the template (antisense) strand; the coding (sense) strand has the same sequence as the mRNA (with U for T). Unlike DNA polymerase, RNA polymerase needs no primer. Transcription runs in three phases — initiation, elongation, termination.
mRNA, tRNA, rRNA
mRNA carries the message; tRNA brings amino acids and reads codons via its anticodon; rRNA builds the ribosome and catalyzes peptide-bond formation.
Three RNAs run translation: messenger RNA (mRNA) is the transcript that gets read; transfer RNA (tRNA) is the adaptor that matches a codon (via its anticodon) to the correct amino acid; ribosomal RNA (rRNA) is the structural and catalytic core of the ribosome (a ribozyme). Eukaryotes have three RNA polymerases (I, II, III) for rRNA, mRNA, and tRNA respectively; prokaryotes have one.
Eukaryotic RNA processing
Before a eukaryotic mRNA leaves the nucleus it gets a 5′ cap, a 3′ poly-A tail, and has its introns spliced out (exons joined) by the spliceosome. Alternative splicing lets one gene make many proteins.
Three modifications turn the primary transcript (pre-mRNA) into mature mRNA: a 5′ methylguanosine cap (protects the message and aids ribosome binding), a 3′ poly-A tail (stability and export), and splicing — the spliceosome (snRNPs) removes introns (intervening, non-coding) and joins exons (expressed). Alternative splicing — including different exon combinations from the same transcript — is a major reason humans make far more proteins than they have genes. Prokaryotes do none of this (no nucleus; transcription and translation are coupled).
Don't confuse
Introns stay intron the nucleus (cut out); exons exit and are expressed.
Translation
The ribosome reads mRNA codons 5′→3′ and links amino acids into a polypeptide. tRNAs deliver amino acids matching each codon; synthesis runs N-terminus → C-terminus.
Translation happens on the ribosome (small + large subunits, rRNA + protein), which has three tRNA sites — A (aminoacyl, incoming), P (peptidyl, growing chain), E (exit). It proceeds in three phases just like transcription. The ribosome's peptidyl transferase (an rRNA ribozyme) forms each peptide bond.
The genetic code
Three bases = one codon = one amino acid. The code is degenerate (most amino acids have several codons), starts at AUG (Met), and ends at one of three stop codons (UAA, UAG, UGA).
64 codons encode 20 amino acids plus stop signals, so the code is degenerate/redundant — often the third base can vary (wobble), which is why many point mutations are silent. The start codon AUG sets the reading frame and codes for methionine; UAA, UAG, UGA are stops (no amino acid). The code is nearly universal across life, which is what makes recombinant protein expression possible.
Don't confuse
Codon (on mRNA, read by the ribosome) vs. anticodon (on tRNA, complementary to the codon).
tRNA & charging
Each tRNA has an anticodon that base-pairs with a codon and carries the matching amino acid. Aminoacyl-tRNA synthetases "charge" each tRNA with its correct amino acid (using ATP).
Accuracy of translation depends on two recognition steps: the aminoacyl-tRNA synthetase attaching the right amino acid to the right tRNA (charging, ATP-dependent), and the anticodon–codon pairing at the ribosome. There's one synthetase family per amino acid, and they proofread — a mischarged tRNA would insert the wrong residue regardless of the codon.
Post-translational modification
After the ribosome releases a polypeptide, the protein is finished by folding, chemical tags (glycosylation, phosphorylation, lipidation, ubiquitination), proteolytic cleavage, and targeting via a signal sequence that routes it to the right destination.
The primary sequence is only the start. An N-terminal signal peptide (signal sequence) directs a nascent protein to the rough ER for secretion or membrane insertion and is cleaved off once the protein arrives. Reversible chemical modifications tune function: phosphorylation (a regulatory phosphate added by kinases) switches activity on or off, glycosylation (adding sugars in the ER/Golgi) marks proteins for the cell surface or secretion, and lipidation (attaching a lipid group) anchors proteins to membranes. Proteolytic activation is an irreversible cleavage that switches an inactive zymogen on (e.g., trypsinogen → trypsin). For disposal, ubiquitination tags a protein with ubiquitin, marking it for degradation and recycling by the proteasome.
Don't confuse
A signal peptide is cleaved to route a protein; ubiquitin is added to destroy it. Both are tags, but one targets a destination and the other targets the proteasome.
Ribosome subunits: 70S vs. 80S
Prokaryotes use a 70S ribosome (30S + 50S); eukaryotes use a larger 80S ribosome (40S + 60S). The smaller subunit binds mRNA; the larger one holds the A/P/E sites and catalyzes peptide bonds.
Each ribosome has a small subunit (decodes the mRNA) and a large subunit (forms peptide bonds via its rRNA ribozyme). In prokaryotes these are 30S + 50S → 70S; in eukaryotes they are 40S + 60S → 80S (the cytosolic ribosome — mitochondria carry their own 70S-like ribosomes, a clue to their bacterial origin). The size difference is what makes ribosome-targeting antibiotics selective: drugs like aminoglycosides and tetracyclines bind the bacterial 30S and spare the human 80S.
Don't confuse
Svedberg units (S) reflect sedimentation rate, not mass, so they don't add up arithmetically — 30S + 50S yields a 70S ribosome, not 80S.
How AAMC tests it
A pharmacology passage gives an antibiotic that binds the 30S (or 50S) subunit and asks why it harms bacteria but not the patient — the answer hinges on prokaryotic 70S vs. eukaryotic 80S.
Control of gene expression
Cells control which genes are expressed and how much. Prokaryotes use operons; eukaryotes layer on transcription factors, chromatin structure, and epigenetic marks.
Regulation can act at every step (transcription, processing, translation, protein stability), but the biggest lever is transcription. Prokaryotes cluster related genes into operons under one promoter; eukaryotes regulate genes individually with a combination of DNA-level access (chromatin) and protein factors.
Operons: lac vs. trp
The lac operon is inducible (off by default, turned on when lactose is present); the trp operon is repressible (on by default, turned off when tryptophan is abundant).
An operon = promoter + operator + structural genes, with a repressor protein that binds the operator to block transcription. lac (inducible): normally the repressor is bound (off); allolactose (from lactose) inactivates the repressor, turning it on — and it's also positively controlled: low glucose → high cAMP → CAP binds and boosts transcription, so maximal expression needs lactose present and glucose low. trp (repressible): normally on; when tryptophan is plentiful it acts as a corepressor, activating the repressor to shut synthesis off. The pattern: catabolic pathways tend to be inducible, anabolic ones repressible.
Don't confuse
Inducible (lac — off until an inducer removes the block) vs. repressible (trp — on until a corepressor activates the block).
Eukaryotic regulation & epigenetics
Transcription factors bind enhancers/silencers; access to the DNA is set by chromatin packing. DNA methylation generally silences; histone acetylation generally activates — heritable "epigenetic" marks that don't change the sequence.
DNA wraps around histones into nucleosomes; tightly packed heterochromatin is silent, loosely packed euchromatin is active. Histone acetylation loosens packing (→ more transcription); DNA methylation (at CpG islands) tightens it (→ silencing). Transcription factors bind promoters and distant enhancers (or silencers) to recruit or block RNA polymerase. These epigenetic modifications are heritable through cell division without altering the DNA sequence, and explain how identical genomes yield different cell types.
Recombinant DNA & biotechnology
The same mechanisms, harnessed in the lab: PCR amplifies DNA, restriction enzymes and ligase cut and paste it, sequencing reads it, and cloning/CRISPR move or edit it.
Biotechnology is applied central dogma. The high-yield tools share a logic of base pairing and the enzymes above — recognize what each one does and the questions are easy points.
PCR
Polymerase chain reaction amplifies a target sequence through cycles of denature (heat) → anneal primers → extend with heat-stable Taq polymerase, doubling the DNA each cycle.
Each cycle: ~95 °C denatures the strands, cooling anneals two primers flanking the target, and Taq polymerase (from a thermophile, so it survives the heat steps) extends from the primers. Because every cycle doubles the product, amplification is exponential (2ⁿ). PCR underlies diagnostics, forensics, and qPCR for quantifying expression.
Cutting, reading & editing DNA
Restriction enzymes cut at specific sequences (often leaving "sticky ends"); gel electrophoresis separates fragments by size; Southern/Northern/Western blots detect DNA/RNA/protein; Sanger sequencing reads the order of bases; CRISPR-Cas9 edits sequences.
Restriction endonucleases cut at palindromic recognition sites, and matching sticky ends plus ligase let you splice DNA into plasmid vectors (recombinant DNA / cloning). Gel electrophoresis drives negatively charged DNA toward the anode, separating fragments by size (smaller travel farther). The blots follow one rule — "Southern = DNA, Northern = RNA (RNA, by the second letter), Western = protein." Sanger (chain-termination) sequencing reads a sequence using dideoxynucleotides; CRISPR-Cas9 uses a guide RNA to target and cut a chosen sequence for editing.
Don't confuse
Southern (DNA) / Northern (RNA) / Western (protein) — and the protein version of electrophoresis, SDS-PAGE, separates by size after SDS masks charge.
In vitro vs. in vivo
In vitro = outside a living organism (a test tube, dish, or purified cell-free system — isolated and controlled); in vivo = inside a living organism (the whole, physiological context).
In vitro ("in glass") strips a process down to its parts — a purified enzyme in a tube, cells in a dish — so you can control every variable, but you lose the surrounding physiology. In vivo keeps the process embedded in a living system (whole animal, intact tissue), so it's realistic but full of confounders. The MCAT's recurring move is the transfer question: a clean in vitro result (e.g., a drug inhibits an enzyme in a tube) may not hold in vivo, where absorption, metabolism, compartmentalization, and feedback all intervene.
Don't confuse
In vitro (isolated, controlled, simplified) is not automatically more or less valid than in vivo — they answer different questions. When a passage reports an in vitro finding, be cautious about claiming it proves anything about the living organism, and vice versa. (Note: in vitro is unrelated to in situ, which means "in its original place," e.g., a gene examined within intact tissue.)
How AAMC tests it
Experiments are constantly labeled in vitro or in vivo, and the answer often hinges on which conclusions legitimately transfer between the isolated system and the living one.
Worked question
In E. coli growing with both glucose and lactose available, transcription of the lac operon is kept low. Once glucose is used up but lactose remains, lac transcription rises sharply. The rise on glucose depletion is best explained by: