Enumerating k-mers Lexicographically

When cataloguing a collection of genetic strings, we should have an established system by which to organize them. The standard method is to organize strings as they would appear in a dictionary, so that “APPLE” precedes “APRON”, which in turn comes before “ARMOR”.

RNA Splicing

In “Transcribing DNA into RNA”, we mentioned that a strand of DNA is copied into a strand of RNA during transcription, but we neglected to mention how transcription is achieved.
In the nucleus, an enzyme (i.e., a molecule that accelerates a chemical reaction) called RNA polymerase (RNAP) initiates transcription by breaking the bonds joining complementary bases of DNA. It then creates a molecule called precursor mRNA, or pre-mRNA, by using one of the two strands of DNA as a template strand: moving down the template strand, when RNAP encounters the next nucleotide, it adds the complementary base to the growing RNA strand, with the provision that uracil must be used in place of thymine.

Locating Restriction Sites

The war between viruses and bacteria has been waged for over a billion years. Viruses called bacteriophages (or simply phages) require a bacterial host to propagate, and so they must somehow infiltrate the bacterium; such deception can only be achieved if the phage understands the genetic framework underlying the bacterium’s cellular functions. The phage’s goal is to insert DNA that will be replicated within the bacterium and lead to the reproduction of as many copies of the phage as possible, which sometimes also involves the bacterium’s demise.

Calculating Protein Mass

In “Translating RNA into Protein”, we examined the translation of RNA into an amino acid chain for the construction of a protein. When two amino acids link together, they form a peptide bond, which releases a molecule of water. Thus, after a series of amino acids have been linked together into a polypeptide, every pair of adjacent amino acids has lost one molecule of water, meaning that a polypeptide containing n amino acids has had n − 1 water molecules removed.

Enumerating Gene Orders

Point mutations can create changes in populations of organisms from the same species, but they lack the power to create and differentiate entire species. This more arduous work is left to larger mutations called genome rearrangements, which move around huge blocks of DNA. Rearrangements cause major genomic change, and most rearrangements are fatal or seriously damaging to the mutated cell and its descendants (many cancers derive from rearrangements). For this reason, rearrangements that come to influence the genome of an entire species are very rare.

Open Reading Frames

In “Transcribing DNA into RNA”, we discussed the transcription of DNA into RNA, and in “Translating RNA into Protein”, we examined the translation of RNA into a chain of amino acids for the construction of proteins. We can view these two processes as a single step in which we directly translate a DNA string into a protein string, thus calling for a DNA codon table.

Inferring mRNA from Protein

When researchers discover a new protein, they would like to infer the strand of mRNA from which this protein could have been translated, thus allowing them to locate genes associated with this protein on the genome.

Finding a Protein Motif

As mentioned in “Translating RNA into Protein”, proteins perform every practical function in the cell. A structural and functional unit of the protein is a protein domain: in terms of the protein’s primary structure, the domain is an interval of amino acids that can evolve and function independently.


Something went wrong. Please refresh the page and/or try again.