Evolution     Genetics     Genome Biology     Biostatistics     Population Genetics     Genetic Epidemiology    Epidemiology     HLA     MHC     Inf & Imm      Homepage

 

Common Terms in Genetics

 

Mehmet Tevfik DORAK

 

On Line Biology Book - Glossary    Glossary of Genetic Terms   Talking Glossary (Genetics) 

 Life: The Science of Biology - Glossary   

UCMP Glossary (Evolution)    Population Genetics Glossary

 Molecular Biology Glossary (ASH)    Molecular Biology Glossary (UM)    Genome Glossary    RNAi Glossary

  Genomic Glossaries & Taxonomies   More Human Genetics Glossaries 

 

Genetic Epidemiology Glossary    Real-Time PCR Glossary

[For best results, please use the FIND option by pressing "CTRL + F" to locate the word you are looking for]

a-helix: Common secondary 3-dimensional structure of proteins in which the linear sequence of amino acids is folded into a spiral that is stabilized by hydrogen bonds between the carboxyl oxygen of each peptide bond.

Ab initio gene prediction: A computing biology technique that attempts to identify genes without any knowledge of their function nor of the genetics of the organism. This can be accomplished because different gene features, such as exons, introns, promoters, polyadenylation signal etc are associated with unique patterns in the DNA sequence.

Acrocentric chromosome: A chromosome with its centromere towards one end. Human chromosomes 13,14,15,21,22 are acrocentric.

Adaptation: Adjustment to environmental demands through the long-term process of natural selection acting on genotypes.

Additive and non-additive components: In studies of heredity, the portions of the genetic component that are passed and not passed to offspring, respectively.

Allele: A known variation (version) of a particular gene. Formerly called allelomorph.

Allelic association: see linkage disequilibrium.

Allelic exclusion: Expression of only one of the two homologous alleles at a locus in the case of heterozygosity. This usually occurs at loci such as immunoglobulin or T cell receptor (TCR) genes where a functional rearrangement among genes takes place. One of the alleles is either non-functionally or incompletely rearranged and not expressed. This way, each T-cell expresses only one set of TCR genes.

Allelopathy: The influence exerted by a living plant on other plants nearby or microorganisms through production of a chemical.

Allorecognition: Recognition by T cells of the MHC molecules on an allogeneic individual's antigen-presenting cells which results in allograft rejection in vivo and mixed lymphocyte reaction (MLR) in vitro.

Altered self: A term used to describe the MHC molecule associated with a peptide rather than in its native form. Thus, a native MHC molecule does not induce an immune reaction except when it is presenting a peptide.

Alternative splicing: Formation of diverse mRNAs through differential splicing of the same RNA precursor. This may result in proteins with different composition of amino acids or it may involve just the length of 3' UTR. One reason for alternative/differential splicing is base modification during RNA editing causing a change in splice sites.

Amino acids: Building blocks of peptides. Each amino acid is encoded by DNA. See Amino Acids and The Chemistry of Amino Acids.

Amorph (null allele): A mutation that leads to complete loss of function.

Amphipathic: A molecule that has both a hydrophobic and a hydrophilic part.

Analogy: A similarity due to convergent evolution (common function) but not inheritance from a common ancestor (bat's wings and bird's wings). See also homology.

Antagonistic pleiotropy: The effects of a gene, which are beneficial early in life (i.e., increasing fitness) but deleterious later in life (no change in fitness after the reproductive age). Such genes will be maintained by selection, because by the time the gene exerts its damage, its bearers will already have had more offspring than other individuals. Differential effects in two sexes is called sexual antagonism. See a review by Hughes, 2002.

Anthropology: The study of human kind.

Antigen: Any macromolecule that triggers an immune response. Antigenicity depends on the ability of the peptide fragments to be presented by the MHC molecules.

Antisense DNA/RNA: Single stranded nucleic acid that is complementary to the coding/sense strand of a gene. It is then also complementary to the mRNA produced from the same gene.

Apoptosis: The genetically programmed death of cells at specific times during embryonic morphogenesis and development, metamorphosis, and during cell turnover in adults including the maturation of T and B cells of the immune system. Defects in apoptosis are associated with maintenance of the transformed state and cancer. Anti-apoptotic proteins include Bcl-2 and HSP families (see also caspase). Apoptosis is often induced by activation of death receptors (DR) belonging to the tumor necrosis factor receptor (TNFR) family. Examples are Fas (CD95), TNFR-1 and TNFR-related apoptosis-mediated protein (TRAMP). Death signals are conducted through a cytoplasmic motif (death domain - DD) - death-inducing signaling complex (DISC) and caspase-8 that leads to the activation of caspase cascade and eventual death of the cell.

Arabidopsis thaliana: A small member of the mustard family (kitchen cress). It has a very small genome (130-140 Mbp), five chromosomes and contains almost no repetitive DNA. Its genome has been sequenced. It is a plant model system of choice because of the additional advantages of short generation time (about five weeks), high seed production (up to 40,000 seeds per plant) and natural self-pollination (as opposed to natural cross-pollination in maize). It has five small chromosomes. Link to Arabidopsis website.

Archaea: A prokaryote kingdom that has not diverged much from the ancestral prokaryote stock. Contemporary species of Archeabacteria live in extreme conditions. The three major groups are halobacteria, sulphobacteria and methanogens. All other prokaryotes are grouped in Eubacteria.

Archezoa: One of the kingdom level taxa proposed by Cavallier-Smith which consists of the most ancient unicellular eukaryotes with a nucleus and rod shaped chromosome but no mitochondria or plastid, thus believed to be the intermediate stage between prokaryotes and eukaryotes. They are also used as evidence for the evolution of nucleus before the organelles. The intestinal parasite Giardia lamblia (a protist) is an example.

ARS: Autonomously replicating sequence. ARS is the origin of replication in yeast.

Artificial selection: Selective evolutionary pressure imposed by humans to obtain breeds with certain features (such as breeding cows, dogs, chicken).

Asexual reproduction: Any form of reproduction not depending on a sexual process. It involves a single individual. Reproduction by cell division, fragmentation or budding.

Association (genetic): Association refers to a correlation greater than predicted by chance between a specific allele or genotype and a phenotype/trait (for example, a disease) that may or may not have a genetic basis. Correlation does not mean causation. Evaluation of an association is achieved by the study of unrelated individuals or family members. Association studies may prove useful in identifying a genetic factor modifying the risk for a multifactorial disease. Except when linkage disequilibrium exists, association is not due to genetic linkage and should not be confused with it.

Assortative matings: Reproduction in which mate selection is not random but is based on physical, cultural, or religious grounds (see negative and positive assortative mating). Assortative mating also occurs in animals (Jiang, 2013).

Atomic mass unit (amu or dalton): The basic unit of mass on an atomic scale. One amu or dalton is one-twelfth the mass of a carbon 12 atom (in other words, the mass of a hydrogen atom, 1.66 x 10-24 g). Therefore, there are 6.023 x 1023 amu in one gram (Avogadro number).

Autosome: Any chromosome except a sex chromosome.

b-pleated sheet: A planar secondary structure element of proteins. It is created by hydrogen bonding between the backbone atoms in two different polypeptide chains or segments of a single folded chain.

Bacteriophage: A virus that infects a bacterium.

Balanced lethal: Lethal mutations in different genes on the same pair of chromosomes that remain in repulsion because of close linkage or crossover suppression. In a closed population, only the trans-heterozygotes (l1 + / + l2) for the lethal mutations survive.

Balanced polymorphism: The maintenance of two or more alleles in a population due to a selective advantage of the heterozygote.

Balancing selection: Selection involving opposing forces in which selective advantages and disadvantages cancel each other out. Heterozygote advantage (or overdominant selection) is an example in which an allele selected against in the homozygous state is retained because of the superiority of heterozygotes (for an example and a list of all known examples, see Gemmell & Slate, 2006 and Supplemental Table 1). Other balanced states may occur including when: an allele is favored at one developmental stage and is selected against at another (antagonistic pleiotropy); an allele is favored in one sex and selected against in another (sexual antagonism); an allele is favored when it is rare and selected against when it is common (negative frequency dependent selection).

Barr body: Also called sex-chromatin body, which represents the inactivated X chromosome in the nucleus of somatic mammalian cells. Normally only seen in female cells and not in male cells. It is the result of the process called dosage compensation.

Base: A compound, usually containing nitrogen, that can accept a H+. It is used to describe the non-sugar components of nucleotides (despite the basic nature of nucleotides, nucleic acids are acidic due to the phosphate atoms they contain). The five bases that form the nucleic acids are adenine (A), guanine (G), cytosine (C), tymine (T) and uracil (U).

B cells: A major family of small lymphocytes that are responsible for antigen-specific humoral immunity as part of the adaptive immunity. Their antigen receptors are surface immunoglobulins (antibodies). They recognize peptides directly and secrete antibodies by differentiating into plasma cells. They also exist as long-lived memory cells.

B Factor: A fungal incompatibility factor. It operates in the Basidiomycetes species Schizophyllum commune (not to be confused with Factor B of the immune system).

Bioinformatics: Computerized acquisition, management and analysis of biological information. See Online Lectures on Bioinformatics and Bioinformatics Tools.

Biome: A grouping of plant ecosystems into a large distinct group occupying a major terrestrial region. They are created and maintained by climate. See examples of biomes.

Bottleneck: A drastic reduction in the population size followed by an expansion. This often results in altered gene pool as a result of genetic drift.

CAAT box: A highly conserved DNA sequence found about 75 bp 5' to the site of transcription in eukaryotic genes. Its specific (trans-acting) transcription factor is CTF-1 (NF-1) (see also TATA / Goldberg-Hogness box). See also Gene Expression.

Caenorhabditis elegans: A normally self-fertilizing hermaphrodite soil nematode whose developmental genetics has been extensively studied. It is no more than 1 mm long. Loss of an X chromosome by meiotic disjunction leads to the production of males. The genetic basis of apoptosis was first shown in C.elegans in 1986. It has five equally sized chromosomes and it is the first animal whose whole genome has been sequenced (in 1998). The 97 Mbp genome contains 19,000 genes. About 74% of human genes have their homologues in the C.elegans genome. Links to the C.elegans website.

Cap: A methylated guanine residue (GTP), which is added to the 5' end of eukaryotic mRNAs in a post-transcriptional reaction. It protects the mRNA against 5'-exonuclease, stabilises the mRNA and enhances its translation. The cap contains a 7-methyl guanylate residue attached by a triphosphate linkage to the sugar at the 5' end of the mRNA in a rare 5'-5' linkage.

Cap site: The initiation site of transcription in a eukaryotic gene. The initiation of translation of most eukaryotic mRNAs involves recognition of the cap followed by either the first downstream AUG or by a 5' proximal AUG with a consensus sequence surrounding it (like the bacterial Shine-Dalgarno or the viral Kozak sequence). Such a consensus sequence has not been recognized in eukaryotes yet.

Caretaker gene: A class of genes that when inactivated do not directly promote tumors; instead, their inactivation results in genetic instabilities causing an increased mutation rate affecting all genes. BRCA1 and BRCA2 are examples of caretaker genes. See Cancer Genetics.

Carrier: A healthy person who is a heterozygote for a recessive trait. Also includes persons with balanced chromosomal translocations. The unfortunate use of ‘carrier’ to describe individuals positive for a genetic marker is wrong, and the use of ‘carrier frequency’ in that context should be replaced by ‘marker frequency’.

Carter effect: Higher incidence of a genetically determined condition in relatives when the index case is the less commonly affected sex. This phenomenon was first demonstrated in Dr Cedric Carter's study of pyloric stenosis, where the incidence is highest in the sons of affected women and lowest in daughters of affected men.

Central dogma of molecular biology: DNA is transcribed into RNA and RNA is translated into protein only in this direction. This concept is first proposed by Francis Crick in 1957. For more, see Gene Expression.

Centric fusion: Fusion of the long arms of two acrocentric chromosomes [13,14,15,21,22] into a single chromosome having lost the short arms at the same time. Most often occurs as 21/21, 13/14, and 14/21 translocations. Apart from being an important cause of uniparental disomy, it may cause trisomy 21 (Down syndrome) in the offspring. Human chromosome 2 is a result of a centric fusion between two ancestral ape chromosomes (gorillas have 24 pairs of chromosomes).

Centromere: Constricted region where sister chromatids are attached in mitotic chromosomes. The centromere is generally flanked by repetitive DNA sequences and it is late to replicate. The centromere is an A-T region of about 130 bp. It binds several proteins with high affinity to form the kinetochore which is the anchor for the mitotic spindle.

Chaperone: Any cellular protein that binds to an unfolded or partially folded target protein to prevent misfolding, aggregation, and/or degradation of it. Chaperones also facilitate the target protein's proper folding, translocation and assembly within cells, preventing inappropriate interactions with other proteins.

Chargaff's rule: DNA in organisms has a 1:1 ratio (base pair rule) of pyrimidine (C, T) and purine (A, G) bases. Furthermore, the amount of guanine is equal to cytosine and the amount of adenine is equal to thymine.

Chiasma (plural chiasmata): The points of physical overlap of nonsister chromatids crossing-over in meiosis.

Chi-like sequence: An octamer nucleotide sequence (A/G - C/T - A/T - A/G - G - A/T - G - G) that creates a recombinational hotspot in the genome (originally discovered in coliphage lambda). MHC class I transmembrane domain length variation, frequent gene conversions and deletions in the MHC-linked 21-hydroxylase gene (CYP21), gene conversions within the MHC class II genes in mice and humans, many oncogene translocations (BCL2 for example) are attributed to chi-like sequences at the breakpoint region. It acts like a restriction site for recombinase.

Chlamydomonas: The unicellular green alga that is probably the closest living organism to the ancestor of green plants. It reproduces both asexually and sexually (two mating types). When reproduces sexually, the mitochondria are inherited from the (-) mating type and chloroplasts from the (+) mating type.

Chloramphenicol acetyl transferase (CAT): The bacterial gene for chloramphenicol, CAT, is commonly used as a reporter gene for investigating physiological gene regulation. Beta-galactosidase and luciferase genes can also be used for the same purpose.

Chromatid: One of two copies of a replicated chromosome during mitosis. Together they are called sister chromatids. Each one becomes a daughter chromosome at anaphase of mitosis and at the second meiotic division.

Chromatin: The complex of DNA and associated histone and non-histone proteins that represents the normal state of genes in the nucleus. It exists in two forms: less dense euchromatin that can be transcribed, and heterochromatin is highly condensed that cannot be transcribed. Inactive X chromosome of female mammals is an example of heterochromatin. See Gene Expression and Rando, 2007; Johnson & Dent, 2013; Voss & Hager, 2014.

Chromosome: Structure in a cell nucleus that carries the genes. Each chromosome consists of one very long strand of DNA, coiled and folded to produce a compact body. They become more compact and visible during metaphase of cell division. In interphase chromosomes, chromatin fibers are organized into 30 to 100 kb loops anchored in a supporting matrix within the nucleus. The length of each DNA molecule must be compressed about 8000-fold to generate the structure of a condensed metaphase chromosome.

Cis-acting gene: A gene acting on or co-operating with another gene on the same chromosome (see trans-acting gene).

Cistron: A DNA segment coding for a specific polypeptide, and includes its own start and stop codons. When an mRNA encodes two or more proteins, it is called polycistronic.

Clade: All descendants of any given species. A single whole branch of a phylogeny. See Cladistics.

Cladogram: The output as a branching diagram from a (cladistic) phylogenetic analysis postulating relationship of different taxa. All cladograms are hypothetical, and none can be proved correct for sure. Among the alternative cladograms the one which is best supported by the character/sequence data is the most representative one. Note that a cladogram is not a phylogenetic tree.

Class: A category of classification (taxon); a subdivision of subphylum. The classes in the Subphylum Vertebrata are: Pisces (Fishes), Amphibia, Reptilia, Avis (Birds) and Mammalia.

Clone - All the cells derived from a single cell by repeated cell division and having the same genetic constitution.

Coalescence: Growing into each other, uniting into one whole.

Coalescence theory: The evolutionary theory that estimates the time for divergence from the last common ancestor.

Coding sequence (cds): The portion of a gene that is transcribed into mRNA and translated into protein.

Codominance: Equal effect on the phenotype of two alleles of the same locus (as opposed to recessive and dominant).

Codon bias: Although several codons code for a single amino acid, an organism may have a preferred codon for each amino acid. This is called codon bias.

Co-evolution: Joint evolution of two unrelated species that have a close ecological relationship resulting in reciprocal adaptations as happens between host and parasite, and plant and insect.

Cognate molecule: A relative descended from a common ancestor. Usually used to describe the corresponding partner in a receptor-ligand complex.

Cohesive end: Also known as sticky end. Overhanging ends of a double-stranded DNA molecule that are capable of hybridizing with complementary ends.

Complementary (copy) DNA (cDNA): Single-stranded DNA produced from an RNA template (usually mRNA) by reverse transcriptase in vitro. It lacks the introns present in corresponding genomic DNA. It is most commonly made to use in PCR to amplify RNA (RT-PCR).

Complementation: The production of a wildtype phenotype when two recessive mutations from different genes are brought together. If recessive mutations represent alleles of the same gene, they will not complement each other to produce a wildtype phenotype because they both represent loss-of-function of the same gene. Deafness in humans can be caused by a recessive mutation at a number of genes, so it is not uncommon for two deaf parents to have children who hear.

Compound heterozygote: An individual who is affected with an autosomal recessive disorder having two different mutations in homologous alleles. An individual in whom each of the two alleles of the same locus carry a different mutation (for a recessive disorder). The C282Y and H63D mutations of HFE frequently occur as compound heterozygosity.

Concerted (coincidental) evolution: The preservation of sequence homology among members of a multigene family within the same species.

Congenic: Animals which have been bred to be genetically identical except for a single gene locus. This is achieved by superimposing the locus of interest on the genetic background of another by first crossing two inbred lines followed by extensive (about 20 generations) backcrossing hybrids to one parental line (the background strain) while selecting for the alleles of the locus of interest of the other. The result is an inbred strain uniquely identified by a difference at a single locus.

Consanguineous matings: Matings between two individuals who share a common ancestor in the preceding two or three generations.

Consensus sequence: The nucleotides or amino acids most commonly found at each positions of the sequences of related molecules.

Contiguous gene syndromes: Disorders caused by microdeletions or microduplications in neighboring functional genes. Inheritance is usually sporadic but recurrences are possible. Alport syndrome, DiGeorge syndrome and cri du chat disease are some examples.

Convergent evolution: Evolution of two or more different lineages towards similar morphology due to similar adaptive pressures. Examples of convergence are: fins or fin-like structures in fish, cuttlefish and whales; extreme similarity in alarm calls by five small birds; endothermy in dogs and ducks, wings of butterflies and birds.

Coefficient of relatedness: r = n(0.5)L where n is the alternative routes between the related individuals along which a particular allele can be inherited; L is the number of meiosis or generation links.

Copy number variation (CNV): Gains and losses of genomic segments resulting in variation in the number of copies of a genomic region or gene per diploid genome. Most genes show this variation and study of disease associations with CNV is becoming common (see for example AMY1). Reference gene in CNV studies is commonly RNAse P (RPPH1), which invariably exists in two copies in human diploid genome. See Redon, 2006; Sanger Institute CNV Project; and Database of Genomic Variants.

Correspondence analysis: A complementary analysis to genetic distances and dendrograms. It displays a global view of relationships among populations (Greenacre MJ, 1984; Greenacre & Blasius, 1994; Blasius & Greenacre, 1998). This type of analysis tends to give results similar to those of dendrograms as expected from theory (Cavalli-Sforza & Piazza, 1975), but is more informative and accurate than dendrograms especially when there is considerable genetic exchange between close geographic neighbors (Cavalli-Sforza et al. 1994). Cavalli-Sforza et al concluded in their enormous effort to work out the genetic relationships among human populations that two-dimensional scatter plots obtained by correspondence analysis frequently resemble geographic maps of the populations with some distortions (Cavalli-Sforza et al. 1994). Using the same allele frequencies that are used in phylogenetic tree construction, correspondence analysis using allele frequencies can be performed on ViSta (v7.0), VST, Statistica, SAS but most conveniently on Multi Variate Statistical Package (MVSP).  Link to a tutorial on correspondence analysis; StatSoft Textbook: Correspondence Analysis Chapter.

Coupling (cis-arrangement): The condition in which a double heterozygote has received two linked mutations from one parent and their wild-type alleles from the other parent, e.g., a b / + + (as opposed to a + / + b; see also repulsion).

CpG: To distinguish a sequence of two sequential nucleotides on the same strand of DNA from a pair of nucleotides, the sequence of nucleotides are shown with a "p" in between them meaning a phosphodiester bond binds them together rather than a hydrogen bond binds the pair of nucleotides in the double helix. CpG is therefore: C-phosphodiester bond-G.

CpG island: Repetitive CpG doublets creating a region of DNA greater than 200 bp in length with a G+C content of more than 0.5 and an observed/expected presence of CpG more than 0.6. Usually associated with transcription-initiation regions of (housekeeping) genes transcribed at low rates that do not contain a TATA box. The CpG-rich stretch of 20-50 nucleotides occurs within the first 100-200 bases upstream of the start site region (where promoter-proximal elements reside). A trans-acting transcription factor called SP1 recognizes the CpG islands (see also Htf islands). In vertebrates, many of the nontranscribed genes (and the genes on the inactivated X chromosome) have a 5-methyl group on the C residue in CpG dinucleotides in transcription-control regions. On the other hand, many genes with restricted expression patterns have (methylated) CpG islands located downstream of transcription initiation does not block elongation of the transcript (see also Methylation paradox).

Crossing-over (recombination): The exchange of genetic material between non-sister chromatids of homologous chromosomes (i.e., between maternal and paternal chromosomes) during meiosis. This results in a new and unique combination of genes on the daughter chromosome which will be passed on to the offspring (if that particular gamete is involved in fertilization). Between pairs of sister chromatids, there is one obligatory crossing-over at each meiosis, which may be up to four. In double crossing-over, it is possible that the previous crossing-over may be reversed. Due to these features of crossing-over, recombination frequency between any two genes is maximum 50%. See a Demonstration of Crossing-Over.

Cryptic female choice: Besides precopulatory female sexual selection, there are also postcopulatory selection processes going on in the female reproductive system. This less appreciated mechanism is the basis for differential fertilization which includes sperm selection as opposed to pollen selection in plants. This should not be confused with sperm competition / pollen tube competition (link to a book by Tim Birkhead on Sperm Selection).

C value: The amount of DNA comprising the haploid genome for a given species (picograms per cell; 2-3 pg in mammals). The C value paradox is the lack of correlation between the C values of species and their evolutionary complexity. For example, some amphibians have 30 times as much DNA as we have but not more complex than humans.

Cyanobacteria: Unicellular, photosynthetic (photo-autotroph) prokaryote (in the Kingdom Monera). Formerly known as blue-green algae. It contains chlorophyll a but not chloroplast. They reproduce by fission and never sexually.

Cytogenetics: The study of the structure, function, and abnormalities of human chromosomes (Basics, Cytogenetics Gallery).

De novo: Literally, 'from new' as opposed to inherited.

Degeneracy: A feature of the genetic code. More than one nucleotide triplet can code for the same amino acid. The same applies to the termination signal, which is encoded by three different stop codons. Only methionine and tryptophan carry unique trinucleotide sequences. See Genetic Code.

Denaturation: Reversible disruption of hydrogen bonds between nucleotides converting a double-stranded DNA molecule to single-stranded molecules. Heating or strong alkali treatment result in denaturation of DNA.

Dicentric chromosome: One chromosome having two centromeres.

Dikaryotic: A cell that contains two separate haploid nuclei (n+n) which is different from being haploid (n) or diploid (2n). Naturally seen in fungal heterokaryons. Dikaryosis is a significant genetic peculiarity of the fungi.

Diploblast: A lower invertebrate such as jelly fish that are composed of two tissue layers (ectoderm and endoderm) and lacking the third layer (mesoderm) present in higher invertebrates and vertebrates.

Diploid number (2n): The full complement of chromosomes in a somatic cell (or a sex cell before meiosis). In humans, the diploid number is 46.

Diploten (diplonema): The stage of meiosis I in which recombination between homologous chromosomes occurs. In females, oocytes are frozen at this stage at birth. Only one proceeds to the completion of meiosis every month during reproductive years.

Disjunction: Separation of homologous chromosomes during anaphase of mitotic or meiotic divisions (see also nondisjunction).

Disposable soma theory: A theory on the evolution of ageing and death suggesting that organisms derive little benefit from investing resources in increasing their lifespan beyond a certain point. It originated from the economic phenomenon that manufacturers invest minimum in durability.

Disruptive selection: Selection against the middle range of variation causing an increase in the frequency of a trait showing the extreme ranges of its variation. Disruptive selection might cause one species to evolve into two.

Disulfide bond (-S-S-): A covalent linkage between two cysteine residues in different parts of a protein or between two different proteins. Insulin (a small protein having two polypeptide chains) and immunoglobulin molecules, for example, have interchain and intrachain disulfide bonds. Endothelin and HLA molecules also have disulfide bonds. The C282Y mutation removes one of the disulfide bonds in the HLA class I-like HFE protein and abolishes its surface expression.

Divergent evolution: A kind of evolutionary change that results in increasing morphological difference between initially more similar lineages.

DNA (deoxyribonucleic acid): The large double-stranded molecule carrying the genetic code. It consists of four bases (adenine, guanine, cytosine and thymine), phosphate and ribose.

DNA binding motif: Common sites on different proteins which facilitate their binding to DNA. Examples are leucine zipper and zinc finger proteins. Any such protein is called DNA-binding protein.

DNA Polymerase: A group of enzymes mainly involved in copying a single-stranded DNA molecule to make its complementary strand. Eukaryotic DNA polymerases participate in chromosomal replication, repair, crossing-over and mitochondrial replication. To initiate replication, DNA polymerases require a priming RNA molecule. They extend the DNA using deoxyribonucleotide triphosphates (dNTP) as substrates and releasing pyrophosphates. The dNMPs are added to the 3' OH end of the growing strand (thus, DNA replication proceeds from 5' to 3' end).

DNA repair: Restoration of the correct nucleotide sequence of a DNA molecule that has acquired a mutation or modification. It includes proofreading by DNA polymerase (see helicase). See DNA Repair: Dynamic Defenders against Cancer and Aging (WorldHealth.net) and Databases and Bioinformatics Tools for the Study of DNA Repair (Milanowska, 2011).

dn/ds ratio: In molecular phylogenetic studies, the ratio of the number of non-synonymous nucleotide substitutions to the number of synonymous nucleotide substitutions. In the case of functionally important (or otherwise constrained) genes, ds is expected to exceed dn (dn/ds <1). Because most amino acid changes will disrupt protein structure and those non-synonymous substitutions (dn) causing them will not be maintained. In a non-functional pseudogene, there will be no discrimination between them and equal numbers of dn and ds are expected (dn/ds=1). When natural selection is acting to favor changes at the amino acid level, it is predicted that dn will exceed ds, hence a high dn/ds ratio. In classical MHC loci, in the peptide binding regions (allele-specific sequences) because of heterozygote advantage/frequency-dependent selection, there is always a high dn/ds ratio (>1) whereas in the remainder of the gene dn/ds <1 (due to functional constrains). This suggests balancing selection is acting on peptide binding regions.

Domain: Region of a protein with a distinct tertiary structure and characteristic activity (for example, the membrane distal and membrane proximal domains of an MHC molecule).

Dominance: The property possessed by some alleles of determining the phenotype for any particular gene by masking the effects of the other allele (when heterozygous). Thus, homozygosity or heterozygosity for the dominant allele result in the same genotype in complete dominance (if red is dominant over white, the petals of a flower heterozygous for red and white would be red). Incomplete dominance appears as a blend of the phenotypes corresponding to the two alleles (like pink petals as opposed to red or white). In co-dominance, both alleles equally contribute to the phenotype (red and white petals occur together).

Dominant allele: An allele that masks an alternative allele when both are present (in heterozygous form) in an organism. Most common autosomal dominant diseases are due to mutations in transcription factor genes (Jimenez-Sanchez, 2001). See also recessive.

Dominant-negative mutation: A (heterozygous) dominant mutation on one allele blocking the activity of wild-type protein still encoded by the normal allele (often by dimerising with it) causing a loss-of-function phenotype. The phenotype is indistinguishable from that of homozygous dominant mutation. P53 mutations may act as dominant-negative (see also haploinsufficiency).

Dosage compensation: The phenomenon in women, who have two copies of genes on the X chromosome, of having the same level of the products of those genes as males (who have a single X chromosome). This is due to the process of random inactivation of one of the X chromosomes in females (Lyonisation). See Lyon's hypothesis.

Dot plot: A visual representation of the similarities between two sequences. The two sequences are arranged along the axes of a simple graph and a dot is placed at every point where the two sequences are similar or identical. A diagonal stretch of dots will indicate regions where the two sequences are similar.

Double heterozygote: An individual who is heterozygous at two loci under investigation.

Double stranded RNA (dsRNA): In eukaryotes, it is an accidental byproduct of transcriptional process. It may occur as the genome of certain viruses (such as reovirus) or may be produced during viral replication as a general marker for viral infection. It is believed that dsRNA is the toxic substance responsible for general symptoms of viral infection as it induces cytokine production. dsRNA is the major activator of the PKR enzyme which is the major agent of anti-viral innate immunity.

Downstream: The direction which RNA polymerase moves during transcription (5' to 3') and ribosomes moves during translation. By convention, the +1 position of a gene is the first transcribed nucleotide; nucleotides downstream from the +1 position are designated +2, +3, etc.

Drift evolution: A high rate of immunologically significant mutations in certain viruses. This results in drifting away from recognition by the immune system by antigenic change. Influenza virus, HIV and HCV constantly change their antigenic structure through drift evolution.

Drosophila melanogester: Common fruit fly. It contributed heavily to the study of genetics because of its ease and speed to breed. It contains only four pairs of chromosomes. Link to Drosophila website.

Dynamic mutation: Changes in the DNA-repeat copy number of an STR locus. Such changes are responsible for diseases like fragile X syndrome, myotonic dystrophy and Huntington disease as well as genetic anticipation. See Richards & Sutherland, 1997.

Ecogenetics: The branch of genetics that studies how (inherited or acquired) genetic factors influence human susceptibility to environmental health risks. It studies the genetic basis of environmental toxicity to develop methods for the detection, prevention and control of environment-related disease. Ecogenetics interacts with ecology, molecular genetics, toxicology, public health medicine and environmental epidemiology.

Ecological genetics: The analysis of genetics of natural populations and of the adaptations of them to the environment.

Ecology: The study of the interrelationships among living organisms and their environment. Human ecology means the study of human groups as influenced by environmental factors, including social and behavioral ones.

Effective population size (N or Ne): Number of individuals contributing 'unique' chromosomes to the next generation (Nf = number of mothers in a population; relevant in the calculation of number of generations for the fixation of a mitochondrial allele). It is always less than or equal to the actual population size. Inbreeding effectively reduces Ne because of the identity (not unique) of most chromosomes in the population.

ELISA (enzyme-linked immunosorbent assay): An assay for quantifying the presence of an antigen by using an enzyme linked to an antibody to the antigen.

Embryo: A developing offspring during the period when most of its internal organs are forming. It is called fetus in the next stage of development. 

Endonuclease: A nuclease which cuts a nucleic acid molecule by cleaving the phosphodiester bonds between two internal residues. Best known examples are restriction endonucleases.

Enhancer: A cis-acting (on either side of a gene) enhancer of promoter function without any promoter activity of its own. They are usually located 10 to 50 kb downstream or upstream of a gene, but may also be in the coding regions. They may be tissue-specific. The enhancer effect is mediated through sequence-specific DNA-binding proteins (see also silencer).

ENTREZ: The online search and retrieval system that integrates information from databases at NCBI. See Global ENTREZ search.

Epidemiology: The study of the distribution and determinants of health-related events aiming to trace causes of disease, and subsequently to control and prevent diseases. See Epidemiology Notes.

Epigenesis: The theory that the development of an embryo consists of the gradual production and organization of parts.

Epigenetics: The study of heritable changes in gene expression that occur without a change in DNA sequence. Epigenetic phenomena such as imprinting and paramutation violate Mendelian principles of heredity. Epigenetic studies link genotype to phenotype working out the chain of processes (mainly in developmental biology) (see Epigenetics: Special Issue of Science, 2001).

Epistasis: In population genetics, the nonreciprocal interaction between nonallelic genes affecting the selective values of genes (see Trojak, 1983). This may result in masking of one and in this case, the masked gene is said to be hypostatic. An epistatic-hypostatic relationship between two loci is similar to a dominant-recessive relationship between alleles at a particular locus. See a Commentary on Epistasis by JH Moore.

Epistatic interaction: In genetic epidemiology, an epistatic effect is the modification of the risk conferred by one marker by the presence of a marker from an unrelated gene (unlinked gene-gene interaction). For examples, see Kajiwara, 1994 (retinitis pigmentosa); Olson, 2002; Pastor, 2003; Robson, 2004 (Alzheimer Disease); and Martin, 2002 (KIR3DL in HIV-AIDS).

Epitope: The specific site on an antigen that is recognized by an antibody (also known as the antigenic determinant).

Escherichia coli: A gram-negative bacterium whose genome has been sequenced in its entirety. It is model organisms for the study of the prokaryotes. Link to the E.coli genome project website.

EST (expressed sequence tag): A partial sequence of a cDNA molecule. See also STS.

Ethology: Study of animal behavior under normal conditions.

Eugenics: The idea of improving the quality of human species by selective breeding. Encouraging breeding of those with supposedly good genes is positive eugenics, whereas discouraging those with genes for undesirable traits is negative eugenics.

Eukaryotic cell: The DNA lies within a true nucleus (eu-karyon). May be unicellular (protist, some fungi) or multicellular (most fungi, plants, animals). Among eukaryotes, most fungi are haploid.

Excinuclease: The excision nuclease involved in nucleotide exchange repair of DNA.

Exon: The coding sequence of a eukaryotic gene (see also open reading frame).

Exon - intron boundary: Introns end with the dinucleotide ApG [3' splice site / acceptor] and start with the dinucleotide GpT [5' splice site / donor].

Exon shuffling: A hypothesis that suggests that new proteins arose in evolution by rearranging exons that encoded discrete structural elements.

Exonuclease: A nuclease that degrades a double-stranded DNA molecule by removing nucleotides from its two ends. Exonucleases can be specific for digestion from the 3' or 5' ends of the nucleic acid.

Expressivity: The range of phenotypes resulting from a given genotype (cystic fibrosis, for example, may have a variable degree of severity). This is different from pleiotropy which refers to a variety of different phenotypes resulting from the same genotype, or from penetrance.

Extended phenotype: All effects of a gene upon the world where the effects influence the survival chance of a gene [Richard Dawkins].

Extra-chromosomal inheritance: Non-Mendelian inheritance due to extra-nuclear DNA (mitochondrial DNA in animals). The transmission of the trait only occurs from mothers.

Evolution: The process that results in heritable changes in a population spread over many generations (change in allele frequencies over time). Biological evolution refers to populations and not to individuals and that the changes must be passed on to the next generations. Genes mutate, individuals are selected, and populations evolve. See Evolution-related Links.

Evolutionarily stable strategy (ESS): A strategy such that if most members of a population adopt it, would generate reproductive fitness greater than any other strategy.

Ewens-Watterson neutrality test: Also called E-W homozygosity statistics. Described by Ewens (1972) and Watterson (1978). A widely used but not a statistically powerful test in population genetics to estimate the selection acting on a locus. It compares the sum of observed homozygosity for each allele of a given locus (Fo) with the expected Fe value based on the number of alleles in the locus of interest, neutrality expectations and random mating assumption. A test of comparison yields an F value. Values close to zero mean that the locus is evolving under neutrality (genetic drift only) and there is no selection. Values of F significantly different from zero suggest selection. When Fo > Fe, the locus is undergoing purifying selection, and when Fe > Fo, the locus is under balancing selection (very common for HLA loci) (see Nielsen, 2001, Luikart, 2003, Harris & Meyer, 2006 for reviews). Alternative tests for neutrality are Tajima's D (Tajima, 1989) and Slatkin's Exact Test for Neutrality (Slatkin, 1996; Slatkin & Muirhead, 2000). See Tripathy & Reddy, 2007 (Table 1) for a review of signatures of natural selection in the genome (in the context of G6PD deficiency) and also Basic Population Genetics.

F1: First filial (son or daughter) hybrids arising from a first cross. Subsequent generations are denoted by F2, F3 etc. In animal studies of quantitative trait locus (QTL) mapping, two animals with extremes of the phenotype (like lowest and highest blood pressure) are mated to generate F1 and then F1 x F1 matings produce an F2 generation with a wide spectrum of the phenotype which are then used for mapping studies.

F Factor (Fertility Factor): Transmissible plasmid (episome) in bacteria (such as E.coli) that acts as a sex factor. It is a circular DNA about 94 kb long. Conjugation and chromosomal gene transfer occur from F+ (male) to F- (female) bacterium.

F' (F-prime) factor: Normally, the F factor contains genes related to conjugation/mating. The F' factor contains an additional portion of the bacterial genome.

F-statistics: A measure of genetic structure developed by Sewall Wright (1969, 1978) known as Wright's F-statistics. FST is the proportion of the total genetic variance contained in a subpopulation (the S subscript) relative to the total genetic variance (the T subscript). Values can range from 0 to 1. High FST implies a considerable degree of distance/differentiation among populations. FIS (inbreeding coefficient) is the proportion of the variance in the subpopulation contained in an individual. High FIS implies a high degree of inbreeding. Related measures are q (theta; the coancestry coefficient) of Cockerham & Weir (1984; Weir, 2002) and GST of Nei (1973, 1977). See also Nielsen, 2001, Luikart, 2003, Harris & Meyer, 2006 for reviews and Basic Population Genetics.

F+ strain: E.coli strain behaving as donors during conjugation (male). It has the F factor.

F- strain: E.coli strain behaving as recipients during conjugation (female). It lacks the F factor.

Falconer's multifactorial liability threshold model: Originally described and modelled in an analysis of polydactyly in guinea pigs (Wright S, 1934) and applied to human genetics by Douglas Falconer (Falconer DS, 1965 & 1967; Fraser FC 1976 & 1980). Nicely explained in Falconer's polygenic threshold model for dichotomous nonmendelian characters in Human Molecular Genetics. See also a  Lecture Note by Dr R Tissot. For an example see a paper by Wanstrat & Wakeland. See also Introduction to Genetic Epidemiology.

FASTA format: A simple universal text format for DNA and protein sequences. See FASTA and other DNA sequence formats.

Fertilization: Fusion of female and male haploid gametes to form a diploid zygote from which a new individual develops.

Fetus: Final development stage before birth (following embryo).

Fingerprinting: The use of RFLPs or repeat sequence DNA to establish a unique individual-specific pattern of DNA fragments.

F.I.S.H. (fluorescence in situ hybridization): One of the more modern methods in cytogenetics, which uses fluorescence-labelled chromosome-specific DNA, probes to detect translocations, inversions, deletions, amplifications and other structural or numerical chromosomal abnormalities. FISH permits analysis of proliferating (metaphase cells) and non-proliferating (interphase nuclei) cells and is useful in determining the percentage of neoplastic cells before and after therapy (minimal residual disease) (see examples at Washington University FISH Gallery; a review by Mathew & Raimondi, 2003; a review of the use of FISH in childhood leukemia, see Harrison CJ, 2001).

Fisher’s fundamental theorem: The rate of increase in fitness is equal to the additive genetic variance in fitness. This means that if there is a lot of variation in the population the value of S will be large.

Fisher's theorem of the sex ratio: In a population where individuals mate at random, the rarity of either sex will automatically set up selection pressure favoring production of the rarer sex. Once the rare sex is favored, the sex ratio gradually moves back toward equality.

Fitch-Margoliash method: Algorithm for building phylogenetic trees from genetic distance data without the assumption of equal evolutionary rate (see Basic Population Genetics).

Fitness: Lifetime reproductive success of an individual (i.e., the total number of offspring who themselves survive to reproduce). It can be seen as the extent to which an individual successfully passes on its genes to the next generation. It has two components: survival (viability) and reproductive success (fecundity). Variation in fitness is the major driving force in biological evolution (see also genetic fitness).

Fixed: The establishment of a single allelic variant at a locus as a result random genetic drift.

Five-prime (5') end: The end of a DNA or RNA strand with a free 5' phosphate group corresponding to the transcription initiation (see also three-prime end).

Footprinting, DNase: DNA with protein bound is resistant to digestion by DNase. When a sequencing reaction is performed using such DNA, a protected area representing the footprint of the bound protein will be detected. This permits identification of the protein binding regions of the DNA. See also Gene Expression.

Founder effect (Sewall Wright effect): A type of genetic drift in which allele frequencies are altered in a small population, which is a nonrandom sample of a larger (main) population.

Frameshift mutations: Mutations, usually deletions or insertions that change the reading frame of the codon triplets. See Description of Sequence Changes for frameshift mutations.

Fugu: The puffer fish, Fugu rubripes, has essentially the same number of genes as the human genome, but its genome is eight times more compact than human genome (about 400 Mb as opposed to 3 Gb). A project to sequence the whole Fugu genome is underway. Link to Fugu website and the latest release of the draft sequence.

Fu/HC: The fusion/histocompatibility system of the Ascidians. It is involved in self - nonself recognition and regulates the fusion between compatible organisms (De Tomaso, 2005). See also Bottryllus and protochordates).

Fungus: A Kingdom made up of a diverse group of unicellular or multicellular, eukaryotic organisms which are not plants or animals. Many are parasitic or saprophytic. Both asexual and sexual reproductions are possible. The Kingdom includes five phyla: Zygomycetes (conjugating fungi, black bread molds), Deuteromycetes (reproduce only asexually, Aspergillus 'brown mold' and Penicillium), Basidiomycetes (incl. mushrooms), Ascomycetes (incl. Neurospora 'bread mold' and Saccharomyces 'baker's yeast') and Mycophycophyta (incl. lichens). Some of them (Basidiomycetes) have one of the most ancient pheromone-based mating-type recognition systems. See Fungi in Kimball's Biology Pages and Fungus in Tree of Life. See also dikaryosis, heterokaryon and mating types.

Gain-of-function mutation: A mutation that results in a protein with a new or enhanced function. An example is the conversion of proto-oncogenes to oncogenes by a gain-of-function mutation. See also loss-of function mutation.

Galton's regression law: Individuals differing from the average character of the population produce offspring, which, on the average, differ to a lesser degree but in the same direction from the average as their parents.

Gamete: A haploid reproductive cell such as sperm (or pollen) and egg (oocyte).

Gametophyte: The haploid, gamete-forming (sexual) generation in plants with alternation of generations. Typically it is produced from a haploid spore. See also sporophyte.

Gametic association: see linkage disequilibrium.

Gatekeeper gene: A class of genes which directly regulate tumor growth by inhibiting growth or by promoting cell death. TP53 is the prime example. See Cancer Genetics.

GC box: A component of many eukaryotic promoters, especially those from constitutively expressed genes. The consensus sequence for the GC box is 5'-GGGCGG-3'. See also Gene Expression.

Gender: Differences between any two complementary organisms of the same species that render them capable of mating (see mating types). See also Sex Factors.

Gene: Physical and functional unit of heredity that carries information from one generation to the next, which is the entire DNA sequence necessary for the synthesis of a functional polypeptide or RNA molecule. In addition to the coding regions (exons), a gene may have non-coding intervening sequences (introns) and transcription-control regions. See NCBI ENTREZ GENE; Human Genome Statistics and HUGO Human Gene Names.

Gene conversion: Partial sequence transfer from one allele to another (interallelic recombination) converting one gene or allele to another one. It is the most common mechanism, especially for the HLA-B locus, in the generation of new MHC alleles. Less common are conversions between alleles of different MHC loci (intergenic conversion).

Gene expression: The process that converts a gene's coded information into the structures operating in the cell. Expressed genes include those that are transcribed and translated all the way to peptides, and those that are transcribed into RNA but not translated into protein (e.g., noncoding RNAs).

Gene flow: The movement of genes within a population or between two populations following genetic admixture. Gene flow creates new combinations of genes or alleles in individuals that can be tested against the environment. This way it is one of the sources of variation in the process of natural selection.

Genetic anticipation: The progressive shift of the age of onset of a hereditary disease to earlier ages in successive generations. It may occur because a parent is a mosaic, and the child has the full mutation in all cells. Triplet repeat expansion may demonstrate anticipation when the number of repeats increases with each generation.

Genetic determinism: The (incorrect) belief that genes alone form all characteristics of an individual organism. For most traits (even for some single-gene disorders), both genetic and environmental factors have a role in their determination.

Genetic distance: A measurement of genetic relatedness of populations. The estimate is based on the number of allelic substitutions per locus that have occurred during the separate evolution of two populations. Link to a lecture on Estimating Genetic Distance and GeneDist: Online Calculator of Genetic Distance. The software Arlequin v3.01, PHYLIP, GDA, PopGene, Populations and SGS are suitable to calculate population-to-population genetic distance from allele frequencies. GenAlEx can be used to calculate genetic distance on Excel.

Genetic distance estimation by PHYLIP: The most popular (and free) phylogenetics program PHYLIP can be used to estimate genetic distance between populations. Most components of PHYLIP can be run online. One component of the package GENDST estimates genetic distance from allele frequencies using one of the three methods: Nei's, Cavalli-Sforza's or Reynold's (see papers by Nei, 1983, Nei M, 1996 and a lecture note for more information on these methods). GENDST can be run online using the default options (Nei's genetic distance) to obtain genetic distance matrix data. The PHYLIP program CONTML estimates phylogenies from gene frequency data by maximum likelihood under a model in which all divergence is due to genetic drift in the absence of new mutations (Cavalli-Sforza's method) and draws a tree. The program comes as freeware as part of PHYLIP or this program can be run online with default options. If new mutations are contributing to allele frequency changes, Nei's method should be selected on GENDST to estimate genetic distances first. Then a tree can be obtained using one of the following components of PHYLIP: NEIGHBOR also draws a phylogenetic tree using the genetic distance matrix data (from GENDST). It uses either Nei's "Neighbor Joining Method," or the UPGMA (unweighted pair group method with arithmetic mean; average linkage clustering) method. Neighbor Joining is a distance matrix method producing an unrooted tree without the assumption of a clock (UPGMA does assume a clock). NEIGHBOR can be run online. Other components of PHYLIP that draw phylogenetic trees from genetic distance matrix data are FITCH / online (does not assume evolutionary clock) and KITSCH / online (assumes evolutionary clock).

Genetic drift: Evolutionary change over generations due to random events in small populations (not to be mixed with sampling error due to a small sample size). It operates unless overcome by strong selective forces. Wildly different HLA allele frequencies among South Amerindian tribes are believed to be result of probable genetic drift in each small tribe. Links to simulation-1 & simulation-2.

Genetic fitness: Classic genetic fitness is the average direct reproductive success of an individual possessing a specific genotype in comparison to others in the population. Inclusive fitness is described as the classic fitness plus the probability that an individual's genotype may be passed on through relatives.

Genetic heterogeneity: Presence of several different genotypes contributing to the genetic component of a disease. In clinical settings, genetic heterogeneity refers to the presence of a variety of genetic defects which cause the same disease, which may be the mutations at different positions on the same gene, a finding common to many human diseases (including congenital deafness, cystic fibrosis, lipoprotein lipase deficiency and polycystic kidney disease).

Genetic linkage: The situation referring to segregation of two or more genes together as a unit. Genetic linkage is thought to arise to accommodate genes that function best in each other's company, i.e., to provide a necessary cooperative effect that enhances survival. Genetic linkage reflects a lack of meiotic crossovers between two genes (see exercises on Gametes under Linkage and Linkage Pedigrees).

Genetic load: The average number of lethal equivalents (or any recessive mutant lowering fitness) per individual in a population which are propagated by heterozygotes in a masked state.

Genetic relatedness (r): A quantitative measure of genetic relatedness between individuals. In diploid species, r=1/2 between full siblings, or parent and child; r=1/4 for half siblings or aunt/uncle versus niece/nephew, or for grandparents versus grandchildren; r=1/8 for first cousins; r=0 for non-relatives (see also coefficient of relatedness).

Genetic variance: The phenotypic variance in a population that is due to genetic heterogeneity.

Genetics: Study of variation and heredity and their physical basis in DNA.

Genocopy: A gene/genotype causing the same phenotype as another gene/genotype. Genocopies are the basis of genetic heterogeneity and important in genetic diagnosis and counseling.

Genome: Total genetic material in a set of haploid chromosomes as in a germ cell. The human genome contains 3,000 Mbp whereas the E.coli genome has 4.6 Mbp (see also C value). Link to the Genome Catalogue and List of OMEs.

Genome-wide association study (GWAS): Simultaneous investigation of up to five million genetic variants covering the whole genome in complex genetic diseases. See NIH guide to GWAS; a GWAS Presentation by G McVean; the WTCCC GWAS (PDF) and NHGRI GWAS Catalog. For GWAS-related software, see Introduction to Genetic Epidemiology; for GWAS-related bioinformatics tools, see Bioinformatics Tools.

Genomic imprinting: Differing expression of genetic material dependent on the parent-of-origin. This is due to methylation of one of the alleles depending of its origin. A very illustrative example is the inherited neck tumor paraganglioma for which the predisposition gene is only active if inherited from the father. Genomic imprinting must be considered in disorders that appear to have skipped a generation. For more, see Genomic Imprinting website.

Genomic instability: One of the first phenomena in the formation of malignancies. It is due to defects in DNA repair and cell cycle controls. This can happen by gain-of-function mutations in proto-oncogenes or loss-of-function mutations in tumor suppressor genes.

Genotype: The diploid genetic formula at one or more loci.

Genotype relative risk: The risk of disease for one genotype at a locus versus another genotype (referent) at the same locus.

Genotype-environment (GxE) interaction: This term refers both to the modification of genetic risk factors by environmental risk and protective factors and to the role of specific genetic risk factors in determining individual differences in vulnerability to environmental risk factors. For a review, see Heath & Nelson, 2002).

Geological timescale: The period between the origin of earth (4,500 Mya) and the beginning of the Cambrian period (540 Mya) is called the Precambrian Eon. The last 540 million years (Phanerozoic Eon) are divided into three eras: Palaeozoic (540-245 Mya); Mesozoic (245-65 Mya); Cainozoic. The geological periods (included in an era, longer than an epoch) are as follows: Vendian (immediately before the Cambrian; 610-540 Mya); Cambrian (540-510 Mya); Ordovician; Silurian; Devonian; Carboniferous; Permian; Triassic / Jurassic / Cretaceous (altogether the Mesozoic Era); Tertiary (65-1.64 Mya) and Quaternary. An epoch is a subdivision of a period.

Germ line: Genetic material transmitted from one generation to the next through the gametes. A germ line mutation exists in all cells of the offspring formed from that gamete.

Germinal mosaicism: A mixture of gonadal cells with different numbers of chromosome numbers or other chromosomal abnormalities. It can lead to aneuploid offspring from phenotypically normal parents with an unpredictable recurrence risk.

Gonadal (germline) mosaicism: If a de novo mutation selectively affects the cells destined to become gonadal cells during early embryogenesis, the affected individual will be phenotypically normal, and the somatic cells will be free of the mutation, but all or some gonadal/germ cells will have the mutation to transmit it to the next generation. The end result is transmission of a genetic disorder by a healthy person causing sporadic form a genetic disease in the offspring and a higher than the general population risk in the following siblings.

Germline mutation: If a parent is positive for a mutation, all their cells including germ cells will be positive for it, and the offspring will have the same germline mutation in all their cells. Alternatively, germline mosaicism in a parent may also result in a germline mutation transmission to the offspring. Germline mutations are inherited and transmitted to the following generations. If they are disease-causing mutations, the disease will show familial aggregation and an inheritance pattern. See also somatic mutation and Cancer Genetics.

GTP (guanosine 5'-triphosphate): A nucleotide that is a precursor in RNA synthesis, which plays a role in protein synthesis (as well as in signal transduction and microtubule assembly). See also cap.

GWAX: GWAS by proxy. A GWAX is a GWAS in which the phenotype is inferred on the basis of parental phenotype. UK Biobank has published a lot of GWAX studies. See an example by de la Fuente, 2022.

Gyrase: One of the bacterial DNA topoisomerases that functions during DNA replication to reduce molecular tension caused by supercoiling (supertwisting). DNA gyrase produces, then seals, double-stranded breaks.

H-2 complex: The major histocompatibility complex (MHC) of the mouse. It is the first MHC discovered in 1937 by Peter Gorer.

Hairpin loop: A loop of nucleic acid formed by duplex formation within a single strand (also called stem loop). If happens in a PCR primer, it will not function.

Haploid number (n): The number of chromosomes in the gamete after meiosis. In humans, the haploid number is 23.

Hamilton’s rule (theory of kin selection): In an altruistic act, if the donor sustains cost C, and the receiver gains a benefit B as a result of the altruism, then an allele that promotes an altruistic act in the donor will spread in the population if B/C >1/r or rB-C>0 (where r is the coefficient of relatedness).

Haploinsufficiency: Situation where one normal copy of a gene alone is not sufficient to maintain normal function. It is observed as a dominant mutation on one allele (or deletion of it) resulting in loss-of-function in a diploid cell because of the insufficient amount of the wild-type protein encoded by the normal allele on the other haplotype (see also dominant negative). For examples of haploinsufficiency, see Ogata, 2001 (SHOX; short stature homeobox containing gene) and Kurotaki, 2002 (in Sotos syndrome). See also Clinical Genetics.

Haplotype: The particular combination of alleles in a linked group encoded by genes in close vicinity on the same chromosome.

Hardy-Weinberg equilibrium (HWE): In an infinitely large population, gene and genotype frequencies remain stable as long as there is no selection, mutation, or migration. For a bi-allelic locus where the gene frequencies are p and q: p2+2pq+q2 = 1. HWE should be assessed in controls in a case-control study and any deviation from HWE should alert for genotyping errors (Gomes, 1999; Lewis, 2002) but see also Zou & Donner, 2006. Relying only on the HWE test to detect genotyping errors is not recommended as this is a low power test (Leal, 2005). (HWE in EvoTutor; HWE Tutorial in Life, 7th Ed; Online HWE Analysis (OEGE); HWE and Association Testing for SNPs in Case-Control Studies; Excel-based HWE Test (by Michael Court); Basic Population Genetics Notes).

Hayflick limit: The number of times a mammalian cell can divide is limited and this number is called Hayflick number as described by Leonard Hayflick (Hayflick L, 1965). The limit is determined by the shortening of telomeres at each cell division. This limit does not apply to cancer cells due to possession of telomerase, which maintains telomere length rendering cancer cells immortal.

Heat shock response: Heat shock response is ubiquitous and highly conserved defense mechanism for protection of cells from harmful conditions such as heat shock, UV irradiation, toxic chemicals, infection, transformation and appearance of mutant and misfolded proteins. Heat Shock Proteins (HSPs) also function as accessory molecules in antigen presentation. HSP70 genes are within the MHC in most vertebrates. High levels of HSP70 prevent stress-induced apoptosis, and may have a transforming potential.

Helicase: An enzyme that unwinds the double DNA helix near the replication fork before DNA polymerase acts on it. Replication fork moves from 3' to 5' of the leading strand. Unwinding is also necessary for DNA repair. Mutations in the helicase genes on chromosome 2q and 19q are one group of causes of the DNA repair defect xeroderma pigmentosum (an autosomal recessive disease). See also primosome.

Hemizygous: As in any X-linked trait in males, absence of a homologous counterpart for an allele. It may also result from deletion. Males are hemizygous for mutations on X chromosome.

Heritability (narrow sense): The proportion of the total phenotypic variance that is attributable to additive genetic variance (h2= genetic variance / total phenotypic variance). A high h2 does not mean that the trait cannot be influenced by environment. In a different environment, h2 may not be that high.

Hermaphroditism: Having both male and female sexual organs in one individual. Most invertebrates and plants are hermaphrodites. Union of the gametes of the same individual (self-fertilization) is the most extreme example of inbreeding.

Heterogametic sex: The sex, which has the two different sex chromosomes (XY). Human and Drosophila males are the heterogametic sex, whereas, in birds, moths, some fish and amphibians, females are the heterogametic sex (ZW).

Heterokaryon: A cell containing more than one genetically different nucleus. Naturally occurs in fungi as long as their fungal (heterokaryon) incompatibility types are identical (see also dikaryotic).

Heterogeneous nuclear RNA (hnRNA): RNA products immediately synthesized from the DNA template in the nucleus (sometimes called DNA-like RNA or dRNA). This RNA species has a short half-life, is very heterogeneous and very large (molecular weight in excess of 107). hnRNA molecules are processed to generate the mRNA molecules (molecular weight generally less than 2x106) before leaving the nucleus.

Heteroplasmy: The presence of more than one type of mitochondrial DNA in a cell (wild-type and mutant).

Heterothallic: Organisms (fungi, algae, plants) that can only undergo sexual reproduction with another bearing a different mating/compatibility type (self-incompatible). See also homothallic.

Heterozygosity: Presence of two different alleles at a locus in a diploid organism (see homozygosity). It is the result of inheritance of different alleles from parents. For relevance of heterozygosity in disease states, see Beckman, 1990; Vockley, 2000; Vladutiu, 2001. Rarely, only heterozygosity but neither homozygous genotypes cause a disease. For a review, see van Heyningen, 2004.

Heterozygote advantage: Also called overdominance (a form of balancing selection). For an example and a list of all known examples, see Gemmell & Slate, 2006 and Supplemental Table 1. Genome-wide heterozygosity has been reported to confer advantage for common diseases (Campbell, 2007).

Heuristics: A term in computer science that refers to guesses made by a program to obtain approximately accurate results. Frequently used in phylogenetics and computational biology. See explanation.

Hfr: A male bacterial cell that has the F factor integrated into its chromosome is an Hfr (high frequency of recombination) cell. Crosses between Hfr cells and F- females produce far more recombinant progeny than do crosses between F+ males and F- females.

High-throughput typing: Simultaneous genotyping of large numbers of samples. Most machines can run 4x96 (384) samples simultaneously (SNP typing, real-time PCR, sequencing) with a queing system that would allow automatic continuation of the typing. The ultimate high-throughput genotyping method is GWAS microarray chips, which can genotype up to five million variants using a small amount of DNA.

Histones: Highly conserved basic proteins that are involved in the packing of DNA. They have a high arginine/lysine content. Histone proteins and the nucleosomes they form with DNA are the fundamental building blocks of eukaryotic chromatin. They bind to the phosphate groups of DNA by their amino termini. There are five major types of histone proteins. Two copies of H2A, H2B, H3 and H4 bind to about 200 base pairs of DNA to form the repeating structure of chromatin (nucleosome) with H1 binding to the linker sequence. Histone genes do not encode poly-A tail. Possible post-translational modifications of histone molecules include deacetylation of lysine, methylation of lysine and arginine, ubiquitination or phosphorylation. While histone acetylation and possibly phosphorylation correlate with gene activity, histone methylation seems to have diverse functions: methylation of lysine 4 of the N-terminal tail of histone H3 (H3-K4) is associated with gene activity but methylation of lysine 9 (H3-K9) is associated with gene silencing (as is deacetylation). See Hu & Hoffman, 2001 and See Rando, 2007.for reviews.

Histone Code Hypothesis: A hypothesis proposes that specific constellations of modified histone residues are thought to regulate unique biological outcomes through specific interactions with other components of chromatin (Strahl & Allis, 2000; Jenuwein & Allis, 2001).

Histone deacetylase: An enzyme that contributes to transcriptional repression by deacetylation of acetylated lysine residues in histones.

HLA complex: The human major histocompatibility complex (MHC). An HLA haplotype has been totally sequenced in 1999 (for more information, see MHC).

Holandric gene: A gene carried on the Y chromosome and therefore transmitted from father to son.

Homeobox: Conserved protein sequence, which forms a DNA-binding domain in a class of transcription factors.

Hominid: A member of the Hominidae family.

Homologous chromosomes: Chromosomes that occur in pairs one having come from the male parent and the other from the female parent. The pair participate in crossing-over during meiosis. Homologous chromosomes contain the same array of genes but may contain different alleles at those loci.

Homology: A similarity due to inheritance from a common ancestor (see also analogy). An example is mammals' back legs. Homology may be due to orthology (between species) or paralogy (within a species).

Homothallic: Organisms (fungi, algae, plants) that can undergo sexual reproduction with a similar strain including the self (self-compatible) (see also heterothallic).

Homozygosity: Presence of two identical alleles at a locus in a diploid organism (see heterozygosity). It is the result of inheritance of identical alleles from both parents.

Homozygosity mapping: Recessive diseases require two copies of an allele for expression. Because of linkage disequilibrium, loci surrounding the disease locus will tend to be homozygous in affected individuals. Searching for homozygous segments in diseased individuals help to locate the disease gene. This is called homozygosity mapping (Lander & Botstein, 1987).

House-keeping genes: Genes which are constitutively expressed in most cells because they provide basic functions.

Htf island: A Hpa Tiny Fragment island is an unmethylated CpG-rich region in the genome. Eighty percent of these occur at or near genes, particularly housekeeping genes.

Hybrid: The offspring of two distinct species.

Hybridization: The specific reassociation of complementary strands of nucleic acids.

Hybrid vigor (heterosis): Unusual growth, strength, and health of heterozygous offspring from two less vigorous homozygous parents.

Hypothesis: An unproven but testable scientific proposition. A theory is a statement with some confirmation.

Identity by descent (IBD): Alleles that trace back to a shared ancestor. In family studies, IBD refers to  inheritance of the same allele from a given parent.

Imprinting: See genomic imprinting.

Inbreeding: Production of offspring by (blood) related parents. Its most extreme form is self-fertilization in hermaphrodites (most invertebrates and plants).

Inbreeding depression: Reduction in offspring fitness resulting from mating between blood relatives.

Incest: Sexual relationships between parents and children, or between brothers and sisters.

Incomplete dominance: One allele is not expressed, but the other allele expresses itself normally so that the phenotype gets half the dose of the effect.

Indel polymorphism: Insertion/deletion polymorphism. See Description of Sequence Changes for indel polymorphisms.

Initiation complex: A multi-protein complex that forms at the site of transcription initiation and is composed of RNA polymerase II, ubiquitous or general transcription or initiation factors (TFII or IF/eIF), promoter elements and gene-specific enhancers/silencers. This complex initiates the RNA synthesis.

Innate immunity: Pre-existing and non-specific defense immunity with a very low memory component if any. As the primitive immune response against bacteria, it is present in invertebrates and vertebrates.

Inosine (I): A modified nucleotide that occurs in tRNA (anticodon) and can pair with A, U(T) or C in the codon.

Insulator: A genomic element located between an enhancer and a gene promoter which insulates an enhancer to act on a neighboring gene. Insulator activity is characterized by CTCF binding (but paradoxically, CTCF binding may also mediate gene activation through promoting enhancer–promoter looping). See Phillips & Corces, 2009.

Integrase: An enzyme that catalyzes a site-specific recombination (integration or excision) involving a prophage and a bacterial chromosome.

Intron: A non-coding section of DNA within a gene that is not translated to a peptide. Intervening sequences between exons. Introns are featured in the primary transcript (pre-mRNA) but removed by splicing during nuclear RNA processing/editing.

Invertebrate: All animals other than those in the phylum Chordata; lower metazoans. They do not possess a notochord or vertebral column. Examples are worms, corals, sponges, etc. The protochordates are sometimes called higher invertebrates.

in vitro: Literally, 'in glass' meaning in the laboratory.

in vivo: Literally, 'in the living organism'.

Isochromosome: Abnormal chromosome composed of two identical arms due to division transversely through the centromere. Most commonly occurs in chromosomes 13-15, 21-22 and X.

Isolate breaking: As opposed to Wahlund effect, this phenomenon occurs when two isolated populations start interbreeding, which causes a temporary excess of heterozygotes if the two populations have different allele frequencies.

Iterative evolution: Repeated origination of lineages with generally similar morphology at different times in the history of a clade.

Karyotype: A photomicrograph of metaphase chromosomes arranged in standard order. Normal human karyotype consists of 46 chromosomes, of which 44 are somatic (autosomal) and 2 are sex chromosomes.

Kinase: An enzyme that transfers a phosphate group from a donor molecule such as ATP or ADP, to a serine, threonine, or tyrosine residue on an acceptor protein.

Kingdom: The major taxonomic group in the current classification of living organisms with the exception of informal division of prokaryotic and eukaryotic empires. The five Kingdoms are Monera, Protoctista, Fungi, Plants and Animals. In the late 1980s Cavalier-Smith proposed that within the Eukaryota there are six kingdoms: Archezoa, Protozoa, Chromista, Plants, Fungi, and Animals (see also taxonomy). Link to Kingdoms of Life.

Kozak sequence: In some viral mRNAs, the consensus sequence surrounding the initiating AUG 5' ACCAUGG 3' . It facilitates ribosomal binding and therefore, protein synthesis. The most consistent position is located three nucleotides before the initiation codon (ATG) and is almost always an adenine nucleotide (see also Shine-Dalgarno sequence).

lac operon: A structural unit in the E.coli genome that consists of three structural genes (encoding different enzymes involved in sugar metabolism) transcribed together and their common promoter and operator genes. Provides a good model for studying the interactions between promoters and repressors.

Last male sperm precedence: A situation that results in fertilization of the ovum by the sperm of the last male in a multiply inseminated female. This is due to sperm incapacitation by the semen and sperm displacement. This well-documented form of sperm competition is best known in Drosophila.

Latency (viral): The state of viral infection in which the virus exists in host cells without reproducing itself. This is slightly different from viral persistence when basal replication continues.

Leader sequence: A sequence at the 5' (N-terminal) of the DNA and mRNA that leads the newly synthesized mRNA to the ribosome (it is not translated). It is also used to mean the signal sequence, which is translated but is subject to post-translation cleavage when the final destination is reached or following secretion.

Ligase: An enzyme which is of vital importance in recombinant DNA technology. It joins nucleotides together by a phosphodiester bond between the 5'-P end of a polynucleotide chain and the 3'-OH end of another one.

LINES (long interspersed elements): One of the abundant intermediate (6 to 7 kb) repeat DNA sequences in mammals (see also SINES).

Linkage: The tendency of 'genes' on the same chromosome to segregate together. This means that linked genes are transmitted to the same gamete more than 50% of the time. Genetic linkage reflects a lack of meiotic crossovers between two genes one of which is usually a latent/unknown disease locus. A number of software is available to analyze linkage in pedigree data, most commonly used ones are Linkage, Genehunter and Allegro (Genetic Analysis Software List and A Survey of Current (2003) Software for Linkage Analysis by F Dudbridge). See exercises on Gametes under Linkage and Linkage Pedigrees. For a general review, see genetic linkage in Kimball’s Biology and Tutorial by F Clerget-Darpoux. See also Introduction to Genetic Epidemiology.

Linkage disequilibrium (LD): The tendency for two 'alleles' to be present on the same chromosome (positive LD), or not to segregate together (negative LD). As a result, specific alleles at two different loci are found together more or less than expected by chance. The same situation may exist for more than two alleles. Its magnitude is expressed as the delta (D) value and corresponds to the difference between the expected and the observed haplotype frequency. It can have positive or negative values. LD is decreased by recombination. Thus, it decreases every generation of random mating unless some process opposing the approach to linkage equilibrium. Permanent LD may result from natural selection if some gametic combinations result in higher fitness than other combinations. See also Basic Population Genetics

Living fossil: An extant species which is morphologically very similar to a species from the ancient past. Despite apparent lack of change, they seem to have escaped extinction. Coelacanth (a 350 million-years-old lobe-finned fish), Horseshoe Crab (a 510 million-years-old marine arthropod), Amazon River Dolphin, Gingko (maidenhair tree, a gymnosperm), and Metasequoia (Metasequoia glyptostrobodes, a conifer) are examples.

Locus: The position on a chromosome occupied by a particular gene (plural: loci). For information and official nomenclature of human genetic loci, see Entrez Gene, UniGene, GenAtlas; for mouse genetics nomenclature, see the guideline and tutorial at the Jackson Laboratory.

Locus heterogeneity: Involvement of different genes in the pathogenesis of a genetic disease (same as genetic heterogeneity). The phenotype is usually clinically indistinguishable. For example, congenital adrenal hyperplasia may be caused by mutations in CYP21A2 or CYP11B1. See also Clinical Genetics.

Lod score: Logarithm (10) of the odds favoring linkage obtained from the statistical analysis of linkage. The lod score (Z) of +3 means 1000:1 odds of linkage and is considered evidence for linkage. A lod score of -2 is odds of 100:1 against linkage (does not apply to sex-linked diseases).

Loss-of-function mutation: A mutation that results in the loss of production and/or function of a protein. For total loss of function homozygosity would be necessary unless haploinsufficiency is operating. An example is the loss of function mutations in tumor suppressor genes. See also gain-of-function mutations.

Loss-of-heterozygosity (LOH): Refers to the disappearance of polymorphic marker alleles when constitutional DNA and tumor DNA from cancer patients are compared. The consequence is usually genomic deletion discarding the normal copies of tumor suppressor genes. Such deletion (or functional deletion through methylation) may uncover existing mutations in the homologue copy.

Lyon hypothesis: The proposition by Mary F Lyon that random inactivation of one X chromosome in the somatic cells of mammalian females is responsible for dosage compensation and mosaicism. When randomization is skewed, an X-linked recessive disease may be seen in a female (see OMIM 300087). Also, discordance in female twins for an X-linked trait may be due to atypical Lyonisation. When a part of X-chromosome is translocated to an autosomal chromosome, it escapes inactivation. Not all genes on X-chromosomes are inactivated, some escape from inactivation. These are the genes for which sex difference in gene dosage is not a problem and is tolerated. See Evolution of Sex Chromosomes / Figure 14.12 in Human Molecular Genetics. See also XIST.

Major histocompatibility complex (MHC): A genetic complex of vertebrates consisting of around 400 genes, including the extremely polymorphic cell surface molecules called HLA in humans and H-2 in mice. These molecules provide an immunological marker for selfness and a genetic self-identity to the individual. This information is used in mate choice, union of gametes, maintenance of pregnancy, and immune response against nonself (including a transplanted graft). These molecules are the most polymorphic one expressed proteins in vertebrates. The polymorphism arises from point mutations not at an unusually higher rate than other genes, and mainly from interallelic gene conversion events. The polymorphism is maintained through pathogen and non-pathogen driven mechanisms via heterozygote advantage (overdominant selection) and negative frequency dependent selection (perhaps also by other types of balancing selection like sexual antagonism and antagonistic pleiotropy). A 3.6 Mb long human MHC haplotype and the 92 kb chicken MHC have been totally sequenced (see Nature 1999(Oct 28);401:921-925). See also MHC Haplotype Project; Gene Map of MHC; dNCBI dbMHC.

Male disadvantage: The phenomenon that males are biologically inferior to females. This is evident by the huge male loss prenatally (see primary sex ratio), greater vulnerability of males to diseases in childhood (including infections and cancer) and adulthood (Dorak & Karpuzoglu, 2012; Markle, 2014); and subsequently decreased life span in males (Shettles, 1958; Kraemer, 2000; Stevenson, 2000; Vatten, 2004). See also Gender Effect.

Mammals: One of the eight Classes in the Phylum Chordata which contains approximately 4500 species in 15 Orders. In mammals, the fertilization of the egg is internal, the young develops within the body of the mother, and is fed by milk produced by the mammary glands. The mammals are warm-blooded and the body is covered with hair. In Mammals, female is the heterogametic sex (XY) and thus male-to-male competition is the predominant form of sexual selection.

Manifesting heterozygotes: A heterozygote for a recessive autosomal gene mutation or a female heterozygote for a recessive sex-linked gene mutation who may even have the same phenotype as homozygotes. Manifesting heterozygotes usually have a milder form of the phenotype and may only have biochemical signs without clinical phenotype. This situation is an exception rather than a rule but occurs in a proportion of heterozygotes for most major autosomal recessive disease genes: CYP21A2 (Witchel, 1997), HFE (Bulaj, 1996; Burt, 1998), CFTR (Super, 1999), ATM (Fearon, 1997; Scott, 2002) and McArdle disease (Manfredi, 1993) are among the examples. See Medline, OMIM and Google searches for manifesting heterozygotes; see also Clinical Genetics.

Mapping: Determining the physical location of a gene or a genetic marker on a chromosome. Used to be achieved by linkage and association studies besides other methods but genome projects have mapped more or less all genes in respective genomes.

Maternal effect: A component of environmental variance in quantitative genetic studies. It is a combination of prenatal and postnatal, mainly nutritional, influences on the young. An example is that large mice give more milk to their offspring and they become larger than others. This is not a genetic but an environmental effect. Maternal effect also causes resemblance among the offspring of the same mother.

Maternal effect lethals: One form of selfish/parasitic DNA that facilitates its own propagation. They are post-zygotic distorters that kill progeny lacking the factor. Medea in beetles and Scat in mice are the known examples. Progeny of the heterozygous mothers that are homozygous for wild-type are killed. Progeny carrying a copy of the lethal are actually protected.

Maternal inheritance: Diseases due to mutations in mtDNA are transmitted only by mothers because all mitochondria are inherited via the egg. Thus, all offspring of an affected female are at risk of inheriting the abnormality, whereas no offspring of an affected male are at risk. Clinical manifestations are variable and may be due to variable mixtures of mutant and normal mitochondrial genomes (heteroplasmy) within cells and tissues (see Clinical Genetics).

Mating type: Genetically determined characteristics of bacteria, ciliates, fungi and algae, determining their ability to conjugate and undergo sexual reproduction with other members of the species. In yeasts (S.cerevisiae), which have only two types, only cells of opposite types can conjugate. The common mushroom Schizophyllum has more than 50,000 mating types (genders) encoded in separate loci. In species where the organelles are inherited uniparentally and reproduction is by the union of gametes, there are only two mating types. In species reproducing via sexual conjugation (nuclear exchange) so that each cell preserves its own organelles, there can be multiple types.

Meiosis: Cell division with two phases resulting four haploid cells (gametes) from a diploid cell. In meiosis I, the already doubled chromosome number reduces to half to create two diploid cells each containing one set of replicated chromosomes. Genetic recombination between homologous chromosome pairs occurs during meiosis I. In meiosis II, each diploid cell creates two haploid cells resulting in four gametes from one diploid cell.

Melting temperature (Tm): The temperature at which the two strands of a double-stranded DNA molecule come apart. A short (<18 nucleotides) oligonucleotide's Tm value (0C) is estimated by the formula: Tm = [(number of A+T) x 2 + (number of G+C) x 4].

Mendelian inheritance: Inheritance of traits mediated by nuclear genes (as opposed to mitochondrial DNA) according to the laws defined by Gregor Mendel.

Mendelian randomization: A natural randomization process that occurs at conception to determine a person's genotype. It is possible to use 'Mendelian randomization' to derive an estimate of the association that is free of the confounding and reverse causation typical of classical epidemiology. According to the second law of Mendel (random assignment of genes), the inheritance of one trait is independent of the inheritance of other traits. The distribution of genetic polymorphisms is largely unrelated to the confounders (socioeconomic or behavioral) that distort interpretations of observational epidemiological studies. The basis of Mendelian randomization is best seen in parent–offspring designs that study the way phenotype and alleles co-segregate during transmission from parents to offspring. This study design is closely analogous to that of randomized clinical trials as by Mendelian principles there should be an equal probability of either allele being randomly transmitted to the offspring. Due to Mendelian randomization, genetic association studies are less prone to confounding than conventional risk-factor epidemiology (pleiotropy and linkage disequilibrium can still produce confounding; see Lee & Ho, 2003). Mendelian randomization concept can be used as a tool for epidemiological inference on environmental risk factors by examining the genetic counterpart of a suspected environmental exposure association free of confounding by conventional confounders (see Davey-Smith & Ebrahim, 2003; Khoury, 2004).

Mendel's first law (law of segregation): The two alleles received one from each parent segregate independently in gamete formation, so that each gamete receives one or the other with equal probability.

Mendel's second law (law of recombination): Two characters determined by two unlinked genes are recombined at random in gametic formation, so that they segregate independently of each other, each according to the first law (note that recombination here is not used to mean crossing-over in meiosis).

Metacentric chromosome: A chromosome with its centromere near the center. If the centromere is slightly off-center, the chromosome is said to be submetacentric (see also acrocentric and telocentric).

Metaphase: Mitotic phase at which replicated chromosomes are fully condensed and become visible under the light microscope.

Metazoa: A major division in the animal kingdom consisting of multicellular animals.

Methylation: The addition of a methyl group (-CH3) to DNA. Methylated DNA is inactivated and not transcribed. Most frequently occurs at CpG doublets (see genomic imprinting and CpG islands).

Methylation paradox: Methylation of the CpG islands in the transcribed region is often correlated with transcription but an inverse correlation is seen at the CpG islands at the transcription initiation site (link to a review on Methylation Paradox by PA Jones).

MHC: Major Histocompatibility Complex. H-2 complex in mice, HLA complex in humans.

microRNAs (miRNAs): Small non-protein-coding RNAs that regulate gene expression at the translational level. Changes in their expression levels and activity that may be due to polymorphisms are frequently found in disease states (see Kulkarni, 2011 for an example and Mishra, 2008; Chen, 2008 for reviews). The microRNA database miRBase lists known human miRNAs.

Microsatellite repeat sequences: Sequences of 2 to 5 bp repeated up to 50 times such as a TA dinucleotide repeat polymorphism. Also called short tandem repeat (STR). The variable number of repeats creates the polymorphism. They may occur at up to 100 thousand locations in the human genome. Microsatellites mutate faster than nonrepeat polymorphisms and can be used to estimate evolutionary relationships over shorter time scales (Goldstein, 1995). As multilallelic markers, they provide higher polymorphism information content (PIC) than SNPs (Schaid, 2004). Average length of LD with microsatellites is 100kb which is considerably higher than for SNPs (Bahram & Inoko, 2007).

Mimicry: Resemblance of one kind of organism to another to make the organism difficult to find, to discourage the potential predators, or to attract potential prey. The common kinds of mimicry are Batesian and Mullerian mimicry (see Evolutionary Biology Notes). See also molecular mimicry.

Missense mutation: A mutation that causes the substitution of one amino acid for another (non-synonymous change). An example is the major HFE mutation C282Y in which results in an amino acid change at position 282.

Missing link: An absent member needed to complete an evolutionary lineage.

Mitochondrial DNA (mtDNA): The maternally inherited nucleic acid found in cytoplasm whose homologue in plants is chloroplastic DNA. This small circular DNA mostly codes for Mt-tRNAs, rRNAs and ATP synthases (as well as NADH dehydrogenase and cytochrome oxidase subunits). It is more closely related to bacterial DNA than to eukaryotic nuclear DNA. Mitochondrial DNA mutates 10-20 times faster than nuclear DNA. mtDNA is much more abundant than nuclear DNA and this is why most ancient DNA studies use (or can only use) mtDNA. See the map of mitochondrial DNA at NCBI Map Viewer.

Mitochondrial diseases: A set of diseases resulting from mutations in mitochondrial DNA. These diseases affect most frequently tissues that depend heavily on oxidative phosphorylation (the heart and nervous system). See Clinical Genetics.

Mitosis: Cell division into two identical daughter cells with the same chromosome number as the mother cell (see also meiosis). Replicated chromosomes separate and each chromatid goes to a daughter cell.

Mitotic recombination: During mitosis, sister chromatids freely exchange pieces without changing anything in genetic material because they are identical. Very rarely, and by chance, homologous chromosomes come very close to each other and exchange material as in meiosis which results in a recombinant chromosome.

Mixed lymphocyte reaction (MLR): The activation of T cells in vitro by other (allogeneic) lymphocytes due to differences in their MHC molecules.

Molecular mimicry: Resemblance of a DNA sequence or a polypeptide by an unrelated sequence at the nucleotide or amino acid level, respectively. Mimicry of MHC proteins is an immunoevasion mechanism used by pathogens.

Monoallelic expression: Expression of one of the homologous genes due to random allelic inactivation (autosomal genes), parental imprinting or X-inactivation. See Rhoades, 2000 for details.

Morgan or centiMorgan: See recombination.

Mosaicism: Mosaicism is the presence of more than one cell lines differing in genotype or karyotype but derived from one zygote. Post-zygotic new mutations result in mosaic individuals who may not be clinically affected themselves, but are at risk of bearing multiple affected offspring. Mosaicism is well recognized in Duchenne muscular dystrophy and in autosomal dominant disorders with high new mutation rates (see Clinical Genetics).

mRNA: Messenger RNA. It is the first product of the DNA transcription by RNA polymerase. mRNA forms 1-5% of the total cellular RNA. Its molecular weight is generally less than 2x106. At any time, there are about 105 species of mRNA in a cell.

mRNA expression profile: The identities and absolute or relative levels of mRNAs in a specific cell/tissue type in a given physiological, developmental or pathological state.

Multifactorial liability threshold model: See Falconer's multifactorial liability threshold model.

Multivariate analysis: Methods to deal with more than one related 'outcome/dependent variable' (like two outcome measures from the same individual) simultaneously with adjustment for multiple confounding variables (covariates). Unfortunately, the word 'multivariate' is most frequently used instead of 'multivariable' analysis (which means multiple independent/explanatory variables but one outcome/dependent variable; see also Peter TJ, 2009)). See an Online Multivariate Statistics Book, Multivariate Statistical Methods: A Primer by BF Manly and MultiVariate Statistical Package-MVSP. See also Biostatistics Glossary.

Mutation: Any heritable change (not only point mutation) brought about by an alteration in the genetic material. Includes gene conversion, deletion, duplication, insertion and so forth. Mutation is preferred to polymorphism to describe a disease causing gene variation regardless of its frequency. Link to Human Gene Mutation Database (Cardiff, UK) and Description of Sequence Changes for mutations.

Mutation pressure: Evolution by different mutation rates alone.

Mutation rate: The number of mutations at a particular locus, which occur per gene per cell generation. This is the only source of variation in asexual organisms. The mutation rate is the likelihood of parentage when findings suggest otherwise. Beware of the different units in different mutation rates. In humans, the mutation rate is 1 bp per 109 bp per cell division. This corresponds to 10-6 mutations per gene per cell division and because there are 1016 divisions in a lifetime, 1010 mutations per gene per lifetime.

Mya: Million years ago.

National Center for Biotechnology Information (NCBI): Established in 1988 as a national resource for molecular biology information as part of NIH, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information. NCBI Global Search and NCBI Tools for Bioinformatics Research.

 

Natural killer (NK) cell: Bone marrow-derived, mononuclear white blood cells (large granular lymphocytes) that are able to kill invading microorganisms without activation by cells of the immune system. They are, therefore, part of the innate immune system. They are specialized in killing virus-infected cells and cells transformed to develop cancer (see NK Cell Receptors; KIR Gene Cluster by Carrington & Norman).

Natural selection (Darwin's definition, 1859): "As many more individuals of each species are born than can possibly survive; and as, consequently, there is a frequent recurrent struggle for existence, it follows that any being, if it varies however slightly in any manner profitable to itself, under the complex and sometimes varying conditions of life, will have a better chance of surviving, and thus be naturally selected." Link to a simulation on Natural Selection.

Nature-nurture debate: The debate on the relative contributions of genetics (nature) and environment (nurture) to the characteristics of an organism. An example is the debate on whether gene(s) and/or environmental factors determine the sexual orientation of an individual. Finding a gene playing a role in the development of a condition does not necessarily mean it is a purely genetic trait.

Negative assortative mating: A type of nonrandom mating in which individuals of unlike phenotype mate more often than predicted under random mating conditions.

Neurospora crassa: Haploid, heterothallic, filamentous Ascomycete fungus (bread mold). It has two mating types (A and a) operating as sexual compatibility system, and 11 het loci operating as heterokaryon compatibility system in vegetative phase. Link to Neurospora website.

Neutral theory: Originally proposed by Kimura (1969), neutral theory suggests that much of the variation at the molecular level is due to the interaction between drift and mutation rather than being actively maintained by selection. Link to a tutorial on Neutral Theory of evolutionary change and a lecture note on The Neutral Theory of Molecular Evolution. For reviews on neutrality tests, see Simonsen, 1995, Otto, 2000, Nielsen, 2001, Luikart, 2003, Harris & Meyer, 2006 and Basic Population Genetics.

Nick translation: A nick is a point in a double-stranded DNA molecule where there is no phosphodiester bond between adjacent nucleotides of one strand only. DNA polymerase replaces a DNA strand beginning from a nick. From this point it replaces the 'nicked' strand by its exonuclease activity while polymerizing the DNA strand using the nondigested strand as template. This is called nick translation while it is not actually a translation event in classical sense.

Nomenclature reports: Any report of a human genetic study should conform to the requirements of HUGO Gene Nomenclature Committee - Guidelines and HGVS - Nomenclature for the Description of Sequence Variations (see also den Dunnen & Antonarakis, 2000). For mouse genetics nomenclature, see the guideline and tutorial at the Jackson Laboratory.

Non-coding region: Parts of a gene that include sequences, which are not translated. Both 5' and 3' untranslated regions (UTRs), upstream promoter region and introns are classified as non-coding regions. Intergenic regions are considered to be non-coding regions but they may code for non-coding RNAs with regulatory functions.

Non-coding RNA: Functional RNAs that are not translated into a protein. It now appears that most of what used to be called junk DNA code for non-coding RNA that is involved in various regulatory functions. They include small nucleolar (sno) RNAs, antisense riboregulator RNAs and RNAs involved in X-dosage compensation. Classes of ncRNAs include: tRNA (transfer RNA); Mt-tRNA (transfer RNA located in the mitochondrial genome); rRNA (ribosomal RNA); scRNA (small cytoplasmic RNA); snRNA (small nuclear RNA); snoRNA (small nucleolar RNA); miRNA (microRNA precursors); misc_RNA (miscellaneous other RNA); lincRNA (long intergenic non-coding RNAs). See Guttman, 2009; Eddy, 1999, 2001 & 2002; Kelley, 2000; ENCODE Project website; non-coding RNA characterization.

Non-disjunction: Due to failure in separation of homologous chromosomes in meiosis, the two members of one pair migrate to the same pole, giving rise to unbalanced gametes, one of which contain both homologous chromosomes, and the other none (most frequent in sex chromosomes). The non-disjunction event is much more frequent in maternal meiosis I. This may be due to the fact that in a mature woman, oocytes have been held in the ovary for a very long time at prophase I of meiosis from before her birth to shortly before ovulation of the oocyte in question.

Non-overlapping generation model: This population biology model assumes death of all members of a generation (in the cycle of birth, maturation and death) to die before the next generation reaches maturity. This assumption is necessary to (mathematically) simplify the models.

Nonsense mutation: A mutation that changes an amino acid specifying codon to one of the three termination (stop) codons.

Northern blotting: Similar to Southern blotting, a technique in which RNA fragments are size separated by electrophoresis, transferred to a nitrocellulose/nylon membrane, hybridized to a biochemically or radioactively labelled DNA probe complementary to the desired sequence, and visualized by autoradiography. The technique can therefore be used to locate and identify an RNA fragment containing a specific sequence. See Gene Expression.

Notochord: A rod that forms in the embryonic mesoderm and which establishes the front-to-back orientation of vertebrate embryos. It also initiates the formation of the nervous system, the skeleton and most muscles.

Nuclease: An enzyme that breaks bonds in nucleic acids. Deoxyribonuclease (DNAase) and ribonuclease (RNAase).

Nucleoid: The loosely tangled clump of DNA within the cytoplasm of a prokaryotic cell.

Nucleolar organizer: A region on a chromosome that is associated with formation of a new nucleolus following cell division. It contains the genes for several species of ribosomal RNA (rRNA), i.e., 18S, 5.8S, 5S and 28S in eukaryotes.

Nucleolus: The most prominent of subnuclear structures, which has a well-established role in ribosomal subunit assembly. See Olson, 2002.

Nucleoside: A small molecule composed of a purine or pyrimidine base linked to a five-carbon sugar (pentose: ribose or deoxyribose). With the addition of a phosphate group, it becomes a nucleotide. Nucleosides in RNA are adenosine, guanosine, cytidine and uridine; in DNA, they are (d)adenosine, (d)guanosine, (d)cytidine and (d)thymidine.

Nucleosome: A bead-like structure of eukaryotic chromosomes. It consists of a core of eight histone molecules and a DNA segment of about 150 base pairs. Each nucleosome is separated from another by a linker DNA sequence of about 50 base pairs. Nucleosome structure helps to fold DNA into a compact form in the interphase nucleus. Otherwise, the length of a chromosome, when linear, is many orders of magnitude greater than the diameter of the nucleus.

Nucleotide: The monomeric unit that makes up the DNA or RNA, formed by a phosphate group, a pentose and one of the nitrogenous bases or nucleobases (A, T/U, C, G). Nucleotides in RNA are adenylate, guanylate, cytidylate and uridylate; in DNA, they are (d)adenylate, (d)guanylate, (d)cytidylate and thymidylate. What is sometimes referred to as the 5th nucleotide is the methylated cytosine (Cm). Nucleotides are joined each other by the phosphate. Nucleobases are bound to the sugar molecule in each strand, and in the case of DNA, face each other inside the backbone of DNA made up of phospate-deoxyribose chain.

Obligate carrier: An individual who must carry a recessive mutation based on analysis of the family history. This definition usually applies to disorders inherited in an autosomal recessive and X-linked recessive manner such as parents of a child with an autosomal disease or mothers of boys with an X-linked disease. Parents of a child homozygote for a recessive disease gene are not necessarily (both) carriers as the mutation may have occurred in germ cells (de novo mutation). See Clinical Genetics.

Okazaki fragment: The small (<1 kb), discontinuous strands of DNA produced using the lagging strand as template during DNA synthesis. DNA ligase links the Okazaki fragments to give rise to a continuous strand.

Oligonucleotide: A short, synthetic DNA string used as a probe (as in SSOP or real-time PCR) or primer (as in SSP) in molecular genetic studies.

Oligonucleotide ligation assay (OLA): A PCR-based method for SNP typing. It is a ligase-mediated gene detection system which uses exact 3' matching of a primer to one of the SNP alleles. If this happens, the other labelled oligonucleotide which binds to the nucleotide immediately next to the SNP on the other side would be joined to the primer by ligase. The resulting sample can then be tested for the presence of the label (for example biotin). Unless controls are included, false positives are possible (link to a book chapter explaining OLA: Section 8.2.2.3; OLA Protocol by CSH Protocols).

Oncogene: An activated proto-oncogene that is capable of causing malignant transformation. See Cancer Genetics.

One gene - one (enzyme) polypeptide hypothesis: The hypothesis that each gene controls the synthesis of a single polypeptide which may be a subunit of a complex protein. This is no longer valid since due to post-transcriptional (mainly alternative splicing) and post-translational modifications, the same sequence of the gene may produce hundreds of different peptides.

Oocyte: Female sex cell which undergoes meiosis and produces an egg (ovum).

Open reading frame (ORF): A nucleotide sequence encoding a polypeptide starting with a start and ending with a stop codon.

Operon: A type of genetic unit which consists of one or more transcription units that are transcribed together into a polycistronic mRNA. The transcription of each operon is initiated at a promoter region and controlled by a neighboring regulatory gene (an operator which binds to a repressor or an apoinducer, to repress or induce the transcription, respectively). An example is the lac operon of E.coli.

Ori: The origin of replication in prokaryotes.

Orthology: Being homologous by descent between species. In other words, descendants from a common ancestor. An example is the MHC class II genes in different species that all descended from a common ancestral class I gene. See also homology.

Overdominance: See balancing selection. See also underdominance.

Paralogy: Being homologous due to a recent or past duplication within the same species. An example is the chromosomal regions 1q22-q23;6p21.31; and 9q33-q34 in humans. These regions contain very similar genes including some of the MHC class III genes. See also homology.

Paramecium: A unicellular Protoctist belonging to the group Ciliates. Although normally reproduces asexually, they also undergo sexual conjugation in which mating types play a role. Paramecium aurelia has 34 hereditary mating types that form 16 distinct mating groups.

Paramutation: In paramutation, one paramutable allele of a gene is silenced by another paramutagenic allele. This epigenetic silenced state is then genetically transmissible for many generations. See paramutation in Modern Genetic Analysis.

Parental imprinting: see genomic imprinting.

Parsimony: The scientific convention whereby the simplest explanation is preferred over the others. This is usually a phylogenetic tree requiring the fewest evolutionary steps.

PCR: Polymerase chain reaction. A technique that allows amplification of specific DNA segments in a very short time. See: PCR: Xeroxing DNA, IDT: PCR, Biotechnology, PCR Books and Real-Time PCR.

Penetrance: The proportion of individuals with a given genotype (like heterozygotes for a dominant gene) who express an expected trait, even if mildly. If a disease gene does not cause the disease in all of its carriers, its penetrance is low [not to be mixed with variable expression]. BRCA1 mutations show both age-dependent penetrance and overall reduced penetrance; lifetime risk for a female mutation carrier being estimated to be around 70%. Breast cancer is also an example of an autosomal condition where penetrance is sex-dependent. While male mutation carriers can develop breast cancer (particularly with BRCA2 mutations), females are at much greater risk. HFE mutations have very low penetrance, which is age and sex-dependent.

Phenocopy: A non-genetic condition resembling a genetically determined one. Such conditions confound the interpretation of pedigrees and therefore, genetic counseling. Some teratogens may cause congenital anomalies mimicking genetically caused anomalies (thalidomide syndrome vs phocomegalia). Deafness is another example of phenocopy which may be genetic of non-genetic.

Phenotype: The visible or measurable (i.e., expressed) characteristics of an organism (see genotype).

Phylogenetic footprinting: The use of phylogenetic comparisons to reveal conserved functional elements.

Phylogenetics: Study of reconstructing evolutionary genealogical ties between taxa and line of descent of species or higher taxon.

Phylogeny: An evolutionary tree showing the inferred relationships of descent and common ancestry of any given taxa. Link to the Tree of Life, Spectrum of Life, Lecture on Tree Construction, Freeware Phylogenetic Data Analysis Download Pages: Phylogenetics Software, Phylogeny Programs.

Plasmid: A transferable extrachromosomal genetic element found in some bacteria. They are up to 200 kb long, double-stranded, circular DNA molecules. They replicate independently of the bacterial chromosomes and usually confer an advantage to the bacteria (such as antibiotic or heavy metal resistance). Plasmids are popular vectors in recombinant DNA technology. They can carry up to 10 kb foreign DNA.

Pleiotropy: More than one effect of a gene on the phenotype. The effects may occur simultaneously or sequentially. An example is the determination of the color pattern and the shape of the eyes by a single allele in Siamese cats. Another example is the DNA repair enzymes, which have several other functions (transcription, cell cycle regulation, regulation of gene rearrangements).

Point mutation: A single nucleotide change in the DNA sequence. Even if it is in the coding region of a gene, it may or may not change the amino acid sequence. The rates of point mutations for MHC genes are not unusually high. The extensive MHC polymorphism results from their accumulation over many millions of years of transspecies evolution.

Poly (A) tail: A sequence of 20 - 250 adenylic acid residues which is added to the 3' end of most eukaryotic mRNAs. It increases their stability by making them resistant to nuclease digestion. Not all mRNAs have poly-A tails (for example, histone and snRNA genes do not have a signal for poly-A tail).

Polygenic: Traits controlled by two or more genetic loci. They are usually influenced by environmental factors as well (multifactorial).

Polymorphism: Presence of discreetly different forms of a gene or a character. It is defined as a variation that exists in the population in at least two versions, neither of which occurs at a frequency of less than 1%. Polymorphism at a genetic locus is due to either balanced polymorphism (heterozygous advantage, frequency-dependent selection) or unequilibrium states (temporary polymorphism) as occurs during frequency-dependent selection and genetic drift (alleles becoming fixed or extinct).

Polyploidy: The situation in which the organism has more than two sets (2n) of chromosomes. It could be 3n, 4n or more. A common situation in earthworms and plants. About half of angiosperms are polyploid. It arises as a result of meiotic irregularities and gives rise to sterile progeny, which can still reproduce asexually. The original South American potato is a tetraploid (4n). Many of the common food plants are polyploid as it leads to larger flowers and fruits (as well as larger cells, thicker and fleshier leaves). The wheat now grown for bread T. aestivum is hexaploid (6n = 42 chromosomes). Polyploidy is a common mechanism for sympatric speciation which played a role in angiosperm evolution. Link to a mini-essay on polyploidy.

Population biology: The study of the patterns in which organisms are related in time and space. It is a combination of disciplines including population genetics, ecology, taxonomy, ethology and others. Link to a population biology website.

Population genetics: The branch of genetics that deals with frequencies of alleles and genotypes in breeding populations. It also deals with selective influences on the genetic composition of the population (links to freeware population genetic data analysis software: Arlequin v3.01, PopGene, GDA, Genetix, Tools for Population Genetic Analysis, GenePop, GeneStrut, PyPop, SGS, GenAlEx; WinPop, Quanto, features of data analysis software; lectures on population genetics). See also Basic Population Genetics.

Population stratification:  An example of 'confounding by ethnicity' in which the co-existence of different disease rates and allele frequencies within population sub-sections lead to an association between the two at the population level. Case-control association studies can still be conducted by using genomic controls (Devlin, 1999; Pritchard, 1999) even when population stratification is present. The software STRUCTURE and STRAT can be used to analyse case-control data with genomic control.

Position effect: A difference in phenotype that is dependent on the position of a gene or a group of genes, often caused by heterochromatin nearby. Thus, the change in a gene's location may cause a change in its expression (a problem that has to be overcome in gene therapy). See Kleinjan, 1998 for a review of position effect in human genetic disease.

Positive assortative mating: A type of nonrandom mating in which individuals of similar phenotype mate more often than predicted under random mating conditions.

Phosphorylation (of proteins): The addition of a phosphate group onto serine, threonine or tyrosine amino-acid residues of proteins to change their activity. Protein kinases are responsible for phosphorylation reactions. Kinase activities, therefore, phosphorylations have crucial regulatory effects on a variety of biological processes including protein complex formation, cell signaling, cytoskeleton remodeling and cell cycle control. Even kinases phosphorylate each other to modify their functions. The result is a complex web of regulatory relations (biological networks).

Post-translational modifications: Cleavage of amino terminal peptide, hydroxylation and oxidation of amino acids in the polypeptide chain for cross-linking, covalent modifications by acetylation, phosphorylation and glycosylation, citrullination (conversion of arginine (Arg) in a protein into citrulline (Cit)) or or carbamylation (conversion of lysine (Lys) to homocitrulline or e-carbamyl-lysine) are some of possible post-translational modifications. See Gene Expression.

Predisposition gene: A gene (variant) that is necessary and sufficient to cause a disease. This is different from a 'susceptibility gene' which only increases the risk for a disease but is neither necessary nor sufficient for its development. See also predisposition gene.

Preimplantation genetic diagnosis (PGD): PGD allows genetic analysis to be performed on early embryos prior to implantation and pregnancy. See an e-Medicine article on PGD.

Pre-mRNA (precursor mRNA): The primary transcript and intermediates in RNA processing that yield functional (mature) mRNA.

Prenatal diagnosis: Diagnosis of single-gene diseases or other genetic abnormalities in high-risk pregnancies using DNA extracted from cells obtained from amniocentesis at 16-18 weeks' gestation or chorionic villus sampling (CVS) at about 10-12 weeks' gestation. Prenatal diagnosis is not the same as preimplantation genetic diagnosis. See an e-Medicine article on Prenatal Testing

Primary sex ratio: The estimated male-to-female sex ratio at fertilization. Primary sex ratio is generally more than 1 in most mammals; it is estimated to be as high as 110-160 males to 100 females in humans (see for example: Tricomi, 1960; Serr &  Ismajovich, 1963; Shettles, 1964; Lee & Takano, 1970; McMillen, 1979; Kellokumpu-Lehtinen & Pelliniemi, 1984; Vatten, 2004).

Primase: An enzyme that makes the RNA primer required by DNA polymerase in DNA replication. See also primosome.

Primer: A short nucleic acid molecule which, when annealed to a complementary template strand, provides a 3' terminus suitable for copying by a DNA polymerase.

Primosome: The mobile complex of helicase and primase that is involved in DNA replication.

Principal component analysis (PCA): In genetic epidemiology, PCA is used to detect population stratification in genome-wide association studies (Price, 2006). It is implemented in a software program called EIGENSOFT.

Prion: An infectious agent, which does not have any nucleic acid (but just protein). Responsible for scrapie in sheep, kuru and Creutzfeldt-Jakob disease in humans.

Processed pseudogene: A pseudogene, derived from a retrotranscript of mRNA of any expressed gene and inserted back into the genome. A processed pseudogene is intronless, usually flanked by the repeat sequence (GC/AGCTCTCC), and rich in multiple genetic lesions including substitution, deletion and/or insertion events that modify the reading frames. Because of the lack of its original promoter and the genetic lesions it has accumulated, a processed pseudogene is not normally expressed. For a comprehensive database of pseudogenes, see pseudogenes.org.

Prokaryotic cell: The cell type in which the DNA is not enclosed in a nucleus. Consists of Eubacteria and Archaebacteria. When the cell has a proper nucleus (eu-karyon), it is eukaryotic. See Prokaryotes Tutorial in The Biology Project.

Promoter: Initial binding site for RNA polymerase in the process of gene expression. First transcription factors bind to the promoter which is located 5' to the transcription initiation site in a gene. General and tissue/cell-specific promoters stimulate the expression of a gene under the control of enhancers. See Gene Expression.

Promoter-proximal element: Any regulatory sequence in eukaryotic DNA that is located within 200 bp of a promoter and binds a specific protein to modulate transcription of the associated gene.

Proofreading: In DNA synthesis, the ability of DNA polymerase to recognize mismatched bases. DNA polymerase corrects mistakes with its exonuclease activity. RNA editing is also possible at the mRNA level in some simple organisms.

Proteome: The functional representation of the genome that includes the types, functions and interactions of proteins that are present in a cell. The proteome is not a fixed characteristic of a cell, but variable depending on developmental stage, hormonal status, etc. A full List of OMEs; Human Proteome Organization (HUPO) website.

Proteomics: Proteomics is the study of proteins in aggregate. It applies to the translation from mRNA to primary protein products, and their maturation and modification to yield active proteins as components of a cell, tissue or organism. The collection of proteins in a given cell at a given stage of differentiation is called proteome. See the websites for the Human Proteome Organization (HUPO) and Human Proteome Project (HPP). See also a 'Review of Proteomics with Applications to Genetic Epidemiology' (Sellers & Yates, 2003) and Ahsan & Rundle, 2003.

Pseudoalleles: Genes that behave like alleles but can be separated by crossing over. The eye color genes on the X chromosome of Drosophila are for example closely adjacent but separable loci and not alleles of a single gene.

Pseudoautosomal inheritance: The X and Y chromosomes share a common ancestor. There is a part of X chromosome, which has its homologous counterpart on the Y chromosome. The pattern of inheritance for a gene located on both the X and Y chromosomes may appear to be autosomal. The genes in these segments escape X inactivation (because no gene dosage adjustment is necessary). The major pseudoautosomal region (PAR1) at the tip of the short arms has very high recombination frequency (the sex-averaged recombination frequency is 28% which, for a region of only 2.6 Mb, is approximately 10 times the normal recombination frequency). The high figure is due to the obligatory crossover in male meiosis resulting in a crossover frequency approaching 50%. The minor pseudoautosomal region (PAR2) extends over 320 kb at the extreme tips of the long arms of the X and Y; crossover between the X and Y in this region is not so frequent. The part of the Y-chromosome between the two PARs is called the nonrecombining portion of the Y chromosome (NRY) which is exclusive to Y-chromosome. The genes within PARs of the X and Y chromosomes have a unique segregation pattern that affected sibs will tend to be same sex. See Flaquer, 2008; Evolution of Sex Chromosomes in Human Molecular Genetics and Map Viewer: Y-chromosome.

Pseudogenes: Generally nonfunctional genes generated by nonsense mutation, frameshift mutation, or partial nucleotide deletion rendering a gene with no transcription ability. Some pseudogenes result from retroposition of processed mRNA and lack introns and the regulatory sequences necessary for expression. HLA class I and II pseudogenes (see HLA-DR53 Fact File) and the pseudogene CYP21A1P adjacent to its functional duplicate CYP21A2 are the examples from the HLA complex. Similar to the changing concepts of "junk DNA", it is now becoming accepted that some pseudogenes, if expressed as mRNA, may have activities on transcriptional regulation of other genes. For a comprehensive database of pseudogenes, see pseudogenes.org.

Pseudo-SNP: Ectopic sequence variants (ESVs) and paralogous sequence variants (PSVs) (Estivill, 2002; Cheung, 2003). Pseudo-SNPs are one reason for genotyping errors and the main non-biological reason for violation of HWE (Leal, 2005).

Pulse field gel electrophoresis (PFGE): A form of gel electrophoresis that allows extremely long DNA molecules to be separated from one another.

Quantitative character: A character displaying a 'continuous' phenotypic range rather than discrete classes (characters taking any value within a limit; characters measured rather than counted such as metabolic activity, height, length, width, arm span, body fat content, growth rate, milk production, blood pressure). The genetic variation underlying a continuous character distribution may be the result of segregation at a single genetic locus or more frequently, at numerous interacting loci which produce a cumulative effect on the phenotype (with contributions from the environment). A gene affecting a quantitative character is a quantitative trait locus, or QTL (should be seen as a continuous trait locus). See also Introduction to Genetic Epidemiology.

Quantitative genetics: The statistical study of the genetics of quantitative characters (biometrical genetics) as opposed to Mendelian (discrete) characters. Quantitative genetic characters are those that do not assort in a simple way in crosses. Examples include physiological activity, behavior, size and height. A major task of quantitative genetics is to determine the ways in which genes (QTL) interact with the environment to contribute to the formation of a given quantitative trait distribution (and the estimation of genetic and environmental variance). See also Quantitative Genetics Resources; Quantitative Genetics in Modern Genetic Analysis; and Introduction to Genetic Epidemiology.

Quasidominance: Direct transmission, generation to generation, of a recessive trait giving the impression of dominance. It happens if the recessive gene is frequent or inbreeding is intense.

Quasispecies: The whole population of phylogenetically related (virus) variants observed within a single (infected) individual. Viruses with high mutation rates such as human immunodeficiency virus (HIV) and hepatitis C virus (HCV) form quasispecies. As a comparison, HIV variation within a single infected individual can be as great as the variation of influenza throughout the worldwide-infected population in a flu season. A review on quasispecies by DB Smith, 1997.

R: A programming language and environment for statistical computing and graphics. R is an open platform and offers thousands of programs (called libraries) to achieve a wide variety of statistical, graphical, biological tasks. It has a large number of genetics- and phylogenetics-related libraries. See also "Introduction to Phylogenies in R" and a list of "Phylogenetics Libraries in R". To learn how to “Use R” (not how to program R!), see this self-paced R course and the links within.

Race: Described in population genetics as a geographic subdivision of a species distinguished from others by the allele frequencies of a number of genes. A beautiful discussion that there are no genetically defined races within Homo sapiens can be found in Cavalli-Sforza's book Genes, Peoples, and Languages  (2000).

Random mating: Mating without any preference for mates. One of the assumptions of Hardy-Weinberg equilibrium (nonrandom mating may be due to assortative or disassortative mating).

Real-time PCR: Also called kinetic PCR. Most commonly used for quantitative PCR or for allelic discrimination. See real-time PCR glossary.

Recessive: A trait that is not expressed in heterozygotes (i.e., that can only be expressed in the homozygous state). Most common recessive disease genes are those encoding metabolic enzymes (Jimenez-Sanchez, 2001).

Recombinase: A group of enzymes that catalyze the joining of two DNA molecules after recognizing the recombination sites. See also integrase and transposase.

Recombination (crossing-over): The exchange (reshuffling) of genetic material between a homologous pair of chromosomes during meiosis (see also somatic recombination; sister chromatid exchange). Recombination fraction is the proportion of gametes in which recombination is expected to occur between two loci. This (genetic distance) is usually a function of physical distance between them. The unit of recombination is the Morgan (M), defined as the genetic distance in which exactly one cross-over is expected to occur (1 Morgan = 100 cM) meaning a crossover value of 100%; one cM distance indicates two markers are inherited separately (recombination separates them) 1% of the time. Named in honor of Thomas Hunt Morgan. See also Introduction to Genetic Epidemiology.

Release factor: One of a set of proteins that recognize stop codons on mRNA at the A site on the ribosome, which leads to the release of the completed protein from the tRNA in the P site of the ribosome.

Repetitive DNA: Non-coding DNA sequence blocks repeatedly occurring in chromosomal DNA. They do not normally have any function but those capping the chromosomes prevent the loss of genetic information after each replication (as this would cause a 3' overhang). In human genome, at least 20% of the DNA consists of repetitive sequences.

Replicon: A unit of genetic material, which behaves autonomously during replication of DNA. In bacteria, a whole chromosome is a replicon. In eukaryotes, chromosomes are divided into hundreds of replicons. Each replicon contains a segment beginning with a binding site for RNA polymerase.

Repulsion (trans-arrangement): The condition in which a double heterozygote has received a mutant and a wild-type allele from each parent, e.g., a + / + b (see also coupling).

Residue: A compound such as an amino acid or a nucleotide when it is part of a larger molecule.

Reverse transcriptase: RNA-dependent DNA polymerase.

RFLP (restriction fragment length polymorphism): Genetic polymorphism as revealed by the sizes of fragments generated with a particular restriction endonuclease enzyme (such as EcoRI, PstI, BglII).

Ribosomal RNA (rRNA): A type of RNA that exists in a eukaryotic cell. It has a very slow mutation rate which is useful in phylogenetic analysis of kingdoms and phyla.

Ribosome: A small cytoplasmic organelle that is the site of mRNA translation, thus peptide synthesis. The ribosome converts the triple genetic code in messenger RNA (mRNA) into peptide. It achieves this function with the help of transfer RNAs (tRNAs). Following the decoding of a codon, the mRNA and associated tRNAs must be moved through the ribosome to allow the next codon to be read. New charged tRNA are taken in at the A (aminoacyl-tRNA) site, and newly extended peptidyl-tRNAs are moved into the P (peptidyl-tRNA) site, with the deacylated tRNAs are removed from the exit site in the ribosome.

Ribosyme: RNA molecules with enzymatic activity. Its presence in organelles from plants, yeast, viruses and eukaryotic cells revolutionized the ideas about the origin of life.

Robertsonian translocation: see centric fusion.

RNA (ribonucleic acid): A single-stranded nucleic acid that is found both in nucleus and cytoplasm. Other differences from DNA are: it contains uracil instead of thymine, it is single-stranded, and its sugar molecule is ribose. Total cellular RNA is made up of ribosomal RNA (rRNA, 80-85%), transfer RNA (tRNA, 15-20%) and messenger RNA (mRNA, 1-5%). See also non-coding RNA, small nuclear RNA and heterogeneous nuclear RNA.

RNA interference (RNAi): The use of double stranded RNA to interfere with gene expression. RNAi is usually mediated by approximately 21-nt small interfering RNAs (siRNA). See a review by Zhang & Hua, 2004.

RNA polymerase: An enzyme that transcribes an RNA molecule from the template strand of a DNA molecule. It adds to the 3' end of the growing RNA molecule one nucleotide at a time using ribonucleotide triphosphates (rNTPs) as substrates (this reaction releases pyrophosphates). RNA polymerase I is dedicated to the synthesis of only one type of RNA molecule (pre-rRNA). RNA polymerase II is required for general transcription reactions. RNA polymerase III produces small RNAs such as tRNAs and 5S rRNA.

Saccharomycetes cerevisiae: Unicellular Ascomycete yeast known as the baker's or brewer's yeast. Widely used as a simple eukaryotic model, particularly in recombinant DNA and cell cycle studies as well as in mating type and heterokaryon compatibility studies. It has most advantages of a prokaryotic system but is a true eukaryote. It is considered as the E.coli of the eukaryotes. S.cerevisiae can reproduce both asexually and sexually, and can be cultured in either the haploid or the diploid state. One major advantage of yeast is the ease with which specific gene disruptions, gene replacements, and gene retrievals can be accomplished. Its complete genome was sequenced in 1997 and contains 12,057,500 bp, 6,000 genes in 16 chromosomes. It is used in the creation of YACs. Link to S.cerevisiae website.

SAGE (serial analysis of gene expression): A high-throughput method that uses 10-14 bp-long tags from each cDNA expressed in a cell. The concentration of each tag sequence is proportional to the level of its mRNA in the original sample. This method is used to explore gene regulation in cell populations. See Gene Expression and NCBI SAGE website.

Schizophyllum commune: A fungus species of Basidiomycetes group. It has thousands of mating types in a multiallelic locus, a pheromone receptor system and a pheromone system. Because of this, it has the maximal outbreeding rates (98.8%) in nature.

Sea Urchin: A small spiny marine invertebrate belonging to the phylum Echinodermata. It is a model animal for the study of fertilization and development. Because it is a spawner, its gametes can be obtained in large quantities. Simple mixing of sperms and eggs causes synchronous mitosis and cytokinesis. Furthermore, its eggs are large and clear. Links to Sea Urchin Embryology (Stanford) and Sea Urchin Genome Project (Baylor) websites.

Segregation: The separation of members of a gene pair from each other during gamete formation.

Segregation distortion: Violation of Mendel's first law which results in unequal segregation of a pair of alleles.

Selection differential (S) and response to selection (R): Following a change in the environment, in the parental (first) generation, the mean value for the character among those individuals that survive to reproduce differs from the mean value for the whole population by a value of (S). In the second, offspring generation, the mean value for the character differs from that in the parental population by a value of R which is smaller than S. Thus, strong selection of this kind (directional) leads to reduced variability in the population.

Self-fertilization (also selfing, self-pollination): The fusion of male and female gametes produced by the same (hermaphrodite or bisexual) individual. Self-fertilization allows an individual to create a local population, but it fails to provide variability within a population and limits the possibilities for adaptation to environmental change. Some plants reproduce by self-fertilization but most hermaphroditic animals rarely use self-fertilization, since many of them have adaptations encouraging cross-fertilization.

Self-incompatibility system (SI system): The genetic complex of plants that prevents self-fertilization. See also MHC and mating types.

Sense mutation: A mutation that changes a termination (stop) codon into one that codes for an amino acid. Such a mutation results in an elongated protein.

Sequence divergence: Changes in the DNA or protein sequences of homologous genes in different species due to the independent accumulation of mutations and natural selection since these species shared a common ancestor.

Sequence motif: A short conserved amino acid/nucleotide sequence pattern that represents a specific functional site of a molecule. See Motif Structures of Transcription Factors

Sequence tagged site (STS): A short (200-500bp) genomic DNA sequence with known location and sequence and only occurs once in the human genome. For a catalogue of human STSs, see UniSTS.

Serological typing: Identification of MHC molecules expressed on cells using either naturally occurring antibodies in multiparous women or by alloantiserum raised in animals.

Sex: Formation of new organism containing genetic material from more than a single parent. See Sexual Reproduction.

Sex factors: A National Library of Medicine subject heading (MESH) replacing gender effect. MESH definition is: Maleness or femaleness as a constituent element or influence contributing to the production of a result.

Sex-influenced dominance: A dominant expression that depends on the sex of the individual. For example horns in sheep are dominant in males and recessive in females.

Sexual dimorphism: The existence, within a species, of differences in morphology between the sexes. Examples are greater size in males of gorilla, baboon and elephant seals.

Sexual reproduction: Reproduction requiring the union of sex cells (gametes), which are themselves products of meiotic division. Each offspring has a unique genetic composition due to independent assortment of chromosomes during meiosis, recombination and union of gametes.

Sexual selection: Natural selection operating on factors that contribute to an organism's mating success. Described by Darwin as natural selection in relation to sex. Link to a lecture note on sexual selection.

Sexual antagonism: Action of a gene that differs between two sexes in opposite directions. See also antagonistic pleiotropy.

Shine-Dalgarno (S-D) sequence: An eight nucleotide consensus sequence 5' UAAGGAGG 3' found in bacterial mRNAs five to ten nucleotides before the translation initiation codon (AUG). It is thought to be involved in initiation of translation by helping the mRNA bind to the ribosome (16S rRNA), thus it can be called the ribosomal binding site (see also Kozak sequence). In eukaryotic DNA, there is no such sequence. The 5' cap present on all eukaryotic mRNAs seems to be the first signal to start protein synthesis.

Sibling relative risk: The disease risk for a sibling of an affected individual compared to the disease risk in the general population. One of the simplest indications of a genetic basis for a disease. Ascertainment bias and similar environment effect reduce the validity of this estimate. See Guo, 1997 and Introduction to Genetic Epidemiology.

Sibling species: Two species evolved from a common ancestor and are genetically distinct but morphologically similar.

Signal sequence: A stretch of 13-36 hydrophobic amino acids at the amino-terminal of the nascent polypeptide chain that guides polypeptide translocation through the rough endoplasmic reticulum. It helps the polypeptide to pass through the membrane via interaction with its receptor on the membrane and is usually cleaved off at the other side of the membrane by an endopeptidase. Sometimes used interchangeably with leader sequence. See also Gene Expression.

Signal transduction: A complex multistep pathway by which extracellular signals are transduced from plasma membrane receptors to the transcription machinery in the nucleus and the translation machinery in the cytoplasm, subsequently to regulate cell proliferation and differentiation. The components are growth factors, growth factor receptors, membrane and cytoplasmic tyrosine kinases, GTP-binding (G) proteins, nuclear binding proteins and transcription factors.

Silencer: A DNA sequence which acts in the opposite direction of an enhancer to inhibit the transcription of a gene.

Silent mutation: Base-pair substitution, which alters a codon but does not result in altered phenotype due to the degeneracy of the genetic code (synonymous mutation).

SINES (short interspersed element sequence): An abundant intermediate DNA sequence in mammals about 300 bp long (see also LINES).

Single nucleotide polymorphism (SNP): A single nucleotide change in the DNA code. It is the most common type of stable genetic variation and usually bi-allelic (but can be tri- or quadri-allelic). SNPs may be silent -no change in phenotype- (sSNP), may cause a change in phenotype (cSNP) or may be in a regulatory region (rSNP) with potential to change phenotype. On average, each 1 kb of human genome contains 2-10 SNPs, i.e., one in every 100-500 nucleotides is polymorphic; most frequently a C to T substitution. Functional changes that may be caused by SNPs are gene transcription changes (promoter and intronic enhancer SNPs), truncated protein (nonsense coding region SNPs), structural changes (coding region SNPs), alternative splicing (intronic splice site SNPs), and mRNA stability changes (3’UTR SNPs). Intergenic SNPs can have important regulatory function too (see Bioinformatics Tools).

Single nucleotide variation (SNV): Same as SNP except that this variant does not exist in a population but in an individual (a private mutation).

Sister chromatid exchange: An exchange (crossing-over) of genetic material between the two (identical) chromatids of a chromosome in mitosis (mitotic recombination). Normally, genetic recombination takes place between homologous chromosomes in meiosis I. Sister chromatid exchange may be a sign of chromosomal instability but has no genetic consequences as long as the exchange is the result of an equal crossover.

Site-specific mutagenesis: The use of recombinant DNA technology to create specific deletions, insertions, or substitutions in vitro in a particular gene. This technique allows the production of proteins having any desired amino acid at any position.

Site-specific recombination: The exchange of two specific but not always homologous DNA sequences.

Small nuclear RNA (snRNA): A type of non-coding RNA. Small (90 to 300 nucleotides) RNA molecules that are not directly involved in protein synthesis but may have roles in RNA processing (splicing) and the cellular architecture. There are six types of snRNA: U1 to U6. Their genes do not encode poly(A) tails.

Small nucleolar RNAs (snoRNAs): A type of non-coding RNA. Their functions are modification (methylation and pseudouridylation) of ribosomal RNA, spliceosomal small nuclear RNAs and other cellular RNAs; pre-ribosomal RNA processing reactions; and synthesis of telomeric DNA. Two large families of snoRNAs are called box C/D and H/ACA snoRNAs. Some newly discovered brain-specific small nucleolar RNAs of unknown function are encoded in introns of tandemly repeated units, with paternally imprinted expression. Homologs of snoRNAs exist in the domain Archaea, which suggests an ancient evolutionary origin. See Uliel, 2004; Bachellerie, 2002; Filipowicz, 2002; Kiss, 2002.

Somatic mutation: A mutation that is only present in some somatic cells and not in all body cells or germ cells. These mutations are not inherited, but acquired. Somatic mutations either randomly occur or are induced by environmental assaults (UV light, aflatoxin, benzene etc). All cancer cells contain a lot of somatic mutations not present in adjacent normal cells. See also germline mutation and Cancer Genetics.

Somatic recombination: Rearrangement of genes in cells other than germ cells, which happens to generate the extreme diversity of T-cell receptors and immunoglobulins (see adaptive immunity).

Southern blotting: A technique in which DNA fragments are size separated by electrophoresis, transferred to a nitrocellulose/nylon membrane, hybridized to a biochemically or radioactively labelled DNA probe complementary to the desired sequence, and visualized by autoradiography. The technique can therefore be used to locate and identify a DNA fragment containing a specific sequence. See Gene Expression.

Species: A group of individuals, which can successfully breed with each other to produce offspring that can breed with each other. There are evolutionary, biological, and recognition species concepts.

Speciation: It is now almost universally agreed that the prevailing process of speciation is geographical (allopatric) speciation. (There are also parapatric and sympatric speciation concepts.) According to biological species concept, however, species are defined as aggregation of populations that are reproductively isolated from one another.

Sperm competition: Competition not for access to females but for fertilization of egg. Equivalent to pollen tube competition in plants and a type of sexual selection. See Birkhead's book Promiscuity  (2000) for a detailed study of sperm competition in nature.

Spermatocyte: A male germ cell that undergoes meiosis and produces a haploid spermatid and subsequently a sperm.

Splicing: An event which takes place within the nucleus whereby introns are removed from the precursor mRNA and the exons are joined together as a post-transcriptional modification. See AASsites at HUSAR for analysis of SNP sites for their involvement in splicing (described in Faber, 2011). See Tang, 2013 for a review of alternative splicing and its relevance in disease.

Spore: In plants with alternation of generations, small reproductive bodies capable of giving rise to a new offspring either immediately or after a period of dormancy. It can be produced asexually or sexually. It usually germinates without fusing with another cell. Sexual spores of plants are haploid cells produced by meiosis.

Sporophyte: The diploid (asexual) spore-producing generation in plants with alternation of generations. A sporophyta is typically formed by the union of sexual cells produced by the gametophyta. In higher plants, the sporophyte is the conspicuous plant. In lower plants (such as mosses), the gametophyte is the dominant generation.

SSOP: Sequence-specific oligonucleotide probe. Together with PCR-SSP, commonly used to type classical MHC genes following amplification by PCR reaction (see also serological typing). See Genotyping with PCR by Dorak MT.

SSP: Sequence-specific primer (also known as allele-specific oligonucleotide or ASO). PCR-SSP is a common but not the only method to type classical MHC genes. See Genotyping with PCR by Dorak MT.

Stabilizing selection: Natural selection against extreme deviations from the average (like low and high birth weight).

Stop (termination) codon: Codons that signal the end of a growing polypeptide chain. These are UAA, UGA and UAG. They do not code for any amino acids and are usually shown as (*).

Supercoiled/supertwisted DNA: A closed circular DNA molecule in which the DNA molecule is further twisted on itself to form a more compact molecule. Double-stranded DNA helix (dsDNA) is mostly supercoiled, which may be under- or overwound. RNA polymerase works through this supercoiled DNA during transcription. Left-handed (negative) supercoiling leads to a loosening of the strands of the double helix (underwinding). Positive supercoiling is not seen in vivo.

Susceptibility gene: A gene whose variant is neither necessary nor sufficient to cause a disease, but increases the risk of developing it. These genetic variants are detected by association studies without any evidence for linkage with the disease nor causality. See Greenberg, 1993; Greenberg & Doneshka, 1996. (Weakly penetrant predisposition genes may act as a susceptibility gene.) See also predisposition gene.

Syngamy: The union of the (haploid) nuclei of two gametes following fertilization to form a single (diploid) nucleus for the zygote.

Syngeneic: Genetically identical (isogeneic) members of the same species like monozygotic twins.

Synonymous (silent) base change: A change in the nucleotide sequence that does not cause an amino acid change. Non-synonymous changes replace the amino acid and are called replacement change.

Synteny: Refers to two genomes in which certain groups of linked (syntenic) genes are conserved in similar regional maps. Parts of the mouse chromosome 17 and human chromosome 6 are syntenic. Syntenic genes also mean two genes in close location so that they would cosegregate (in the same linkage group) during meiosis (see Genetic Misunderstandings).

Synthetic theory of evolution: Proposed to explain the transformation of a species by natural selection and for the splitting of a species into reproductively isolated subgroups.

Systematics: Classification of living things with regard to their evolutionary relationship. Link to a lecture on Access Excellence: Systematics.

Systems biology: A branch of biological science that aims to model complex biological systems by using computational algorithms and modeling. It integrates quite heterogeneous data like multi-omics data and analyses interactions among DNA, mRNA, proteins and metabolites. See Kirschner, 2005; Tavassoly, 2018; International Systems Biology).

TATA box (Goldberg-Hogness box): A short nucleotide sequence in the promoter 25 to 35 bp upstream to the transcription initiation (cap) site of eukaryotic genes to which RNA polymerase II binds. The consensus sequence is 5’-TATAA/TAA-3’. The TATA box binds the general transcription factor TFIID (see also CAAT box). See also Gene Expression.

Taxon: Any group of organisms to which any rank of taxonomic name (classification) is applied. Plural: taxa.

Taxonomic hierarchy: All taxa are classified within the following groups (starting from the most inclusive): kingdom, ('division' in plants), phylum, class, order, family, genus, species, subspecies (race). See h2g2: Taxonomy.

T cells: A subgroup of T lymphocytes characterized by having T-cell receptor (TCR) complex and CD3 surface marker. T cells are roughly subdivided into CD4+ helper T cells and CD8+ cytotoxic and suppressor T cells.

Telocentric: A chromosome with a terminal centromere (like chromosome 21 in humans).

Telomerase: A reverse transcriptase (hTERT) containing an RNA molecule (hTR) that functions as the template for the tandem repeat at telomere. It synthesizes telomere to maintain its length after each cell division (even after Hayflick number is reached). It is active in embryonic cells and gametes, inactive in differentiated somatic cells, and reactivated in malignant cells. Telomerase can add one base at a time to the telomeric end of a chromosome. This maintenance work is required for cells to escape from replicative senescence. Telomerase activity is the most general molecular marker for identification of human cancer. See review by Chan & Blackburn (2004) on telomeres and telomerase.

Telomere: The end of eukaryotic chromosome consisting of tandemly repeated sequences. Chromosomes lose about 100 bp from telomere every time the cell divides. The enzyme telomerase can add the lost bases.

Testicular feminization (Androgen Insensitivity Syndrome): An X-linked trait that causes XY individuals to develop into phenotypic females. A mutation causes loss of sensitivity to testosterone (see Clinical Genetics).

Tetrapolar: Used for the mating types of Basidiomycete to describe four distinct ways of interactions between haploid mycelia. These fungi have two mating type loci and there are four degrees of matching: fully compatible at both loci, fully incompatible at both loci, semicompatible (compatible only at locus 1), and semicompatible (compatible only at locus 2). In Ascomycete, the mating type locus is biallelic and mating types are bipolar.

Theta (q): The recombination fraction (in population genetics).

Three-prime (3') end: The end of a DNA or RNA strand with a free 3' hydroxyl group corresponding to the end of transcription (see also five-prime end).

Topoisomerase: A class of enzymes that convert DNA from one topological form to another. During DNA replication, they facilitate the untwisting of supertwisted DNA (see also gyrase and helicase).

Trans-acting gene: A gene acting on or co-operating with another gene on a different chromosome (see also cis-acting gene).

Transcription: As the first step in protein synthesis, transfer of genetic information from the DNA template to RNA molecule mediated by RNA polymerase (followed by translation). A transcription unit is a segment of DNA between the sites of initiation and termination of transcription. It may contain more than one gene.

Transcription factors: Proteins that are directly involved in regulation of transcription initiation by binding to the control elements and allowing RNA polymerase to act. There are ubiquitous transcription factors as well as cell and tissue-specific ones. Several families have been identified including helix-loop-helix proteins, helix-turn-helix proteins, leucine zipper proteins and zinc finger proteins. See also Gene Expression.

Transcription start site: The position in a gene where the RNA synthesis starts. The segment from this point downstream to the translation initiation site is called 5’ untranslated region (5’ UTR). See also Gene Expression.

Transcription unit: The region of DNA that extends between the promoter and the termination codon.

Transcriptome: The identity and expression level of all the genes expressed in a cell population at any given time. A full List of OMEs.

Transduction: Transfer of genes from one bacterium to another by means of a bacteriophage.

Transfection: Addition of foreign DNA into a eukaryotic cell by exposing them to naked DNA (i.e., not in a bacteriophage as in transduction). In bacterial genetics, it is also called transformation.

Transformation: In bacterial transformation, it means the transfer of genes from one bacterium to another in the forms of soluble fragments of DNA; in malignant transformation, it means conversion of normal animal cell state to unregulated growth.

Transgenic: An organism (animal or plant) that contains genes from another species. This is achieved by introducing the foreign gene into the germline.

Transition: A nucleotide substitution between two purine nucleotides (A and G) or between pyrimidines (C and T/U) in DNA or RNA. Transition-type substitution is more common than transversion.

Translation: The process of converting the RNA sequence into the linear sequence of amino acids in a protein product. Translation start site contains a codon (AUG) for methionine but not all proteins start with a methionine as most of the time it is cleaved off post-translationally. See also Gene Expression.

Translocation: Transfer of chromosomal material between chromosomes (usually reciprocal).

Transposase: An enzyme that catalyzes the insertion of a transposon.

Transposon: A long mobile DNA element that moves in the genome by a mechanism involving DNA synthesis and transposition.

Transspecies evolution: The favored type of evolution of the MHC allelic diversity. The age of an allele or an allelic lineage is greater than the species. Therefore, common allelic lineages have been inherited from a common ancestor and species-specific mutational diversification occurred within these lineages. For example, no single MHC class I allele is shared between humans and chimpanzees, but numerous similarities in lineage, polymorphic motifs and individual substitutions can be observed. One consequence is that certain alleles will be more similar to their correspondent alleles in another species than to the other alleles of the same locus in the same species. The long-term persistence of families of MHC alleles whose origins predate speciation events is called 'transspecies evolution'.

Transversion: A mutation caused by the substitution of a purine (A and G) for a pyrimidine (C and T/U) or vice versa in DNA or RNA (see also transition).

Triplet repeat: In this situation, a triplet of nucleotides increases in number within a gene. A mutation especially occurring in central nervous system disorders is the increased number of triplets repeats. Examples include myotonic dystrophy, Huntington disease, Friedreich ataxia and fragile X syndrome. Also in polycystic ovary syndrome, androgen receptor gene has increased number of CAG repeats (Hickey, 2002). Expansion may be greater depending on the transmitting parent (eg, the mother in myotonic dystrophy, the father in Huntington disease); thus, a parent-of-origin effect and genetic anticipation can be observed. Increased number of repeats of a triplet may trigger methylation of the gene that causes the disease (see Mitas M, 1997 for a review). See also Clinical Genetics.

Trisomy: The presence of three copies of a specific chromosome. The most common one is trisomy 21 as the smallest chromosome is 21. Trisomy 18 is also viable. Trisomy 8 can also be seen but only in mosaic form. Trisomies are the most common chromosome abnormalities detected in spontaneous abortions. See also Genes & Chromosomes.

Underdominance: Also called heterozygote disadvantage / heterozygote inferiority / homozygote advantage. This unusual selection process occurs when heterozygotes are less fit than either homozygote. This situation is likely to arise when two adjacent populations are isolated and become homozygous for different alleles, and then come into secondary contact at the borders of their ranges. This is the opposite of overdominance.

Uniparental disomy: Inheritance of both homologues of a chromosome from one parent, with loss of the corresponding homologue from the other parent. Hydaditiform mole is a parental disomy disorder.

3' untranslated region (3' UTR): 3'-untranslated regions of genes are the sequences after the end of the last exon. They are transcribed but not translated. They often contain key regulatory elements involved in posttranscriptional regulation of expression of the message by mediating the stability of mRNA. A high degree of evolutionary conservation in regions of the 3'-UTR suggests the presence of important elements. However, 3' UTRs are subject to weaker purifying selection than coding and promoter sequences and have consequently been found to be less highly conserved in comparisons of orthologous gene regions (Makalowski, 1996).Many of the identified cis acting elements for translational regulation occur within the 3' UTR, and some occur with regularity within certain protein function classes (a review by Pesole, 2001). For a review of 3' UTR variants and genetic predisposition to disease, see Chen, 2006.

5' untranslated region (5' UTR): The short sequence between the transcription initiation site and the start of translation that is retained in mRNA but not translated. It contains the ribosomal binding site (leader sequence) and signal sequence. It is the beginning of exon 1. The 5’ UTRs of most mRNAs contain a consensus sequence of 5’-CCAGCCAUG-3’ involved in the initiation of protein synthesis. Although untranslated, this region may influence the mRNA secondary structure and stability, efficacy of translation initiation, or binding of sequence-specific mRNA-binding proteins (a review by Pesole et al, 2001).

Upstream: Sequences located to the opposite direction to transcription (which runs from 5' to 3' on the sense strand of the DNA). A nucleotide 25 bp upstream to the first transcribed nucleotide is at position -25.

Variable expression: A variation in phenotype between affected members of the same family (i.e. individuals carrying identical mutations). It occurs in many dominant conditions and may be associated with reduced penetrance (see also penetrance).

Variant: Because of the ambiguity in the definitions of mutation and polymorphism, any genetic change is called a sequence variation and such alleles are called variant (see Nomenclature for the Description of Sequence Variations and Cotton, 2001. For mouse genetics nomenclature, see the guideline and tutorial at the Jackson Laboratory.  

Vector: A plasmid, phage, or cosmid into which foreign DNA may be inserted for cloning.

Vertebrates: A subphylum in the Phylum Chordata of the Kingdom Animalia. All members have a notochord and a cranium (skull). Includes the Classes: Fishes, Amphibians, Reptiles, Birds, and Mammals (monotremes, marsupials, placentals). Link to the vertebrates page in the tree of life.

Viroid: A disease-causing agent consisting of only a single-stranded, short (270 to 380 nucleotides long) RNA molecule.

Virus: An entity that is capable of reproducing only by infecting a bacterial or eukaryotic cell. Viruses are incapable of autonomous replication and have to use a host cell's translational system. They consist of a nucleic acid molecule and protein coat. The genetic material of a virus may be DNA or RNA. If it is RNA, it will have to be converted to DNA first by the reverse transcriptase enzyme encoded by the viral nucleic acid. These viruses are called retrovirus.

Wahlund effect: The finding of excess homozygosity (or heterozygote deficiency) in a large sample of population consisted of several subpopulations with different allele frequencies. It is due to differences in gene frequencies in the subpopulations and purely a mathematical complication. The opposite of Wahlund effect is isolate breaking. Link to lectures on Wahlund effect (1)  &  (2) and a simulation. See also Population Genetics.

Western blotting: A technique in which protein fragments are size separated by electrophoresis, transferred to a membrane, and visualized by using labelled antibodies and chemiluminescence or fluorescence. See Gene Expression and Overview of Western Blotting.

Whole genome amplification (WGA): Representational amplification of total genomic DNA to increase the quantity and quality for further studies. WGA improves amplification success with degraded DNA (Holbrook, 2005; Ballantyne, 2006). Reliability, robustness and accuracy of WGA methods in general have been shown in genotyping of highly polymorphic loci such as HLA (Gillespie, 2000; Shao, 2004) and SNP genotyping and sequencing (Dean, 2002; Lovmar, 2003; Hosono, 2003; Alsmadi, 2003; Tranah, 2003Shao, 2004; Yan, 2004; Bannai, 2004; Paez, 2004; Barker, 2004; Holbrook, 2005; Thompson, 2005) and particularly useful in molecular (childhood cancer) epidemiology studies (Zheng, 2001; Yan, 2004). STR genotyping may require a little more attention (Dickson, 2005; Ballantyne, 2006). As long as a minimum of 10 nanogram genomic DNA is used in WGA, SNP genotyping can be accurately performed on whole genome amplified DNA (Lovmar, 2003; Bergen, 2005a; Bergen, 2005b; Holbrook, 2005) with possible exception of loci near the end of chromosomes (Tzvetkov, 2005). Commercially available WGA kits include GenomePlex (OmniPlex PCR-based WGA), REPLI-g (multiple displacement amplification) and GenomiPhi (multiple displacement amplification).

Whole transcriptome amplification (WTA): Representational amplification of total transcriptome. See SIGMA: Transplex whole transcriptome amplification (WTA) kit (manual) and QIAGEN: QuantiTect Whole Transcriptome Kit.

Wild type: The customary phenotype or standard for comparison. Deviants from this type are said to be mutant.

Wobble hypothesis: Hypothesis to explain how one tRNA can recognize two codons varying at the third nucleotide. The third base in the anticodon can pair with more than one base. This is due to the degeneracy of the genetic code, which results in more than one triplet coding for some amino acids.

Wright-Fisher model: The most widely used population genetics model for reproduction. It assumes a finite and constant size (N) and non-overlapping population and random mating. One of the results is that if a new allele appears in the population, its fixation probability is its frequency (1/2N). See a Lecture Note on Wright-Fisher Reproduction and a Presentation on Wright-Fisher Model.

Wright's coefficient of relationship (RC): The degree to which inbreeding has occurred can be mathematically calculated from pedigree analysis. Wright's coefficient of relationship (RC) is the probability that homologous alleles present in different individuals are identical by descent.

Wright's F-statistics: See F-statistics.

W, Z chromosomes: Sex chromosomes in species (like snakes, birds, moths) where the female is the heterogametic sex (WZ).

X-chromosome: One of the sex chromosomes in humans. Females have two copies and males have one copy, which is invariably maternal in origin. See Map Viewer: X-chromosome.

Xenolog: Xenologs are homologs resulting from the horizontal transfer of a gene between two organisms.

Xenopus: An amphibian (frog) who shared a common ancestor with mammals about 350 million years ago. The oldest species in which all three regions of the MHC are linked. Its eggs are very large and have front-to-back orientation even before they are fertilized. See Xenopus website.

XIST: X (inactive)-specific transcript (Gene ID 7513; maps Xq13.2) the initiator of inactivation of X-chromosome. X inactivation is regulated by several factors, including a region of chromosome X called the X inactivation center (XIC). The XIST gene is expressed exclusively from the XIC of the inactive X chromosome. XIC in the human refers to a region on the X chromosome; XIST refers to a specific gene in that region which is necessary for X inactivation but alone is not sufficient. The XIST transcript is spliced but apparently does not encode a protein; it remains in the nucleus where it coats the inactive X chromosome. Mutations in the XIST promoter cause familial skewed X inactivation (OMIM 314670 & OMIM 300087).

Y-chromosome: The male-specific sex chromosome in humans, which is much smaller than the X-chromosome. 95% of the Y-chromosome is not involved in recombinations with the X-chromosome and called the male-specific or nonrecombining Y-chromosome. For Y-chromosome polymorphisms, see ISOGG: Y-DNA SNP Index; Wellcome Trust The Human Genome: Y-chromosome-Quick Facts.

Y-chromosome haplogroups: The tips of the Y-chromosome can recombine with corresponding parts of the X-chromosome (pseudoautosomal regions), but around 95% of the Y-chromosome is not involved in any recombination. This non-recombining (male-specific) region is therefore passed intact from one generation to the next as frozen blocks. These haplotypes are the Y-chromosome haplogroups (lineages) that are used in phylogenetic studies.

Yeast: The genus Saccharomycetes of the unicellular fungi. See Yeast WiKi.

Yeast artificial chromosomes (YAC): An artificial chromosome created from DNA, centromere and telomere of yeast chromosomes. Heavily used in cloning of very large genomic fragments.

Zebrafish: A model organism (Danio rerio) to study vertebrate biology, physiology and human disease. Its high fecundity and short generation time make it useful for genetic studies as well. Another useful feature is that their fry are transparent. Hundreds of mutants resembling human diseases have been identified. See Zebrafish Information Network and Zebrafish Information Server website. See also Fugu.

Zinc finger protein: A DNA-binding domain of a protein that has a characteristic pattern of cysteine and histidine residues that complex with zinc ions. This motif occurs in several types of eukaryotic transcription factors. See Motif Structures of Transcription Factors.

Zygote: A cell formed by the fusion of sperm and egg. It develops to become first an embryo and then a fetus.

 

On Line Biology Book - Glossary    Glossary of Genetic Terms   Talking Glossary (Genetics) 

 Life: The Science of Biology - Glossary   

UCMP Glossary (Evolution)    Population Genetics Glossary

 Molecular Biology Glossary (ASH)    Molecular Biology Glossary (UM)    Genome Glossary    RNAi Glossary

  Genomic Glossaries & Taxonomies   More Human Genetics Glossaries 

 

Genetic Epidemiology Glossary  &  Real-Time PCR Glossary

 

Address for bookmarking: http://www.dorak.info/genetics/glosgen.html

 

Compiled by   Mehmet Tevfik Dorak, MD, PhD

 

Last updated on 28 August 2022

 

Evolution     Genetics     Genome Biology    Biostatistics     Population Genetics     Genetic Epidemiology    Epidemiology     HLA     MHC     Inf & Imm     Homepage