The sequence of the human genome
Celera Genomics, the International Human Genome Sequencing Consortium and others published the first drafts of human genomes in 2001 which revolutionized genomics. These drafts, and subsequent updates, covered the euchromatic portion of the genome. However the heterochromatin as well as many other complex areas were not completed or incorrect. Telomere-to-Telomere Consortium (T2T), which addresses the remaining 8%, has completed the first 3.055 billion base pairs (bp) sequence for a human genome. This represents the biggest improvement made to the reference human genome since its original release. The new T2T reference genome contains gapless assemblies of all 22 autosomes and Chromosomes X. It corrects many errors, introduces 200 million bp novel sequence, contains 2,226 paralogous genes, of which 115 are predicted to code for protein. These newly completed regions include the centromeric arrays as well as the short arms for all five acrocentric Chromosomes. This allows functional and variational studies to be conducted on these complex regions.
The Genome Reference Consortium released the latest major update of the human reference genome in 2013 and the most recent patch was in 2019(GRCh38.p13). The Human Genome Project, which funded this assembly, has continuously improved it over the last 20 years. The GRC human assembly, unlike the Celera assembly and other modern genome projects, is based primarily on Sanger sequence data from bacterial artificial genome (BAC) clones. These clones were then ordered and oriented on the genome using radiation hybrid, genetic links, and fingerprint maps. The result of this laborious process is one of the most accurate and continuous reference genomes available today. However, the reliance on such technologies restricted the assembly of the genome to only those euchromatic regions that could be reliably converted into BACs and mapped.