The complete human genome

Source: By Kabir Firaque: The Indian Express

Nearly two decades ago, when scientists published the map of the human genome for the first time, it was hailed as a breakthrough. That was incomplete, however: about 8% of the human DNA was left unsequenced. Now, in a series of papers published in Science, a large team has accounted for that 8%, completing the picture of the human genome for the first time.

Why it matters

A complete human genome makes it easier to study genetic variation between individuals or between populations. A genome refers to all of the genetic material in an organism, and the human genome is mostly the same in all people, but a very small part of the DNA does vary between one individual and another. By constructing a complete human genome, scientists can use it for reference while studying the genome of various individuals, which would help them understand which variations, if any, might be responsible for disease.

What was missing?

The genetic sequence made available in 2003 from the Human Genome Project, an international collaboration between 1990 and 2003, contained information from a region of the human genome known as the euchromatin. Here, the chromosome is rich in genes, and the DNA encodes for protein.

The 8% that was left out was in the area called heterochromatin. This is a smaller portion of the genome, and does not produce protein.

There were at least two key reasons why heterochromatin was given lower priority. This part of the genome was thought to be “junk DNA”, because it had no clear function. Besides, the euchromatin contained more genes that were simpler to sequence with the tools available at the time.

Now, the fully sequenced genome is the result of the efforts of a global collaboration called the Telomere-2-Telomere (T2T) project. The invention of new methods of DNA sequencing and computational analysis helped complete the reading of the remaining 8% of the genome.

What’s in the 8%?

The new reference genome, called T2T-CHM13, includes highly repetitive DNA sequences found in and around the telomeres (structures at the ends of chromosomes) and the centromeres (at the middle section of each chromosome). The new sequence also reveals long stretches of DNA that are duplicated in the genome and are known to play important roles in evolution and disease.

The fact that the sequences are repetitive is enlightening, scientists said. The findings have revealed a large number of genetic variations, and these variations appear in large part within these repeated sequences.

“A significant amount of human genetic material turns out to be long, repetitive sections that occur over and over. Although every human has some repeats, not everyone has the same number of them. And the difference in the number of repeats is where most of human genetic variation is found,” the University of Connecticut said in a press release.

Many of the newly revealed regions have important functions in the genome even if they do not include active genes.

What next

The T2T consortium used the now-complete genome sequence as a reference to discover more than 2 million additional variants in the human genome. These studies provide more accurate information about the genomic variants within 622 medically relevant genes, the US National Institutes of Health has announced.

The complete sequence will be valuable for studies that aim to establish comprehensive views of human genomic variation. Many research groups have already started using a pre-release version of the complete human genome sequence for their research, the NIH said.

The new T2T reference genome will complement the standard human reference genome, known as Genome Reference Consortium build 38 (GRCh38), which originated from the Human Genome Project and has been updated since.