By decoding the complete genetic code, or genome, of more than 1,000 people whose homelands stretch from Africa and Asia to Europe and the Americas, scientists have compiled the largest and most detailed catalog yet of human genetic variation. This massive resource will help medical researchers find the genetic roots of rare and common diseases in populations worldwide.
“With this resource, researchers have a roadmap to search for the genetic origins of diseases in populations around the globe,” says Elaine Mardis, PhD, one of the study’s coprincipal investigators and the codirector of The Genome Institute at Washington University. “We estimate that each person carries up to several hundred rare DNA variants that could potentially contribute to disease. Now, scientists can investigate how detrimental particular rare variants are in different ethnic groups.”
At the genetic level, any two people are more than 99 percent alike. But rare variants—those that occur with a frequency of 1 percent or less in a population—are thought to contribute to rare diseases as well as common conditions like cancer, heart disease and diabetes. Rare variants may also explain why some medications are not effective in certain people or cause adverse side effects.
Identifying rare variants across different populations is a major goal of the project. During the pilot phase of the effort, researchers found that most of the rare variants differed from one population to another. These variants developed recently in human evolutionary history, after populations in Europe, Africa, Asia and the Americas diverged from a single group. The current study supports this initial work.
“This information is crucial and will improve our interpretation of individual genomes,” says another of the study’s co-principal investigators, Richard Wilson, PhD, director of The Genome Institute and a pioneer in cancer genome sequencing. “Now, if we want to study cancer in Mexican Americans or Japanese Americans, for example, we can do so in the context of their diverse geographic or ancestry-based genetic backgrounds.”
All study participants submitted anonymous DNA samples and agreed to have their genetic data included in an online database. To catalog the variants, the researchers first sequenced the entire genome—all the DNA—of each individual in the study multiple times. The process yields the precise order of DNA’s molecular building blocks, called nucleotides. Surveying the genome in this way finds common DNA changes but misses many rare variants.
Then, to find rare variants, researchers repeatedly sequenced the small portion of the genome that contains genes—about 80 times for each participant to ensure accuracy. They looked closely for changes in the DNA sequence involving a single nucleotide, called SNPs (for singlenucleotide polymorphisms).
Using tools developed to analyze and integrate the data, researchers discovered a total of 38 million SNPs. They also found more than one million structural variations—sections of extra or missing DNA.
SNPs and structural variants can help explain an individual’s susceptibility to disease, response to drugs or reaction to environmental factors such as air pollution or stress. Other studies have found an association between structural variants and diseases such as autism and schizophrenia.
The 1000 Genomes Project has generated massive amounts of genomic data. Simply recording the raw information took up some 180 terabytes of hard-drive space, enough to fill more than 40,000 DVDs. All of the information is freely available on the Internet through public databases.
“This tremendous resource builds on the knowledge of the Human Genome Project,” says co-author George Weinstock, PhD, associate director of The Genome Institute. “Scientists and, ultimately, patients worldwide will benefit from the extensive effort to understand the shared features and geographic diversity of the human genome.”
The 1000 Genomes Project involved some 200 scientists at Washington University and other institutions. Results detailing the DNA variations of individuals from 14 ethnic groups were published Oct. 31, 2012, in the journal Nature. Eventually, the initiative will involve 2,500 individuals from 26 populations.
In addition to The Genome Institute at Washington University, the project included these research centers: the Human Genome Sequencing Center at the Baylor College of Medicine, Houston; The Broad Institute of Massachusetts Institute of Technology and Harvard University in Cambridge, Mass.; the Wellcome Trust Sanger Institute in England; BGI Shenzhen in China; the Max Planck Institute for Molecular Genetics in Berlin; and Illumina Inc. in San Diego.
Q&A: The ABCs of DNA
What is DNA?
Deoxyribonucleic acid, or DNA, is the genetic blueprint for life. DNA carries the instructions for an organism—be it a flower, a dog or a person—to develop, survive and reproduce. DNA is passed down from parents to their offspring, and it is what makes each of us unique. Most DNA is located in the nucleus (or brain) of the cell, where it is packed tightly into 23 pairs of chromosomes. If unwound and tied together, the DNA in just one cell would stretch 6 feet.
What are genes?
Genes are the stretches of DNA that code for proteins, the workhorses of cells. Humans have about 20,000 genes, and together they make up only 1 to 2 percent of a person’s DNA. The rest of the DNA is thought to influence the activity of the genes.
What is a genome, and why is it studied?
A genome is the complete DNA sequence of an organism. In humans, that sequence is made up of 3 billion chemical units represented by the letters A, T, G and C. Spelling out the entire DNA sequence of a person would fill an estimated 200 New York City phone books. At the genetic level, any two people are more than 99 percent alike. By studying the genome, scientists can identify variations in the DNA sequence that may contribute to good health or increase the risk of disease.