The Future of Genomic Research: Embracing Diversity with Pangenomes
The world of genomics is rapidly evolving, and with it, the need to broaden our genetic baseline. As Neil Ward argues, our reliance on Eurocentric datasets is holding back research and limiting our understanding of human health and disease. It's time to embrace diversity and unlock the full potential of genomic medicine.
The Diversity Gap
Genomics has revolutionized our understanding of human health, but our data is not representative of the global population. Over 90% of publicly available genome-wide association studies (GWAS) data comes from European participants, despite Europeans making up less than a fifth of the world's population. This imbalance has far-reaching consequences, from skewed drug development pipelines to missed diagnoses in communities whose genetic profiles are not represented in datasets.
The impact of this diversity gap is already evident. Genetic variants common in non-European populations, such as African, Arab, and South Asian groups, often go undetected or misclassified because they don't appear in the reference datasets used by researchers. This leads to ineffective treatments and entire populations being excluded from the benefits of genomic innovation.
The Power of Pangenomes
To address this issue, researchers are turning to pangenomes. Unlike traditional single-reference genomes, pangenomes are built from the genomes of many individuals, representing core genes shared across humanity and variable genes found in specific populations. This approach provides a more comprehensive and accurate picture of genetic diversity, especially for communities historically left out of genomic research.
Overcoming Technical Challenges
Building a complete human genome reference has never been easy. Until recently, the most widely used reference genome, GRCh38, was only 92% complete. The remaining 8% included complex and repetitive regions of human DNA, often referred to as 'dark' regions that earlier sequencing technologies struggled to decipher. The completion of the human reference genome with long-read sequencing technologies has been a breakthrough, enabling scientists to capture longer, continuous stretches of DNA and accurately reconstruct difficult genomic regions.
The Arab Pangenome Example
The recently published Arab human pangenome in Nature highlights the importance of population-specific genomic references. Although Arab populations represent nearly 6% of the global population, they have been largely absent from genomic research. The study, using long-read sequencing, generated a haplotype-resolved pangenome from 53 individuals across eight countries in the Middle East and North Africa, reflecting a broad cross-section of Arab ancestries.
The researchers uncovered over 111 million base pairs of previously unsequenced DNA, including 235,000 structural variants unique to Arab individuals. They also identified 883 duplicated genes present in every individual studied, potentially linked to recessive disease. This detailed data brings researchers closer to distinguishing between benign and pathogenic variants, paving the way for improved diagnostic outcomes and better-informed care for patients of Arab ancestry.
The Need for Collaboration
The Arab pangenome study is a testament to the power of inclusive pangenomes. However, building pangenomes for every population cannot happen in isolation. It requires sustained investment, international research partnerships, and a commitment to genomic equity. Governments, funders, and industry must work together to ensure that no group is left behind. Only then can genomic medicine truly deliver on its promise, benefiting everyone, everywhere.
References:
- https://gwasdiversitymonitor.com/
- https://genebites.org/2025/02/17/diversity-in-genetic-data-where-is-everyone/
- https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.660428/full
- https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.26/
- https://www.nih.gov/news-events/nih-research-matters/first-complete-sequence-human-genome
- https://www.science.org/doi/10.1126/science.abj6987#:~:text=Addressing%20the%20remaining%208%25%20of,200%20million%20base%20pairs%20of
- https://www.nature.com/articles/s41467-025-61645-w