For the first time, large-scale DNA sequence data on three UK long-term birth cohorts has been released, creating a unique resource to explore the relationship between genetic and environmental factors in child health and development.
The first resource containing high-resolution DNA sequencing data for over 37,000 children and parents collected over multiple decades from across the UK is now available to researchers worldwide.
The data release is led by the Wellcome Sanger Institute, the Children of the 90s study (also known as ALSPAC), the Millennium Cohort Study (MCS), and Born in Bradford (BiB)1, and supported by the Medical Research Council (MRC) and the Economic and Social Research Council (ESRC).
This work is supported by the ongoing efforts of Population Research UK, a UK-wide initiative led by teams at the University of Bristol and University College London, which aids longitudinal population studies by working to coordinate and connect the current research landscape.
Now available on the European Genome-phenome Archive (EGA), these high-quality genomic data can be used in combination with the existing longitudinal health and survey information provided by participating families. These combined data resources offer the scientific community the opportunity to make valuable insights in areas ranging from population genetics to the social sciences.
For example, it could be used to investigate the impact of genetic variation on neurodevelopmental conditions or childhood obesity, and how these are influenced by environmental factors.
Longitudinal research follows large numbers of participants over multiple years, repeatedly examining them at regular time points through, for example, blood tests, body measurements, and health questionnaires, to detect changes over time.
Previously, large DNA sequence datasets have typically focused on children with rare conditions or adult population cohorts. This new data release focuses on sequencing ‘birth cohorts’, which are population-based cohorts of people followed from birth through to adolescence or early adulthood.
To produce this latest data release, researchers at the Sanger Institute sequenced all 20,000 genes in the human genome, known as exome sequencing2, in samples from 8,436 children and 3,215 parents from the Children of the 90s study, 7,667 children and 6,925 parents from the MCS, and 8,784 children and 2,875 parents from BiB.
These three UK longitudinal birth cohort studies are internationally recognised and data from these cohorts have already been used to study the contribution of common genetic variants on phenotypes ranging from childhood obesity3,4 to parental nurturing behaviours5 and anxiety and depression6.
For example, by using Children of the 90s data, researchers found that a genetic variant in a gene called MC4R is associated with increased weight across childhood4 and studies like this could help design effective weight management interventions and change the way society views obesity7. That specific study used targeted DNA sequencing of the MC4R gene, whereas the new exome sequencing data reported here will allow similar investigations of other genes in the human genome. This will help drive more discoveries and research that could benefit human health.
The team has made the anonymised data as accessible as possible to approved researchers, including drafting a data note (available on Wellcome Open Research8) and other materials to help support its use by those who are less familiar with large-scale sequencing data.
In coming months, this DNA sequence data resource will be expanded to encompass all participants in these cohorts as well as additional cohorts. The value of these data will be enhanced by harmonising the data across the different cohorts, providing a more powerful resource than could be achieved by one study in isolation.
Source: Sanger Institute