Browse the corpus

Walk the Even Hospital Database by book and chapter — the raw source passages that ground Ask, DDx, and the rest.

37 passages

abstractpubmed· Abstract· item 34758253

100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. BACKGROUND: The U.K. 100,000 Genomes Project is in the process of investigating the role of genome sequencing in patients with undiagnosed rare diseases after usual care and the alignment of this research with health care implementation in the U.K. National Health Service. Other parts of this project focus on patients with cancer and infection. METHODS: We conducted a pilot study involving 4660 participants from 2183 families, among whom 161 disorders covering a broad spectrum of rare diseases were present. We collected data on clinical features with the use of Human Phenotype Ontology terms, undertook genome sequencing, applied automated variant prioritization on the basis of applied virtual gene panels and phenotypes, and identified novel pathogenic variants through research analysis. RESULTS: Diagnostic yields varied among family structures and were highest in family trios (both parents and a proband) and families with larger pedigrees. Diagnostic yields were much higher for disorders likely to have a monogenic cause (35%) than for disorders likely to have a complex cause (11%). Diagnostic yields for intellectual disability, hearing disorders, and vision disorders ranged from 40 to 55%. We made genetic diagnoses in 25% of the probands. A total of 14% of the diagnoses were made by means of the combination of research and automated approaches, which was critical for cases in which we found etiologic noncoding, structural, and mitochondrial genome variants and coding variants poorly covered by exome sequencing. Cohortwide burden testing across 57,000 genomes enabled the discovery of three new disease genes and 19 new associations. Of the genetic diagnoses that we made, 25% had immediate ramifications for clinical decision making for the patients or their relatives. CONCLUSIONS: Our pilot study of genome sequencing in a national health care system showed an increase in diagnostic yield across a range of rare diseases. (Funded by the National Institute for Health Research and others.).

fulltextpubmed· METHODS· item 34758253

After approval from the national research ethics committee was obtained, we recruited participants who had been identified by health care professionals and researchers as having rare diseases (across a broad range of categories) that had not been diagnosed after receipt of usual care in the NHS, which included either no diagnostic tests (because none were available) or approved diagnostic tests that did not include genome sequencing. The participants were recruited at nine English hospitals, and written informed consent was obtained from the participants by the National Institute for Health Research (NIHR) BioResource for Rare Diseases.

fulltextpubmed· METHODS· item 34758253

agnostic tests (because none were available) or approved diagnostic tests that did not include genome sequencing. The participants were recruited at nine English hospitals, and written informed consent was obtained from the participants by the National Institute for Health Research (NIHR) BioResource for Rare Diseases. To test the broad applicability of genome sequencing, we determined that participants were eligible if they had a rare disease (as defined in the United Kingdom as a disorder affecting ≤1 in 2000 persons), were likely to have a single-gene or oligogenic cause, and had not received a genomic diagnosis. Data on previous testing in probands were collected when possible; testing included single-gene tests, karyotyping, single-nucleotide polymorphism arrays, next-generation sequencing panels, and exome sequencing. Probands and, when feasible, parents or other family members were enrolled across multiple clinical specialties in the NHS. Standardized baseline clinical data were recorded with the use of Human Phenotype Ontology (HPO) terms7 guided by disease-specific data models,8 and whole blood samples were obtained for DNA extraction. In the 100,000 Genomes Project, participants are followed over their life course with the use of electronic health records (all hospital episodes, registry entries, and cause of death).

fulltextpubmed· METHODS· item 34758253

man Phenotype Ontology (HPO) terms7 guided by disease-specific data models,8 and whole blood samples were obtained for DNA extraction. In the 100,000 Genomes Project, participants are followed over their life course with the use of electronic health records (all hospital episodes, registry entries, and cause of death). This pilot study was undertaken in partnership with the NIHR BioResource and is part of the portfolio of translational research at the NIHR Biomedical Research Centres at Barts, Cambridge University Hospitals NHS Foundation Trust, Great Ormond Street Hospital for Children NHS Foundation Trust, Manchester University NHS Foundation Trust, Moorfields Eye Hospital NHS Foundation Trust, Newcastle upon Tyne Hospitals NHS Foundation Trust, Oxford University Hospitals NHS Foundation Trust, and University College London Hospitals NHS Foundation Trust. Clinical data from the NHS and NHS Digital were used in this work.

fulltextpubmed· METHODS· item 34758253

Manchester University NHS Foundation Trust, Moorfields Eye Hospital NHS Foundation Trust, Newcastle upon Tyne Hospitals NHS Foundation Trust, Oxford University Hospitals NHS Foundation Trust, and University College London Hospitals NHS Foundation Trust. Clinical data from the NHS and NHS Digital were used in this work. Genome sequencing9 was performed with the use of the TruSeq DNA polymerase-chain-reaction (PCR)–free sample preparation kit (Illumina) on a HiSeq 2500 sequencer, which generates a mean depth of 32× (range, 27 to 54) and a depth greater than 15× for at least 95% of the reference human genome. Whole-genome sequencing reads were aligned to the Genome Reference Consortium human genome build 37 (GRCh37) with the use of Isaac Genome Alignment Software. Family-based variant calling of single-nucleotide variants (SNVs) and insertion or deletions (indels) for chromosomes 1 to 22, the X chromosome, and the mitochondrial genome (mean coverage, 2814×; range, 142 to 16,581) was performed with the use of the Platypus variant caller.10

fulltextpubmed· METHODS· item 34758253

use of Isaac Genome Alignment Software. Family-based variant calling of single-nucleotide variants (SNVs) and insertion or deletions (indels) for chromosomes 1 to 22, the X chromosome, and the mitochondrial genome (mean coverage, 2814×; range, 142 to 16,581) was performed with the use of the Platypus variant caller.10 We constructed an automated analytic pipeline to filter the genome down to rare, segregating, and predicted damaging candidate variants in coding regions. To limit the possibility of overlooking or inefficiently prioritizing diagnoses, we focused initially on applied virtual gene panels (applied panels) that were based on both the recruited clinical indication or disease and the submitted HPO terms. To address the issue of which genes have sufficient evidence to show causation and be included in these applied panels, we used our PanelApp software to enable expert, crowd-sourced review and curation of genes with diagnostic-grade evidence for each of our disease categories (e.g., evidence in at least three unrelated families).11 Loss-of-function or de novo protein-altering variants affecting genes in the applied panels were classified as tier 1, other variant types such as missense variants affecting these genes were classified as tier 2, and all other filtered variants were classified as tier 3 (Fig. S1 in the Supplementary Appendix, available with the full text of this article at NEJM.org). To further reduce the possibility of missing or inefficiently prioritized diagnoses, we used a phenotype-based approach with the Exomiser application12 to search across all genes in the genome for a diagnosis. Exomiser prioritizes rare, segregating, and predicted pathogenic variants in genes in which the patient phenotypes match previously referenced knowledge from human disease or model organism databases. The ontology-driven phenotype matching can identify patients who have an atypical profile for a disease. Additional details regarding the Exomiser are provided in the Diagnostic Pipeline section in the Supplementary Appendix.

fulltextpubmed· METHODS· item 34758253

enotypes match previously referenced knowledge from human disease or model organism databases. The ontology-driven phenotype matching can identify patients who have an atypical profile for a disease. Additional details regarding the Exomiser are provided in the Diagnostic Pipeline section in the Supplementary Appendix. Prioritization of variants and return of candidate variants for presentation to the 13 NHS Genomic Medicine Centres (GMCs) were performed with the use of decision-support systems and with assistance from clinical genetics teams from Congenica and Fabric Genomics.13,14 These variants were reviewed by NHS clinical scientists and clinicians using the guidelines of the American College of Medical Genetics and Genomics, and a diagnostic report was issued for each proband.15 Final clinical outcomes included whether a genetic diagnosis was obtained, identification of the variant or variants involved, whether the variant or variants explained all or some of the phenotypes, and whether an intervention was used. Recruitment of the participants in the pilot study and sequencing were performed during the period from January 2014 through December 2016, while the infrastructure to collect, quality check, process, and return data was being established. Results were returned to the GMCs from May 2016 through April 2019. Now that the information pipeline has been established (post-pilot phase), results are returned to the GMCs within 6 weeks after the sample is obtained.

fulltextpubmed· METHODS· item 34758253

while the infrastructure to collect, quality check, process, and return data was being established. Results were returned to the GMCs from May 2016 through April 2019. Now that the information pipeline has been established (post-pilot phase), results are returned to the GMCs within 6 weeks after the sample is obtained. Researchers investigated coding and noncoding regions to detect novel diagnostic variants in genes matching the patients’ phenotypes, including the presence of de novo variants in highly constrained coding regions16 in the 95th percentile. We use the term novel to describe diagnostic variants we have detected that have not previously been described in the literature as causative. This is distinct from de novo variants, which are present for the first time in a family member due to either a new variant in an egg or sperm or a new mutation at conception. The variant may have been previously described. We used a new method described by Wei et al.17 to analyze mitochondrial DNA that accounts for heteroplasmy, the Genomiser to detect noncoding pathogenic variants,18 and the ExpansionHunter software tool to detect simple tandem repeat expansions.19 Finally we used a new random forest method to analyze Canvas20 and Manta21 calls and to identify potentially pathogenic copy-number and structural variants.

fulltextpubmed· METHODS· item 34758253

ounts for heteroplasmy, the Genomiser to detect noncoding pathogenic variants,18 and the ExpansionHunter software tool to detect simple tandem repeat expansions.19 Finally we used a new random forest method to analyze Canvas20 and Manta21 calls and to identify potentially pathogenic copy-number and structural variants. Gene-based burden testing to detect enrichment of rare, predicted pathogenic, and segregating variants in novel genes in specific disease cohorts relative to controls was performed on the genomes in the pilot study as well as on additional genomes from the rest of the 100,000 Genomes Project to increase power (57,002 genomes; see the Supplementary Methods in the Supplementary Appendix). The genomic and clinical data from the pilot study are freely accessible to members of a Genomics England Clinical Interpretation Partnership domain (https://www.genomicsengland.co.uk/about-gecip/). Testing was performed with the use of the R software, version 3.6.0 (R Foundation for Statistical Computing), and Stata software, version 16 (StataCorp). Further details on the individual methods used in the study are provided in the Supplementary Appendix.

fulltextpubmed· GENOME SEQUENCING· item 34758253

Genome sequencing9 was performed with the use of the TruSeq DNA polymerase-chain-reaction (PCR)–free sample preparation kit (Illumina) on a HiSeq 2500 sequencer, which generates a mean depth of 32× (range, 27 to 54) and a depth greater than 15× for at least 95% of the reference human genome. Whole-genome sequencing reads were aligned to the Genome Reference Consortium human genome build 37 (GRCh37) with the use of Isaac Genome Alignment Software. Family-based variant calling of single-nucleotide variants (SNVs) and insertion or deletions (indels) for chromosomes 1 to 22, the X chromosome, and the mitochondrial genome (mean coverage, 2814×; range, 142 to 16,581) was performed with the use of the Platypus variant caller.10

fulltextpubmed· DIAGNOSTIC PIPELINE· item 34758253

We constructed an automated analytic pipeline to filter the genome down to rare, segregating, and predicted damaging candidate variants in coding regions. To limit the possibility of overlooking or inefficiently prioritizing diagnoses, we focused initially on applied virtual gene panels (applied panels) that were based on both the recruited clinical indication or disease and the submitted HPO terms. To address the issue of which genes have sufficient evidence to show causation and be included in these applied panels, we used our PanelApp software to enable expert, crowd-sourced review and curation of genes with diagnostic-grade evidence for each of our disease categories (e.g., evidence in at least three unrelated families).11 Loss-of-function or de novo protein-altering variants affecting genes in the applied panels were classified as tier 1, other variant types such as missense variants affecting these genes were classified as tier 2, and all other filtered variants were classified as tier 3 (Fig. S1 in the Supplementary Appendix, available with the full text of this article at NEJM.org). To further reduce the possibility of missing or inefficiently prioritized diagnoses, we used a phenotype-based approach with the Exomiser application12 to search across all genes in the genome for a diagnosis. Exomiser prioritizes rare, segregating, and predicted pathogenic variants in genes in which the patient phenotypes match previously referenced knowledge from human disease or model organism databases. The ontology-driven phenotype matching can identify patients who have an atypical profile for a disease. Additional details regarding the Exomiser are provided in the Diagnostic Pipeline section in the Supplementary Appendix.

fulltextpubmed· NOVEL PATHOGENIC VARIANTS· item 34758253

Researchers investigated coding and noncoding regions to detect novel diagnostic variants in genes matching the patients’ phenotypes, including the presence of de novo variants in highly constrained coding regions16 in the 95th percentile. We use the term novel to describe diagnostic variants we have detected that have not previously been described in the literature as causative. This is distinct from de novo variants, which are present for the first time in a family member due to either a new variant in an egg or sperm or a new mutation at conception. The variant may have been previously described. We used a new method described by Wei et al.17 to analyze mitochondrial DNA that accounts for heteroplasmy, the Genomiser to detect noncoding pathogenic variants,18 and the ExpansionHunter software tool to detect simple tandem repeat expansions.19 Finally we used a new random forest method to analyze Canvas20 and Manta21 calls and to identify potentially pathogenic copy-number and structural variants. Gene-based burden testing to detect enrichment of rare, predicted pathogenic, and segregating variants in novel genes in specific disease cohorts relative to controls was performed on the genomes in the pilot study as well as on additional genomes from the rest of the 100,000 Genomes Project to increase power (57,002 genomes; see the Supplementary Methods in the Supplementary Appendix). The genomic and clinical data from the pilot study are freely accessible to members of a Genomics England Clinical Interpretation Partnership domain (https://www.genomicsengland.co.uk/about-gecip/).

fulltextpubmed· STATISTICAL ANALYSIS· item 34758253

Testing was performed with the use of the R software, version 3.6.0 (R Foundation for Statistical Computing), and Stata software, version 16 (StataCorp). Further details on the individual methods used in the study are provided in the Supplementary Appendix.

fulltextpubmed· RESULTS· item 34758253

We enrolled 4660 participants (2183 probands and 2477 family members), among whom 161 disorders across a broad spectrum of rare diseases were present (Table 1).22 Neurologic, ophthalmologic, and tumor syndromes were commonly represented (Table 2). Participants were recruited with varying numbers of affected and unaffected family members. We aimed to recruit family trios (both parents and a proband) or larger family structures to facilitate more effective variant prioritization, and our efforts were met with varying degrees of success. Among the recruited probands with multiple bowel polyps, 93% were singletons (i.e., probands for whom no other family member was recruited). In contrast, 12% of the probands with intellectual disability were singletons. Adult probands were more commonly enrolled than pediatric probands (age ≤18 years at recruitment) (74% vs. 26%), which is in line with the percentage of children and adults in the general population in England and Wales (79% vs. 21% [2011 census of England and Wales23]). The preponderance of adults was unusual as compared with previous sequencing projects and reflects the eligibility criterion that probands had to have undergone usual care; in many cases, usual care involved standard genetic testing (mostly single-gene or panel-based). A lower percentage of recruited probands were female than male owing to the difference among pediatric probands (232 girls and female adolescents [11%] vs. 339 boys and male adolescents [16%], P<0.001); the expected percentage of female probands was 51% (on the basis of the 2011 census of England and Wales) across most disease categories. The greater susceptibility of males than of females to recessive X-linked conditions may account for this sex bias: more than 6% of all diagnoses involved variants on the X chromosome (which represents approximately 5% of the genome). The inferred ancestry of the probands (see the Supplementary Appendix) was in line with what was expected on the basis of the general population, in which 86% of children and adults were White, 8% Asian, 3% Black, 2% mixed, and 1% other (2011 census of England and Wales).

fulltextpubmed· RESULTS· item 34758253

ich represents approximately 5% of the genome). The inferred ancestry of the probands (see the Supplementary Appendix) was in line with what was expected on the basis of the general population, in which 86% of children and adults were White, 8% Asian, 3% Black, 2% mixed, and 1% other (2011 census of England and Wales). However, South Asian ancestry was significantly more common among pediatric probands than among adult probands (16% vs. 4%, P<0.001); our results indicated potential consanguinity in 43% of the 93 pediatric South Asian probands and in 1% of the other 478 pediatric probands (Table 1). We collected clinical data with the use of HPO terms for each affected participant (a median of 4 [range, 1 to 61] present terms, and a median of 4 [range, 0 to 144] absent terms [phenotypes that were assessed and confirmed as definitely not observed in the proband]). We then performed genome sequencing, followed by quality assurance to check coverage, sequence quality, presence of repeat sample submissions or sample swaps, and consistency with reported family structures (see the Supplementary Appendix).

fulltextpubmed· RESULTS· item 34758253

s that were assessed and confirmed as definitely not observed in the proband]). We then performed genome sequencing, followed by quality assurance to check coverage, sequence quality, presence of repeat sample submissions or sample swaps, and consistency with reported family structures (see the Supplementary Appendix). We made genetic diagnoses in 25% of the probands and deposited the genotypes into the ClinVar repository (accession numbers, SCV001759972 to SCV001760540). Of these diagnoses, 60% were made on the basis of coding SNVs or indels in the applied panels; 26% were made on the basis of coding SNVs or indels affecting well-established disease genes not included in the applied panels (diagnoses were made through phenotype-based prioritization or expert review by the study clinicians or the clinical genetics teams from Congenica or Fabric Genomics); and 14% were made on the basis of genomewide, phenotype-agnostic research analysis that investigated beyond SNVs and indels, coding regions, and disease genes in the applied panels (Fig. 1). On the basis of international guidelines,15 an additional 10% of the probands were classified as having variants of unknown significance in genes that were considered to be consistent with the phenotype on clinical review at the study site but that required further functional validation. Fewer candidate variants were returned to the GMCs after filtering (i.e., the removal of extremely unlikely candidates) in larger family structures (Table 3), which made it easier to identify causative variants and in turn led to higher diagnostic yields for family trios and quads (proband, sibling, and parents) and more complex family structures (Fig. 2A), even within a disorder (e.g., the diagnostic yield for hereditary ataxia was 21% among singletons and 32% among persons in family trios) (Table S4).

fulltextpubmed· RESULTS· item 34758253

ify causative variants and in turn led to higher diagnostic yields for family trios and quads (proband, sibling, and parents) and more complex family structures (Fig. 2A), even within a disorder (e.g., the diagnostic yield for hereditary ataxia was 21% among singletons and 32% among persons in family trios) (Table S4). We obtained a higher diagnostic yield for diseases that we considered likely to have a monogenic cause than those we considered likely to have a complex cause (35% vs. 11%) (Fig. 2A). Diseases were considered likely to have a monogenic cause if they were present in the Online Mendelian Inheritance in Man database, involved genetic testing as part of the standard diagnostic workup, and had a consensus of opinion among three clinical geneticists (who were unaware of each other’s assessments) that they had monogenic cause. Diagnostic yield was highly varied across diseases (Fig. 2B and Table S3); the diagnostic yield ranged from 40 to 55% for intellectual disability and various vision and hearing disorders and was 6% for tumor syndromes.

fulltextpubmed· RESULTS· item 34758253

ee clinical geneticists (who were unaware of each other’s assessments) that they had monogenic cause. Diagnostic yield was highly varied across diseases (Fig. 2B and Table S3); the diagnostic yield ranged from 40 to 55% for intellectual disability and various vision and hearing disorders and was 6% for tumor syndromes. We obtained data on the presence or absence of previous genetic testing in 1177 participants. The number of tests per proband ranged from 0 to 16, with a median of 1 (interquartile range, 0 to 2), and approximately half the probands in this subgroup had been tested at least once. The overall diagnostic yield with the use of genome sequencing in this subgroup increased by 32%, and there was only a slight difference depending on whether previous testing had been performed (33%) or not (31%). However, many of these previous tests were not recent, dating back to the time of recruitment at the latest (2014 to 2016). The diagnostic yield provided by genome sequencing varied between 28% and 45%, depending on the type of previous testing (Fig. 2C and Table S5), which for the most part involved targeted single-gene and panel-based testing (Table S6). The aim of the automated diagnostic pipeline is to identify a few potentially causative candidate variants, among the millions in a whole genome, through the removal of extremely unlikely candidates (filtering) and the identification of the most likely candidates in the remainder (prioritization). This approach facilitates manual clinical interpretation and diagnostic reporting by clinicians at the GMCs.

fulltextpubmed· RESULTS· item 34758253

candidate variants, among the millions in a whole genome, through the removal of extremely unlikely candidates (filtering) and the identification of the most likely candidates in the remainder (prioritization). This approach facilitates manual clinical interpretation and diagnostic reporting by clinicians at the GMCs. A total of 322 (66%) of the 490 diagnoses that were based on SNVs or indels from the genomes were made with the virtual panel–based pipeline, and the positive predictive value was high given the millions of variants in the whole genomes — 291 of 1041 candidate variants (28%) returned to the GMCs proved to be diagnostic. We re-ran this analysis in December 2019 to assess the effects of updated versions of the applied panels with the latest disease gene discoveries, improved selection of the applied panel or panels on the basis of the patient’s phenotype, and advances in variant-filtering strategies (e.g., allowance for incomplete penetrance when suspected). With the use of these updated versions, the number of genetic diagnoses increased from 322 to 377 of the 490 diagnoses (77% sensitivity), and the positive predictive value was 15% (Fig. 2D). This result shows effective filtering and prioritization of the variants, with a median number of only 1 candidate variant (interquartile range, 0 to 2) included in the panels returned to the GMCs per proband (Table 3). Ongoing evolution of the applied panels with new disease genes is expected to continue to increase the diagnostic yield with this approach.

fulltextpubmed· RESULTS· item 34758253

nd prioritization of the variants, with a median number of only 1 candidate variant (interquartile range, 0 to 2) included in the panels returned to the GMCs per proband (Table 3). Ongoing evolution of the applied panels with new disease genes is expected to continue to increase the diagnostic yield with this approach. With the use of phenotype-based prioritization with the Exomiser to score and rank the most likely causative variants, diagnoses were detected in 77% of the top-ranked candidate variants, in 86% of the top three candidates, and in 88% of the top five candidates (Fig. 2D). Use of the Exomiser and applied panels was complementary — 92% of the 490 diagnoses were made with the applied panels or the Exomizer top five candidates (last blue bar in Fig. 2D). Precision phenotyping in our participants was essential for both the Exomiser and the selection of additional applied panels; without such phenotyping, only 54% of these diagnoses would have been prioritized in the virtual panel for the recruited disease and presented to the GMCs as a likely candidate (first blue bar in Fig. 2D).

fulltextpubmed· RESULTS· item 34758253

henotyping in our participants was essential for both the Exomiser and the selection of additional applied panels; without such phenotyping, only 54% of these diagnoses would have been prioritized in the virtual panel for the recruited disease and presented to the GMCs as a likely candidate (first blue bar in Fig. 2D). A total of 14% of the genetic diagnoses required further research outside the diagnostic pipeline (Fig. 1). This research involved combined analysis of the genome sequences and clinical data in our research environment and validation with the use of wet-bench orthogonal tests and computational approaches (Table S7). Additional diagnoses were made by screening for the presence of de novo variants in highly constrained coding regions.16 These diagnoses included a de novo EBF3 missense variant in a patient with hereditary ataxia. A mitochondrial genome analysis that accounted for heteroplasmy led to four new diagnoses, as well as the nine that had already been made by means of the main pipeline. Twelve probands had intronic splicing variants that were prioritized by Exomiser owing to the known pathogenic status of these variants in the ClinVar database.24 Nine diagnoses involving novel, previously undescribed noncoding variants required exploration of the whole genome and in vitro functional validation by means of reverse transcriptase–PCR, minigene, or luciferase assays.25-27 For these diagnoses, unsolved cases in probands had been queried for noncoding variants that affect genes, either alone or in compound heterozygosity with loss-of-function variants, included in the applied panels. These cases were identified with the use of Genomiser or, for probands with retinal disorders, systematic analysis of the untranslated regions, promoter, or introns. The cases in 43 additional probands were fully or partially explained by structural variants or simple tandem repeat expansions in the genes HTT or FXN in the probands with hereditary spastic paraplegia.

fulltextpubmed· RESULTS· item 34758253

iser or, for probands with retinal disorders, systematic analysis of the untranslated regions, promoter, or introns. The cases in 43 additional probands were fully or partially explained by structural variants or simple tandem repeat expansions in the genes HTT or FXN in the probands with hereditary spastic paraplegia. We performed burden testing to identify new mendelian disease–gene associations and make potential genetic diagnoses in probands with unsolved cases; 828 significant disease–gene associations (Q value of <0.1) were identified, including 249 known and 579 novel genes (novel with respect to their association with disease), with a mean (±SD) number of associations of only 0.03±0.2 (range, 0 to 3) from 10,000 permutations in which the cases and controls were assigned randomly. A total of 22 candidates represent the most probable new, fully penetrant, mendelian disease genes (Table S8; ClinVar accession numbers, SCV001759972 to SCV001760540) with three recently independently confirmed diagnoses: UBAP1 in hereditary spastic paraplegia,28 FOXJ1 in non–cystic fibrosis bronchiectasis,29 and SORD in Charcot–Marie–Tooth disease.30 Diagnostic reports were issued for three probands with these genes (Fig. 1), and we are currently investigating others with the use of the online tool GeneMatcher and with functional validation studies in model organisms.

fulltextpubmed· RESULTS· item 34758253

legia,28 FOXJ1 in non–cystic fibrosis bronchiectasis,29 and SORD in Charcot–Marie–Tooth disease.30 Diagnostic reports were issued for three probands with these genes (Fig. 1), and we are currently investigating others with the use of the online tool GeneMatcher and with functional validation studies in model organisms. The findings from our approach ended long diagnostic odysseys for some participants and their families (the median duration of such an odyssey was 75 months, and the median number of hospital visits was 68) (Table S1), and we speculate that they will mitigate NHS resource costs (the combined cost for 183,273 episodes of hospital care among the affected participants was £87 million [$122 million]) (Table S3). In addition, 134 of the 533 genetic diagnoses (25%) were reported by clinicians to be of immediate clinical actionability — only 11 (0.2%) were described as having no benefit. As of now, the remainder of the diagnoses are of unknown usefulness. The benefits in terms of health care included 4 diagnoses that led to a suggested change in medication, 26 that led to suggested additional surveillance of the proband or relatives, 13 that allowed for clinical trial eligibility, 59 that informed future reproductive choices, and 32 that had other benefits (Table S9).

fulltextpubmed· RESULTS· item 34758253

. The benefits in terms of health care included 4 diagnoses that led to a suggested change in medication, 26 that led to suggested additional surveillance of the proband or relatives, 13 that allowed for clinical trial eligibility, 59 that informed future reproductive choices, and 32 that had other benefits (Table S9). In several specific probands, diagnoses have had important clinical actionability. In a 36-year-old man with suspected choroideremia, we detected a novel CHM promoter variant causing loss of gene expression,27 a diagnosis that enabled eligibility for a gene-replacement trial. A male neonate proband presented with severe infection and transient neurologic symptoms immediately after birth and died at 4 months of age with no diagnosis but with health care costs of approximately £80,000 ($112,000) (Table S10). A diagnosis of transcobalamin II deficiency due to a homozygous frameshift in TCN2 was made from this study, which enabled predictive testing to be offered to the younger brother within 1 week after birth. The younger child, who received a positive result, received weekly hydroxocobalamin injections to prevent metabolic decompensation.

fulltextpubmed· RESULTS· item 34758253

anscobalamin II deficiency due to a homozygous frameshift in TCN2 was made from this study, which enabled predictive testing to be offered to the younger brother within 1 week after birth. The younger child, who received a positive result, received weekly hydroxocobalamin injections to prevent metabolic decompensation. A 10-year-old girl was admitted to the intensive care unit with life-threatening chicken pox. She had undergone a diagnostic odyssey over a period of 7 years at a total cost of £356,571 ($499,199) across 307 secondary care episodes (Table S11). We were able to diagnose CTPS1 deficiency due to a homozygous, known pathogenic splice acceptor variant. A diagnosis enabled a curative bone marrow transplantation (cost of £70,000 [$98,000]), and predictive testing in her siblings showed no additional family members to be at risk. One proband had waited until his sixth decade of life for a genomic diagnosis of an INF2 mutation causing focal segmental glomerulosclerosis. His father, brother, and uncle had all died from kidney failure. He had received two kidney transplants, had transmitted the condition to his daughter, and was concerned about whether his 15-year-old granddaughter, who was under surveillance, was at risk. After he received his genetic diagnosis, the granddaughter was tested, found to be negative, and discharged from regular medical surveillance.

fulltextpubmed· PARTICIPANTS· item 34758253

ich represents approximately 5% of the genome). The inferred ancestry of the probands (see the Supplementary Appendix) was in line with what was expected on the basis of the general population, in which 86% of children and adults were White, 8% Asian, 3% Black, 2% mixed, and 1% other (2011 census of England and Wales). However, South Asian ancestry was significantly more common among pediatric probands than among adult probands (16% vs. 4%, P<0.001); our results indicated potential consanguinity in 43% of the 93 pediatric South Asian probands and in 1% of the other 478 pediatric probands (Table 1).

fulltextpubmed· CLINICAL DATA AND SEQUENCING· item 34758253

We collected clinical data with the use of HPO terms for each affected participant (a median of 4 [range, 1 to 61] present terms, and a median of 4 [range, 0 to 144] absent terms [phenotypes that were assessed and confirmed as definitely not observed in the proband]). We then performed genome sequencing, followed by quality assurance to check coverage, sequence quality, presence of repeat sample submissions or sample swaps, and consistency with reported family structures (see the Supplementary Appendix).

fulltextpubmed· DIAGNOSTIC YIELD· item 34758253

We made genetic diagnoses in 25% of the probands and deposited the genotypes into the ClinVar repository (accession numbers, SCV001759972 to SCV001760540). Of these diagnoses, 60% were made on the basis of coding SNVs or indels in the applied panels; 26% were made on the basis of coding SNVs or indels affecting well-established disease genes not included in the applied panels (diagnoses were made through phenotype-based prioritization or expert review by the study clinicians or the clinical genetics teams from Congenica or Fabric Genomics); and 14% were made on the basis of genomewide, phenotype-agnostic research analysis that investigated beyond SNVs and indels, coding regions, and disease genes in the applied panels (Fig. 1). On the basis of international guidelines,15 an additional 10% of the probands were classified as having variants of unknown significance in genes that were considered to be consistent with the phenotype on clinical review at the study site but that required further functional validation. Fewer candidate variants were returned to the GMCs after filtering (i.e., the removal of extremely unlikely candidates) in larger family structures (Table 3), which made it easier to identify causative variants and in turn led to higher diagnostic yields for family trios and quads (proband, sibling, and parents) and more complex family structures (Fig. 2A), even within a disorder (e.g., the diagnostic yield for hereditary ataxia was 21% among singletons and 32% among persons in family trios) (Table S4).

fulltextpubmed· DIAGNOSTIC YIELD· item 34758253

ee clinical geneticists (who were unaware of each other’s assessments) that they had monogenic cause. Diagnostic yield was highly varied across diseases (Fig. 2B and Table S3); the diagnostic yield ranged from 40 to 55% for intellectual disability and various vision and hearing disorders and was 6% for tumor syndromes. We obtained data on the presence or absence of previous genetic testing in 1177 participants. The number of tests per proband ranged from 0 to 16, with a median of 1 (interquartile range, 0 to 2), and approximately half the probands in this subgroup had been tested at least once. The overall diagnostic yield with the use of genome sequencing in this subgroup increased by 32%, and there was only a slight difference depending on whether previous testing had been performed (33%) or not (31%). However, many of these previous tests were not recent, dating back to the time of recruitment at the latest (2014 to 2016). The diagnostic yield provided by genome sequencing varied between 28% and 45%, depending on the type of previous testing (Fig. 2C and Table S5), which for the most part involved targeted single-gene and panel-based testing (Table S6).

fulltextpubmed· DIAGNOSTIC PIPELINE· item 34758253

The aim of the automated diagnostic pipeline is to identify a few potentially causative candidate variants, among the millions in a whole genome, through the removal of extremely unlikely candidates (filtering) and the identification of the most likely candidates in the remainder (prioritization). This approach facilitates manual clinical interpretation and diagnostic reporting by clinicians at the GMCs.

fulltextpubmed· RESEARCH-BASED DIAGNOSES· item 34758253

A total of 14% of the genetic diagnoses required further research outside the diagnostic pipeline (Fig. 1). This research involved combined analysis of the genome sequences and clinical data in our research environment and validation with the use of wet-bench orthogonal tests and computational approaches (Table S7). Additional diagnoses were made by screening for the presence of de novo variants in highly constrained coding regions.16 These diagnoses included a de novo EBF3 missense variant in a patient with hereditary ataxia. A mitochondrial genome analysis that accounted for heteroplasmy led to four new diagnoses, as well as the nine that had already been made by means of the main pipeline. Twelve probands had intronic splicing variants that were prioritized by Exomiser owing to the known pathogenic status of these variants in the ClinVar database.24 Nine diagnoses involving novel, previously undescribed noncoding variants required exploration of the whole genome and in vitro functional validation by means of reverse transcriptase–PCR, minigene, or luciferase assays.25-27 For these diagnoses, unsolved cases in probands had been queried for noncoding variants that affect genes, either alone or in compound heterozygosity with loss-of-function variants, included in the applied panels. These cases were identified with the use of Genomiser or, for probands with retinal disorders, systematic analysis of the untranslated regions, promoter, or introns. The cases in 43 additional probands were fully or partially explained by structural variants or simple tandem repeat expansions in the genes HTT or FXN in the probands with hereditary spastic paraplegia.

fulltextpubmed· NEW DISEASE–GENE ASSOCIATIONS· item 34758253

We performed burden testing to identify new mendelian disease–gene associations and make potential genetic diagnoses in probands with unsolved cases; 828 significant disease–gene associations (Q value of <0.1) were identified, including 249 known and 579 novel genes (novel with respect to their association with disease), with a mean (±SD) number of associations of only 0.03±0.2 (range, 0 to 3) from 10,000 permutations in which the cases and controls were assigned randomly. A total of 22 candidates represent the most probable new, fully penetrant, mendelian disease genes (Table S8; ClinVar accession numbers, SCV001759972 to SCV001760540) with three recently independently confirmed diagnoses: UBAP1 in hereditary spastic paraplegia,28 FOXJ1 in non–cystic fibrosis bronchiectasis,29 and SORD in Charcot–Marie–Tooth disease.30 Diagnostic reports were issued for three probands with these genes (Fig. 1), and we are currently investigating others with the use of the online tool GeneMatcher and with functional validation studies in model organisms.

fulltextpubmed· HEALTH CARE OUTCOMES AFTER DIAGNOSIS· item 34758253

The findings from our approach ended long diagnostic odysseys for some participants and their families (the median duration of such an odyssey was 75 months, and the median number of hospital visits was 68) (Table S1), and we speculate that they will mitigate NHS resource costs (the combined cost for 183,273 episodes of hospital care among the affected participants was £87 million [$122 million]) (Table S3). In addition, 134 of the 533 genetic diagnoses (25%) were reported by clinicians to be of immediate clinical actionability — only 11 (0.2%) were described as having no benefit. As of now, the remainder of the diagnoses are of unknown usefulness. The benefits in terms of health care included 4 diagnoses that led to a suggested change in medication, 26 that led to suggested additional surveillance of the proband or relatives, 13 that allowed for clinical trial eligibility, 59 that informed future reproductive choices, and 32 that had other benefits (Table S9).

fulltextpubmed· DISCUSSION· item 34758253

Our findings show a substantial increase in yield of genomic diagnoses made in patients with the use of genome sequencing across a broad spectrum of rare disease. The enhanced diagnostic benefit was observed regardless of whether participants had undergone previous genetic testing (diagnostic yields were 31% among those who had undergone testing and 33% among those who had not). In 25% of those who received a genetic diagnosis, there was immediate clinical actionability. The standardization of procedures — from the enrollment of patients to the return of NHS-validated results to clinicians — was critical to our success. For example, the collection of clinical data with the use of disease-specific data models and HPO terms enabled diagnoses, which confirmed the value of standardization with the use of ontology terms and clinical annotation in precision medicine.31 These additional diagnoses, beyond the 264 (49% of total diagnoses) observed with the use of the single-disease virtual panel, came from the use of Exomiser and additional applied panels. The diagnostic discoveries derived by combining research, decision support, and clinical validation and assessment leveraged an additional 72 diagnoses.

fulltextpubmed· DISCUSSION· item 34758253

gnoses, beyond the 264 (49% of total diagnoses) observed with the use of the single-disease virtual panel, came from the use of Exomiser and additional applied panels. The diagnostic discoveries derived by combining research, decision support, and clinical validation and assessment leveraged an additional 72 diagnoses. Diagnostic yield was influenced by family structure, and for disorders likely to have mendelian inheritance and a single-gene etiologic factor, our yield increased to 35%: ophthalmologic, metabolic, and neurologic disorders yielded the greatest percentage of diagnoses. The scale of our data set enabled cohortwide burden testing, which identified numerous novel disease–gene associations, including three that have now been confirmed and 19 with compelling evidence that are likely to be confirmed in independent data sets. Of the diseases we diagnosed with the use of genome sequencing, 13% were caused by mutations in noncoding sequence or mitochondrial genomes, tandem repeat expansions in persons with Huntington’s disease, and a wide range of structural variants with nucleotide resolution of breakpoints (which were identified with the use of a new random forest method). An additional 2% of the diagnoses involved coding variants in regions of low coverage on exome sequencing. Our results provide new evidence of the value of genome sequencing and mirror the findings in a previous study in which 53% of the participants who received new diagnoses from genome sequencing had previously undergone exome sequencing.5

fulltextpubmed· DISCUSSION· item 34758253

diagnoses involved coding variants in regions of low coverage on exome sequencing. Our results provide new evidence of the value of genome sequencing and mirror the findings in a previous study in which 53% of the participants who received new diagnoses from genome sequencing had previously undergone exome sequencing.5 Previous studies have shown how next-generation sequencing can lead to diagnoses, with yields of 25 to 29% with the use of exome sequencing in persons who had received no previous genetic testing.32-34 The Undiagnosed Disease Network reported a diagnostic yield of 26% with the use of a mixture of whole-exome and whole-genome sequence analysis in 382 patients,5 and another study of genome sequencing showed a yield of 42% among 50 probands with intellectual disability who had previously undergone testing.35 Among probands with a broad range of disorders (161 in total) with an unmet diagnostic need, we obtained results that were similar to those in the previous studies. Our approach is limited to diagnoses that are readily made by means of short-read genome sequencing. Fully phased, long-read sequencing better detects structural variation and delivers sequence information from parts of the genome that are poorly captured by short-read sequencing.36

fulltextpubmed· DISCUSSION· item 34758253

ar to those in the previous studies. Our approach is limited to diagnoses that are readily made by means of short-read genome sequencing. Fully phased, long-read sequencing better detects structural variation and delivers sequence information from parts of the genome that are poorly captured by short-read sequencing.36 The findings from our pilot study support the case for genome sequencing in the diagnosis of certain specific rare diseases in the new NHS National Genomic Test Directory.37 In patients with specific disorders, such as intellectual disability, genome sequencing is now the first-line test in the NHS (Table S12). With a new National Genomic Medicine Service, the NHS in England is in the process of sequencing 500,000 whole genomes in rare disease and cancer in health care. We hope that our findings will assist other health systems in considering the role of genome sequencing in the care of patients with rare diseases.