Browse the corpus
Walk the Even Hospital Database by book and chapter — the raw source passages that ground Ask, DDx, and the rest.
76 passages
Preeclampsia risk prediction from prenatal cell-free DNA screening. Preeclampsia is characterized by placental dysfunction and results in significant morbidity, but reliable early prediction remains challenging. We investigated whether clinically obtained prenatal cell-free DNA (cfDNA) screening (PDNAS) using whole-genome sequencing (WGS) data can be leveraged to predict preeclampsia risk early in pregnancy (≤16 weeks). Using 1,854 routinely collected clinical PDNAS samples (median, 12.1 weeks) with low-coverage (0.5×) WGS data, we developed a framework to quantify maternal and fetal tissue signatures using nucleosome accessibility, revealing early placental and endothelial dysfunction. These signatures informed a prediction model for preeclampsia risk, which achieved a validation performance of 0.85 area under the receiver operating characteristic curve (AUC) (81% sensitivity at 80% specificity) for preterm phenotypes several months prior to disease onset in a separate cohort of 831 consecutively collected samples, and subsequently confirmed in an external cohort of 141 samples (AUC 0.84, 79% sensitivity). We demonstrate that assessment of cfDNA nucleosome accessibility from early-pregnancy cfDNA sequence data enables the detection of early placental and endothelial-tissue aberrations and may aid in the determination of preeclampsia risk.
Preeclampsia, a hypertensive disorder affecting 2–8% of pregnancies, contributes substantial morbidity and mortality to the maternal–fetal dyad, particularly when delivered preterm1–4. Prediction of preeclampsia before disease onset has remained challenging, with many approaches relying on binary risk classification on the basis of clinical factors alone or in combination with circulating biomarkers5,6. The performance of these methods, however, is suboptimal in many cases7,8. The placenta has a well-defined role in the developmental origins of preeclampsia9–11. Given that the placenta is inaccessible during pregnancy, non-invasive approaches are essential to identify molecular features indicative of preeclampsia risk, improve the understanding of the underlying biology, and develop strategies to mitigate the risks of preeclampsia.
he developmental origins of preeclampsia9–11. Given that the placenta is inaccessible during pregnancy, non-invasive approaches are essential to identify molecular features indicative of preeclampsia risk, improve the understanding of the underlying biology, and develop strategies to mitigate the risks of preeclampsia. Constitutive shedding of placenta-derived material, such as cell-free RNA (cfRNA) and cfDNA, begins early in pregnancy12–14. Recent studies investigated preeclampsia prediction during early pregnancy using circulating cfRNA, long-molecule cfDNA, and cfDNA methylation15–18; however, these methods require specialized assays, increasing costs and complicating clinical implementation. By contrast, sequencing of first-trimester maternal plasma cfDNA, already performed for prenatal screening of aneuploidy, is increasingly part of routine prenatal care19, albeit mostly in high-income countries20. A tool for early preeclampsia prediction that seamlessly integrates into routine practice could substantially reduce the wide-ranging adverse sequelae of preeclampsia.
fDNA, already performed for prenatal screening of aneuploidy, is increasingly part of routine prenatal care19, albeit mostly in high-income countries20. A tool for early preeclampsia prediction that seamlessly integrates into routine practice could substantially reduce the wide-ranging adverse sequelae of preeclampsia. cfDNA fragments are preferentially protected from degradation by nucleosomes, reflecting the chromatin organization of their tissue of origin21–26. Although clinical prenatal cfDNA screening (PDNAS) assays routinely determine the placental-derived fraction, or fetal fraction (FF)19, the assessment of tissue-specific nucleosome footprints in cfDNA for the early prediction of pregnancy-specific disorders, such as preeclampsia27, has not been explored. Using sequence data generated through PDNAS in singleton gestations, we developed a new computational framework called preeclampsia early assessment of risk from liquid biopsy (PEARL) to accurately quantify the maternal–fetal tissue composition in plasma cfDNA and to classify the risk of severe phenotypes of preeclampsia months before clinical onset.
Using low-coverage WGS data from 1,854 clinical plasma-cfDNA samples for PDNAS (mean, 0.52× coverage; interquartile range (IQR), 0.36–0.64) from the University of Washington (UW), we developed a framework to accurately quantify tissue contributions to maternal plasma and predict preeclampsia risk months before clinical onset (Fig. 1a and Extended Data Table 1). Preeclampsia was categorized into clinically relevant subtypes on the basis of gestational age at onset and delivery: early preeclampsia (EPE) (onset, ≤34 weeks, and delivery, <37 weeks); late preeclampsia (LPE) with preterm birth (LPE-PB) (onset, >34 weeks, and delivery, <37 weeks); and LPE with term birth (LPE-TB) (onset, >34 weeks, and delivery, ≥37 weeks) (Fig. 1b, Extended Data Fig. 1, Extended Data Table 1, Supplementary Table 1, and Methods). We applied the PEARL framework, which uses the Griffin tool22, to assess tissue-specific nucleosome profiles in cfDNA and quantify features of the nucleosome-depleted region (NDR, ±30 and ±75 bp) and the mean coverage around the site (MCV, ±1,000 bp) (Fig. 1c and Methods). These features were used in machine-learning models of the PEARL framework for (1) estimating the maternal–fetal tissue composition and FF and (2) predicting the risk of preeclampsia.Fig. 1Study design to predict maternal–fetal tissue composition and preeclampsia risk using cfDNA sequencing data.a, Flow chart of clinical PDNAS samples consisting of separate and distinct training and validation cohorts. A total of 1,854 PDNAS samples from the UW clinical laboratory were included. The training cohorts were the FF-training cohort, consisting of 375 PDNAS samples with XY fetuses and 20 from non-pregnant healthy females, and the PE-training cohort, including a random sampling of PDNAS samples collected at ≤16 weeks gestation from 1 January 2019 to 31 December 2020 with enrichment for additional preeclampsia (PE) cases from 1 January 2021 to 25 April 2023 (total n = 450). The validation cohorts were the PDNAS validation cohort (n = 831) and the multicenter external cohort (n = 141).
sampling of PDNAS samples collected at ≤16 weeks gestation from 1 January 2019 to 31 December 2020 with enrichment for additional preeclampsia (PE) cases from 1 January 2021 to 25 April 2023 (total n = 450). The validation cohorts were the PDNAS validation cohort (n = 831) and the multicenter external cohort (n = 141). The PDNAS validation cohort consists of clinical PDNAS samples collected consecutively at ≤16 weeks gestation from 1 May 2017 to 31 December 2018 that met the inclusion criteria (n = 831). The multicenter external cohort comprises samples collected at ≤16 weeks gestation from two biobanks that were sequenced at an independent clinical laboratory (n = 141). GA, gestational age (weeks). b, Schematic of preeclampsia outcome classification. The x axis indicates GA (weeks), and the y axis indicates outcome. Colors show the time from clinical onset of hypertension to delivery. The box (right) indicates the number of samples included in the training and validation cohorts. The NP group in the PDNAS validation cohort includes 44 samples from preterm births that were not complicated by PE. c, PEARL framework for estimating the tissue contribution in plasma cfDNA from PDNAS data using tissue-specific chromatin profiling from low-coverage WGS (~0.5×) to estimate FF and predict PE risk. Chromatin profiles are generated at tissue-specific accessible sites obtained from publicly available accessibility datasets. Chromatin features include nucleosome and sub-nucleosome profiles generated using fragments sizes restricted to 120–180 bp and 35–80 bp, respectively. To overcome low coverage, sequence reads were aggregated across 10,000 tissue-specific accessible sites. On the basis of the distance from accessible sites, features such as MCV (±1,000 bp) and NDR (±30 or 75 bp) were extracted from the profiles. These features reflect tissue contributions, which were then used to train supervised models to estimate the FF and predict PE risk. b and c were created using BioRender (https://BioRender.com/i89p411). ATAC-seq, Assay for Transposase-Accessible Chromatin using sequencing; DNase-seq, DNase I hypersensitive sites sequencing; ChIP–seq, chromatin immunoprecipitation and sequencing.
used to train supervised models to estimate the FF and predict PE risk. b and c were created using BioRender (https://BioRender.com/i89p411). ATAC-seq, Assay for Transposase-Accessible Chromatin using sequencing; DNase-seq, DNase I hypersensitive sites sequencing; ChIP–seq, chromatin immunoprecipitation and sequencing. a, Flow chart of clinical PDNAS samples consisting of separate and distinct training and validation cohorts. A total of 1,854 PDNAS samples from the UW clinical laboratory were included. The training cohorts were the FF-training cohort, consisting of 375 PDNAS samples with XY fetuses and 20 from non-pregnant healthy females, and the PE-training cohort, including a random sampling of PDNAS samples collected at ≤16 weeks gestation from 1 January 2019 to 31 December 2020 with enrichment for additional preeclampsia (PE) cases from 1 January 2021 to 25 April 2023 (total n = 450). The validation cohorts were the PDNAS validation cohort (n = 831) and the multicenter external cohort (n = 141). The PDNAS validation cohort consists of clinical PDNAS samples collected consecutively at ≤16 weeks gestation from 1 May 2017 to 31 December 2018 that met the inclusion criteria (n = 831). The multicenter external cohort comprises samples collected at ≤16 weeks gestation from two biobanks that were sequenced at an independent clinical laboratory (n = 141). GA, gestational age (weeks). b, Schematic of preeclampsia outcome classification. The x axis indicates GA (weeks), and the y axis indicates outcome. Colors show the time from clinical onset of hypertension to delivery. The box (right) indicates the number of samples included in the training and validation cohorts. The NP group in the PDNAS validation cohort includes 44 samples from preterm births that were not complicated by PE. c, PEARL framework for estimating the tissue contribution in plasma cfDNA from PDNAS data using tissue-specific chromatin profiling from low-coverage WGS (~0.5×) to estimate FF and predict PE risk. Chromatin profiles are generated at tissue-specific accessible sites obtained from publicly available accessibility datasets. Chromatin features include nucleosome and sub-nucleosome profiles generated using fragments sizes restricted to 120–180 bp and 35–80 bp, respectively. To overcome low coverage, sequence reads were aggregated across 10,000 tissue-specific accessible sites.
e sites obtained from publicly available accessibility datasets. Chromatin features include nucleosome and sub-nucleosome profiles generated using fragments sizes restricted to 120–180 bp and 35–80 bp, respectively. To overcome low coverage, sequence reads were aggregated across 10,000 tissue-specific accessible sites. On the basis of the distance from accessible sites, features such as MCV (±1,000 bp) and NDR (±30 or 75 bp) were extracted from the profiles. These features reflect tissue contributions, which were then used to train supervised models to estimate the FF and predict PE risk. b and c were created using BioRender (https://BioRender.com/i89p411). ATAC-seq, Assay for Transposase-Accessible Chromatin using sequencing; DNase-seq, DNase I hypersensitive sites sequencing; ChIP–seq, chromatin immunoprecipitation and sequencing.
used to train supervised models to estimate the FF and predict PE risk. b and c were created using BioRender (https://BioRender.com/i89p411). ATAC-seq, Assay for Transposase-Accessible Chromatin using sequencing; DNase-seq, DNase I hypersensitive sites sequencing; ChIP–seq, chromatin immunoprecipitation and sequencing. To study early tissue dysfunction in individuals who eventually developed preeclampsia, we used PEARL to analyze PDNAS samples, demonstrating its ability to discern cfDNA derived from distinct tissues, including placenta28 (Fig. 2a, Extended Data Fig. 2, and. Methods). We confirmed that the contribution of placental tissue in PDNAS samples was detected in samples from pregnant people and increased in each trimester (Pearson’s r = –0.42, P = 2.28 × 10−18), whereas the contribution of immune and endothelial cells decreased by trimester (P < 0.020; Fig. 2b and Extended Data Figs. 2–6). Next, we investigated whether early-pregnancy (≤16 weeks) PDNAS data could reveal tissue-contribution differences that are unique to individuals who developed preeclampsia. We used PDNAS samples from 450 individuals (PE-training cohort), collected at a median of 12.2 weeks (range, 8.1–16.0 weeks); 135 of these individuals eventually developed preeclampsia, and 315 had normal pregnancies (Fig. 1a,b, Extended Data Table 1, and Methods). We observed that the placental-tissue contribution was significantly lower in those who developed preeclampsia than in those with normal pregnancies (Mann–Whitney U, P < 0.037; analysis of covariance (ANCOVA), P ≤ 0.03; Fig. 2c, Extended Data Fig. 7, and Methods). By contrast, we found that the contribution of endothelial tissue was higher in those with preeclampsia than in those with normal pregnancies (Mann–Whitney U, P ≤ 0.039; ANCOVA, P ≤ 0.012; Fig. 2d and Extended Data Fig. 7). These results identify the presence of early placental dysfunction in individuals who develop preeclampsia, and may also reflect underlying maternal endothelial dysfunction29–31.Fig. 2Nucleosome patterns in cfDNA discern maternal–fetal tissue composition and early tissue dysfunction in those who develop preeclampsia.a, Nucleosome profile (left) and sub-nucleosome profile (right) of tissue-specific accessible sites derived from single-cell ATAC-seq (scATAC-seq) in 395 PDNAS samples (FF-training cohort). Immune (pink) and placenta (blue) have lower and higher NDR values in nucleosome and sub-nucleosome profiles, respectively, indicating higher tissue contribution.
sub-nucleosome profile (right) of tissue-specific accessible sites derived from single-cell ATAC-seq (scATAC-seq) in 395 PDNAS samples (FF-training cohort). Immune (pink) and placenta (blue) have lower and higher NDR values in nucleosome and sub-nucleosome profiles, respectively, indicating higher tissue contribution. b, Heatmap of normalized coverage values at NDRs for tissue-specific nucleosome (left) and sub-nucleosome (right) features derived from scATAC-seq for 20 non-pregnancy and 375 pregnancy PDNAS samples, stratified by trimesters (FF-training cohort). c, Placental nucleosome profile (left) and NDR and MCV (right) for samples from the PE-training cohort (n = 450). Higher NDR and MCV values in samples with a PE outcome indicate decreased placental contribution. d, Endothelial nucleosome profile (left) and NDR and MCV (right) for samples from the PE-training cohort (n = 450). Lower NDR and MCV values in samples with a PE outcome indicate increased endothelial contribution. Boxes represent the IQR (25th–75th percentiles), with the horizontal line indicating the median. The whiskers extend to 1.5 × IQR on either side, with dots representing outliers. Statistical analysis was done using the two-sided Mann–Whitney U test.Source data
h a PE outcome indicate increased endothelial contribution. Boxes represent the IQR (25th–75th percentiles), with the horizontal line indicating the median. The whiskers extend to 1.5 × IQR on either side, with dots representing outliers. Statistical analysis was done using the two-sided Mann–Whitney U test.Source data a, Nucleosome profile (left) and sub-nucleosome profile (right) of tissue-specific accessible sites derived from single-cell ATAC-seq (scATAC-seq) in 395 PDNAS samples (FF-training cohort). Immune (pink) and placenta (blue) have lower and higher NDR values in nucleosome and sub-nucleosome profiles, respectively, indicating higher tissue contribution. b, Heatmap of normalized coverage values at NDRs for tissue-specific nucleosome (left) and sub-nucleosome (right) features derived from scATAC-seq for 20 non-pregnancy and 375 pregnancy PDNAS samples, stratified by trimesters (FF-training cohort). c, Placental nucleosome profile (left) and NDR and MCV (right) for samples from the PE-training cohort (n = 450). Higher NDR and MCV values in samples with a PE outcome indicate decreased placental contribution. d, Endothelial nucleosome profile (left) and NDR and MCV (right) for samples from the PE-training cohort (n = 450). Lower NDR and MCV values in samples with a PE outcome indicate increased endothelial contribution. Boxes represent the IQR (25th–75th percentiles), with the horizontal line indicating the median. The whiskers extend to 1.5 × IQR on either side, with dots representing outliers. Statistical analysis was done using the two-sided Mann–Whitney U test.
samples with a PE outcome indicate increased endothelial contribution. Boxes represent the IQR (25th–75th percentiles), with the horizontal line indicating the median. The whiskers extend to 1.5 × IQR on either side, with dots representing outliers. Statistical analysis was done using the two-sided Mann–Whitney U test. Source data
samples with a PE outcome indicate increased endothelial contribution. Boxes represent the IQR (25th–75th percentiles), with the horizontal line indicating the median. The whiskers extend to 1.5 × IQR on either side, with dots representing outliers. Statistical analysis was done using the two-sided Mann–Whitney U test. Source data Differences in maternal–fetal tissue composition in plasma cfDNA are not well understood in the context of preeclampsia. A critical early step in PDNAS is accurate quantification of the FF, which can be influenced by fetal sex, genotype, and fragment length32–34. We sought to overcome these potential biases by relying on placental-specific signatures to determine the FF. First, we used the PEARL framework to develop a new model to estimate the FF, independent of sex, genotype, or differences in fragment length32,35,36, by integrating 1,239 features across placental and maternal tissue types, as well as 377 transcription factors (Methods). The model was developed using 395 PDNAS samples from 375 pregnant women with male (XY) fetuses, whose FF had been determined by Y-chromosome analysis, and 20 non-pregnant women (FF-training cohort; Fig. 1a and Methods). The training performance showed a strong positive correlation (Pearson’s r = 0.90), with a limit of detection of 0.039 (range, 0.039–0.042) (Fig. 3a,and Methods). We validated the model’s performance on a publicly available dataset (n = 70) in which the FF was determined by maternal–fetal genotyping for fetuses of both sexes37 (Pearson’s r = 0.97; Fig. 3b, Extended Data Fig. 8, and Supplementary Table 2), and on 238 XY fetuses in the PE-training cohort (Pearson’s r = 0.79; Fig. 3c).Fig. 3Nucleosome patterns from PDNAS estimate FF and predict PE risk.a, PEARL-estimated FF in the FF-training cohort, which includes 375 PDNAS samples from XY fetuses (black dots) and 20 samples from non-pregnant females (blue dots). Performance was estimated using chromosome-Y-derived FF as the true FF in a bootstrapping with replacement framework with 1,000 iterations (Pearson’s r = 0.90 (CI, 0.901–0.904); root mean square error (RMSE), 0.034 (CI, 0.033–0.033)). Using the maximum predicted FF in non-pregnancy samples, we estimated the limit of detection (LOD) at 0.039 (CI, 0.039–0.042), indicated by the horizontal dashed gray line. b, The final trained model was validated in the FF-validation cohort, in which the true FF had been determined using maternal–fetal genotyping.
)). Using the maximum predicted FF in non-pregnancy samples, we estimated the limit of detection (LOD) at 0.039 (CI, 0.039–0.042), indicated by the horizontal dashed gray line. b, The final trained model was validated in the FF-validation cohort, in which the true FF had been determined using maternal–fetal genotyping. This cohort contains 39 samples from fetuses classified as female (pink dots), 21 as male (black dots), and 10 with unknown fetal sex (gray dots). c, PEARL-estimated FF on 238 XY samples from the PE-training cohort. d, PEARL-estimated FF on 462 XY samples from the PDNAS validation cohort. e, Median PEARL-estimated FF in the PE-training (n = 450), PDNAS validation (n = 831), and external (n = 141) cohorts. Samples from preterm deliveries with PE (purple, EPE; red, LPE-PB) have a significantly lower PEARL-estimated FF compared to NP samples (green). f, PEARL-estimated FF in EPE compared to NP cases, stratified by trimester, from consecutively collected PDNAS samples at all available GAs in the PDNAS validation cohort. The PEARL-estimated FF remains lower across trimesters in cases which eventually developed EPE (purple). Boxes represent the IQR (25th–75th percentiles), with the horizontal line indicating the median. The whiskers extend to 1.5 × IQR on either side, with dots representing outliers. Statistical analysis was done using the two-sided Mann–Whitney U test. g–i, Receiver operating characteristic curves for predicting risk for developing PE with preterm delivery, using samples collected at ≤16 weeks GA (combined EPE and LPE-PB, gray dashed line) from the PE-training cohort (n = 450) (g), PDNAS validation cohort (n = 831) (h), and external cohort (n = 141) (i). Samples are further stratified as EPE (purple) and LPE-PB (red). Sensitivity is listed (bottom) for three set specificities. Performance metrics calculated for sensitivity and AUC (95% CI) were generated using a bootstrapping with replacement framework with 1000 iterations. The Mann–Whitney U test was used for pairwise comparisons.Source data
tified as EPE (purple) and LPE-PB (red). Sensitivity is listed (bottom) for three set specificities. Performance metrics calculated for sensitivity and AUC (95% CI) were generated using a bootstrapping with replacement framework with 1000 iterations. The Mann–Whitney U test was used for pairwise comparisons.Source data a, PEARL-estimated FF in the FF-training cohort, which includes 375 PDNAS samples from XY fetuses (black dots) and 20 samples from non-pregnant females (blue dots). Performance was estimated using chromosome-Y-derived FF as the true FF in a bootstrapping with replacement framework with 1,000 iterations (Pearson’s r = 0.90 (CI, 0.901–0.904); root mean square error (RMSE), 0.034 (CI, 0.033–0.033)). Using the maximum predicted FF in non-pregnancy samples, we estimated the limit of detection (LOD) at 0.039 (CI, 0.039–0.042), indicated by the horizontal dashed gray line. b, The final trained model was validated in the FF-validation cohort, in which the true FF had been determined using maternal–fetal genotyping. This cohort contains 39 samples from fetuses classified as female (pink dots), 21 as male (black dots), and 10 with unknown fetal sex (gray dots). c, PEARL-estimated FF on 238 XY samples from the PE-training cohort. d, PEARL-estimated FF on 462 XY samples from the PDNAS validation cohort. e, Median PEARL-estimated FF in the PE-training (n = 450), PDNAS validation (n = 831), and external (n = 141) cohorts. Samples from preterm deliveries with PE (purple, EPE; red, LPE-PB) have a significantly lower PEARL-estimated FF compared to NP samples (green). f, PEARL-estimated FF in EPE compared to NP cases, stratified by trimester, from consecutively collected PDNAS samples at all available GAs in the PDNAS validation cohort. The PEARL-estimated FF remains lower across trimesters in cases which eventually developed EPE (purple). Boxes represent the IQR (25th–75th percentiles), with the horizontal line indicating the median. The whiskers extend to 1.5 × IQR on either side, with dots representing outliers. Statistical analysis was done using the two-sided Mann–Whitney U test. g–i, Receiver operating characteristic curves for predicting risk for developing PE with preterm delivery, using samples collected at ≤16 weeks GA (combined EPE and LPE-PB, gray dashed line) from the PE-training cohort (n = 450) (g), PDNAS validation cohort (n = 831) (h), and external cohort (n = 141) (i). Samples are further stratified as EPE (purple) and LPE-PB (red).
edicting risk for developing PE with preterm delivery, using samples collected at ≤16 weeks GA (combined EPE and LPE-PB, gray dashed line) from the PE-training cohort (n = 450) (g), PDNAS validation cohort (n = 831) (h), and external cohort (n = 141) (i). Samples are further stratified as EPE (purple) and LPE-PB (red). Sensitivity is listed (bottom) for three set specificities. Performance metrics calculated for sensitivity and AUC (95% CI) were generated using a bootstrapping with replacement framework with 1000 iterations. The Mann–Whitney U test was used for pairwise comparisons. Source data
edicting risk for developing PE with preterm delivery, using samples collected at ≤16 weeks GA (combined EPE and LPE-PB, gray dashed line) from the PE-training cohort (n = 450) (g), PDNAS validation cohort (n = 831) (h), and external cohort (n = 141) (i). Samples are further stratified as EPE (purple) and LPE-PB (red). Sensitivity is listed (bottom) for three set specificities. Performance metrics calculated for sensitivity and AUC (95% CI) were generated using a bootstrapping with replacement framework with 1000 iterations. The Mann–Whitney U test was used for pairwise comparisons. Source data We next assembled a cohort of 956 routine PDNAS samples, collected consecutively from May 2017 to December 2018 (PDNAS validation cohort), approximating an all-comers population including individuals with pre-existing hypertension, diabetes, preterm birth, and other maternal medical conditions (Fig. 1a and Methods). First, we confirmed the predicted FF in samples from XY fetuses in this cohort (n = 462, Pearson’s r = 0.85) (Fig. 3d) by applying the final model previously trained on the FF-training cohort. We also observed that, in all three cohorts, the estimated FF was significantly lower for individuals who developed EPE than for those with normal pregnancies (Mann–Whitney U, P ≤ 0.044; Fig. 3e and Extended Data Fig. 7). Furthermore, we observed that the estimated FF significantly increased across gestation in normal pregnancies; however, it remained lower throughout pregnancy in those with EPE (Mann–Whitney U; P = 0.0088 versus 0.91; Fig. 3f), indicative of persistent detectable tissue dysfunction throughout pregnancy.
d Data Fig. 7). Furthermore, we observed that the estimated FF significantly increased across gestation in normal pregnancies; however, it remained lower throughout pregnancy in those with EPE (Mann–Whitney U; P = 0.0088 versus 0.91; Fig. 3f), indicative of persistent detectable tissue dysfunction throughout pregnancy. We next investigated whether cfDNA nucleosome features can expand the utility of early-pregnancy PDNAS (≤16 weeks) to predict preeclampsia risk. Using the PEARL framework, we used 450 PDNAS samples (PE-training cohort) to train a machine-learning binary classifier to predict preeclampsia risk (Fig. 1a,b). The model incorporates two feature sets: (1) placental- and endothelial-tissue-specific nucleosome and subnucleosome features (NDR and MCV) and the PEARL-estimated FF; and (2) blood pressure and body mass index (BMI), both of which are easily ascertained around the time of PDNAS (Methods). The model achieved an overall training performance of 0.76 AUC (confidence interval (CI), 0.70–0.82) for predicting preeclampsia requiring preterm delivery (Fig. 3g, Extended Data Fig. 9, Supplementary Table 3, and Methods). The performance for the specific preterm subtypes were 0.80 AUC (CI, 0.72–0.87; 71% sensitivity at 80% specificity) for EPE and 0.69 AUC (CI, 0.61–0.77; 50% sensitivity) for LPE-PB.
I), 0.70–0.82) for predicting preeclampsia requiring preterm delivery (Fig. 3g, Extended Data Fig. 9, Supplementary Table 3, and Methods). The performance for the specific preterm subtypes were 0.80 AUC (CI, 0.72–0.87; 71% sensitivity at 80% specificity) for EPE and 0.69 AUC (CI, 0.61–0.77; 50% sensitivity) for LPE-PB. We validated the final model for predicting preeclampsia risk in a cohort of 831 routine PDNAS samples collected at a median of 12.2 weeks of gestation (PDNAS validation cohort; Fig. 1a,b). The overall performance for predicting preeclampsia requiring preterm delivery was 0.85 AUC (CI, 0.80–0.90; 81% sensitivity at 80% specificity), with AUCs of 0.85 (CI, 0.77–0.94; 82% sensitivity) and 0.84 (CI, 0.78–0.91; 80% sensitivity) for EPE and LPE-PB, respectively (Fig. 3h, Extended Data Fig. 9, and Supplementary Table 4). We applied the final model to an external cohort of 141 samples (median gestation, 12.1 weeks) collected across multiple biobanks and sequenced at an external clinical laboratory (external validation cohort; Fig. 1a,b, Supplementary Table 5, and Methods). The overall performance was 0.84 AUC (CI, 0.79–0.89; 79% sensitivity at 80% specificity) for predicting preeclampsia requiring preterm delivery, with 0.92 AUC (CI, 0.88–0.96; 94% sensitivity) for EPE and 0.79 (CI, 0.72–0.85; 68% sensitivity) for LPE-PB (Fig. 3I, Extended Data Fig. 9, and Supplementary Table 6). These results demonstrate that using the PEARL framework for cfDNA nucleosome accessibility analysis of early-pregnancy PDNAS data can stratify those at high risk of preeclampsia requiring preterm delivery, months before disease onset.
To study early tissue dysfunction in individuals who eventually developed preeclampsia, we used PEARL to analyze PDNAS samples, demonstrating its ability to discern cfDNA derived from distinct tissues, including placenta28 (Fig. 2a, Extended Data Fig. 2, and. Methods). We confirmed that the contribution of placental tissue in PDNAS samples was detected in samples from pregnant people and increased in each trimester (Pearson’s r = –0.42, P = 2.28 × 10−18), whereas the contribution of immune and endothelial cells decreased by trimester (P < 0.020; Fig. 2b and Extended Data Figs. 2–6). Next, we investigated whether early-pregnancy (≤16 weeks) PDNAS data could reveal tissue-contribution differences that are unique to individuals who developed preeclampsia. We used PDNAS samples from 450 individuals (PE-training cohort), collected at a median of 12.2 weeks (range, 8.1–16.0 weeks); 135 of these individuals eventually developed preeclampsia, and 315 had normal pregnancies (Fig. 1a,b, Extended Data Table 1, and Methods). We observed that the placental-tissue contribution was significantly lower in those who developed preeclampsia than in those with normal pregnancies (Mann–Whitney U, P < 0.037; analysis of covariance (ANCOVA), P ≤ 0.03; Fig. 2c, Extended Data Fig. 7, and Methods). By contrast, we found that the contribution of endothelial tissue was higher in those with preeclampsia than in those with normal pregnancies (Mann–Whitney U, P ≤ 0.039; ANCOVA, P ≤ 0.012; Fig. 2d and Extended Data Fig. 7). These results identify the presence of early placental dysfunction in individuals who develop preeclampsia, and may also reflect underlying maternal endothelial dysfunction29–31.Fig. 2Nucleosome patterns in cfDNA discern maternal–fetal tissue composition and early tissue dysfunction in those who develop preeclampsia.a, Nucleosome profile (left) and sub-nucleosome profile (right) of tissue-specific accessible sites derived from single-cell ATAC-seq (scATAC-seq) in 395 PDNAS samples (FF-training cohort). Immune (pink) and placenta (blue) have lower and higher NDR values in nucleosome and sub-nucleosome profiles, respectively, indicating higher tissue contribution.
Differences in maternal–fetal tissue composition in plasma cfDNA are not well understood in the context of preeclampsia. A critical early step in PDNAS is accurate quantification of the FF, which can be influenced by fetal sex, genotype, and fragment length32–34. We sought to overcome these potential biases by relying on placental-specific signatures to determine the FF. First, we used the PEARL framework to develop a new model to estimate the FF, independent of sex, genotype, or differences in fragment length32,35,36, by integrating 1,239 features across placental and maternal tissue types, as well as 377 transcription factors (Methods). The model was developed using 395 PDNAS samples from 375 pregnant women with male (XY) fetuses, whose FF had been determined by Y-chromosome analysis, and 20 non-pregnant women (FF-training cohort; Fig. 1a and Methods). The training performance showed a strong positive correlation (Pearson’s r = 0.90), with a limit of detection of 0.039 (range, 0.039–0.042) (Fig. 3a,and Methods). We validated the model’s performance on a publicly available dataset (n = 70) in which the FF was determined by maternal–fetal genotyping for fetuses of both sexes37 (Pearson’s r = 0.97; Fig. 3b, Extended Data Fig. 8, and Supplementary Table 2), and on 238 XY fetuses in the PE-training cohort (Pearson’s r = 0.79; Fig. 3c).Fig. 3Nucleosome patterns from PDNAS estimate FF and predict PE risk.a, PEARL-estimated FF in the FF-training cohort, which includes 375 PDNAS samples from XY fetuses (black dots) and 20 samples from non-pregnant females (blue dots). Performance was estimated using chromosome-Y-derived FF as the true FF in a bootstrapping with replacement framework with 1,000 iterations (Pearson’s r = 0.90 (CI, 0.901–0.904); root mean square error (RMSE), 0.034 (CI, 0.033–0.033)). Using the maximum predicted FF in non-pregnancy samples, we estimated the limit of detection (LOD) at 0.039 (CI, 0.039–0.042), indicated by the horizontal dashed gray line. b, The final trained model was validated in the FF-validation cohort, in which the true FF had been determined using maternal–fetal genotyping.
We next investigated whether cfDNA nucleosome features can expand the utility of early-pregnancy PDNAS (≤16 weeks) to predict preeclampsia risk. Using the PEARL framework, we used 450 PDNAS samples (PE-training cohort) to train a machine-learning binary classifier to predict preeclampsia risk (Fig. 1a,b). The model incorporates two feature sets: (1) placental- and endothelial-tissue-specific nucleosome and subnucleosome features (NDR and MCV) and the PEARL-estimated FF; and (2) blood pressure and body mass index (BMI), both of which are easily ascertained around the time of PDNAS (Methods). The model achieved an overall training performance of 0.76 AUC (confidence interval (CI), 0.70–0.82) for predicting preeclampsia requiring preterm delivery (Fig. 3g, Extended Data Fig. 9, Supplementary Table 3, and Methods). The performance for the specific preterm subtypes were 0.80 AUC (CI, 0.72–0.87; 71% sensitivity at 80% specificity) for EPE and 0.69 AUC (CI, 0.61–0.77; 50% sensitivity) for LPE-PB.
Here we provide evidence that nucleosome profiling can be applied to clinically obtained low-coverage cfDNA sequence data to identify distinctive features among those who develop preeclampsia. We detected evidence of endothelial-tissue damage (placental and/or maternal) in early pregnancy, a major feature of preeclampsia pathogenesis30,31,38, and decreased contributions from placental tissue. These findings informed PEARL for predicting preeclampsia risks months before the onset of clinical symptoms. Currently available approaches, such as the Fetal Medicine Foundation first-trimester algorithm, outperform clinical risk factors alone (AUC 0.5 to 0.9)5,7. This algorithm incorporates several maternal features, laboratory tests, and ultrasound measures (for example, racial categorization, levels of placental growth factor, and uterine artery pulsatility index)5,7, but can be challenging to implement and is not translatable to all populations7,39. Our approach, however, reliably predicts preeclampsia risk using cfDNA nucleosome profiling of placenta and endothelial tissue, with the addition of only two simple clinical parameters: blood pressure and BMI.
ery pulsatility index)5,7, but can be challenging to implement and is not translatable to all populations7,39. Our approach, however, reliably predicts preeclampsia risk using cfDNA nucleosome profiling of placenta and endothelial tissue, with the addition of only two simple clinical parameters: blood pressure and BMI. Screening for preeclampsia is most useful if effective interventions exist. Although the benefits of aspirin prophylaxis generally outweigh any risks40–43, improved precision in risk stratification can eliminate unnecessary exposure. Seminal clinical trials indicate that stricter blood-pressure control can safely and effectively decrease the incidence of severe hypertension, preeclampsia, and hypertension-associated preterm birth44–46. False positive predictions through a potential screening approach in this clinical setting generally carry low risks as these interventions have established safely. The favorable performance of our prediction model could facilitate stratification of risk during early pregnancy to test these interventions.
m birth44–46. False positive predictions through a potential screening approach in this clinical setting generally carry low risks as these interventions have established safely. The favorable performance of our prediction model could facilitate stratification of risk during early pregnancy to test these interventions. Although recent methods that utilize plasma-derived cfRNA and cfDNA can also facilitate preeclampsia prediction15–17, these approaches require specialized sequencing assays that are challenging to implement, even in high-income settings. Given the success of aneuploidy screening in prenatal medicine, our method is advantageous because it uses an existing clinical assay without requiring additional sampling or sequencing. Because this approach still requires transporting samples to sequencing facilites, which are typically available only in high-income settings20, further innovations are needed to facilitate implementation in low- and middle income countries.
n existing clinical assay without requiring additional sampling or sequencing. Because this approach still requires transporting samples to sequencing facilites, which are typically available only in high-income settings20, further innovations are needed to facilitate implementation in low- and middle income countries. Prior studies using PDNAS cfDNA sequence data either did not investigate adverse pregnancy outcomes or were not generalized to an unselected population47,48. By contrast, we focused on preeclampsia risk prediction, demonstrating favorable performance in EPE and LPE-PB cases, with modest performance in LPE-TB cases. Furthermore, we analyzed consecutively obtained PDNAS samples with broad inclusion criteria, approximating true preeclampsia prevalence and patient variability. Importantly, the PDNAS validation cohort did not exclude patients with pre-existing hypertension, diabetes, or other maternal medical conditions and included pregnancies ending in preterm birth that were not complicated by preeclampsia, closely approximating a real-world cohort and strengthening the validity of our findings of cfDNA tissue signatures specific to preeclampsia. This broad, inclusive approach, coupled with our favorable results, sets the stage for larger studies to validate the clinical utility of this method in a prospective manner in which pregnancy outcomes are unknown.
ohort and strengthening the validity of our findings of cfDNA tissue signatures specific to preeclampsia. This broad, inclusive approach, coupled with our favorable results, sets the stage for larger studies to validate the clinical utility of this method in a prospective manner in which pregnancy outcomes are unknown. We demonstrate that nucleosome profiling of cfDNA sequence data derived from PDNAS can accurately estimate FF, overcoming some current limitations, and can be expanded to include risk stratification for the development of preeclampsia requiring preterm delivery, a current unmet need. Importantly, our approach can be applied to low-coverage standard sequencing data, highlighting the potential for scalability, cost-effectiveness, and lower barriers for clinical implementation.
Institutional Review Board approval was obtained from UW to use PDNAS sequence data housed at our institution (STUDY0001636, STUDY00005540, STUDY00014943) and samples obtained from biorepositories for the external validation cohort (STUDY00004878, STUDY00017462). Informed consent was obtained as required by UW Institutional Review Board policies, or per the policies at the external biorepository sites. PDNAS samples for our study were primarily derived from individuals seeking prenatal care and/or genetic screening through our institution, which is a tertiary referral center covering a five-state region. The external validation cohort sequenced elsewhere included samples with patient demographics different from those of our cohort. We recognize the lack of geographic representation of low- and middle-income countries (LMICs) in this proof-of-concept study. We acknowledge that LMIC representation in large prospective studies is crucial to determine whether our findings are widely applicable and can benefit all pregnant people.
those of our cohort. We recognize the lack of geographic representation of low- and middle-income countries (LMICs) in this proof-of-concept study. We acknowledge that LMIC representation in large prospective studies is crucial to determine whether our findings are widely applicable and can benefit all pregnant people. We designated a pregnancy as normal if it was a singleton gestation; continued past 24 weeks; was not complicated by preeclampsia (including postpartum preeclampsia), gestational hypertension, maternal transplant, or maternal malignancy; and did not have aneuploidy or a major fetal anomaly. In the PE-training cohort (n = 450), we included only those with pregnancies ending in a term delivery (≥37 weeks) in the normal group. In the PDNAS validation cohort (n = 831), however, we included cases with preterm delivery (<37 weeks) in the normal group. To assemble cohorts more representative of a real-world clinical population, we did not exclude individuals with pre-existing hypertension, diabetes, or other maternal medical conditions (for example, renal disease or systemic lupus erythematosus) from any of the three cohorts (PE-training, PDNAS validation, or external).
assemble cohorts more representative of a real-world clinical population, we did not exclude individuals with pre-existing hypertension, diabetes, or other maternal medical conditions (for example, renal disease or systemic lupus erythematosus) from any of the three cohorts (PE-training, PDNAS validation, or external). Preeclampsia diagnosis was based on the American College of Obstetricians and Gynecologists criteria in pregnancies continuing past 20 weeks gestation1. We excluded non-singleton gestations, cases of chronic hypertension without superimposed preeclampsia (retained in the normal cohort), postpartum preeclampsia, gestational hypertension, maternal transplant, maternal malignancy, or if the pregnancy was complicated by aneuploidy or major fetal anomaly. Preeclampsia was designated as EPE if diagnosis occurred at or before 34 weeks gestation and delivery was preterm (<37 weeks); or as LPE if confirmed after 34 weeks gestation. For clinical significance, LPE was further classified as LPE-PB or LPE-TB, using a gestational age at delivery of <37 weeks to define preterm birth. Medical records were reviewed by trained maternal–fetal medicine providers (T.K. and R.S.) to confirm the preeclampsia diagnosis. For preeclampsia prediction, we primarily focused our results on EPE and LPE-PB, because these cases required preterm deliveries, which are associated with a high risk of neonatal morbidity and mortality49. All samples are from individuals of female sex based on the designation in the electronic health record. No information on gender identity was collected.
y focused our results on EPE and LPE-PB, because these cases required preterm deliveries, which are associated with a high risk of neonatal morbidity and mortality49. All samples are from individuals of female sex based on the designation in the electronic health record. No information on gender identity was collected. Since May 2017, UW has clinically used an internally developed PDNAS assay that employs low-pass WGS of cfDNA extracted from maternal plasma to provide a high-quality screen for autosomal and sex-chromosome aneuploidies. A total of 1,854 clinical PDNAS samples were included in this study; samples were divided as outlined in Fig. 1. The FF-training cohort consists of 375 PDNAS samples with XY fetuses and 20 from healthy non-pregnant females sequenced using the same clinical assay (total n = 395). The FF-training cohort was used to train the PEARL-estimated FF model. The PE-training cohort was assembled by randomly selecting approximately 10% of samples that met inclusion criteria and were clinically obtained from 1 January 2019 to 31 December 2020, with enrichment of preeclampsia cases from samples clinically obtained from 1 January 2021 to 25 April 2023. From here, we excluded 54 samples that were collected after 16 weeks gestation (total n = 450). The PDNAS validation cohort initially consisted of 956 samples and resembles an all-comers population because it includes all consecutively obtained clinical samples from 1 May 2017 to 31 December 2018 that met the broad inclusion criteria as detailed above. Notably, the normal group in this cohort included all term and preterm deliveries not complicated by preeclampsia. After excluding 125 samples that were collected after 16 weeks gestation, the final PDNAS validation cohort used for preeclampsia risk prediction consisted of 831 samples. Samples used for training (FF-training cohort for FF estimation and PE-training cohort for preeclampsia risk prediction) were distinct and separate from those used for validation of each of these models.
6 weeks gestation, the final PDNAS validation cohort used for preeclampsia risk prediction consisted of 831 samples. Samples used for training (FF-training cohort for FF estimation and PE-training cohort for preeclampsia risk prediction) were distinct and separate from those used for validation of each of these models. For external validation of the PEARL-estimated FF model, we used published data with single-nucleotide polymorphism (SNP)-based quantification of FF from pregnancies with male and female fetuses (n = 70)37. These samples were sequenced to a median depth of 4×, enabling down-sampling analysis to assess the effect of sequence depth on FF determination. Down-sampling was performed at depths of 0.1×, 0.5×, 1×, and 1.5×. Sequencing read duplicates were first marked on the aligned bam file using Picard Tools (v2.18.29, http://broadinstitute.github.io/picard) MarkDuplicates and down-sampling was performed using DownsampleSam. For external validation of the preeclampsia risk prediction model, we assembled 146 samples (stored plasma or extracted cfDNA) from two biobanks (the Magee Obstetrical Maternal Infant Database and Biobank and the UW Pregnancy Research Biobank) enriched for preeclampsia cases (n = 16 EPE, n = 23 LPE-PB, n = 23 LPE-TB, and n = 79 normal controls). These samples were sequenced at an external clinical laboratory (the Broad Institute), which is accredited by the Clinical Laboratory Improvement Amendments and The College of American Pathologists (CLIA/CAP), at a low-pass sequence depth (mean coverage, 2.3×; IQR, 1.19–3.11).
= 23 LPE-TB, and n = 79 normal controls). These samples were sequenced at an external clinical laboratory (the Broad Institute), which is accredited by the Clinical Laboratory Improvement Amendments and The College of American Pathologists (CLIA/CAP), at a low-pass sequence depth (mean coverage, 2.3×; IQR, 1.19–3.11). For PDNAS samples sequenced through the UW clinical laboratory, libraries were generated using a validated, laboratory-developed method that has been previously reported50,51. In brief, whole blood from Streck (BCT1) tubes was centrifuged, and plasma was isolated following the instructions on the package insert. cfDNA was extracted using the QIAsymphony Circulating DNA Kit (Qiagen). Following DNA quantification, next-generation sequencing library preparation was performed using the KAPA HyperPrep kit for adapter and index ligation. Libraries were purified before and after amplification using the AMPureXP kit (Beckman Coulter). Sample pools were created using an equimolar strategy and diluted to 1 nM. Sequencing was performed using an Illumina NextSeq 500 with a 37-bp paired-end read configuration.
APA HyperPrep kit for adapter and index ligation. Libraries were purified before and after amplification using the AMPureXP kit (Beckman Coulter). Sample pools were created using an equimolar strategy and diluted to 1 nM. Sequencing was performed using an Illumina NextSeq 500 with a 37-bp paired-end read configuration. All sequence data (PDNAS samples and external validation samples) were aligned to the reference genome hg38 (GRCh38, https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.26/). Raw fastq files were adapter-trimmed using the cutadapt tool (v3.3)52. The paired reads were then aligned using BWA-MEM algorithm (v0.7.17) and were further processed using SAMtools (v1.10) to sort and index the resulting bam file53,54. Picard tool’s (v2.18.29, http://broadinstitute.github.io/picard) MarkDuplicates function was used to mark duplicates. Picard tool’s CollectAlignmentSummaryMetrics, CollectWgsMetrics and CollectInsertSizeMetrics were used to calculate quality-control (QC) sequencing metrics.
o sort and index the resulting bam file53,54. Picard tool’s (v2.18.29, http://broadinstitute.github.io/picard) MarkDuplicates function was used to mark duplicates. Picard tool’s CollectAlignmentSummaryMetrics, CollectWgsMetrics and CollectInsertSizeMetrics were used to calculate quality-control (QC) sequencing metrics. The UW clinical lab uses the following procedures for determining the FF for PDNAS samples from pregnancies with male fetuses. The chromosome-Y-based FF was calculated on the basis of the percentage of reads that map to chromosome Y (chrY) using the formula:\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm{ChrY}} \% =\frac{\varSigma ({\rm{Reads}}\; {\rm{mapped}}\; {\rm{to}}\; {\rm{ChromosomeY}})}{\varSigma ({\rm{All}}\; {\rm{reads}})}$$\end{document}ChrY%=Σ(ReadsmappedtoChromosomeY)Σ(Allreads)
{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm{ChrY}} \% =\frac{\varSigma ({\rm{Reads}}\; {\rm{mapped}}\; {\rm{to}}\; {\rm{ChromosomeY}})}{\varSigma ({\rm{All}}\; {\rm{reads}})}$$\end{document}ChrY%=Σ(ReadsmappedtoChromosomeY)Σ(Allreads) To adjust for erroneously mapped reads to chrY, a baseline set of male samples and samples with known female fetuses were used to estimate the average percentage of reads mapping to chromosome Y:\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm{Average}}\; {\rm{male}}\; {\rm{ChrY}} \% =\Sigma \left\{{\rm{Male}}\; {\rm{ChrY}} \% \right\}/n$$\end{document}AveragemaleChrY%=ΣMaleChrY%/n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm{Average}}\; {\rm{female}}\; {\rm{ChrY}} \% =\Sigma \left\{{\rm{Female}}\; {\rm{ChrY}} \% \right\}/n$$\end{document}AveragefemaleChrY%=ΣFemaleChrY%/n
wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm{Average}}\; {\rm{female}}\; {\rm{ChrY}} \% =\Sigma \left\{{\rm{Female}}\; {\rm{ChrY}} \% \right\}/n$$\end{document}AveragefemaleChrY%=ΣFemaleChrY%/n Average male ChrY% is the percentage of reads mapped to chromosome Y from male donors. For average female ChrY%, samples with known female fetuses were used. Finally, the FF is calculated by adjusting for the average erroneously mapped reads:\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm{ChrY}}-{\rm{FF}}=\frac{\left({\rm{chrY}} \% -{{\rm{Average}}\; {\rm{female}}\; {\rm{ChrY}}} \% \right)}{\left({\rm{Average}}\; {\rm{male}}\; {\rm{ChrY}} \% -{\rm{Average}}\; {\rm{female}}\; {\rm{ChrY}} \% \right)}$$\end{document}ChrY−FF=chrY%−AveragefemaleChrY%AveragemaleChrY%−AveragefemaleChrY%
\begin{document}$${\rm{ChrY}}-{\rm{FF}}=\frac{\left({\rm{chrY}} \% -{{\rm{Average}}\; {\rm{female}}\; {\rm{ChrY}}} \% \right)}{\left({\rm{Average}}\; {\rm{male}}\; {\rm{ChrY}} \% -{\rm{Average}}\; {\rm{female}}\; {\rm{ChrY}} \% \right)}$$\end{document}ChrY−FF=chrY%−AveragefemaleChrY%AveragemaleChrY%−AveragefemaleChrY% Tissue-specific DNase 1 hypersensitivity sites (DHSs) were obtained from the ENCODE regulatory index55 to construct 16 tissue-specific open chromatin maps (https://www.meuleman.org/research/dhsindex/). The top 10,000 DHS summits from each tissue were selected on the basis of ranking by highest recurrence across experiments (‘numsamples’) and mean DHS signal (‘mean_signal’). Additionally, 19 tissue-specific open chromatin maps were generated from published single-cell transposase-accessible chromatin sequencing (scATAC-seq) data (http://catlas.org/humanenhancer)56. For open chromatin sites, the top 10,000 peaks were selected by peak scores. Transcription-factor-binding sites (TFBSs) for 377 transcription factors were obtained from the Gene Transcription Regulation Database (GTRD) 57, and the top 10,000 TFBSs were filtered on the basis of recurrence across experiments, as described in our previous study22.
the top 10,000 peaks were selected by peak scores. Transcription-factor-binding sites (TFBSs) for 377 transcription factors were obtained from the Gene Transcription Regulation Database (GTRD) 57, and the top 10,000 TFBSs were filtered on the basis of recurrence across experiments, as described in our previous study22. The PEARL framework starts with the generation of nucleosome coverage profiles using Griffin v0.2.0 (https://github.com/GavinHaLab/Griffin/releases/tag/v0.2.0) to compute a composite coverage profile, aggregating across sets of sites for each tissue and cell type or sets of TFBSs for each transcription factor22. Coverage profiles were generated for a ±2-kb window flanking the peak summit (for open chromatin) or binding site (for TFBS). Coverage profiles were further restricted to use specific cfDNA reads on the basis of fragment sizes: nucleosome profile (120–180 bp) and subnucleosomal profile (35–80 bp). From the coverage profiles, two features were extracted: the NDR, defined as the mean Griffin-normalized coverage value ±30 bp and ±75 bp from the summit or binding site, and MCV, defined as the mean Griffin-normalized coverage value ±1,000 bp from the summit or binding site. For single nucleosome profiles, lower values correspond to increased accessibility and increased tissue contribution. For sub-nucleosome profiles, higher values correspond to increased accessibility and increased tissue contribution. These features are used for tissue of origin, FF estimation, and PE risk prediction in the PEARL framework.
profiles, lower values correspond to increased accessibility and increased tissue contribution. For sub-nucleosome profiles, higher values correspond to increased accessibility and increased tissue contribution. These features are used for tissue of origin, FF estimation, and PE risk prediction in the PEARL framework. We investigated overall tissue contributions in maternal plasma in 375 PDNAS samples from pregnant individuals in the FF-training cohort. We then investigated tissue contributions across gestation in the FF-training cohort, comprising 237 first trimester (≤14 weeks gestation), 122 second trimester (14–28 weeks gestation), and 16 third trimester (>28 weeks gestation) samples, compared to 20 non-pregnant female samples. We also performed Pearson’s r correlation analysis between gestational age at time of PDNAS for each tissue-specific feature (NDR and MCV) from sites derived from both scATAC-seq and DNase-seq. Lastly, we used Pearson’s r correlation analysis to evaluate the effect of BMI on tissue-specific features.
nt female samples. We also performed Pearson’s r correlation analysis between gestational age at time of PDNAS for each tissue-specific feature (NDR and MCV) from sites derived from both scATAC-seq and DNase-seq. Lastly, we used Pearson’s r correlation analysis to evaluate the effect of BMI on tissue-specific features. To investigate placental and endothelial tissue contributions in early-pregnancy plasma that are associated with preeclampsia, we compared PDNAS samples from the PE-training cohort collected at ≤16 weeks gestation from normal pregnancies (n = 315) with 135 preeclampsia cases (38 EPE, 35 LPE-PB, and 62 LPE-TB). We performed the Mann–Whitney U test (scipy.stats.mannwhitneyu, v1.11.0) with Bonferroni correction (alpha = 0.05, statsmodels.stats.multitest.multipletests, v0.13.2). To account for BMI and FF, which are known to influence PDNAS test characteristics51,58, we also performed analysis of covariance (ANCOVA) using pingouin (v0.5.3) by including these as covariates59.
ats.mannwhitneyu, v1.11.0) with Bonferroni correction (alpha = 0.05, statsmodels.stats.multitest.multipletests, v0.13.2). To account for BMI and FF, which are known to influence PDNAS test characteristics51,58, we also performed analysis of covariance (ANCOVA) using pingouin (v0.5.3) by including these as covariates59. We used 395 PDNAS samples (FF-training cohort) consisting of 375 XY fetuses and 20 non-pregnant healthy females to train a model for FF estimation using our nucleosome profiling method, termed PEARL-estimated FF. Feature selection, training, and testing were performed in a cross-validation framework using bootstrapping with replacement of 1,000 iterations. For each bootstrap, samples were sampled with replacement, and the remaining samples were included in the hold-out set for that iteration. The top placental specific features were identified from the training set by analyzing 16 tissue-specific DHSs, 20 tissue-specific scATAC sites, and 377 TFBS nucleosome and sub-nucleosome features (nucleosome, MCV and NDR; sub-nucleosome, NDR; total, 1,652) between samples from non-pregnant controls (n = 20) and pregnancy samples with a high (>0.25) chromosome Y FF (chrY-FF) using the Mann–Whitney U test (scipy.stats.mannwhitneyu, v1.11.0) with Bonferroni correction (alpha = 0.05, statsmodels.stats.multitest.multipletests, v0.13.2). Statistically significant (P < 0.05) features were used in a supervised Bayesian Ridge regression framework (sklearn BayesianRidge, v1.1.1) to train a linear model for estimating the FF as a continuous value, using chrY-FF value as the true value. In each iteration, the trained model was applied to the hold-out set, and the RMSE and Pearson’s r were calculated. The model was also applied to the non-pregnant hold-out samples, and the limit of detection (LOD) was calculated as the maximum predicted chrY-FF from any non-pregnancy sample. The overall training performance was reported as the average for the RMSE, Pearson’s r, and LOD across 1,000 iterations with 95% CIs.
lculated. The model was also applied to the non-pregnant hold-out samples, and the limit of detection (LOD) was calculated as the maximum predicted chrY-FF from any non-pregnancy sample. The overall training performance was reported as the average for the RMSE, Pearson’s r, and LOD across 1,000 iterations with 95% CIs. The final model for estimating FF using PEARL was then trained on all 395 PDNAS samples. First, using all 1,239 features, 141 features were identified from all non-pregnancy samples (n = 20) and pregnancy samples with a high chrY-FF (>0.25) (n = 25). A supervised Bayesian Ridge regression model was then trained using the 141 features across all 395 samples. The final model was then applied to a separate external validation cohort (FF-validation cohort, n = 70) and all preeclampsia cohorts (PE-training cohort and PDNAS validation cohort). Performance metrics were calculated similarly to above, using bootstrapping with replacement across 1,000 iterations.
s all 395 samples. The final model was then applied to a separate external validation cohort (FF-validation cohort, n = 70) and all preeclampsia cohorts (PE-training cohort and PDNAS validation cohort). Performance metrics were calculated similarly to above, using bootstrapping with replacement across 1,000 iterations. We accessed data for 70 pregnancy samples from an external publicly available dataset (Jiang et al.)37, in which the FF was confirmed using maternal–fetal SNP genotyping. Fetal sex was predicted using the chromosome-Y ratio for samples with FF > 0.10 (39 female fetuses, 21 male fetuses, and 10 with unknown sex). These samples were sequenced to a median depth of 4×, allowing for down-sampling analysis to compare the effect of sequence depth on FF determination using PEARL. Nucleosome-based tissue profiles were then generated for all tissue and cell types and TFBSs using Griffin. NDR and MCV scores were generated as described above. The final trained model (trained on the 395 samples in the FF-training cohort) was applied to these externally sequenced samples and performance metrics, such as RMSE and Pearson’s r, were computed.
en generated for all tissue and cell types and TFBSs using Griffin. NDR and MCV scores were generated as described above. The final trained model (trained on the 395 samples in the FF-training cohort) was applied to these externally sequenced samples and performance metrics, such as RMSE and Pearson’s r, were computed. A total of 450 PDNAS samples (collected ≤16 weeks gestation, with confirmed clinical outcome) were part of the PE-training cohort, which included 135 cases of preeclampsia and 315 normal pregnancies (Fig. 1a,b). We considered three feature sets defined as follows:Placental- and endothelial-tissue-specific chromatin profiles (nucleosome, MCV and NDR; sub-nucleosome, NDR) and FF as estimated by PEARL.Systolic blood pressure, diastolic blood pressure and BMI.Combination of sets (1) and (2). Placental- and endothelial-tissue-specific chromatin profiles (nucleosome, MCV and NDR; sub-nucleosome, NDR) and FF as estimated by PEARL. Systolic blood pressure, diastolic blood pressure and BMI. Combination of sets (1) and (2). We included only BMI and blood pressure as clinical parameters because they can be easily ascertained to minimize clinician burden, especially because some conditions (for example, systemic lupus erythematosus) require complex diagnostic confirmation. Furthermore, we aimed to identify markers in cfDNA that demonstrate favorable performance without the need for multiple clinical inputs.
se they can be easily ascertained to minimize clinician burden, especially because some conditions (for example, systemic lupus erythematosus) require complex diagnostic confirmation. Furthermore, we aimed to identify markers in cfDNA that demonstrate favorable performance without the need for multiple clinical inputs. Samples were trained and tested using bootstrapping with replacement (n = 1,000). Three models were trained for each feature set, l1 logistic regression, l2 logistic regression (sklearn LogisticRegression with class weight balanced, v1.1.1), and XGBoost (xgboost XGBClassifier, number of estimators = 10 and max depth = 5, v1.6.1). These models were then used to form a soft ensemble model (mean probability across three models). In each iteration, the model is trained as a supervised binary classifier using all normal and preeclampsia samples from the respective training cohort. The trained model was applied to hold-out test samples, and the final predicted probability is taken as the mean of the three classifiers for each feature set. Through each iteration of bootstrapping, the following was computed: AUC, false positive rate (FPR), true positive rate (TPR), and sensitivity for a set specificity of 0.7, 0.8, and 0.9. A positive case is defined as an individual with preeclampsia (EPE, LPE-PB, or LPE-TB), and a negative case is an individual with normal pregnancy.
teration of bootstrapping, the following was computed: AUC, false positive rate (FPR), true positive rate (TPR), and sensitivity for a set specificity of 0.7, 0.8, and 0.9. A positive case is defined as an individual with preeclampsia (EPE, LPE-PB, or LPE-TB), and a negative case is an individual with normal pregnancy. The final model for predicting risk for preeclampsia was then trained using all 450 PDNAS samples in the PE-training cohort. This final trained model was then applied to the PDNAS validation cohort (n = 831 samples) and the multicenter external cohort (n = 141 samples). Performance metrics were calculated similarly to above, using bootstrapping with replacement across 1,000 iterations. Internal validation of the preeclampsia prediction model was performed on 831 samples (PDNAS validation cohort), distinct from those used to train the model (PE-training cohort). The study team (except for T.K. and R.S., who adjudicated cases) was blinded to the clinical outcome (normal versus preeclampsia). External validation involved application of the PEARL preeclampsia risk prediction model on sequencing results from 141 samples assembled from two biobanks and sequenced at an external clinical laboratory. Performance metrics (AUC, FPR, TPR, and sensitivity for a set specificity of 0.7, 0.8, and 0.9) of the PEARL prediction model on the internal and external validation cohorts were calculated.
iction model on sequencing results from 141 samples assembled from two biobanks and sequenced at an external clinical laboratory. Performance metrics (AUC, FPR, TPR, and sensitivity for a set specificity of 0.7, 0.8, and 0.9) of the PEARL prediction model on the internal and external validation cohorts were calculated. Sequenced data with available clinical outcome information were considered for inclusion: PDNAS validation cohort, consecutively obtained clinical samples from 1 May 2017 to 31 December 2018; PE-training cohort, random selection of approximately 10% of samples collected from 1 January 2019 to 31 December 2020. Samples were excluded on the basis of criteria listed in ‘Classification of normal pregnancy’ and ‘Clinical diagnosis of preeclampsia’ sections. No statistical method was used to predetermine sample size.
raining cohort, random selection of approximately 10% of samples collected from 1 January 2019 to 31 December 2020. Samples were excluded on the basis of criteria listed in ‘Classification of normal pregnancy’ and ‘Clinical diagnosis of preeclampsia’ sections. No statistical method was used to predetermine sample size. All two-group comparisons of continuous values were performed using the Mann–Whitney U test; P values were estimated using the default two-sided test, with the statistical significance level set at 0.05. Pearson’s correlation was used to compare two continuous variables; the correlation coefficient, r, is reported, and the P value was estimated using the default two-sided test with the CI set at 95%. The RMSE was used to determine the error as the average difference between the model-predicted FF and the actual FF. Receiver operating characteristic curves were used to evaluate classification performance; the AUC was used as the metric for performance. Confidence intervals (95%) for AUCs were computed using 1,000 iterations of bootstrapping with replacement. True positive (TP) was defined as correctly predicting preeclampsia; false positive (FP) was defined as predicting preeclampsia for a normal pregnancy; true negative (TN) was defined as correctly predicting normal pregnancy; and false negative (FN) was defined as predicting normal pregnancy for a preeclampsia case. Sensitivity was computed as TP / (TP + FN); specificity was computed as TN / (TN + FP).
(FP) was defined as predicting preeclampsia for a normal pregnancy; true negative (TN) was defined as correctly predicting normal pregnancy; and false negative (FN) was defined as predicting normal pregnancy for a preeclampsia case. Sensitivity was computed as TP / (TP + FN); specificity was computed as TN / (TN + FP). Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
We designated a pregnancy as normal if it was a singleton gestation; continued past 24 weeks; was not complicated by preeclampsia (including postpartum preeclampsia), gestational hypertension, maternal transplant, or maternal malignancy; and did not have aneuploidy or a major fetal anomaly. In the PE-training cohort (n = 450), we included only those with pregnancies ending in a term delivery (≥37 weeks) in the normal group. In the PDNAS validation cohort (n = 831), however, we included cases with preterm delivery (<37 weeks) in the normal group. To assemble cohorts more representative of a real-world clinical population, we did not exclude individuals with pre-existing hypertension, diabetes, or other maternal medical conditions (for example, renal disease or systemic lupus erythematosus) from any of the three cohorts (PE-training, PDNAS validation, or external).
Preeclampsia diagnosis was based on the American College of Obstetricians and Gynecologists criteria in pregnancies continuing past 20 weeks gestation1. We excluded non-singleton gestations, cases of chronic hypertension without superimposed preeclampsia (retained in the normal cohort), postpartum preeclampsia, gestational hypertension, maternal transplant, maternal malignancy, or if the pregnancy was complicated by aneuploidy or major fetal anomaly. Preeclampsia was designated as EPE if diagnosis occurred at or before 34 weeks gestation and delivery was preterm (<37 weeks); or as LPE if confirmed after 34 weeks gestation. For clinical significance, LPE was further classified as LPE-PB or LPE-TB, using a gestational age at delivery of <37 weeks to define preterm birth. Medical records were reviewed by trained maternal–fetal medicine providers (T.K. and R.S.) to confirm the preeclampsia diagnosis. For preeclampsia prediction, we primarily focused our results on EPE and LPE-PB, because these cases required preterm deliveries, which are associated with a high risk of neonatal morbidity and mortality49. All samples are from individuals of female sex based on the designation in the electronic health record. No information on gender identity was collected.
Since May 2017, UW has clinically used an internally developed PDNAS assay that employs low-pass WGS of cfDNA extracted from maternal plasma to provide a high-quality screen for autosomal and sex-chromosome aneuploidies. A total of 1,854 clinical PDNAS samples were included in this study; samples were divided as outlined in Fig. 1. The FF-training cohort consists of 375 PDNAS samples with XY fetuses and 20 from healthy non-pregnant females sequenced using the same clinical assay (total n = 395). The FF-training cohort was used to train the PEARL-estimated FF model. The PE-training cohort was assembled by randomly selecting approximately 10% of samples that met inclusion criteria and were clinically obtained from 1 January 2019 to 31 December 2020, with enrichment of preeclampsia cases from samples clinically obtained from 1 January 2021 to 25 April 2023. From here, we excluded 54 samples that were collected after 16 weeks gestation (total n = 450). The PDNAS validation cohort initially consisted of 956 samples and resembles an all-comers population because it includes all consecutively obtained clinical samples from 1 May 2017 to 31 December 2018 that met the broad inclusion criteria as detailed above. Notably, the normal group in this cohort included all term and preterm deliveries not complicated by preeclampsia. After excluding 125 samples that were collected after 16 weeks gestation, the final PDNAS validation cohort used for preeclampsia risk prediction consisted of 831 samples. Samples used for training (FF-training cohort for FF estimation and PE-training cohort for preeclampsia risk prediction) were distinct and separate from those used for validation of each of these models.
For external validation of the PEARL-estimated FF model, we used published data with single-nucleotide polymorphism (SNP)-based quantification of FF from pregnancies with male and female fetuses (n = 70)37. These samples were sequenced to a median depth of 4×, enabling down-sampling analysis to assess the effect of sequence depth on FF determination. Down-sampling was performed at depths of 0.1×, 0.5×, 1×, and 1.5×. Sequencing read duplicates were first marked on the aligned bam file using Picard Tools (v2.18.29, http://broadinstitute.github.io/picard) MarkDuplicates and down-sampling was performed using DownsampleSam.
For external validation of the preeclampsia risk prediction model, we assembled 146 samples (stored plasma or extracted cfDNA) from two biobanks (the Magee Obstetrical Maternal Infant Database and Biobank and the UW Pregnancy Research Biobank) enriched for preeclampsia cases (n = 16 EPE, n = 23 LPE-PB, n = 23 LPE-TB, and n = 79 normal controls). These samples were sequenced at an external clinical laboratory (the Broad Institute), which is accredited by the Clinical Laboratory Improvement Amendments and The College of American Pathologists (CLIA/CAP), at a low-pass sequence depth (mean coverage, 2.3×; IQR, 1.19–3.11).
For PDNAS samples sequenced through the UW clinical laboratory, libraries were generated using a validated, laboratory-developed method that has been previously reported50,51. In brief, whole blood from Streck (BCT1) tubes was centrifuged, and plasma was isolated following the instructions on the package insert. cfDNA was extracted using the QIAsymphony Circulating DNA Kit (Qiagen). Following DNA quantification, next-generation sequencing library preparation was performed using the KAPA HyperPrep kit for adapter and index ligation. Libraries were purified before and after amplification using the AMPureXP kit (Beckman Coulter). Sample pools were created using an equimolar strategy and diluted to 1 nM. Sequencing was performed using an Illumina NextSeq 500 with a 37-bp paired-end read configuration.
All sequence data (PDNAS samples and external validation samples) were aligned to the reference genome hg38 (GRCh38, https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.26/). Raw fastq files were adapter-trimmed using the cutadapt tool (v3.3)52. The paired reads were then aligned using BWA-MEM algorithm (v0.7.17) and were further processed using SAMtools (v1.10) to sort and index the resulting bam file53,54. Picard tool’s (v2.18.29, http://broadinstitute.github.io/picard) MarkDuplicates function was used to mark duplicates. Picard tool’s CollectAlignmentSummaryMetrics, CollectWgsMetrics and CollectInsertSizeMetrics were used to calculate quality-control (QC) sequencing metrics.
The UW clinical lab uses the following procedures for determining the FF for PDNAS samples from pregnancies with male fetuses. The chromosome-Y-based FF was calculated on the basis of the percentage of reads that map to chromosome Y (chrY) using the formula:\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rm{ChrY}} \% =\frac{\varSigma ({\rm{Reads}}\; {\rm{mapped}}\; {\rm{to}}\; {\rm{ChromosomeY}})}{\varSigma ({\rm{All}}\; {\rm{reads}})}$$\end{document}ChrY%=Σ(ReadsmappedtoChromosomeY)Σ(Allreads)
Tissue-specific DNase 1 hypersensitivity sites (DHSs) were obtained from the ENCODE regulatory index55 to construct 16 tissue-specific open chromatin maps (https://www.meuleman.org/research/dhsindex/). The top 10,000 DHS summits from each tissue were selected on the basis of ranking by highest recurrence across experiments (‘numsamples’) and mean DHS signal (‘mean_signal’). Additionally, 19 tissue-specific open chromatin maps were generated from published single-cell transposase-accessible chromatin sequencing (scATAC-seq) data (http://catlas.org/humanenhancer)56. For open chromatin sites, the top 10,000 peaks were selected by peak scores. Transcription-factor-binding sites (TFBSs) for 377 transcription factors were obtained from the Gene Transcription Regulation Database (GTRD) 57, and the top 10,000 TFBSs were filtered on the basis of recurrence across experiments, as described in our previous study22. The PEARL framework starts with the generation of nucleosome coverage profiles using Griffin v0.2.0 (https://github.com/GavinHaLab/Griffin/releases/tag/v0.2.0) to compute a composite coverage profile, aggregating across sets of sites for each tissue and cell type or sets of TFBSs for each transcription factor22. Coverage profiles were generated for a ±2-kb window flanking the peak summit (for open chromatin) or binding site (for TFBS). Coverage profiles were further restricted to use specific cfDNA reads on the basis of fragment sizes: nucleosome profile (120–180 bp) and subnucleosomal profile (35–80 bp).
anscription factor22. Coverage profiles were generated for a ±2-kb window flanking the peak summit (for open chromatin) or binding site (for TFBS). Coverage profiles were further restricted to use specific cfDNA reads on the basis of fragment sizes: nucleosome profile (120–180 bp) and subnucleosomal profile (35–80 bp). From the coverage profiles, two features were extracted: the NDR, defined as the mean Griffin-normalized coverage value ±30 bp and ±75 bp from the summit or binding site, and MCV, defined as the mean Griffin-normalized coverage value ±1,000 bp from the summit or binding site. For single nucleosome profiles, lower values correspond to increased accessibility and increased tissue contribution. For sub-nucleosome profiles, higher values correspond to increased accessibility and increased tissue contribution. These features are used for tissue of origin, FF estimation, and PE risk prediction in the PEARL framework.
Tissue-specific DNase 1 hypersensitivity sites (DHSs) were obtained from the ENCODE regulatory index55 to construct 16 tissue-specific open chromatin maps (https://www.meuleman.org/research/dhsindex/). The top 10,000 DHS summits from each tissue were selected on the basis of ranking by highest recurrence across experiments (‘numsamples’) and mean DHS signal (‘mean_signal’). Additionally, 19 tissue-specific open chromatin maps were generated from published single-cell transposase-accessible chromatin sequencing (scATAC-seq) data (http://catlas.org/humanenhancer)56. For open chromatin sites, the top 10,000 peaks were selected by peak scores. Transcription-factor-binding sites (TFBSs) for 377 transcription factors were obtained from the Gene Transcription Regulation Database (GTRD) 57, and the top 10,000 TFBSs were filtered on the basis of recurrence across experiments, as described in our previous study22.
The PEARL framework starts with the generation of nucleosome coverage profiles using Griffin v0.2.0 (https://github.com/GavinHaLab/Griffin/releases/tag/v0.2.0) to compute a composite coverage profile, aggregating across sets of sites for each tissue and cell type or sets of TFBSs for each transcription factor22. Coverage profiles were generated for a ±2-kb window flanking the peak summit (for open chromatin) or binding site (for TFBS). Coverage profiles were further restricted to use specific cfDNA reads on the basis of fragment sizes: nucleosome profile (120–180 bp) and subnucleosomal profile (35–80 bp).
From the coverage profiles, two features were extracted: the NDR, defined as the mean Griffin-normalized coverage value ±30 bp and ±75 bp from the summit or binding site, and MCV, defined as the mean Griffin-normalized coverage value ±1,000 bp from the summit or binding site. For single nucleosome profiles, lower values correspond to increased accessibility and increased tissue contribution. For sub-nucleosome profiles, higher values correspond to increased accessibility and increased tissue contribution. These features are used for tissue of origin, FF estimation, and PE risk prediction in the PEARL framework.
We investigated overall tissue contributions in maternal plasma in 375 PDNAS samples from pregnant individuals in the FF-training cohort. We then investigated tissue contributions across gestation in the FF-training cohort, comprising 237 first trimester (≤14 weeks gestation), 122 second trimester (14–28 weeks gestation), and 16 third trimester (>28 weeks gestation) samples, compared to 20 non-pregnant female samples. We also performed Pearson’s r correlation analysis between gestational age at time of PDNAS for each tissue-specific feature (NDR and MCV) from sites derived from both scATAC-seq and DNase-seq. Lastly, we used Pearson’s r correlation analysis to evaluate the effect of BMI on tissue-specific features. To investigate placental and endothelial tissue contributions in early-pregnancy plasma that are associated with preeclampsia, we compared PDNAS samples from the PE-training cohort collected at ≤16 weeks gestation from normal pregnancies (n = 315) with 135 preeclampsia cases (38 EPE, 35 LPE-PB, and 62 LPE-TB). We performed the Mann–Whitney U test (scipy.stats.mannwhitneyu, v1.11.0) with Bonferroni correction (alpha = 0.05, statsmodels.stats.multitest.multipletests, v0.13.2). To account for BMI and FF, which are known to influence PDNAS test characteristics51,58, we also performed analysis of covariance (ANCOVA) using pingouin (v0.5.3) by including these as covariates59.
We used 395 PDNAS samples (FF-training cohort) consisting of 375 XY fetuses and 20 non-pregnant healthy females to train a model for FF estimation using our nucleosome profiling method, termed PEARL-estimated FF. Feature selection, training, and testing were performed in a cross-validation framework using bootstrapping with replacement of 1,000 iterations. For each bootstrap, samples were sampled with replacement, and the remaining samples were included in the hold-out set for that iteration. The top placental specific features were identified from the training set by analyzing 16 tissue-specific DHSs, 20 tissue-specific scATAC sites, and 377 TFBS nucleosome and sub-nucleosome features (nucleosome, MCV and NDR; sub-nucleosome, NDR; total, 1,652) between samples from non-pregnant controls (n = 20) and pregnancy samples with a high (>0.25) chromosome Y FF (chrY-FF) using the Mann–Whitney U test (scipy.stats.mannwhitneyu, v1.11.0) with Bonferroni correction (alpha = 0.05, statsmodels.stats.multitest.multipletests, v0.13.2). Statistically significant (P < 0.05) features were used in a supervised Bayesian Ridge regression framework (sklearn BayesianRidge, v1.1.1) to train a linear model for estimating the FF as a continuous value, using chrY-FF value as the true value. In each iteration, the trained model was applied to the hold-out set, and the RMSE and Pearson’s r were calculated. The model was also applied to the non-pregnant hold-out samples, and the limit of detection (LOD) was calculated as the maximum predicted chrY-FF from any non-pregnancy sample. The overall training performance was reported as the average for the RMSE, Pearson’s r, and LOD across 1,000 iterations with 95% CIs.
The final model for estimating FF using PEARL was then trained on all 395 PDNAS samples. First, using all 1,239 features, 141 features were identified from all non-pregnancy samples (n = 20) and pregnancy samples with a high chrY-FF (>0.25) (n = 25). A supervised Bayesian Ridge regression model was then trained using the 141 features across all 395 samples. The final model was then applied to a separate external validation cohort (FF-validation cohort, n = 70) and all preeclampsia cohorts (PE-training cohort and PDNAS validation cohort). Performance metrics were calculated similarly to above, using bootstrapping with replacement across 1,000 iterations.
We accessed data for 70 pregnancy samples from an external publicly available dataset (Jiang et al.)37, in which the FF was confirmed using maternal–fetal SNP genotyping. Fetal sex was predicted using the chromosome-Y ratio for samples with FF > 0.10 (39 female fetuses, 21 male fetuses, and 10 with unknown sex). These samples were sequenced to a median depth of 4×, allowing for down-sampling analysis to compare the effect of sequence depth on FF determination using PEARL. Nucleosome-based tissue profiles were then generated for all tissue and cell types and TFBSs using Griffin. NDR and MCV scores were generated as described above. The final trained model (trained on the 395 samples in the FF-training cohort) was applied to these externally sequenced samples and performance metrics, such as RMSE and Pearson’s r, were computed.
A total of 450 PDNAS samples (collected ≤16 weeks gestation, with confirmed clinical outcome) were part of the PE-training cohort, which included 135 cases of preeclampsia and 315 normal pregnancies (Fig. 1a,b). We considered three feature sets defined as follows:Placental- and endothelial-tissue-specific chromatin profiles (nucleosome, MCV and NDR; sub-nucleosome, NDR) and FF as estimated by PEARL.Systolic blood pressure, diastolic blood pressure and BMI.Combination of sets (1) and (2). Placental- and endothelial-tissue-specific chromatin profiles (nucleosome, MCV and NDR; sub-nucleosome, NDR) and FF as estimated by PEARL. Systolic blood pressure, diastolic blood pressure and BMI. Combination of sets (1) and (2). We included only BMI and blood pressure as clinical parameters because they can be easily ascertained to minimize clinician burden, especially because some conditions (for example, systemic lupus erythematosus) require complex diagnostic confirmation. Furthermore, we aimed to identify markers in cfDNA that demonstrate favorable performance without the need for multiple clinical inputs.
teration of bootstrapping, the following was computed: AUC, false positive rate (FPR), true positive rate (TPR), and sensitivity for a set specificity of 0.7, 0.8, and 0.9. A positive case is defined as an individual with preeclampsia (EPE, LPE-PB, or LPE-TB), and a negative case is an individual with normal pregnancy. The final model for predicting risk for preeclampsia was then trained using all 450 PDNAS samples in the PE-training cohort. This final trained model was then applied to the PDNAS validation cohort (n = 831 samples) and the multicenter external cohort (n = 141 samples). Performance metrics were calculated similarly to above, using bootstrapping with replacement across 1,000 iterations.
Samples were trained and tested using bootstrapping with replacement (n = 1,000). Three models were trained for each feature set, l1 logistic regression, l2 logistic regression (sklearn LogisticRegression with class weight balanced, v1.1.1), and XGBoost (xgboost XGBClassifier, number of estimators = 10 and max depth = 5, v1.6.1). These models were then used to form a soft ensemble model (mean probability across three models). In each iteration, the model is trained as a supervised binary classifier using all normal and preeclampsia samples from the respective training cohort.
The trained model was applied to hold-out test samples, and the final predicted probability is taken as the mean of the three classifiers for each feature set. Through each iteration of bootstrapping, the following was computed: AUC, false positive rate (FPR), true positive rate (TPR), and sensitivity for a set specificity of 0.7, 0.8, and 0.9. A positive case is defined as an individual with preeclampsia (EPE, LPE-PB, or LPE-TB), and a negative case is an individual with normal pregnancy.
The final model for predicting risk for preeclampsia was then trained using all 450 PDNAS samples in the PE-training cohort. This final trained model was then applied to the PDNAS validation cohort (n = 831 samples) and the multicenter external cohort (n = 141 samples). Performance metrics were calculated similarly to above, using bootstrapping with replacement across 1,000 iterations.
Internal validation of the preeclampsia prediction model was performed on 831 samples (PDNAS validation cohort), distinct from those used to train the model (PE-training cohort). The study team (except for T.K. and R.S., who adjudicated cases) was blinded to the clinical outcome (normal versus preeclampsia). External validation involved application of the PEARL preeclampsia risk prediction model on sequencing results from 141 samples assembled from two biobanks and sequenced at an external clinical laboratory. Performance metrics (AUC, FPR, TPR, and sensitivity for a set specificity of 0.7, 0.8, and 0.9) of the PEARL prediction model on the internal and external validation cohorts were calculated.
Sequenced data with available clinical outcome information were considered for inclusion: PDNAS validation cohort, consecutively obtained clinical samples from 1 May 2017 to 31 December 2018; PE-training cohort, random selection of approximately 10% of samples collected from 1 January 2019 to 31 December 2020. Samples were excluded on the basis of criteria listed in ‘Classification of normal pregnancy’ and ‘Clinical diagnosis of preeclampsia’ sections. No statistical method was used to predetermine sample size.
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41591-025-03509-w.
Source Data Figs. 2 and 3, Source Data Extended Data Table 1 and Source Data Extended Data Figs. 1–9 Source Data Figs. 2 and 3, Source Data Extended Data Table 1 and Source Data Extended Data Figs. 1–9