Browse the corpus

Walk the Even Hospital Database by book and chapter — the raw source passages that ground Ask, DDx, and the rest.

24 passages

fulltextpubmed· Breadth: more variables, less coherence· item 41130627

As large scale cohorts increasingly integrate diverse data sources—including multi-omics data, biological samples, electronic health records, wearable devices, and extensive self-reported questionnaires—the complexity of data management has significantly intensified. Scientifically, despite the potential of large scale cohort studies, the lack of standardised data collection and harmonisation threatens credibility. The China Health and Retirement Longitudinal Study (CHARLS), for example, highlights how variations in cultural contexts, recall biases, and differing literacy levels amplify data heterogeneity.3 4 Integrating these heterogeneous data streams demands robust IT frameworks, sophisticated statistical methods, and stringent data security measures. Without consistent protocols and meticulous data management, such complexity may undermine the interpretability and reproducibility of findings, diminishing the overall scientific value of cohort studies.

fulltextpubmed· Breadth: more variables, less coherence· item 41130627

data streams demands robust IT frameworks, sophisticated statistical methods, and stringent data security measures. Without consistent protocols and meticulous data management, such complexity may undermine the interpretability and reproducibility of findings, diminishing the overall scientific value of cohort studies. Clinically, the rapid expansion of the breadth of variables in large scale cohort studies has generated an illusion of comprehensive health insights. However, without organic integration, this complexity can dilute clinical relevance and public health utility. One major concern is the underuse of existing data on clinical procedures and health examinations. Despite their high utility and relevance, these data are often neglected in favour of collecting new biological, multi-omics, or digital health data, placing additional burdens on healthcare systems and leading to inefficiencies and wasted efforts. Very few cohort studies have successfully achieved meaningful integration between medical informatics and population based cohort studies.5 Structural barriers between hospital information systems and research databases still remain a core challenge, although scalable solutions—including open source frameworks and context specific informatics infrastructure—are emerging.6 Together, these advances may offer the potential to align cohort based data collection more closely with real world healthcare systems.

fulltextpubmed· Breadth: more variables, less coherence· item 41130627

ystems and research databases still remain a core challenge, although scalable solutions—including open source frameworks and context specific informatics infrastructure—are emerging.6 Together, these advances may offer the potential to align cohort based data collection more closely with real world healthcare systems. From a scientific perspective, standardised data protocols must be systematically implemented for managing variability across diverse settings of large scale cohorts, ensuring the consistency and quality of data. Whereas many cohorts, such as TZL (Taizhou Longitudinal Study) with 200 000 participants,7 adopted audio recordings or numerous bespoke software systems to reduce interviewer bias, our own cohort, WHOLISM (West-China Hospital Overall Life Initiative: Strategies for a Million), relies on hospital driven standardisation, including unified training, rigorous quality control, and harmonised laboratory procedures, to minimise inter-hospital variability and measurement errors. Anchored in the hospital alliance’s IT system, it first consolidated repeated examination populations, then expanded to affiliated community residents, and further incorporated high altitude cohorts to capture environmental diversity (box 1; fig 1). Additionally, advanced federated learning technologies have been used for decentralised data integration, safeguarding privacy while linking electronic health records, clinical interventions, and longitudinal biosamples.8

fulltextpubmed· Breadth: more variables, less coherence· item 41130627

and further incorporated high altitude cohorts to capture environmental diversity (box 1; fig 1). Additionally, advanced federated learning technologies have been used for decentralised data integration, safeguarding privacy while linking electronic health records, clinical interventions, and longitudinal biosamples.8 The WHOLISM (West-China Hospital Overall Life Initiative: Strategies for a Million) cohort study comprises three interlinked sub-cohorts that reflect a tiered structure of population depth and data breadth. The design begins with the consolidation of repeated health check-up populations, expands to community dwelling residents through hospital network outreach, and incorporates people living at high altitude on the basis of geographical exposure. The high altitude sub-cohort is not a standalone population but represents an environmental subset drawn from the broader health check-up and community cohorts. Together, these layers enable comprehensive assessment across clinical, social, and environmental dimensions

fulltextpubmed· Breadth: more variables, less coherence· item 41130627

gh altitude on the basis of geographical exposure. The high altitude sub-cohort is not a standalone population but represents an environmental subset drawn from the broader health check-up and community cohorts. Together, these layers enable comprehensive assessment across clinical, social, and environmental dimensions Health check-up cohort (target: 600 000 participants)—This foundational sub-cohort includes patients undergoing routine health check-ups (≥3 times), primarily through the WCH Health Management Centre and its affiliated subcentres. It provides longitudinal data on biomarkers, imaging, and clinical measurements. It also incorporates data from 11 public hospitals and focuses on real world health trajectories, within the West China Hospital Alliance Longitudinal Epidemiology Wellness (WHALE) Study, supporting early disease detection and preventive intervention tracking Community dwelling cohort (target: 400 000 participants)—Building on the health check-up infrastructure, this sub-cohort captures urban and rural residents across Sichuan province and six additional provinces including Chongqing, Fujian, and Hainan. It focuses on examining the effects of lifestyle, diet, and social factors on long term outcomes and helping to characterise health disparities across socioeconomic strata

fulltextpubmed· Breadth: more variables, less coherence· item 41130627

re, this sub-cohort captures urban and rural residents across Sichuan province and six additional provinces including Chongqing, Fujian, and Hainan. It focuses on examining the effects of lifestyle, diet, and social factors on long term outcomes and helping to characterise health disparities across socioeconomic strata High altitude plateau cohort (target: 50 000 participants)—This geographically unique sub-cohort includes populations living above 2500 m in the Qinghai-Tibet Plateau region, covering parts of Sichuan, Tibet, and Qinghai. It is drawn from participants within the health check-up and community cohorts who meet high altitude residence criteria. It is designed to study the health effects of chronic hypoxia and other environmental stressors, offering insights into respiratory, cardiovascular, and genetic adaptations to high altitude living To date, WHOLISM has made preliminary progress, with 318 909 participants from the three sub-cohorts. Of these participants, 3404 are from high altitude regions, 42 926 are from non-high altitude areas, and 272 579 are from health check-up centres with ≥3 repeated measures. Across these sub-cohorts, approximately 50 000 participants will be selected for a “deep cohort” with intensified multi-omics profiling, extended phenotyping, and detailed follow-up. Additionally, a total of 206 875 blood samples were stored in the WCH biobank. By May 2025, the Precision Medicine Center at WCH had completed 30X whole genome sequencing for 38 345 participants

fulltextpubmed· Breadth: more variables, less coherence· item 41130627

ts will be selected for a “deep cohort” with intensified multi-omics profiling, extended phenotyping, and detailed follow-up. Additionally, a total of 206 875 blood samples were stored in the WCH biobank. By May 2025, the Precision Medicine Center at WCH had completed 30X whole genome sequencing for 38 345 participants Timeline illustrating clinician initiated, large scale cohort studies in China since 2000. The figure highlights representative Chinese cohorts characterised by significant hospital involvement and clinician leadership, emphasising varied breadth (diversity of collected variables) and depth (sample size). Cohorts such as the Taizhou Longitudinal Study (environmental and chronic diseases), REACTION Study (metabolic disorders and cancer risk), ChinaHEART (cardiovascular and chronic disease prevention), and National Birth Cohort (early life health determinants) exemplify balanced expansions in both breadth and depth, facilitating integrated clinical and public health research. The WHOLISM Study, initiated in 2010, illustrates a structured, clinician driven approach, systematically integrating hospital based health check-up participants, community dwelling residents recruited via hospital networks, and high altitude populations, exemplifying comprehensive hospital participation and clinician leadership aimed at enhancing scientific quality, clinical applicability, and societal responsibility. COPD=chronic obstructive pulmonary disease; CVD=cardiovascular disease; EHR=electronic health records; NCD=non-communicable disease; WCH=West China Hospital

fulltextpubmed· Breadth: more variables, less coherence· item 41130627

mplifying comprehensive hospital participation and clinician leadership aimed at enhancing scientific quality, clinical applicability, and societal responsibility. COPD=chronic obstructive pulmonary disease; CVD=cardiovascular disease; EHR=electronic health records; NCD=non-communicable disease; WCH=West China Hospital From a health perspective, the value of cohort studies lies not only in the biological insights they generate but also in their clinical relevance. Unlike traditional community based cohort studies, some clinician initiated cohorts can leverage rich clinical data, advanced diagnostic tools, and sustained relationships with patients, enabling more practical translational research for health value. With growing hospital participation, many cohort studies now include long term, multi-age, and multi-ethnic participants, along with multi-fluid biosample collection, all driven by clinical demands.9 10 For instance, birth cohorts, such as China’s National Birth Cohort involving more than 30 hospitals with 60 000 families, are pivotal in revealing how early life exposures shape lifelong disease risk and health trajectories.11 Other clinician initiated cohorts, such as WHOLISM, established within a standardised framework of unified baseline assessments and inclusion criteria, encompasses more than 50 clinician initiated sub-cohorts targeting specific high risk populations, with tailored screening and follow-up protocols. The integration of clinical and public health frameworks showed resilience and continuity even during disruptions such as the covid-19 pandemic.12 13 14

fulltextpubmed· Standardising processes and aligning with clinical needs· item 41130627

From a scientific perspective, standardised data protocols must be systematically implemented for managing variability across diverse settings of large scale cohorts, ensuring the consistency and quality of data. Whereas many cohorts, such as TZL (Taizhou Longitudinal Study) with 200 000 participants,7 adopted audio recordings or numerous bespoke software systems to reduce interviewer bias, our own cohort, WHOLISM (West-China Hospital Overall Life Initiative: Strategies for a Million), relies on hospital driven standardisation, including unified training, rigorous quality control, and harmonised laboratory procedures, to minimise inter-hospital variability and measurement errors. Anchored in the hospital alliance’s IT system, it first consolidated repeated examination populations, then expanded to affiliated community residents, and further incorporated high altitude cohorts to capture environmental diversity (box 1; fig 1). Additionally, advanced federated learning technologies have been used for decentralised data integration, safeguarding privacy while linking electronic health records, clinical interventions, and longitudinal biosamples.8

fulltextpubmed· Depth: bigger samples, narrower reach· item 41130627

The expansion of population size in large scale cohort studies offers opportunities for broader coverage for scientific discovery, but larger numbers alone do not necessarily enhance scientific value. In fact, when scale outpaces targeted design, cohort studies may still have poor representativeness and limited generalisability. For instance, many established cohorts in China have primarily focused on Han populations living in urban or central regions, offering valuable insights into common disease patterns.15 However, their generalisability to minority groups and residents in western or high altitude areas remains limited. Moreover, participation response rates in some large scale studies have failed to exceed 50%, especially in rural area.16 17 They have also struggled to reach marginalised or medically underserved groups,18 reducing the inclusiveness and utility of their findings.

fulltextpubmed· Depth: bigger samples, narrower reach· item 41130627

ents in western or high altitude areas remains limited. Moreover, participation response rates in some large scale studies have failed to exceed 50%, especially in rural area.16 17 They have also struggled to reach marginalised or medically underserved groups,18 reducing the inclusiveness and utility of their findings. Additionally, expanding cohort size does not necessarily enhance health value. Although larger cohorts may collect increasing volumes of information across heterogeneous populations, these data often lack continuity within individuals and design integration across time. Without sustained follow-up and coherent frameworks, tracing disease trajectories or interpreting intervention outcomes in a clinically meaningful way becomes difficult. For instance, studies on hormone replacement therapy initially produced promising observational findings, yet subsequent randomised trials yielded contradictory results.19 These discrepancies stemmed less from biological inconsistency than from fragmented designs, weak longitudinal linkage, and insufficient population tracking. Moreover, when statistical results are interpreted without specific clinical context or consideration of individual variability, confusion grows rather than resolves.20 These design and follow-up challenges are well documented internationally and are no less relevant to large scale cohorts in China. Instead of causing noise, cohort studies should aim to generate clarity by aligning data in a coherent framework.

fulltextpubmed· Depth: bigger samples, narrower reach· item 41130627

eration of individual variability, confusion grows rather than resolves.20 These design and follow-up challenges are well documented internationally and are no less relevant to large scale cohorts in China. Instead of causing noise, cohort studies should aim to generate clarity by aligning data in a coherent framework. To restore scientific validity, future cohort designs must move beyond simple expansion and focus on purposeful inclusion and data consistency. Scientific representativeness can be improved by deliberately recruiting participants from geographically and ethnically diverse regions, ensuring broader applicability of research findings. The ChinaHEART cohort demonstrates this approach by strategically recruiting more than four million participants from diverse socioeconomic backgrounds across all 31 provinces in China. This design effectively identified regional differences in cardiovascular burden and improved the generalisability of findings.21 Similarly, the WHOLISM cohort is based in Sichuan province—a region connecting western and eastern China—and includes broad populations, including those from high altitude plateaus and plains, through its hospital alliance network system (box 1; fig 1). By linking electronic health records, electronic medical records, community health surveys, and government datasets, such cohorts enhance coverage and help to reduce selection bias,22 thereby offering complementary insights to traditional disease specific or general population cohort studies.

fulltextpubmed· Depth: bigger samples, narrower reach· item 41130627

ork system (box 1; fig 1). By linking electronic health records, electronic medical records, community health surveys, and government datasets, such cohorts enhance coverage and help to reduce selection bias,22 thereby offering complementary insights to traditional disease specific or general population cohort studies. To enhance clinical precision, large scale cohort studies should not only build on participant expansion but also track individuals’ trajectories and potential interventions over time. Designs that incorporate sustained observation allow researchers to capture transitions from low risk or subclinical states to disease onset, progression, and recovery. Innovative epidemiological frameworks, such as target trial emulation and trials within cohorts,23 24 additionally enable causal inference and risk stratification within existing populations, reducing the need for continual expansion while improving the specificity of clinical insights. Complementing these design strategies, the application of wearable technologies, digital health platforms, and point-of-care diagnostics can support real time monitoring and ongoing risk assessment, facilitating more responsive and proactive health management. In this way, health value is derived not from volume or novelty alone but also from designs that bridge population science and patient care.

fulltextpubmed· Purposeful inclusion enhances scientific validity and clinical precision· item 41130627

To restore scientific validity, future cohort designs must move beyond simple expansion and focus on purposeful inclusion and data consistency. Scientific representativeness can be improved by deliberately recruiting participants from geographically and ethnically diverse regions, ensuring broader applicability of research findings. The ChinaHEART cohort demonstrates this approach by strategically recruiting more than four million participants from diverse socioeconomic backgrounds across all 31 provinces in China. This design effectively identified regional differences in cardiovascular burden and improved the generalisability of findings.21 Similarly, the WHOLISM cohort is based in Sichuan province—a region connecting western and eastern China—and includes broad populations, including those from high altitude plateaus and plains, through its hospital alliance network system (box 1; fig 1). By linking electronic health records, electronic medical records, community health surveys, and government datasets, such cohorts enhance coverage and help to reduce selection bias,22 thereby offering complementary insights to traditional disease specific or general population cohort studies.

fulltextpubmed· Quality: greater complexity, lower fidelity· item 41130627

Scientific quality in large scale cohorts is compromised by missing data, incomplete follow-up, and fragmented study structures. The rapid increase in both sample size (depth) and variable (breadth) dimensions has introduced substantial operational burdens, particularly in terms of follow-up and data completeness. Large cohorts are inherently more vulnerable to participant attrition and missing data, especially in long term or invasive study settings.25 These logistical challenges can compromise the quality, continuity, and interpretability of the data collected. By contrast, smaller but well structured cohort studies with clearly defined hypotheses have shown lasting scientific value. For example, the Dongfeng-Tongji cohort, comprising around 27 000 retired workers, has informed chronic disease prevention among middle aged and older adults in urban China.26 Similarly, several medium or small scale, region specific cohorts, such as the Fuqing Cohort,27 REACTION study,28 and WHALE study,12 have improved the targeted prevention and control of specific diseases in specific populations. These examples underscore that the quality of cohort studies is a function not of scale alone but of design coherence, follow-up quality, and data fidelity.

fulltextpubmed· Quality: greater complexity, lower fidelity· item 41130627

, such as the Fuqing Cohort,27 REACTION study,28 and WHALE study,12 have improved the targeted prevention and control of specific diseases in specific populations. These examples underscore that the quality of cohort studies is a function not of scale alone but of design coherence, follow-up quality, and data fidelity. Beyond scientific concerns, the quality of clinical insight can also be compromised, as uncontrolled expansion may lead to increasing reliance on exploratory analyses and agnostic discovery approaches. This trend resembles the rise of “black box epidemiology,”29 prioritising quantity of data over biological mechanism and clinical demands, and often resulting in findings disconnected from actionable health priorities. The emergence of multi-omics technologies has amplified this tendency, widening the gap between research output and clinical application. Methods such as mendelian randomisation, for instance, are conceptually powerful but frequently misapplied, often without satisfying core assumptions or examining relevant health outcomes.30 Therefore, the true value of multi-omics should lie in integration with rigorous epidemiological designs that serve real world health goals. Without this alignment, cohort studies risk becoming analytically complex but practically disconnected, losing sight of the “hippocratic epidemiology” for health improvement.31

fulltextpubmed· Quality: greater complexity, lower fidelity· item 41130627

30 Therefore, the true value of multi-omics should lie in integration with rigorous epidemiological designs that serve real world health goals. Without this alignment, cohort studies risk becoming analytically complex but practically disconnected, losing sight of the “hippocratic epidemiology” for health improvement.31 Enhancing quality in large scale cohort studies requires both infrastructural and organisational capacity to support continuity, consistency, and public trust. Hospital based platforms such as the National Birth Cohort and WHOLISM benefit from well established alliance networks that enable unified data protocols, reduce duplication, and support both passive and active follow-up across geographical and socioeconomic settings.32 These structural strengths help to capture health trajectories more comprehensively, and the institutional credibility of hospitals fosters trust and long term engagement of participants,33 particularly in contrast to cohorts sponsored by the private sector, which may face concerns about data use and independence.34 35

fulltextpubmed· Quality: greater complexity, lower fidelity· item 41130627

These structural strengths help to capture health trajectories more comprehensively, and the institutional credibility of hospitals fosters trust and long term engagement of participants,33 particularly in contrast to cohorts sponsored by the private sector, which may face concerns about data use and independence.34 35 These infrastructural advantages are complemented by a strong clinical orientation in clinician initiated designs. Cohort studies including ChinaHEART, the National Birth Cohort, and WHOLISM integrate multidisciplinary clinical input from the outset to define outcomes relevant to health, design appropriate follow-up intervals, and align with care pathways.36 During the covid-19 pandemic, the WHALE study maintained operational continuity through the hospital’s alliance network, preserving data integrity and cohort activity across both urban and rural sites.12 Furthermore, clinical feedback mechanisms in WHOLISM have enabled the validation of lung imaging data from community screening against health check-up records, facilitating earlier detection of pulmonary nodules and minimising unnecessary procedures.37 In this context, clinician initiated cohort studies provide more than just a platform for data collection—they enable clinically responsive research that is grounded in real world needs and built for long term public health value.

fulltextpubmed· Quality: greater complexity, lower fidelity· item 41130627

ilitating earlier detection of pulmonary nodules and minimising unnecessary procedures.37 In this context, clinician initiated cohort studies provide more than just a platform for data collection—they enable clinically responsive research that is grounded in real world needs and built for long term public health value. Despite these advantages, hospital based cohorts in China face challenges to representativeness owing to recruitment biases favouring urban or clinically engaged populations. To overcome this problem, cohort designers should systematically leverage hospital alliance networks by consolidating participants from routine health check-ups, extending enrolment to community dwelling residents, and purposefully integrating geographically distinct populations, such as those living in high altitude regions. This structured, multi-tiered approach ensures demographic diversity and geographical balance, enhancing both representativeness and societal relevance.

fulltextpubmed· Hospital involvement enhances scientific fidelity and clinical trust· item 41130627

Enhancing quality in large scale cohort studies requires both infrastructural and organisational capacity to support continuity, consistency, and public trust. Hospital based platforms such as the National Birth Cohort and WHOLISM benefit from well established alliance networks that enable unified data protocols, reduce duplication, and support both passive and active follow-up across geographical and socioeconomic settings.32 These structural strengths help to capture health trajectories more comprehensively, and the institutional credibility of hospitals fosters trust and long term engagement of participants,33 particularly in contrast to cohorts sponsored by the private sector, which may face concerns about data use and independence.34 35

fulltextpubmed· Moving forward: value oriented cohort studies in China· item 41130627

The challenges discussed—ranging from non-standardised variable diversity to uncontrolled participant expansion and declining research quality—are not isolated. Rather, they are deeply interdependent. Tackling the complexity of breadth, depth, and quality in Chinese cohort studies requires thoughtful trade-offs, strategic prioritisation, and purposeful design. More variables and more participants do not necessarily lead to higher quality, let alone deliver greater value. To move beyond this scale driven trend, future cohort initiatives must adopt a value oriented framework that balances scientific rigour, clinical relevance, and societal responsibility.

fulltextpubmed· Moving forward: value oriented cohort studies in China· item 41130627

nd purposeful design. More variables and more participants do not necessarily lead to higher quality, let alone deliver greater value. To move beyond this scale driven trend, future cohort initiatives must adopt a value oriented framework that balances scientific rigour, clinical relevance, and societal responsibility. Scientific value hinges on representative sampling, rigorous quality control, and harmonised protocols. Health value depends on alignment with clinical pathways and preventive strategies. Societal value, meanwhile, demands public trust, ethical governance, and equitable access. These priorities are increasingly embedded in China’s policy agenda, including the Healthy China 2030 initiative, which calls for integration of care delivery with public health to improve population wellbeing. In response, clinician initiated, hospital integrated cohort models have emerged as a promising direction. By embedding research within diagnostic workflows and treatment pathways, these models combine clinical depth with operational coherence. For example, polygenic risk scores derived from population-scale genomic data are now informing personalised prevention strategies, with 29 of 31 clinical trials based on polygenic risk scores registered by April 2025 led by hospital or academic centres. International efforts also reflect this shift: institutions such as Mass General Brigham in the US and the UK’s Our Future Health project illustrate how large scale cohorts can bridge observational research with health system interventions, supporting earlier detection, targeted prevention, and population level impact.38 39

fulltextpubmed· Moving forward: value oriented cohort studies in China· item 41130627

national efforts also reflect this shift: institutions such as Mass General Brigham in the US and the UK’s Our Future Health project illustrate how large scale cohorts can bridge observational research with health system interventions, supporting earlier detection, targeted prevention, and population level impact.38 39 These reflections reaffirm that in Chinese cohort studies, bigger is not always better. Instead of pursuing ever greater scale, future studies must move beyond breadth and depth to embrace a value oriented approach, grounded in scientific, health, and societal value. What matters more than scale is a holistic framework—one that prioritises strategic design, clinical integration, and long term responsibility. Initiatives such as ChinaHeart, the National Birth Cohort, and WHOLISM offer exploratory examples towards this shift by embedding prevention, screening, diagnosis, and treatment within a unified system. Although still evolving, such initiatives invite broader dialogue on how future cohorts can move beyond data accumulation towards meaningful health and social impact. Bigger is not always better—cohort studies must move beyond pursuing sheer volume to embrace a holistic, clinician initiated, and value oriented design that prioritises scientific credibility, health relevance, and societal responsibility Scientific value is often compromised by fragmented representativeness and poor data standardisation; this can be corrected through purposeful population inclusion and clinician initiated design across diverse healthcare settings

fulltextpubmed· Moving forward: value oriented cohort studies in China· item 41130627

Bigger is not always better—cohort studies must move beyond pursuing sheer volume to embrace a holistic, clinician initiated, and value oriented design that prioritises scientific credibility, health relevance, and societal responsibility Scientific value is often compromised by fragmented representativeness and poor data standardisation; this can be corrected through purposeful population inclusion and clinician initiated design across diverse healthcare settings Health value is hard to achieve through data complexity alone; it requires embedding of cohorts into clinical practice, leveraging tools such as target trial emulation, and developing real world interventions aligned with patients’ needs In this analysis, “breadth” is defined as the variety of collected variables (eg, omics, imaging, behavioural data) and “depth” as the number of participants