Browse the corpus
Walk the Even Hospital Database by book and chapter — the raw source passages that ground Ask, DDx, and the rest.
22 passages
China has initiated prospective cohort studies since the 1970s, and approximately 350 cohorts had been reported by August 2024 (fig 1). These cohorts are geographically decentralised but concentrated in economically developed eastern coastal regions, including Shanghai, Beijing, Jiangsu, and Guangdong. However, most of these cohort studies are relatively small in scale, with sample sizes typically under 10 000 participants (fig 2). Moreover, most large scale cohorts have been established in the past decade with relatively short follow-up periods (table 1). Temporal changes of cohort studies established in China. PubMed was searched for cohorts with relevant publications that could be identified as designed, prospective cohort studies. According to inclusion and exclusion criteria, 17 090 articles were identified as of September 2024. After exclusion of studies based on cross sectional designs, retrospective designs, case-control designs involving hospital specific patient populations, surveillance data, and hospital electronic health records, 514 articles met criteria. Among these, 403 studies had clearly defined baseline data, follow-up methods, and follow-up records. After de-duplication, 345 cohorts were included Size distribution of cohort studies established in China. Numbers marked on pie chart represent number of prospective cohort studies of corresponding sample size Typical large scale cohort studies in China (sorted by baseline time)
Temporal changes of cohort studies established in China. PubMed was searched for cohorts with relevant publications that could be identified as designed, prospective cohort studies. According to inclusion and exclusion criteria, 17 090 articles were identified as of September 2024. After exclusion of studies based on cross sectional designs, retrospective designs, case-control designs involving hospital specific patient populations, surveillance data, and hospital electronic health records, 514 articles met criteria. Among these, 403 studies had clearly defined baseline data, follow-up methods, and follow-up records. After de-duplication, 345 cohorts were included Size distribution of cohort studies established in China. Numbers marked on pie chart represent number of prospective cohort studies of corresponding sample size Typical large scale cohort studies in China (sorted by baseline time) We have divided the history of cohort research in China into three development stages (fig 3) and listed the representative cohorts of each stage: early stage (1970-90), primarily focused on occupational cohorts, with representative studies including leukaemia in benzene workers and cohort study among female workers in textile factories in Shanghai; development stage (1990-2000), dominated by general population cohorts, with notable cohorts such as China-PAR and Shanghai Women’s and Men’s Health Study; and rapid development stage (2000-20), during which large scale population cohorts emerged, including China Kadoorie Biobank (CKB) and China PEACE Million Persons Project, along with birth cohorts such as Jiangsu Birth Cohort and Wuhan Healthy Baby Cohort.
such as China-PAR and Shanghai Women’s and Men’s Health Study; and rapid development stage (2000-20), during which large scale population cohorts emerged, including China Kadoorie Biobank (CKB) and China PEACE Million Persons Project, along with birth cohorts such as Jiangsu Birth Cohort and Wuhan Healthy Baby Cohort. Timeline of developments in cohort studies in China The CKB study is a pioneer in large scale cohorts in China. The CKB covers five urban and five rural regions across China, with a sample size of more than 510 000 and follow-up for more than 20 years (table 1).9 By studying risk factors for all common diseases, especially major chronic diseases such as cardiovascular disease, cancer, chronic respiratory diseases, and diabetes, CKB has provided extensive evidence for developing and refining strategies and measures for disease prevention and control in China.16 For example, using East Asian specific genetic variants that strongly affect alcohol metabolism, CKB has, for the first time, reliably refuted the protective effects of moderate drinking on stroke, which may promote the alcohol control policy to change from alcohol restriction to alcohol withdrawal.
control in China.16 For example, using East Asian specific genetic variants that strongly affect alcohol metabolism, CKB has, for the first time, reliably refuted the protective effects of moderate drinking on stroke, which may promote the alcohol control policy to change from alcohol restriction to alcohol withdrawal. In recent years, under the promotion of the National Key R&D Program of the Ministry of Science and Technology of China, several large scale cohorts have gradually been established, such as China PEACE Million Persons Project (China PEACE-MPP), which was initiated on the basis of the Early Screening and Comprehensive Intervention Project for High Risk Groups of Cardiovascular Diseases program, as well as Jiangsu Birth Cohort (JBC) (table 1). The JBC aims to examine the relations between early life exposures and long term health outcomes. Since 2016, the JBC has enrolled more than 7000 families undergoing assisted reproductive technology and nearly 15 000 families of natural conception (table 1). The JBC has constructed a family based molecular atlas leveraging omics data and identified novel risk factors for a series of adverse outcomes, such as birth defects and neurodevelopment, particularly by comparing assisted reproductive technology and natural conception, as well as different techniques in assisted reproductive technology processes.17 18 The China PEACE-MPP is a government funded public health project that involved 1.7 million participants from all the 31 provinces of mainland China.19 It provides evidence on risk factors for cardiovascular disease in China—for example, familial hypercholesterolaemia was associated with elevated risk of incident coronary artery disease, so early screening for familial hypercholesterolaemia has public health significance for the prevention of coronary artery disease.20
t provides evidence on risk factors for cardiovascular disease in China—for example, familial hypercholesterolaemia was associated with elevated risk of incident coronary artery disease, so early screening for familial hypercholesterolaemia has public health significance for the prevention of coronary artery disease.20 Despite ongoing efforts, significant barriers remain to the establishment and sustainability of large scale cohorts in China. One major challenge is the difficulty in integrating cohort data with national registry or surveillance systems or diverse medical datasets. Furthermore, in the era of precision medicine, omics level analysis has become increasingly important, as understanding the underlying biological mechanisms is key to advancing personalised prevention, diagnosis, and treatment.21 However, the sustainable development of omics digitisation of cohort samples is still constrained by factors such as insufficient funding and incomplete follow-up data. In addition, the collection, preservation, and utilisation of population resources within the existing cohorts lack standardisation and systematisation, which impedes the long term advancement of cohort studies in China. In this analysis, we call for multi-stakeholder collaboration for sustainable development of large scale cohorts in China.
We recommend that the government undertakes top level scientific planning and long term, forward looking strategies, including the development of a national roadmap for cohort research, the establishment of expert advisory mechanisms, coordinated integration of resources, and a comprehensive monitoring system. Such efforts, alongside sustained support and promoting multi-stakeholder collaboration, will facilitate the development of large scale, high quality, standardised and open national cohorts, providing robust population based evidence to drive original innovation and translational applications in China’s health sciences sector.
orts, alongside sustained support and promoting multi-stakeholder collaboration, will facilitate the development of large scale, high quality, standardised and open national cohorts, providing robust population based evidence to drive original innovation and translational applications in China’s health sciences sector. The diverse objectives of cohort studies, as well as their long term nature and high costs, mean that China’s existing cohort studies lack systematic government level guidance and long term sustainable mechanisms; they are characterised as small scale, fragmented, disorganised, and unsustainable. In light of this, government guidance and policy prioritisation are essential for advancing national large scale cohorts. The cohort construction efforts of high income countries offer valuable experiences. For instance, the “All of Us Research Program” (AoURP), led by the US government and centrally managed by the National Institutes of Health,22 offers a useful reference. It involves more than 100 institutions across the country, with centralised management of biological samples and data, standardised procedures, and a strong legal and technological foundation to support data security and open sharing. However, directly replicating this model in China poses challenges, given the country’s administrative complexity, regional disparities, and differences in research governance and public engagement. Key features of AoURP—such as centralised coordination, standardisation, and multi-sector collaboration—can be adapted to China’s context through phased implementation, pilot programmes, and stronger inter-agency mechanisms. Tailoring these elements to China’s unique political and healthcare system will be critical.
t. Key features of AoURP—such as centralised coordination, standardisation, and multi-sector collaboration—can be adapted to China’s context through phased implementation, pilot programmes, and stronger inter-agency mechanisms. Tailoring these elements to China’s unique political and healthcare system will be critical. To promote the sustainable development of large scale cohorts in our context, policy efforts should focus on three key areas. Firstly, the government should strengthen sustained policy support for cohort development, including the formulation of national strategies and long term plans, the establishment of standardised data sharing frameworks, enhanced ethical review systems, stable and long term funding mechanisms, and cross sector coordination platforms. Priority should be given to cohorts that have already achieved significant scale and demonstrated clear scientific and public health value.
plans, the establishment of standardised data sharing frameworks, enhanced ethical review systems, stable and long term funding mechanisms, and cross sector coordination platforms. Priority should be given to cohorts that have already achieved significant scale and demonstrated clear scientific and public health value. Secondly, in terms of organisation and management, an integrated mechanism should be established to streamline cohort construction and operation. This mechanism could include centralised oversight by health departments, close collaboration with scientists as key leaders of cohort design and execution, and coordination among various organisations and institutions nationwide. Scientists play a pivotal role in large scale cohort studies, providing the expertise and leadership necessary to ensure scientific rigour and successful implementation. Such a system would not only improve the efficiency, feasibility, standardisation, and regulation of cohort construction but would also facilitate the integration and sharing of data across multiple systems, breaking down existing silos and enhancing interoperability.
re scientific rigour and successful implementation. Such a system would not only improve the efficiency, feasibility, standardisation, and regulation of cohort construction but would also facilitate the integration and sharing of data across multiple systems, breaking down existing silos and enhancing interoperability. Thirdly, the government should encourage data sharing both within and across cohort studies to support in-depth research on region specific disease burdens. In alignment with the Healthy China 2030 initiative, which emphasises data driven health decision making, a multilevel data management system should be established to enable efficient data storage, integration, and security. Existing platforms and policy mechanisms—such as the National Population Health Data Center and the National Natural Science Foundation of China—can be leveraged to provide the necessary infrastructure, institutional frameworks, and performance based incentives for data sharing. Furthermore, the development of inter-agency coordination mechanisms and unified technical standards will be essential to ensuring secure, standardised, and ethical data sharing. This will promote cross regional collaboration and high quality scientific research.
orks, and performance based incentives for data sharing. Furthermore, the development of inter-agency coordination mechanisms and unified technical standards will be essential to ensuring secure, standardised, and ethical data sharing. This will promote cross regional collaboration and high quality scientific research. As shown in table 1, the diverse characteristics of existing cohorts in China have resulted in inconsistencies in methods for exposure and outcome assessment, measurement techniques, reference standards, and implementation criteria. This fragmentation leads to independent data structures across cohorts, making efficiently merging and using data challenging, thereby reducing collaboration and overall research efficiency. To overcome these obstacles, the construction of large scale cohorts in China should draw on the experience of international cohort consortiums, focusing on establishing standards and data governance frameworks.
iciently merging and using data challenging, thereby reducing collaboration and overall research efficiency. To overcome these obstacles, the construction of large scale cohorts in China should draw on the experience of international cohort consortiums, focusing on establishing standards and data governance frameworks. A relevant example is the US National Cancer Institute Cohort Consortium, an extramural-intramural partnership designed to facilitate large scale cohort collaborations that pool the extensive data and biospecimens needed for diverse cancer studies.23 Key strategies include sharing linkage algorithms for various exposures and outcomes from multiple sources (for example, electronic medical records, registries, geospatial databases) and developing algorithms to harmonise commonly used data elements. The consortium also focuses on establishing procedures for validating measurement instruments and standardising calibration in pooled analyses. Similarly, cohort consortiums in China should focus on specific research areas, bringing together public health researchers, clinical experts, data scientists, bioinformaticians, information technology specialists, and legal experts, while fostering collaboration with international cohorts. These partnerships would enable the creation of comprehensive standards for cohort development, covering data collection and storage, database management, sample management, quality control, and follow-up procedures. Implementing these standards through collaborative consortiums would be a crucial step in advancing the standardisation of cohort studies in China.
of comprehensive standards for cohort development, covering data collection and storage, database management, sample management, quality control, and follow-up procedures. Implementing these standards through collaborative consortiums would be a crucial step in advancing the standardisation of cohort studies in China. The construction of large scale cohorts undoubtedly requires substantial financial investment, especially for developing omics databases. Drawing on international examples, the UK Biobank has adopted a diversified funding structure, with major contributions from government agencies such as the UK Medical Research Council and charitable organisations such as the Wellcome Trust. Additionally, several drug companies in the UK offer support through both financial contributions and technical expertise for specific enhancements to the study, such as the generation of full genome sequencing and proteomics data; companies are given a short exclusivity period to use the data (generally nine months) before the data are released to the wider research community.
through both financial contributions and technical expertise for specific enhancements to the study, such as the generation of full genome sequencing and proteomics data; companies are given a short exclusivity period to use the data (generally nine months) before the data are released to the wider research community. In China, the development of existing cohorts relies heavily on government funding, which is often limited in scope and scale, making maintaining consistent and stable financial support over the long term difficult. To overcome this, the government should provide infrastructure funding for large national cohorts, ensuring their public service role, openness, and long term sustainability. However, relying solely on government support is insufficient. Exploring diversified funding models is essential. For instance, charitable funds and commercial investments could be allocated to support research on rare diseases and generate multi-omics data within large cohorts. Access fees collected through data sharing systems can serve as an important source of funding to support the ongoing cohorts. This diversified approach would gradually create a self-sustaining mechanism for large scale cohorts, whereby the government leads but multiple sectors participate. Moreover, funding mechanisms should focus on clarifying cohort data ownership (that is, confirming ownership rights and intellectual property), realising data value (that is, extracting and leveraging the potential of the data), and enabling data flow (that is, establishing systems for data sharing and circulation). Enhancing these areas would attract greater industry participation and investment, fostering the development of both research and the associated industries.
data value (that is, extracting and leveraging the potential of the data), and enabling data flow (that is, establishing systems for data sharing and circulation). Enhancing these areas would attract greater industry participation and investment, fostering the development of both research and the associated industries. Integrating cohort data with medical big data is essential for the sustainable development of large cohort studies. However, data collection faces significant challenges owing to the fragmentation of information when patients visit different hospitals, as well as the siloed management of various systems. To overcome this, high quality, large scale cohorts should be prioritised as pilot projects and advanced artificial intelligence methods and technologies leveraged to facilitate seamless data integration between disease control and healthcare systems. This would promote the establishment of an open and shared medical big data framework. Moreover, incorporating openness and sharing into the evaluation criteria for registry system development would further drive the comprehensive openness and integration of medical big data in China.
se control and healthcare systems. This would promote the establishment of an open and shared medical big data framework. Moreover, incorporating openness and sharing into the evaluation criteria for registry system development would further drive the comprehensive openness and integration of medical big data in China. At the same time, the storage, sharing, analysis, and mining of data from large cohorts carry potential risks to personal information rights, privacy, and data security. Further strengthening regulatory mechanisms, refining legal frameworks for classified data protection, and clearly defining the responsibilities and rights across data mining, storage, transmission, publication, and reuse are crucial. This will ensure data security and regulated use while unlocking the full potential of large cohort datasets.
The diverse objectives of cohort studies, as well as their long term nature and high costs, mean that China’s existing cohort studies lack systematic government level guidance and long term sustainable mechanisms; they are characterised as small scale, fragmented, disorganised, and unsustainable. In light of this, government guidance and policy prioritisation are essential for advancing national large scale cohorts. The cohort construction efforts of high income countries offer valuable experiences. For instance, the “All of Us Research Program” (AoURP), led by the US government and centrally managed by the National Institutes of Health,22 offers a useful reference. It involves more than 100 institutions across the country, with centralised management of biological samples and data, standardised procedures, and a strong legal and technological foundation to support data security and open sharing. However, directly replicating this model in China poses challenges, given the country’s administrative complexity, regional disparities, and differences in research governance and public engagement. Key features of AoURP—such as centralised coordination, standardisation, and multi-sector collaboration—can be adapted to China’s context through phased implementation, pilot programmes, and stronger inter-agency mechanisms. Tailoring these elements to China’s unique political and healthcare system will be critical.
As shown in table 1, the diverse characteristics of existing cohorts in China have resulted in inconsistencies in methods for exposure and outcome assessment, measurement techniques, reference standards, and implementation criteria. This fragmentation leads to independent data structures across cohorts, making efficiently merging and using data challenging, thereby reducing collaboration and overall research efficiency. To overcome these obstacles, the construction of large scale cohorts in China should draw on the experience of international cohort consortiums, focusing on establishing standards and data governance frameworks.
The construction of large scale cohorts undoubtedly requires substantial financial investment, especially for developing omics databases. Drawing on international examples, the UK Biobank has adopted a diversified funding structure, with major contributions from government agencies such as the UK Medical Research Council and charitable organisations such as the Wellcome Trust. Additionally, several drug companies in the UK offer support through both financial contributions and technical expertise for specific enhancements to the study, such as the generation of full genome sequencing and proteomics data; companies are given a short exclusivity period to use the data (generally nine months) before the data are released to the wider research community.
Integrating cohort data with medical big data is essential for the sustainable development of large cohort studies. However, data collection faces significant challenges owing to the fragmentation of information when patients visit different hospitals, as well as the siloed management of various systems. To overcome this, high quality, large scale cohorts should be prioritised as pilot projects and advanced artificial intelligence methods and technologies leveraged to facilitate seamless data integration between disease control and healthcare systems. This would promote the establishment of an open and shared medical big data framework. Moreover, incorporating openness and sharing into the evaluation criteria for registry system development would further drive the comprehensive openness and integration of medical big data in China. At the same time, the storage, sharing, analysis, and mining of data from large cohorts carry potential risks to personal information rights, privacy, and data security. Further strengthening regulatory mechanisms, refining legal frameworks for classified data protection, and clearly defining the responsibilities and rights across data mining, storage, transmission, publication, and reuse are crucial. This will ensure data security and regulated use while unlocking the full potential of large cohort datasets.
China’s cohort research has undergone a notable transformation over the past few decades, evolving from small scale, single site investigations to a growing number of large scale, multicentre studies supported by advances in technology and increasing policy attention. Despite these achievements, current cohort efforts remain fragmented, under-resourced, and lacking in unified standards, limiting their potential to generate robust and generalisable evidence. China is uniquely positioned to leverage its vast population base, expanding health data infrastructure, and growing scientific capacity to build next generation, high quality national cohorts. By learning from international best practices while tailoring approaches to its unique governance and healthcare context, China can establish a sustainable, standardised, and inclusive cohort ecosystem. This will not only enhance domestic health research and precision medicine development but also contribute valuable models and experience to other developing countries embarking on similar cohort building journeys. Evidence from large scale cohorts in high income countries often does not apply to China’s unique health challenges, necessitating local cohort studies to guide effective health policies In China, a large developing country, many cohorts have small sample sizes and short follow-up periods and are characterised by fragmented and disorganised efforts; thus, a sustainable strategy for large scale cohort development is urgently needed
Evidence from large scale cohorts in high income countries often does not apply to China’s unique health challenges, necessitating local cohort studies to guide effective health policies In China, a large developing country, many cohorts have small sample sizes and short follow-up periods and are characterised by fragmented and disorganised efforts; thus, a sustainable strategy for large scale cohort development is urgently needed Government policy guidance, the development of multidisciplinary consortia, innovative funding strategies, and a robust framework for data integration and sharing are essential for promoting the sustainable development of large scale cohorts in China