Browse the corpus
Walk the Even Hospital Database by book and chapter — the raw source passages that ground Ask, DDx, and the rest.
71 passages
The value of standards for health datasets in artificial intelligence-based applications. Artificial intelligence as a medical device is increasingly being applied to healthcare for diagnosis, risk stratification and resource allocation. However, a growing body of evidence has highlighted the risk of algorithmic bias, which may perpetuate existing health inequity. This problem arises in part because of systemic inequalities in dataset curation, unequal opportunity to participate in research and inequalities of access. This study aims to explore existing standards, frameworks and best practices for ensuring adequate data diversity in health datasets. Exploring the body of existing literature and expert views is an important step towards the development of consensus-based guidelines. The study comprises two parts: a systematic review of existing standards, frameworks and best practices for healthcare datasets; and a survey and thematic analysis of stakeholder views of bias, health equity and best practices for artificial intelligence as a medical device. We found that the need for dataset diversity was well described in literature, and experts generally favored the development of a robust set of guidelines, but there were mixed views about how these could be implemented practically. The outputs of this study will be used to inform the development of standards for transparency of data diversity in health datasets (the STANDING Together initiative).
Recent years have seen a rapid rise in the development of artificial intelligence (AI) systems for use in healthcare, including those that qualify as a medical device (known as AI as a medical device, AIaMD). This has been enabled by increasing use of electronic health records, accompanied by curation of large-scale health datasets1. However, there are credible concerns that many datasets inadequately reflect the diversity of the individuals or groups contained in the population they are intended to represent. This has previously been described as ‘Health Data Poverty’: a phenomenon where individuals or groups who are underrepresented in health datasets are less able to benefit from data-driven innovations developed using these datasets, including AIaMD2. There is a growing concern that non-diverse and non-representative data contribute to the creation of biased algorithms, resulting in less accurate performance in certain patient groups. Therefore, it is well-recognized that an essential component of ensuring algorithmic safety is to guarantee that datasets are appropriately diverse and representative of their intended use population3.
ata contribute to the creation of biased algorithms, resulting in less accurate performance in certain patient groups. Therefore, it is well-recognized that an essential component of ensuring algorithmic safety is to guarantee that datasets are appropriately diverse and representative of their intended use population3. Data diversity, as measured by equal or relative representation alone, is not enough to achieve equitable outcomes. Even when individuals are represented proportionally in datasets, other forms of bias may be inherently embedded in the representation of those individuals’ data. For example, a dataset may include a proportion of individuals from an ethnic group that is in keeping with national census data (adequate numerical representation), but included individuals from this ethnic group could have a systematically higher likelihood of being misdiagnosed than in the sampled population, meaning the insights derived from this data can remain biased. Although a principal focus of our work is data diversity, we also advocate for a broader view of representativeness in health data, including awareness of the limitations of data collection, data accuracy and ethical concerns around the use of data in minoritized and underserved groups.
rom this data can remain biased. Although a principal focus of our work is data diversity, we also advocate for a broader view of representativeness in health data, including awareness of the limitations of data collection, data accuracy and ethical concerns around the use of data in minoritized and underserved groups. Reasons for underrepresentation in datasets broadly fall into two categories: factors that cause individuals or groups to be absent from datasets and factors that cause individuals to be incorrectly or inappropriately categorized into groups despite being present (for example, categories of ‘mixed ethnicity’ or ‘other’). Root causes may include structural barriers to receiving healthcare; barriers to the capture or digitization of relevant health data; individual and structural barriers reducing consent for data sharing; data aggregation, redaction or recoding; collecting data with insufficient granularity; and legal or ethical restrictions on data sharing preventing data accessibility (Fig. 1)2,4,5. The composition and diversity of teams involved in AIaMD development is also critical—teams should include people from different backgrounds as well as those with lived experience of the use case (for example, patients and the public).Fig. 1Individuals may be underrepresented in datasets for many reasons.Barriers may be present that prevent data about entire groups of people from being included in the dataset (A). They may include barriers to accessing health or social care (meaning data are not generated), inadvertent or deliberate exclusion by the dataset curators or absence of electronic health records (meaning data are not digitized). Certain individuals may be less likely to enter datasets (B); for example, when individuals choose not to allow their data to be included, when methods for data collection are exclusionary (for instance, forcing a binary choice of ‘male’ or ‘female’ for gender), or when redaction occurs after data collection because of legal or ethical restrictions on data sharing. Data may not be collected in sufficient detail, leading to data loss (C; for instance, capturing age in categorical bands such as 20–29 rather than as a continuous variable).
hoice of ‘male’ or ‘female’ for gender), or when redaction occurs after data collection because of legal or ethical restrictions on data sharing. Data may not be collected in sufficient detail, leading to data loss (C; for instance, capturing age in categorical bands such as 20–29 rather than as a continuous variable). Groups of individuals with distinct personal attributes may be merged into a different group either at the point of data collection or by preprocessing after collection (D; for instance, requiring ethnicity or race to be selected from a small list of choices during data capture or combining ethnicity or race groupings into a larger, aggregate group after data capture).
utes may be merged into a different group either at the point of data collection or by preprocessing after collection (D; for instance, requiring ethnicity or race to be selected from a small list of choices during data capture or combining ethnicity or race groupings into a larger, aggregate group after data capture). Barriers may be present that prevent data about entire groups of people from being included in the dataset (A). They may include barriers to accessing health or social care (meaning data are not generated), inadvertent or deliberate exclusion by the dataset curators or absence of electronic health records (meaning data are not digitized). Certain individuals may be less likely to enter datasets (B); for example, when individuals choose not to allow their data to be included, when methods for data collection are exclusionary (for instance, forcing a binary choice of ‘male’ or ‘female’ for gender), or when redaction occurs after data collection because of legal or ethical restrictions on data sharing. Data may not be collected in sufficient detail, leading to data loss (C; for instance, capturing age in categorical bands such as 20–29 rather than as a continuous variable). Groups of individuals with distinct personal attributes may be merged into a different group either at the point of data collection or by preprocessing after collection (D; for instance, requiring ethnicity or race to be selected from a small list of choices during data capture or combining ethnicity or race groupings into a larger, aggregate group after data capture).
utes may be merged into a different group either at the point of data collection or by preprocessing after collection (D; for instance, requiring ethnicity or race to be selected from a small list of choices during data capture or combining ethnicity or race groupings into a larger, aggregate group after data capture). Examples of this lack of diversity in datasets have been previously highlighted in several health areas, including radiology, ophthalmology and dermatology6–8. There are further concerns that models may encode biases relating to demographic characteristics even when they are not explicitly trained to do so. This leads to the potential for ‘unknown’ biases reflected in health datasets to become unknowingly and unintentionally embedded in models derived from them9,10.
ology6–8. There are further concerns that models may encode biases relating to demographic characteristics even when they are not explicitly trained to do so. This leads to the potential for ‘unknown’ biases reflected in health datasets to become unknowingly and unintentionally embedded in models derived from them9,10. Despite widespread acknowledgement that inclusiveness is a core tenet of ethical AI in healthcare11–13, there remains a shortage of guidance on how to apply such principles in the curation, aggregation and use of health data. The issue of producing data relating to healthcare disparities has previously been explored, with recommendations for data-collection practices, but there are novel challenges in the specific context of AI research14,15. Generic guidelines exist for the improvement of datasets, including a view to reduce healthcare inequalities by promoting patient voice, accurate variables and data linkage14. A commonly raised concern is the reporting of race/ethnicity data, which is variably collected with fragmented and diverse data-collection practices16. The US Food and Drug Administration (FDA) asks for demographic information and inclusion criteria for data collection to be provided if available but does not mandate certain levels of representation across demographic groups in datasets used in AIaMD17. This is a concern particularly pertinent to the field of AIaMD because of the risk of systemic algorithmic bias if models are trained on biased training datasets. AIaMD algorithms learn patterns in the training data and use this to generate predictions when applied to new data. If the data used for training an algorithm are biased against particular demographic groups, the algorithm is likely to underperform when applied to those groups in the real world. Beyond algorithmic bias, diversity in datasets has wider benefits in improving algorithmic performance. A diverse dataset helps AIaMD models generalize their learnings to new and unseen cases. Without diversity, models may perform well on common cases but struggle with unusual or underrepresented ones.
the real world. Beyond algorithmic bias, diversity in datasets has wider benefits in improving algorithmic performance. A diverse dataset helps AIaMD models generalize their learnings to new and unseen cases. Without diversity, models may perform well on common cases but struggle with unusual or underrepresented ones. This Analysis aims to explore existing standards, frameworks and best practices that improve data diversity in health datasets in the context of AIaMD. It comprises two parts: (1) a systematic review of the published literature for existing standards, frameworks and best practices; and (2) a survey of stakeholder views to understand how issues of bias and health equity are tackled at present for AIaMD and how best practices can be promoted in the future. This work is part of the STANDING Together initiative (standards for data diversity, inclusivity and generalizability), a program that seeks to develop consensus-driven standards for health data to promote health equity; further information is available at www.datadiversity.org (ref. 18).
Database searches yielded 10,646 unique records, of which 100 remained after title and abstract screening (Fig. 2). Most of the 10,646 records that were screened did not meet the inclusion criteria, addressing neither health equity nor AIaMD. A further 35 records were screened after identification through reference lists. We identified seven arXiv preprints for the analysis through this method. After full-text screening, 30 relevant records were included.Fig. 2PRISMA flow diagram for systematic review.The breadth of the search strategy meant most of the 10,646 records that were screened were irrelevant and did not meet any of the inclusion criteria, addressing neither health equity nor AIaMD. The breadth of the search strategy meant most of the 10,646 records that were screened were irrelevant and did not meet any of the inclusion criteria, addressing neither health equity nor AIaMD. Of these 30 records, 17 were identified from bibliographic databases, 9 from searches of reference lists and 4 from searches of unindexed conference proceedings (Table 1). All 30 were published between July 2015 and February 2022. Of the 30 records, 1 was published in 2015, 1 in 2017 and 28 since 2018, showing acceleration in the rate of academic discussion around this topic. Most records were authored by individuals from the same department in one institution (14 of 30; 47%), with only 4 of 30 (13%) authored by interdisciplinary, international teams19–22. Twenty of 30 records (67%) were found in journals, 7 of 30 (23%) were found as preprints and 4 of 30 (13%) were found in conference proceedings.
ords were authored by individuals from the same department in one institution (14 of 30; 47%), with only 4 of 30 (13%) authored by interdisciplinary, international teams19–22. Twenty of 30 records (67%) were found in journals, 7 of 30 (23%) were found as preprints and 4 of 30 (13%) were found in conference proceedings. Twenty-seven of 30 (90%) were available open access.Table 1Characteristics of included articlesStudy IDTitle (year)Article typeDescription of recordAccessType of collaborationSR1Heterogeneity/granularity in ethnicity classifications project: the need for refining assessment of health status (2018)46Journal articleDescription of how ethnicity is recorded across different EU countries; some collect highly granular data, some allow free text expression, others allow only limited categoriesOpen accessInternational collaborationSR2Bringing the people back In: contesting benchmark machine learning datasets (2020)47PreprintOutlines the concept of benchmark datasets as a form of research infrastructure and key factors that may influence a dataset’s value and utilityOpen accessNational collaborationSR3A framework for understanding sources of harm throughout the machine learning life cycle (2019)24PreprintMaps where biases may cause harm during a ML development pipelineOpen accessSingle institutionSR4Datasheets for datasets (2018)32PreprintIntroduces a ‘Datasheet’ artifact, allowing dataset curators to provide a comprehensive, structured and standardized description of a dataset’s composition and the context in which it has been curatedOpen accessNational collaborationSR5The dataset nutrition label: a framework to drive higher data quality standards (2018)48PreprintIntroduces a ‘Nutrition label’ artifact, allowing dataset curators to provide a structured, standardized summary of a dataset’s compositionOpen accessNational collaborationSR6Ensuring that biomedical AI benefits diverse populations (2021)11Journal articleHighlights how AI development can cause biases and health disparity. Also indicates both short-term and longer-term solutions to mitigate some of these factorsOpen accessSingle institutionSR7How to design AI for social good: seven essential factors (2020)49Journal articleIdentifies and explains seven essential ethical factors to consider when developing AI for social good.
th disparity. Also indicates both short-term and longer-term solutions to mitigate some of these factorsOpen accessSingle institutionSR7How to design AI for social good: seven essential factors (2020)49Journal articleIdentifies and explains seven essential ethical factors to consider when developing AI for social good. Each factor is followed by a recommendation for developers who are seeking to develop AI that promotes social goodOpen accessNational collaborationSR8Identifying ethical considerations for machine learning healthcare applications (2020)50Journal articleFramework linking the ML development pipeline to evaluation and oversight of these technologies, highlighting where along this joint pathway ethical considerations and value-based issues may ariseClosed accessNational collaborationSR9Indigenous and tribal peoples data governance in health research: a systematic review (2021)51Journal articleSystematic review of data governance frameworks, processes, policies and practices for indigenous and tribal peoplesOpen accessSingle institutionSR10MINIMAR (MINimum Information for Medical AI Reporting): developing reporting standards for artificial intelligence in health care (2020)52Journal articleMinimum reporting standards for studies of medical AI, relating to the study population and setting, patient demographic characteristics, model architecture and model evaluationOpen accessSingle institutionSR11Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities (2020)53Journal articleEthical discussion about the differences between algorithmic fairness and bias and a summary of different definitions of fairnessOpen accessSingle institutionSR12The reporting of race and ethnicity in medical and science journals: comments invited (2021)26Journal articleGuidance for reporting ethnicity and race in research articles specifically for JAMA Network journalsOpen accessSingle institutionSR13Ethical limitations of algorithmic fairness solutions in health care machine learning (2020)54Journal articleCommentary on how framing algorithmic fairness as entirely a technical problem can contribute to or cause health inequity unless social factors are also consideredOpen accessNational collaborationSR14Missed policy opportunities to advance health equity by recording demographic data in electronic health records (2015)55Journal articleDescription of how different US bodies and organizations take different approaches to collec
ealth inequity unless social factors are also consideredOpen accessNational collaborationSR14Missed policy opportunities to advance health equity by recording demographic data in electronic health records (2015)55Journal articleDescription of how different US bodies and organizations take different approaches to collec ting demographic data, including using different categories, which limits crosslinking between data sourcesClosed accessSingle institutionSR15Clinical collabsheets: 53 questions to guide a clinical collaboration (2020)22Conference proceedingsA guide to collaborating between clinicians and computer scientists to develop models in interdisciplinary teams across eight development stagesOpen accessMultidisciplinary international collaborationSR16Ethical machine learning in healthcare (2021)5Journal articleOverview of the five key stages in the healthcare ML model development pipeline, overlaying points at which ethical issues may ariseOpen accessInternational collaborationSR17Addressing health disparities in the Food and Drug Administration’s artificial intelligence and machine learning regulatory framework (2020)23Journal articleCommentary about how health disparities might be considered by the FDA software as a medical-device regulatory framework, through integration of premarket review and good ML practices and postmarket real-world performance monitoringOpen accessSingle institutionSR18Model cards for model reporting (2018)56PreprintIntroduces a ‘Model card’ artifact, encouraging transparent reporting of ML model performance characteristicsOpen accessNational collaborationSR19Canada protocol: an ethical checklist for the use of artificial intelligence in suicide prevention and mental health (2019)57PreprintAn ethical checklist for the use of AI in mental health and suicide prevention, validated by two-round Delphi consultation.
performance characteristicsOpen accessNational collaborationSR19Canada protocol: an ethical checklist for the use of artificial intelligence in suicide prevention and mental health (2019)57PreprintAn ethical checklist for the use of AI in mental health and suicide prevention, validated by two-round Delphi consultation. Note that a version of this record was subsequently published closed access in a journal57Open accessSingle institutionSR20Aequitas: a bias and fairness audit toolkit. (2018)58PreprintAn open-source bias audit toolkit to allow ML developers, analysts and policymakers to assess AI systems for biased outputsOpen accessSingle institutionSR21AI-assisted decision-making in healthcare: the application of an ethics framework for big data in health and research (2019)25Journal articleA discussion of key ethical issues involved with AI implementation in healthcare, with specific case study examplesOpen accessNational collaborationSR22An ethics framework for big data in health and research (2019)59Journal articleA framework of values underpinning ethical design of AI in healthcare, developed by a working group with expert feedbackOpen accessInternational collaborationSR23Artificial intelligence for genomic medicine—a policy analysis (2020)60Conference proceedingsPractical recommendations for policymakers in the field of AI and genomic medicine, exploring the drivers behind the use of AI in genomics, current applications and limitations and challengesOpen accessSingle institutionSR24Big data science: opportunities and challenges to address minority health and health disparities in the 21st century (2017)61Journal articleA discussion of how big data science can be used to address minority health issues and actively reduce health disparities by changing the types and mechanisms of electronic health-data capture and enabling studies into health disparities. Also provides a series of recommendations to achieve these aimsOpen accessNational collaborationSR25Ensuring fairness in machine learning to advance health equity (2018)62Journal articleDescribes how health disparities can be worsened by model design, data biases and interpretation by patients and clinicians.
parities. Also provides a series of recommendations to achieve these aimsOpen accessNational collaborationSR25Ensuring fairness in machine learning to advance health equity (2018)62Journal articleDescribes how health disparities can be worsened by model design, data biases and interpretation by patients and clinicians. Recommends that proactive distributive justice be incorporated into models to ensure equality in patient outcomes, resource allocation and model performanceOpen accessSingle institutionSR26Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics and effectiveness (2020)19Journal articleFramework for interdisciplinary groups researching, generating or implementing ML models to determine a model’s potential to benefit patients. Focuses on transparency, replicability, ethics and effectivenessOpen accessMultidisciplinary international collaborationSR27Do no harm: a roadmap for responsible machine learning for health care (2019)21Journal articleA set of principles promoting practices that enable acceleration of translation of ethical and effective ML models in healthcare, spanning problem selection, development, ethical considerations, evaluation and reporting, deployment and postmarket considerationsOpen accessMultidisciplinary international collaborationSR28Addressing fairness, bias and appropriate use of artificial intelligence and machine learning in global health (2021)63Journal articleA framework for those deploying ML algorithms in low- and middle-income countries, focusing on determining whether a model is appropriately matched to the local context and target population, identifying biased performance and considering implications for fairnessClosed accessSingle institutionSR29Artificial intelligence, bias and clinical safety (2019)20Conference proceedingsDiscussion of potential medical AI errors and biases and presentation of quality-control questions enabling critical appraisal of medical AI research and highlighting potential pitfalls for future researchersOpen accessMultidisciplinary international collaborationSR30Healthsheet: development of a transparency artifact for health datasets (2022)33Journal articleIntroduces a ‘Healthsheet’ artifact, allowing healthcare dataset curators to provide a comprehensive, structured and standardized description of a dataset’s composition and the context in which it has been curated.
orationSR30Healthsheet: development of a transparency artifact for health datasets (2022)33Journal articleIntroduces a ‘Healthsheet’ artifact, allowing healthcare dataset curators to provide a comprehensive, structured and standardized description of a dataset’s composition and the context in which it has been curated. Related to the ‘Datasheet’ artifact, but adapted for healthcare datasetsOpen accessSingle institutionResults of literature search, including sources found through journal database searches, preprint servers and reference lists. Characteristics of included articles Results of literature search, including sources found through journal database searches, preprint servers and reference lists.
Related to the ‘Datasheet’ artifact, but adapted for healthcare datasetsOpen accessSingle institutionResults of literature search, including sources found through journal database searches, preprint servers and reference lists. Characteristics of included articles Results of literature search, including sources found through journal database searches, preprint servers and reference lists. Data were extracted from all 30 records to derive key themes (Extended Data Table 1). Transparency around data-collection practices was a major theme (n = 13), with particular focus on the need for clarity about how data were sampled and for what purpose, how demographic categories were assigned and details of any preprocessing of data. Some articles highlighted the importance of reporting existing health inequalities affecting those included in a dataset, allowing data users to take steps to avoid exacerbating these23,24. The motivation for collecting the data and whether informed consent was obtained from participants was discussed by 11 articles. Furthermore, as legal requirements vary across jurisdictions, there was little consensus about whether or how consent should be obtained from subjects in datasets. Finally, two records discussed how data quality can be improved by involving clinical experts when developing the data-collection strategy5,25.
11 articles. Furthermore, as legal requirements vary across jurisdictions, there was little consensus about whether or how consent should be obtained from subjects in datasets. Finally, two records discussed how data quality can be improved by involving clinical experts when developing the data-collection strategy5,25. Missing data were discussed in 10 of 30 records (Extended Data Table 2). Transparency about the causes, extent and consequences of missing data is encouraged, as is transparency about any steps taken to address them. It is understood that increased information about the amount of missing data in a dataset will promote transparency and considerations about appropriateness of use, given that usefulness and generalizability of AI and machine learning (ML) models are impaired by missing data19. Research into missing data also revealed a recommendation that aggregation of demographic groups or variables into a smaller number of groups should be reported, which would similarly help researchers understand the limitations of the dataset26.
ity of AI and machine learning (ML) models are impaired by missing data19. Research into missing data also revealed a recommendation that aggregation of demographic groups or variables into a smaller number of groups should be reported, which would similarly help researchers understand the limitations of the dataset26. Data labeling was addressed by 8 of 30 records (Extended Data Table 3). Transparency was again an overarching theme: articles discuss the need to report how labeling was performed—particularly whether labels represent ground truths and whether there are known or potential biases in labels, such as if they were reported by humans (and therefore subject to inter-reporter variability). One article encouraged analysis of statistical relationships between labels and demographic factors so that potential confounders can be identified and controlled for during model development5.
or potential biases in labels, such as if they were reported by humans (and therefore subject to inter-reporter variability). One article encouraged analysis of statistical relationships between labels and demographic factors so that potential confounders can be identified and controlled for during model development5. A recurring theme was the identification of groups at risk of harm. These groups were variably described by authors as “vulnerable”, “minority”, “minoritized”, “underserved”, “marginalized” and “protected”. The cross-cutting theme was that these are groups considered to be more susceptible to physical, social or economic harm. The issue of certain groups being at greater risk of vulnerability was discussed in 24 of 30 records (Extended Data Table 4). However, there was little consensus about how biases should be addressed or which groups are most at risk. Suggested approaches to identify and reduce bias and harms for demographic subgroups included predefining groups suspected to be at risk, targeting data collection and model development to benefit these groups in particular, ensuring that representatives from at-risk groups are involved with model development (including as experts in a development team, such as developers, programmers and analysts), testing data for confounders rather than automatically including all features in training data and testing model performance in minoritized subgroups. Attributes specified as being particularly at risk of harm with underrepresentation include ethnicity, race, pregnancy status, age, nationality, gender, sex, socioeconomic status, religion, indigenous and tribal community membership, disability status, sexual orientation, preferred language, Fitzpatrick skin type, health status, education, employment status, geographical location and marital status.
n include ethnicity, race, pregnancy status, age, nationality, gender, sex, socioeconomic status, religion, indigenous and tribal community membership, disability status, sexual orientation, preferred language, Fitzpatrick skin type, health status, education, employment status, geographical location and marital status. Whereas the systematic review provided an oversight of current best-practice principles for health datasets in AI, the stakeholder survey provided insights into how principles could be operationalized and by whom. Twenty participants completed the scoping survey. Of these participants, ten (50%) reported their sex as female, nine (45%) reported their sex as male, and one (5%) did not provide this information. Eighteen participants (90%) reported that their gender identity was the same as the sex registered at birth, one participant (5%) reported that their gender identity was different than their sex registered at birth and one participant (5%) did not provide this information. Four main themes and 17 subthemes were identified (Table 2).Table 2Themes arising from the stakeholder surveyThemesSubthemesTheme 1—The role of demographic data1.1 Current use of demographic data1.2 Representativeness of the data1.3 Determining importance of race/ethnicity data1.4 Challenges of ensuring diversity in datasets1.5 Solutions and aspirations to overcome challengesTheme 2—Data diversity2.1 Conceptual definition2.2 Components of diversity2.3 Operationalizing diversityTheme 3—The use of metrics3.1 Externally validated or self-report3.2 Types of measures3.3 Rating diversityTheme 4—Standards4.1 Are standards important and needed?4.2 Existing standards4.3 Recommendations for adoption of standards4.4 Recommendations and barriers to adherence4.5 Responsibility of adherence4.6 Consequences of mandating standards
cs3.1 Externally validated or self-report3.2 Types of measures3.3 Rating diversityTheme 4—Standards4.1 Are standards important and needed?4.2 Existing standards4.3 Recommendations for adoption of standards4.4 Recommendations and barriers to adherence4.5 Responsibility of adherence4.6 Consequences of mandating standards Themes arising from the stakeholder survey The first theme was ‘the role of demographic data’. Stakeholders used demographic data in several ways to assess the safety and efficacy of AIaMDs across different subgroup populations. Ensuring that representative data are used to train and validate AIaMDs for the population in which they are to be deployed was felt to be most important. Ways of demonstrating representativeness included describing the intended use and users of the AIaMD, identifying subgroups of interest up front and being transparent about poor performance. Race and ethnicity data were seen as an important means to explore known and unknown biases potentially leading to health inequalities.“Models must be able to ‘work’ for those belonging to racialised minority groups, and clinicians/researchers/developers must go through stringent governance measures to ensure inequalities, racism and other forms of discrimination are not exacerbated by use of medical AI models.” “Models must be able to ‘work’ for those belonging to racialised minority groups, and clinicians/researchers/developers must go through stringent governance measures to ensure inequalities, racism and other forms of discrimination are not exacerbated by use of medical AI models.”
The first theme was ‘the role of demographic data’. Stakeholders used demographic data in several ways to assess the safety and efficacy of AIaMDs across different subgroup populations. Ensuring that representative data are used to train and validate AIaMDs for the population in which they are to be deployed was felt to be most important. Ways of demonstrating representativeness included describing the intended use and users of the AIaMD, identifying subgroups of interest up front and being transparent about poor performance. Race and ethnicity data were seen as an important means to explore known and unknown biases potentially leading to health inequalities.“Models must be able to ‘work’ for those belonging to racialised minority groups, and clinicians/researchers/developers must go through stringent governance measures to ensure inequalities, racism and other forms of discrimination are not exacerbated by use of medical AI models.” “Models must be able to ‘work’ for those belonging to racialised minority groups, and clinicians/researchers/developers must go through stringent governance measures to ensure inequalities, racism and other forms of discrimination are not exacerbated by use of medical AI models.” Challenges of ensuring diversity in datasets included issues of lack of health data in certain populations (health data poverty), lack of standardization across attribute categories, difficulty in harmonizing several methods of data capture and data-governance restrictions. Other factors relating to the development pathway included poorly defined use cases, a lack of relevant stakeholder input, difficulty in accessing suitable datasets and existing gaps in current evidence about underserved populations at risk of harm linked to health outcomes. Stakeholders used various solutions to derive missing demographic data, including statistical techniques such as imputation, Bayesian geocoding and linking across several datasets. Stakeholders suggested they would like to see policy changes, standards of best practice with a statement of scope up front describing data diversity and regulatory authorities providing clarity about specific requirements according to the intended use, as well as toolkits to tackle health data poverty issues.“Assist various bodies in addressing data poverty (I see this as the core problem that needs solving).”“Using linked data records and collating ethnicity information from different parts of the healthcare record can improve ethnicity data capture.”“Multiple imputations can be helpful for some variables but is sub-optimal. There are weak surrogates/indirect methods of increasing populating ethnicity variables.”
solving).”“Using linked data records and collating ethnicity information from different parts of the healthcare record can improve ethnicity data capture.”“Multiple imputations can be helpful for some variables but is sub-optimal. There are weak surrogates/indirect methods of increasing populating ethnicity variables.” “Assist various bodies in addressing data poverty (I see this as the core problem that needs solving).” “Using linked data records and collating ethnicity information from different parts of the healthcare record can improve ethnicity data capture.” “Multiple imputations can be helpful for some variables but is sub-optimal. There are weak surrogates/indirect methods of increasing populating ethnicity variables.”
“Assist various bodies in addressing data poverty (I see this as the core problem that needs solving).” “Using linked data records and collating ethnicity information from different parts of the healthcare record can improve ethnicity data capture.” “Multiple imputations can be helpful for some variables but is sub-optimal. There are weak surrogates/indirect methods of increasing populating ethnicity variables.” The second theme was ‘data diversity’. The definition of dataset diversity was generally considered a universal concept whose application should be context specific depending on the research question or intended use population. The scope of data diversity should be broad and include race, ethnicity, age, gender, sex, socioeconomic status, clinically relevant disease populations, neurodiversity, disability, language barriers and educational level. It was felt that an ideal dataset should represent a global population that is diverse enough to enable a range of problems to be explored with adequate statistical power. Race and ethnicity were seen as dataset attributes that ‘must be included’. Barriers identified included lack of standardization across the globe, low statistical power in underrepresented groups/rare events and a lack of knowledge about intended use populations/subgroups and their health outcomes. Although race and ethnicity are nearly universally acknowledged as important, there is considerable lack of precision and understanding of the terms in science more broadly27.“Diversity as a concept is universal, but its application is contextual.”“Diversity in health datasets should be used more broadly than simply protected characteristics such as sex and ethnicity.”
acknowledged as important, there is considerable lack of precision and understanding of the terms in science more broadly27.“Diversity as a concept is universal, but its application is contextual.”“Diversity in health datasets should be used more broadly than simply protected characteristics such as sex and ethnicity.” “Diversity as a concept is universal, but its application is contextual.” “Diversity in health datasets should be used more broadly than simply protected characteristics such as sex and ethnicity.”
acknowledged as important, there is considerable lack of precision and understanding of the terms in science more broadly27.“Diversity as a concept is universal, but its application is contextual.”“Diversity in health datasets should be used more broadly than simply protected characteristics such as sex and ethnicity.” “Diversity as a concept is universal, but its application is contextual.” “Diversity in health datasets should be used more broadly than simply protected characteristics such as sex and ethnicity.” The third theme was ‘the use of metrics’. In principle, most respondents were in favor of some type of metric to measure diversity but unsure about how this could be operationalized. There was a definite lack of consensus about the concept of introducing a rating for the diversity of a dataset, with considerable concerns about how a rating of diversity could be implemented. When metrics were considered, stakeholders felt they would have greater value if externally validated rather than solely reported by the dataset’s curators. An alternative approach was that dataset curators could be validated as ‘safe providers’, meaning curators and/or their organizations would demonstrate adherence to standards across all datasets they produce. Other metrics put forward included level of inclusion, completeness and/or missingness of demographic data and distribution of the data. Interestingly, participants also proposed metrics that were related indirectly to the data itself but rather to the model derived. For example, participants discussed measures of model performance (such as systematic error rates across subgroups) that could result from biases in the data. However, respondents anticipated many challenges, such as knowing up front for which subpopulations poor performance should be specifically tested, a lack of established methods for evaluating performance and comparability of variables across different datasets.“I am not sure this could be distilled into a simple set of metrics but rather see a minimum requirement of descriptive information re: a dataset as an option that is proportion of different ethnicities, age groups, gender.”“I don’t think datasets should be ‘rated’ on demographic diversity personally. To me, this borders on saying some are ‘better’ than others but it is context-specific and depends on the planned setting of deployment.”
information re: a dataset as an option that is proportion of different ethnicities, age groups, gender.”“I don’t think datasets should be ‘rated’ on demographic diversity personally. To me, this borders on saying some are ‘better’ than others but it is context-specific and depends on the planned setting of deployment.” “I am not sure this could be distilled into a simple set of metrics but rather see a minimum requirement of descriptive information re: a dataset as an option that is proportion of different ethnicities, age groups, gender.” “I don’t think datasets should be ‘rated’ on demographic diversity personally. To me, this borders on saying some are ‘better’ than others but it is context-specific and depends on the planned setting of deployment.”
“I am not sure this could be distilled into a simple set of metrics but rather see a minimum requirement of descriptive information re: a dataset as an option that is proportion of different ethnicities, age groups, gender.” “I don’t think datasets should be ‘rated’ on demographic diversity personally. To me, this borders on saying some are ‘better’ than others but it is context-specific and depends on the planned setting of deployment.” The fourth theme was ‘standards’. There was consensus about the importance of standards to enable risks and harms to be identified, improve the quality of the datasets, address bias and provide accountability. However, respondents cautioned against imposing strict diversity requirements, as doing so might risk products being withdrawn for already-marginalized populations, as well as unethical data-gathering practices28. Recommendations for adoption included making the standards part of the procurement, funding or product-approval process; making them part of the publishing pathway; or implementing an accreditation pathway for organizations to demonstrate compliance. However, there was no consensus about which organizations should be responsible for mandating compliance.“It’s great that the issue has been recognised, but more needs to be done to change things, for example by having a standard for researchers to consider and apply when designing their studies or curating new datasets, and developing methods to ensure people are held accountable.”
be responsible for mandating compliance.“It’s great that the issue has been recognised, but more needs to be done to change things, for example by having a standard for researchers to consider and apply when designing their studies or curating new datasets, and developing methods to ensure people are held accountable.” “It’s great that the issue has been recognised, but more needs to be done to change things, for example by having a standard for researchers to consider and apply when designing their studies or curating new datasets, and developing methods to ensure people are held accountable.”
be responsible for mandating compliance.“It’s great that the issue has been recognised, but more needs to be done to change things, for example by having a standard for researchers to consider and apply when designing their studies or curating new datasets, and developing methods to ensure people are held accountable.” “It’s great that the issue has been recognised, but more needs to be done to change things, for example by having a standard for researchers to consider and apply when designing their studies or curating new datasets, and developing methods to ensure people are held accountable.” Regarding whether such standards should be mandated, some respondents suggested that mandating the standards through regulators, journal editors or commissioners was an acceptable approach, provided diversity of the populations in the dataset was the requirement. If diversity across populations related to performance of the model was to be mandated for compliance, this approach risked setting a level that could not be achieved by creating a problem with enforcement and increasing costs. The consequences might include slowing or even stifling research and innovation because of data poverty issues. Some felt that a softer approach was needed, such as defining best practice with a set of data diversity standards and encouraging adoption and self-reporting. Whichever enforcement mechanism is implemented, practical tools can be developed to improve uptake, such as providing guidance on how to adhere to the standards, making validation datasets available, engagement with stakeholders, providing incentives and removing barriers. The greatest barriers to adherence identified were resources such as time and cost.“Funders at the proposal stage, journals at the publication stage and in parallel the regulatory system.”“If you propose a body for checking compliance more formally, you will need to fund them as well. However, if the stakeholders who are required to comply with the standards feel there is just one more thing being added to their increasing requirements, you may risk facing a rebellion unless this work is adequately compensated. I suggest soft standards rather than formal checks.”
you will need to fund them as well. However, if the stakeholders who are required to comply with the standards feel there is just one more thing being added to their increasing requirements, you may risk facing a rebellion unless this work is adequately compensated. I suggest soft standards rather than formal checks.” “Funders at the proposal stage, journals at the publication stage and in parallel the regulatory system.” “If you propose a body for checking compliance more formally, you will need to fund them as well. However, if the stakeholders who are required to comply with the standards feel there is just one more thing being added to their increasing requirements, you may risk facing a rebellion unless this work is adequately compensated. I suggest soft standards rather than formal checks.”
Data were extracted from all 30 records to derive key themes (Extended Data Table 1). Transparency around data-collection practices was a major theme (n = 13), with particular focus on the need for clarity about how data were sampled and for what purpose, how demographic categories were assigned and details of any preprocessing of data. Some articles highlighted the importance of reporting existing health inequalities affecting those included in a dataset, allowing data users to take steps to avoid exacerbating these23,24. The motivation for collecting the data and whether informed consent was obtained from participants was discussed by 11 articles. Furthermore, as legal requirements vary across jurisdictions, there was little consensus about whether or how consent should be obtained from subjects in datasets. Finally, two records discussed how data quality can be improved by involving clinical experts when developing the data-collection strategy5,25.
Whereas the systematic review provided an oversight of current best-practice principles for health datasets in AI, the stakeholder survey provided insights into how principles could be operationalized and by whom. Twenty participants completed the scoping survey. Of these participants, ten (50%) reported their sex as female, nine (45%) reported their sex as male, and one (5%) did not provide this information. Eighteen participants (90%) reported that their gender identity was the same as the sex registered at birth, one participant (5%) reported that their gender identity was different than their sex registered at birth and one participant (5%) did not provide this information. Four main themes and 17 subthemes were identified (Table 2).Table 2Themes arising from the stakeholder surveyThemesSubthemesTheme 1—The role of demographic data1.1 Current use of demographic data1.2 Representativeness of the data1.3 Determining importance of race/ethnicity data1.4 Challenges of ensuring diversity in datasets1.5 Solutions and aspirations to overcome challengesTheme 2—Data diversity2.1 Conceptual definition2.2 Components of diversity2.3 Operationalizing diversityTheme 3—The use of metrics3.1 Externally validated or self-report3.2 Types of measures3.3 Rating diversityTheme 4—Standards4.1 Are standards important and needed?4.2 Existing standards4.3 Recommendations for adoption of standards4.4 Recommendations and barriers to adherence4.5 Responsibility of adherence4.6 Consequences of mandating standards Themes arising from the stakeholder survey
Whereas the systematic review provided an oversight of current best-practice principles for health datasets in AI, the stakeholder survey provided insights into how principles could be operationalized and by whom. Twenty participants completed the scoping survey. Of these participants, ten (50%) reported their sex as female, nine (45%) reported their sex as male, and one (5%) did not provide this information. Eighteen participants (90%) reported that their gender identity was the same as the sex registered at birth, one participant (5%) reported that their gender identity was different than their sex registered at birth and one participant (5%) did not provide this information. Four main themes and 17 subthemes were identified (Table 2).Table 2Themes arising from the stakeholder surveyThemesSubthemesTheme 1—The role of demographic data1.1 Current use of demographic data1.2 Representativeness of the data1.3 Determining importance of race/ethnicity data1.4 Challenges of ensuring diversity in datasets1.5 Solutions and aspirations to overcome challengesTheme 2—Data diversity2.1 Conceptual definition2.2 Components of diversity2.3 Operationalizing diversityTheme 3—The use of metrics3.1 Externally validated or self-report3.2 Types of measures3.3 Rating diversityTheme 4—Standards4.1 Are standards important and needed?4.2 Existing standards4.3 Recommendations for adoption of standards4.4 Recommendations and barriers to adherence4.5 Responsibility of adherence4.6 Consequences of mandating standards Themes arising from the stakeholder survey The first theme was ‘the role of demographic data’. Stakeholders used demographic data in several ways to assess the safety and efficacy of AIaMDs across different subgroup populations. Ensuring that representative data are used to train and validate AIaMDs for the population in which they are to be deployed was felt to be most important. Ways of demonstrating representativeness included describing the intended use and users of the AIaMD, identifying subgroups of interest up front and being transparent about poor performance. Race and ethnicity data were seen as an important means to explore known and unknown biases potentially leading to health inequalities.“Models must be able to ‘work’ for those belonging to racialised minority groups, and clinicians/researchers/developers must go through stringent governance measures to ensure inequalities, racism and other forms of discrimination are not exacerbated by use of medical AI models.”
entially leading to health inequalities.“Models must be able to ‘work’ for those belonging to racialised minority groups, and clinicians/researchers/developers must go through stringent governance measures to ensure inequalities, racism and other forms of discrimination are not exacerbated by use of medical AI models.” “Models must be able to ‘work’ for those belonging to racialised minority groups, and clinicians/researchers/developers must go through stringent governance measures to ensure inequalities, racism and other forms of discrimination are not exacerbated by use of medical AI models.”
entially leading to health inequalities.“Models must be able to ‘work’ for those belonging to racialised minority groups, and clinicians/researchers/developers must go through stringent governance measures to ensure inequalities, racism and other forms of discrimination are not exacerbated by use of medical AI models.” “Models must be able to ‘work’ for those belonging to racialised minority groups, and clinicians/researchers/developers must go through stringent governance measures to ensure inequalities, racism and other forms of discrimination are not exacerbated by use of medical AI models.” Challenges of ensuring diversity in datasets included issues of lack of health data in certain populations (health data poverty), lack of standardization across attribute categories, difficulty in harmonizing several methods of data capture and data-governance restrictions. Other factors relating to the development pathway included poorly defined use cases, a lack of relevant stakeholder input, difficulty in accessing suitable datasets and existing gaps in current evidence about underserved populations at risk of harm linked to health outcomes. Stakeholders used various solutions to derive missing demographic data, including statistical techniques such as imputation, Bayesian geocoding and linking across several datasets. Stakeholders suggested they would like to see policy changes, standards of best practice with a statement of scope up front describing data diversity and regulatory authorities providing clarity about specific requirements according to the intended use, as well as toolkits to tackle health data poverty issues.“Assist various bodies in addressing data poverty (I see this as the core problem that needs solving).”“Using linked data records and collating ethnicity information from different parts of the healthcare record can improve ethnicity data capture.”“Multiple imputations can be helpful for some variables but is sub-optimal. There are weak surrogates/indirect methods of increasing populating ethnicity variables.”
This study used two methods: a systematic review and a scoping survey of expert stakeholders. The systematic review identified a range of recommendations related to data diversity, with the scoping survey supplementing it by exploring practical considerations relating to their implementation. To our knowledge, this is the first systematic synthesis of existing standards, frameworks, guidelines and best practices for the use of health datasets in the context of AIaMD. We found a clear consensus that there is a need for diversity in datasets and that issues of algorithmic bias may prevail where this is absent. The increasing resources directed towards creating large-scale health datasets has not been accompanied by equivalent efforts to ensure that they are adequately representative and diverse. Even when guidance recognizes the importance of data diversity, it is high-level and not easily operationalized.
re this is absent. The increasing resources directed towards creating large-scale health datasets has not been accompanied by equivalent efforts to ensure that they are adequately representative and diverse. Even when guidance recognizes the importance of data diversity, it is high-level and not easily operationalized. The systematic review provided insight into existing guidelines for data collection, handling missing data and labeling data. A key theme found through the systematic review was a need for transparency in how datasets are prepared, including who is included or excluded from the dataset, how missing data are handled and how data are labeled. Greater transparency in these areas allows better understanding of the context and limitations of a dataset, which in turn provides a guide to the potential limitations of any inferences or innovations derived from that dataset. For example, if a dataset excludes certain groups, this information should be evident to potential users of the dataset and should be reported alongside insights derived from that dataset to provide context and the likely scope of application.
tial limitations of any inferences or innovations derived from that dataset. For example, if a dataset excludes certain groups, this information should be evident to potential users of the dataset and should be reported alongside insights derived from that dataset to provide context and the likely scope of application. With regard to personal attributes, among all included articles, sex, age, race and ethnicity were most commonly cited as attributes that could associate a group with risk of harm or disadvantage; however, there was a notable lack of literature addressing how these concerns could be addressed for these groups. Less commonly, records cited pregnancy, income, marital status and a range of other attributes that may require further consideration. There was notable overlap between attributes identified as often lacking diversity and those that are specifically protected by various jurisdictions, such as in the UK Equality Act 2010, the European Union Charter of Fundamental Rights and the United Nations29–31. The importance of collecting demographic attribute information is not necessarily for it to be embedded as an input variable or predictor in ML models (which may perpetuate harm along these axes) but importantly, to ensure that it is collected for auditing model performance across disaggregated subgroups as a method of bias discovery. In our literature review, ‘Healthsheet’ and ‘Datasheets for Datasets’ were the most comprehensive guides to data documentation, with elements in both relating to diversity32,33.
) but importantly, to ensure that it is collected for auditing model performance across disaggregated subgroups as a method of bias discovery. In our literature review, ‘Healthsheet’ and ‘Datasheets for Datasets’ were the most comprehensive guides to data documentation, with elements in both relating to diversity32,33. There was a high degree of concordance between the recommendations gathered from the systematic review and those obtained through our survey. However, the scoping survey highlighted the potential difficulties and lack of pragmatic guidance about how such guidelines can be implemented practically as well as who should be responsible for overseeing their implementation. This indicates that the issues of interest are well-recognized and conceptually understood; however, there is a clear need to focus on operationalizing existing knowledge. The scoping survey additionally identified barriers and potential enablers to creating standards and best practices that are ready for translation into the real world. It highlighted that although standards would be beneficial, there must be some means of implementing them, which could include embedding them into a product-approval process or making them prerequisites for eligibility for funding, health technology appraisal or procurement. Standards could also be implemented by externally provided accreditation, which may be seen favorably by funders, regulatory bodies and research boards. Another option includes voluntary self-reporting of standards with minimal external oversight.
for eligibility for funding, health technology appraisal or procurement. Standards could also be implemented by externally provided accreditation, which may be seen favorably by funders, regulatory bodies and research boards. Another option includes voluntary self-reporting of standards with minimal external oversight. It was also highlighted that it is not clear which agencies could mandate or suggest completion of such standards. Journal editors, regulators, health policy organizations and funders were among the suggested agencies. In health research, the Enhancing the Quality and Transparency of Health Research (EQUATOR) network is well-recognized as a resource for gold-standard guidance for study reporting, endorsing widely used guidelines including Consolidated Standards of Reporting Trials 2010 guidelines for randomized controlled trials. Although the network focuses on reporting and transparency of research studies, a similar structure may be considered for datasets. The role of regulatory agencies in mandating the data diversity used in AIaMD is unclear, as their remit is at present limited to ensuring that datasets reflect the intended use population. Although respondents acknowledged the lack of evidence on this topic, a counterargument to heavy-handed enforcement was also raised from the stakeholder survey, with the perception that enforcing higher expectations for datasets may stifle innovation, impede health improvement and possibly exacerbate inequalities as a result. Overall, the findings of the survey indicated strong support for the development of standards but ambiguity as to their implementation.
akeholder survey, with the perception that enforcing higher expectations for datasets may stifle innovation, impede health improvement and possibly exacerbate inequalities as a result. Overall, the findings of the survey indicated strong support for the development of standards but ambiguity as to their implementation. This study involved a comprehensive review of academic literature reporting recommendations for dataset use, but it did not extend to reviewing gray literature, including governmental reports. However, we have noted that the issue of algorithmic bias has been formally recognized by the UK Medicines and Healthcare products Regulatory Agency, UK government, the UK Information Commissioner’s Office, the US FDA and the European Parliament34–39. “Ensuring inclusiveness and equity” is one of six principles for AI development prescribed by the World Health Organization, referring to “age, sex, gender, income, race, ethnicity, sexual orientation, ability or other characteristics protected under human rights codes”13. The same WHO report states, “No technology, AI or otherwise, should sustain or worsen existing forms of bias and discrimination”13.
ribed by the World Health Organization, referring to “age, sex, gender, income, race, ethnicity, sexual orientation, ability or other characteristics protected under human rights codes”13. The same WHO report states, “No technology, AI or otherwise, should sustain or worsen existing forms of bias and discrimination”13. The search was limited to records published since 2015. This is a limitation to the study, but it is noted that the results exhibited an exponential increase in the number of relevant records published since 2018. Only one relevant record was published in 2015. It would be expected that any notable pre-2015 papers should have been revealed in reference-list screening, but none were. Accordingly, setting an earlier start date for screening would have been unlikely to identify any more relevant articles and would have substantially increased the number of papers it was necessary to screen. Data collection was performed by two independent reviewers for 11 of the records. Data for the remaining articles were extracted by a single reviewer, who extracted data from all papers included in the analysis for consistency of voice in the extracted summaries. Although data collection by a single author is a limitation, use of a standardized data-extraction sheet partially mitigates it.
records. Data for the remaining articles were extracted by a single reviewer, who extracted data from all papers included in the analysis for consistency of voice in the extracted summaries. Although data collection by a single author is a limitation, use of a standardized data-extraction sheet partially mitigates it. This study has focused on factors that encourage dataset diversification as a lever to address health data poverty. Underrepresentation of minority groups in datasets is well-recognized as an important driver of algorithmic bias, but other mechanisms can be applied downstream to mitigate its effects. This includes practicing model diversity: for example, producing several models and combining the outputs with ensemble learning to diversify the parameters considered by the model and reduce the risk of overfitting to an unbalanced data sample. Synthetic data have also been recently proposed as a method of selectively generating data for marginalized populations to rebalance datasets; however, this approach has limitations in that it effectively oversamples from a small group without truly gaining diversity40–42. It should be emphasized that these (and other) methods are an active area of research with a need for empirical evidence to prove their applicability. Similarly, this review does not extend into other ethical implications of the use of ML in healthcare, although this topic has been extensively studied previously and includes other issues of privacy, trust and accountability43.
an active area of research with a need for empirical evidence to prove their applicability. Similarly, this review does not extend into other ethical implications of the use of ML in healthcare, although this topic has been extensively studied previously and includes other issues of privacy, trust and accountability43. We took a broad approach to the recruitment of different types of stakeholders in the scoping survey and therefore did not apply a formal sampling framework. As a result, we cannot be sure the voices represented are consistent across all stakeholders. This survey was intended as a scoping exercise and not meant to be an exhaustive qualitative study. Future planned work as part of the STANDING Together initiative involves input from a wider gamut of stakeholders including patient and public partners (two of whom are coauthors of this study).
across all stakeholders. This survey was intended as a scoping exercise and not meant to be an exhaustive qualitative study. Future planned work as part of the STANDING Together initiative involves input from a wider gamut of stakeholders including patient and public partners (two of whom are coauthors of this study). The issue of algorithmic bias is well described in medical literature, with dataset insufficiency a key driver. Although principles such as Findable, Accessible, Interoperable, and Reusable (FAIR) seek to improve data availability and use more generally, there is a paucity of data for certain groups and lack of diversity in existing datasets44. This Analysis has highlighted the importance of curating and aggregating health data to promote diversity, inclusivity and equity as well as the lack of guidelines available to facilitate doing so. Although reporting guidelines exist for randomized controlled trials using AI, they focus on the reporting of study results rather than the design and use of datasets45. Future avenues of research may seek to produce clear guidelines for the development and use of datasets, revolving around the need for diversity and inclusion of marginalized populations and improving data interoperability by means of common data models and standards such as Observational Medical Outcomes Partnership (OMOP) and Fast Healthcare Interoperability Resources (FHIR). Specific guidance about new development practices (including use of synthetic data, federated learning and foundation models) is also needed.
ata interoperability by means of common data models and standards such as Observational Medical Outcomes Partnership (OMOP) and Fast Healthcare Interoperability Resources (FHIR). Specific guidance about new development practices (including use of synthetic data, federated learning and foundation models) is also needed. Transparent documentation around diversity and appropriateness of datasets used in AIaMD development will help commissioners, clinicians and health systems determine the risk of bias so they can make informed decisions around whether to deploy corresponding algorithms for their population. The STANDING Together project (https://www.datadiversity.org/) is one such endeavor. Building on the outputs of this systematic review and stakeholder survey, STANDING Together is developing consensus-derived standards coauthored by an international, interdisciplinary team that reinforce ethics and inclusivity in the documentation and use of healthcare datasets, allowing developers to ensure that AIaMD works for everyone18. The findings described in this literature review and stakeholder survey will directly inform the proposed items for STANDING Together, and the methods describing their translation into specific, actionable recommendations will be outlined in a subsequent paper. The recommendations will undergo a multistakeholder, three-staged Delphi study, and the resulting standards will be available in late 2023.
This research was conducted in compliance with all relevant ethical regulations, including informed consent from all participants. Ethical approval was granted by the University of Birmingham’s Science, Technology, Engineering and Mathematics Ethical Review Committee (ERN_21-1831).
This research was conducted in compliance with all relevant ethical regulations, including informed consent from all participants. Ethical approval was granted by the University of Birmingham’s Science, Technology, Engineering and Mathematics Ethical Review Committee (ERN_21-1831). We searched for records describing existing standards, frameworks and best practices for ensuring data diversity in health datasets in the context of AIaMD. An informatician was consulted for the development of the search strategy (Systematic Review Search Strategy), and the searches were conducted on 10 October 2021 on MEDLINE (Ovid), Embase (Ovid) and Web of Science. Results were limited to publications since 2015 in the English language. E-publications ahead of print, in-process publications, in-data-review and other non-indexed citations were included through the MEDLINE search. Deduplication was carried out in EndNote 20 (Clarivate, 2013) and screening in Covidence (https://www.covidence.org/; Veritas Health Innovation, 2022). It was recognized that relevant results may exist as preprints that may not be covered by traditional systematic review searches. As preprint databases are not typically covered by searchable databases, nor are their websites conducive to systematic searches, the burden of the workload to replicate our strategy on the medRxiv and arXiv engines would have been unmanageable. MedRxiv is indexed by Embase, but arXiv is not indexed by any of the databases used. To mitigate this limitation, we conducted reference-list screening for all 100 articles included for full-text screening. Reference lists of included records were searched to identify relevant preprints (including arXiv and medRxiv) or other potentially relevant records. Further searches were conducted of the past five years’ worth of archives of relevant conference proceedings for ML and AI in healthcare: ‘Machine Learning for Health’ (https://ml4health.github.io/2022/), ‘Machine Learning for Healthcare’ (https://www.mlforhc.org) and ‘Conference on Health, Inference, and Learning’ (https://www.chilconference.org/). This scoping review was conducted following the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, and a PRISMA flow chart was also created (Fig. 2)64. Abstract screening was performed by two authors independently (A.A., J.A. or X.L.).
www.chilconference.org/). This scoping review was conducted following the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, and a PRISMA flow chart was also created (Fig. 2)64. Abstract screening was performed by two authors independently (A.A., J.A. or X.L.). Non-consensus was resolved by discussion and involvement of a third reviewer if necessary. Before full-text screening, 10% of the records were full-text screened independently by two reviewers as part of a pilot. Once a high degree of concordance was reached between two independent reviewers, the remaining records were assessed by a single reviewer. Studies were included if they presented standards, frameworks or guidance for AI or health data about issues intersecting AI and bias, fairness, health equity and representation, and coding/categorization of minoritized and marginalized groups. Exclusion criteria were guidance not related to health data, guidance relating to technical and infrastructural aspects of health dataset curation only, guidance relating to privacy and governance and cybersecurity only.
health equity and representation, and coding/categorization of minoritized and marginalized groups. Exclusion criteria were guidance not related to health data, guidance relating to technical and infrastructural aspects of health dataset curation only, guidance relating to privacy and governance and cybersecurity only. For each record, a single reviewer extracted data using a predefined data-extraction sheet. Bibliometric information about each record was extracted, including publication date, number of citations and details of the authorship team (in terms of single/several institutions, geographies and disciplinary background). If specific vulnerable or minority patient groups were discussed in the record, they were also recorded separately. When it was reported, we extracted information about the methodology that led to the construction of the recommendations, including descriptions of any literature review and stakeholder involvement. We approached 45 participants representing individuals who work in health-data research and/or AIaMD, including dataset curators, academics, clinicians and medical-device regulators. Participants were identified as authors of relevant publications and through consultation with the STANDING Together working group on the basis of expertise and previous work in healthy inequity, medical datasets and AIaMD. Respondents were invited by email to participate in the survey using Qualtrics (Qualtrics XM, 2018; https://www.qualtrics.com).
authors of relevant publications and through consultation with the STANDING Together working group on the basis of expertise and previous work in healthy inequity, medical datasets and AIaMD. Respondents were invited by email to participate in the survey using Qualtrics (Qualtrics XM, 2018; https://www.qualtrics.com). The survey consisted of 14 free-text questions (Supplementary File 5) exploring how issues of bias and risk of health inequity are at present tackled for AIaMD and how best-practice recommendations could be operationalized. For data analysis, we took an inductive thematic analysis approach that was exploratory and descriptive in nature65. One author (J.P.) conducted the analysis and used several iterations of the responses to the survey questions to refine the initial list of codes and create a codebook using NVivo (NVivo release 1.0, March 2020, QSR International; https://lumivero.com/products/nvivo/). Two authors (E.L. and X.L.) reviewed the data extracts and relevant codes independently, and then all three of the above authors discussed and agreed on the final codebook. Data relating to the effect of articles included in our systematic review (including journal impact factor, citation count and altmetric data) were obtained but not included in the analysis because these data were not necessary to extract themes from the included articles. No statistical method was used to predetermine the sample size for the stakeholder survey. Blinding and randomization are not applicable to non-interventional studies.
t and altmetric data) were obtained but not included in the analysis because these data were not necessary to extract themes from the included articles. No statistical method was used to predetermine the sample size for the stakeholder survey. Blinding and randomization are not applicable to non-interventional studies. Ovid MEDLINE(R) and Epub Ahead of Print, In-Process, In-Data-Review & Other Non-Indexed Citations, Daily and Versions(R) <1946 to October 08, 2021>cultural diversity/ 12175(ethic* or divers* or fairness or fair or bias or biased or pluralism* or multicultural*).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating subheading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 11245211 or 2 1124521exp Artificial Intelligence/ 124778(Artificial intelligence or AI or Natural Language processing or NLP or Machine learning or Support Vector Machine* or neural network* or deep learning).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 196657((health or patient or medical) adj2 (data* or record*)).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 324576exp Medical Records/ 1528364 or 5 or 6 or 7 587250guideline/ 16447(standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 30638589 or 10 30638583 and 8 and 11 9897limit 12 to yr=“2015 -Current” 6089limit 13 to lg=“english” 5995
subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 30638589 or 10 30638583 and 8 and 11 9897limit 12 to yr=“2015 -Current” 6089limit 13 to lg=“english” 5995 cultural diversity/ 12175 (ethic* or divers* or fairness or fair or bias or biased or pluralism* or multicultural*).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating subheading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 1124521 1 or 2 1124521 exp Artificial Intelligence/ 124778 (Artificial intelligence or AI or Natural Language processing or NLP or Machine learning or Support Vector Machine* or neural network* or deep learning).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 196657 ((health or patient or medical) adj2 (data* or record*)).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 324576 exp Medical Records/ 152836 4 or 5 or 6 or 7 587250 guideline/ 16447
((health or patient or medical) adj2 (data* or record*)).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 324576 exp Medical Records/ 152836 4 or 5 or 6 or 7 587250 guideline/ 16447 (standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 3063858 9 or 10 3063858 3 and 8 and 11 9897 limit 12 to yr=“2015 -Current” 6089 limit 13 to lg=“english” 5995 https://ovidsp.ovid.com/athens/ovidweb.cgi?T=JS&NEWS=N&PAGE=main&SHAREDSEARCHID=5AewcQNdzUJq8nhWGWvzW9upsUXpJwx4ATn6wMF7oYIff436XcLpXSvsrKDfq9BH1
(standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 3063858 9 or 10 3063858 3 and 8 and 11 9897 limit 12 to yr=“2015 -Current” 6089 limit 13 to lg=“english” 5995 https://ovidsp.ovid.com/athens/ovidweb.cgi?T=JS&NEWS=N&PAGE=main&SHAREDSEARCHID=5AewcQNdzUJq8nhWGWvzW9upsUXpJwx4ATn6wMF7oYIff436XcLpXSvsrKDfq9BH1 Embase <1996 to 2021 Week 40>cultural diversity/ 2415(ethic* or divers* or fairness or fair or bias or biased or pluralism* or multicultural*).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 12350791 or 2 1235079exp Artificial Intelligence/ 51842(Artificial intelligence or AI or Natural Language processing or NLP or Machine Learning or Support Vector Machine* or neural network* or deep learning).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 237951((health or patient or medical) adj2 (data* or record*)).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 582323exp medical record/ 2553214 or 5 or 6 or 7 822925(standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 35355733 and 8 and 9 13849limit 10 to yr=“2015 -Current” 8994limit 11 to lg=“english” 8925
cy or policies or governance).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 35355733 and 8 and 9 13849limit 10 to yr=“2015 -Current” 8994limit 11 to lg=“english” 8925 cultural diversity/ 2415 (ethic* or divers* or fairness or fair or bias or biased or pluralism* or multicultural*).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 1235079 1 or 2 1235079 exp Artificial Intelligence/ 51842 (Artificial intelligence or AI or Natural Language processing or NLP or Machine Learning or Support Vector Machine* or neural network* or deep learning).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 237951 ((health or patient or medical) adj2 (data* or record*)).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 582323 exp medical record/ 255321 4 or 5 or 6 or 7 822925
(Artificial intelligence or AI or Natural Language processing or NLP or Machine Learning or Support Vector Machine* or neural network* or deep learning).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 237951 ((health or patient or medical) adj2 (data* or record*)).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 582323 exp medical record/ 255321 4 or 5 or 6 or 7 822925 (standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 3535573 3 and 8 and 9 13849 limit 10 to yr=“2015 -Current” 8994 limit 11 to lg=“english” 8925 https://ovidsp.ovid.com/athens/ovidweb.cgi?T=JS&NEWS=N&PAGE=main&SHAREDSEARCHID=5R3d9uJan3qknMl4p9Tr8MZjsl3nZURHqnXj2n1bjSF8UFTCBrrkGOdGy8x5CCWQ0
(standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 3535573 3 and 8 and 9 13849 limit 10 to yr=“2015 -Current” 8994 limit 11 to lg=“english” 8925 https://ovidsp.ovid.com/athens/ovidweb.cgi?T=JS&NEWS=N&PAGE=main&SHAREDSEARCHID=5R3d9uJan3qknMl4p9Tr8MZjsl3nZURHqnXj2n1bjSF8UFTCBrrkGOdGy8x5CCWQ0 ((((((((TS=(“ethic*”)) OR TS=(“divers*”)) OR TS=(“fairness”)) OR TS=(“fair”)) OR TS=(“bias”)) OR TS=(“biased”)) OR TS=(“pluralism*”)) OR TS=(“multicultural*”)) AND ((((((((TS=(“Artificial Intelligence”)) OR TS=(“AI”)) OR TS=(“Natural Language processing”)) OR TS=(“NLP*”)) OR TS=(“Machine Learning”)) OR TS=(“Support Vector Machine*”)) OR TS=(“neural network*”)) OR TS=(“deep learning”)) AND TS=((“health” OR “patient” OR “medical”) NEAR/2 (“data*” OR “record*”)) AND (((((((TS=(“standard*”)) OR TS=(“guidance”)) OR TS=(“guideline*”)) OR TS=(“framework*”)) OR TS=(“policy”)) OR TS=(“policies”)) OR TS=(“governance”)) Refined to results from 2015 Refined to results in English language 307 results https://www.webofscience.com/wos/woscc/summary/35bc8104-7d84-4a19-a5a8-9fb366bea050-0c3ff05a/relevance/1 Further information on the research design is available in the Nature Portfolio Reporting Summary linked to this article.
We searched for records describing existing standards, frameworks and best practices for ensuring data diversity in health datasets in the context of AIaMD. An informatician was consulted for the development of the search strategy (Systematic Review Search Strategy), and the searches were conducted on 10 October 2021 on MEDLINE (Ovid), Embase (Ovid) and Web of Science. Results were limited to publications since 2015 in the English language. E-publications ahead of print, in-process publications, in-data-review and other non-indexed citations were included through the MEDLINE search. Deduplication was carried out in EndNote 20 (Clarivate, 2013) and screening in Covidence (https://www.covidence.org/; Veritas Health Innovation, 2022). It was recognized that relevant results may exist as preprints that may not be covered by traditional systematic review searches. As preprint databases are not typically covered by searchable databases, nor are their websites conducive to systematic searches, the burden of the workload to replicate our strategy on the medRxiv and arXiv engines would have been unmanageable. MedRxiv is indexed by Embase, but arXiv is not indexed by any of the databases used. To mitigate this limitation, we conducted reference-list screening for all 100 articles included for full-text screening. Reference lists of included records were searched to identify relevant preprints (including arXiv and medRxiv) or other potentially relevant records. Further searches were conducted of the past five years’ worth of archives of relevant conference proceedings for ML and AI in healthcare: ‘Machine Learning for Health’ (https://ml4health.github.io/2022/), ‘Machine Learning for Healthcare’ (https://www.mlforhc.org) and ‘Conference on Health, Inference, and Learning’ (https://www.chilconference.org/). This scoping review was conducted following the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, and a PRISMA flow chart was also created (Fig. 2)64. Abstract screening was performed by two authors independently (A.A., J.A. or X.L.).
health equity and representation, and coding/categorization of minoritized and marginalized groups. Exclusion criteria were guidance not related to health data, guidance relating to technical and infrastructural aspects of health dataset curation only, guidance relating to privacy and governance and cybersecurity only. For each record, a single reviewer extracted data using a predefined data-extraction sheet. Bibliometric information about each record was extracted, including publication date, number of citations and details of the authorship team (in terms of single/several institutions, geographies and disciplinary background). If specific vulnerable or minority patient groups were discussed in the record, they were also recorded separately. When it was reported, we extracted information about the methodology that led to the construction of the recommendations, including descriptions of any literature review and stakeholder involvement.
We approached 45 participants representing individuals who work in health-data research and/or AIaMD, including dataset curators, academics, clinicians and medical-device regulators. Participants were identified as authors of relevant publications and through consultation with the STANDING Together working group on the basis of expertise and previous work in healthy inequity, medical datasets and AIaMD. Respondents were invited by email to participate in the survey using Qualtrics (Qualtrics XM, 2018; https://www.qualtrics.com). The survey consisted of 14 free-text questions (Supplementary File 5) exploring how issues of bias and risk of health inequity are at present tackled for AIaMD and how best-practice recommendations could be operationalized. For data analysis, we took an inductive thematic analysis approach that was exploratory and descriptive in nature65. One author (J.P.) conducted the analysis and used several iterations of the responses to the survey questions to refine the initial list of codes and create a codebook using NVivo (NVivo release 1.0, March 2020, QSR International; https://lumivero.com/products/nvivo/). Two authors (E.L. and X.L.) reviewed the data extracts and relevant codes independently, and then all three of the above authors discussed and agreed on the final codebook.
Data relating to the effect of articles included in our systematic review (including journal impact factor, citation count and altmetric data) were obtained but not included in the analysis because these data were not necessary to extract themes from the included articles. No statistical method was used to predetermine the sample size for the stakeholder survey. Blinding and randomization are not applicable to non-interventional studies.
Ovid MEDLINE(R) and Epub Ahead of Print, In-Process, In-Data-Review & Other Non-Indexed Citations, Daily and Versions(R) <1946 to October 08, 2021>cultural diversity/ 12175(ethic* or divers* or fairness or fair or bias or biased or pluralism* or multicultural*).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating subheading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 11245211 or 2 1124521exp Artificial Intelligence/ 124778(Artificial intelligence or AI or Natural Language processing or NLP or Machine learning or Support Vector Machine* or neural network* or deep learning).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 196657((health or patient or medical) adj2 (data* or record*)).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 324576exp Medical Records/ 1528364 or 5 or 6 or 7 587250guideline/ 16447(standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, original title, name of substance word, subject heading word, floating sub-heading word, keyword heading word, organism supplementary concept word, protocol supplementary concept word, rare disease supplementary concept word, unique identifier, synonyms] 30638589 or 10 30638583 and 8 and 11 9897limit 12 to yr=“2015 -Current” 6089limit 13 to lg=“english” 5995
(standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 3535573 3 and 8 and 9 13849 limit 10 to yr=“2015 -Current” 8994 limit 11 to lg=“english” 8925 https://ovidsp.ovid.com/athens/ovidweb.cgi?T=JS&NEWS=N&PAGE=main&SHAREDSEARCHID=5R3d9uJan3qknMl4p9Tr8MZjsl3nZURHqnXj2n1bjSF8UFTCBrrkGOdGy8x5CCWQ0 ((((((((TS=(“ethic*”)) OR TS=(“divers*”)) OR TS=(“fairness”)) OR TS=(“fair”)) OR TS=(“bias”)) OR TS=(“biased”)) OR TS=(“pluralism*”)) OR TS=(“multicultural*”)) AND ((((((((TS=(“Artificial Intelligence”)) OR TS=(“AI”)) OR TS=(“Natural Language processing”)) OR TS=(“NLP*”)) OR TS=(“Machine Learning”)) OR TS=(“Support Vector Machine*”)) OR TS=(“neural network*”)) OR TS=(“deep learning”)) AND TS=((“health” OR “patient” OR “medical”) NEAR/2 (“data*” OR “record*”)) AND (((((((TS=(“standard*”)) OR TS=(“guidance”)) OR TS=(“guideline*”)) OR TS=(“framework*”)) OR TS=(“policy”)) OR TS=(“policies”)) OR TS=(“governance”)) Refined to results from 2015 Refined to results in English language 307 results https://www.webofscience.com/wos/woscc/summary/35bc8104-7d84-4a19-a5a8-9fb366bea050-0c3ff05a/relevance/1
Embase <1996 to 2021 Week 40>cultural diversity/ 2415(ethic* or divers* or fairness or fair or bias or biased or pluralism* or multicultural*).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 12350791 or 2 1235079exp Artificial Intelligence/ 51842(Artificial intelligence or AI or Natural Language processing or NLP or Machine Learning or Support Vector Machine* or neural network* or deep learning).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 237951((health or patient or medical) adj2 (data* or record*)).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 582323exp medical record/ 2553214 or 5 or 6 or 7 822925(standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 35355733 and 8 and 9 13849limit 10 to yr=“2015 -Current” 8994limit 11 to lg=“english” 8925 cultural diversity/ 2415
Embase <1996 to 2021 Week 40>cultural diversity/ 2415(ethic* or divers* or fairness or fair or bias or biased or pluralism* or multicultural*).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 12350791 or 2 1235079exp Artificial Intelligence/ 51842(Artificial intelligence or AI or Natural Language processing or NLP or Machine Learning or Support Vector Machine* or neural network* or deep learning).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 237951((health or patient or medical) adj2 (data* or record*)).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 582323exp medical record/ 2553214 or 5 or 6 or 7 822925(standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 35355733 and 8 and 9 13849limit 10 to yr=“2015 -Current” 8994limit 11 to lg=“english” 8925 cultural diversity/ 2415 (ethic* or divers* or fairness or fair or bias or biased or pluralism* or multicultural*).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 1235079
cultural diversity/ 2415 (ethic* or divers* or fairness or fair or bias or biased or pluralism* or multicultural*).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 1235079 1 or 2 1235079 exp Artificial Intelligence/ 51842 (Artificial intelligence or AI or Natural Language processing or NLP or Machine Learning or Support Vector Machine* or neural network* or deep learning).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 237951 ((health or patient or medical) adj2 (data* or record*)).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 582323 exp medical record/ 255321 4 or 5 or 6 or 7 822925 (standard* or guidance or guideline* or framework* or policy or policies or governance).mp. [mp=title, abstract, heading word, drug trade name, original title, device manufacturer, drug manufacturer, device trade name, keyword heading word, floating subheading word, candidate term word] 3535573 3 and 8 and 9 13849 limit 10 to yr=“2015 -Current” 8994 limit 11 to lg=“english” 8925 https://ovidsp.ovid.com/athens/ovidweb.cgi?T=JS&NEWS=N&PAGE=main&SHAREDSEARCHID=5R3d9uJan3qknMl4p9Tr8MZjsl3nZURHqnXj2n1bjSF8UFTCBrrkGOdGy8x5CCWQ0
((((((((TS=(“ethic*”)) OR TS=(“divers*”)) OR TS=(“fairness”)) OR TS=(“fair”)) OR TS=(“bias”)) OR TS=(“biased”)) OR TS=(“pluralism*”)) OR TS=(“multicultural*”)) AND ((((((((TS=(“Artificial Intelligence”)) OR TS=(“AI”)) OR TS=(“Natural Language processing”)) OR TS=(“NLP*”)) OR TS=(“Machine Learning”)) OR TS=(“Support Vector Machine*”)) OR TS=(“neural network*”)) OR TS=(“deep learning”)) AND TS=((“health” OR “patient” OR “medical”) NEAR/2 (“data*” OR “record*”)) AND (((((((TS=(“standard*”)) OR TS=(“guidance”)) OR TS=(“guideline*”)) OR TS=(“framework*”)) OR TS=(“policy”)) OR TS=(“policies”)) OR TS=(“governance”)) Refined to results from 2015 Refined to results in English language 307 results https://www.webofscience.com/wos/woscc/summary/35bc8104-7d84-4a19-a5a8-9fb366bea050-0c3ff05a/relevance/1
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41591-023-02608-w.