Browse the corpus
Walk the Even Hospital Database by book and chapter — the raw source passages that ground Ask, DDx, and the rest.
49 passages
Global genomic surveillance of monkeypox virus. Monkeypox virus (MPXV) is endemic in western and Central Africa, and in May 2022, a clade IIb lineage (B.1) caused a global outbreak outside Africa, resulting in its detection in 116 countries and territories. To understand the global phylogenetics of MPXV, we analyzed all available MPXV sequences, including 10,670 sequences from 65 countries collected between 1958 and 2024. Our analysis reveals high mobility of clade I viruses within Central Africa, sustained human-to-human transmission of clade IIb lineage A viruses within the Eastern Mediterranean region and distinct mutational signatures that can distinguish sustained human-to-human from animal-to-animal transmission. Moreover, distinct clade I sequences from Sudan suggest local MPXV circulation in areas of eastern Africa over the past four decades. Our study underscores the importance of genomic surveillance in tracking spatiotemporal dynamics of MXPV clades and the need to strengthen such surveillance, including in some parts of eastern Africa.
Mpox, formerly known as monkeypox, is a disease that is caused by the monkeypox virus (MPXV). MPXV is a member of the Orthopoxvirus genus, which also includes the variola virus, the causative agent of smallpox1. In humans, mpox can be associated with a range of clinical symptoms, but classically presents with a short febrile prodromal phase, which lasts 1–5 days, followed by the appearance of a skin and/or mucosal rash, which might include single or multiple lesions2–4. The incubation period of mpox has historically ranged from 4 to 14 days (ref. 5). MPXV is divided genetically into two clades—clade I (formerly known as Congo Basin clade) and clade II (formerly known as West African clade); clade II is further classified into subclades IIa and IIb (ref. 6). Clade I and subclade IIa circulate endemically within as yet unknown animal reservoirs, potentially including rodents and nonhuman primates, and human cases are mostly the result of spillover from these reservoirs1,7,8. Historical surveillance has not been sufficient to identify the frequency of spillover. In 2022, mpox epidemiology shifted with the emergence of a new lineage—clade IIb—that spread worldwide through human-to-human transmission, and based on the vast number of sequences from this outbreak, it was inferred that clade IIb has circulated continually within humans since at least 20169. Mpox human-to-human transmission primarily occurs through direct contact with infected lesions or bodily fluids, which includes sexual contact, but transmission can also occur through contact with fomites10.
rom this outbreak, it was inferred that clade IIb has circulated continually within humans since at least 20169. Mpox human-to-human transmission primarily occurs through direct contact with infected lesions or bodily fluids, which includes sexual contact, but transmission can also occur through contact with fomites10. In May 2022, a novel lineage of clade IIb, termed B.1, emerged and spread globally, establishing efficient local transmission within many countries with no previous history of mpox transmission. As of 12 August 2024, the multicountry outbreak has been associated with 99,176 cases and 208 fatalities from 116 countries, areas and territories, representing a case fatality ratio (CFR) of 0.21%. The outbreak is primarily driven by sexual transmission among males who self-identify as men who have sex with men, with 7% of cases requiring hospitalization. Other groups at higher risk of hospitalization include female cases, those younger than 5 years of age or greater than 65 years of age, and the immunosuppressed (either due to being HIV positive or from other immunocompromising conditions)11. In response to the global outbreak, a number of countries have started to establish mpox surveillance programs.
pitalization include female cases, those younger than 5 years of age or greater than 65 years of age, and the immunosuppressed (either due to being HIV positive or from other immunocompromising conditions)11. In response to the global outbreak, a number of countries have started to establish mpox surveillance programs. In addition to the global outbreak, the detection of mpox is spreading to new African countries such as Burundi, Rwanda, Uganda and Kenya, which led to the declaration of a second Public Health Emergency of International Concern in August 2024 by the World Health Organization (WHO)12. In 2024, and as of 14 September 2024, a total of 21,835 suspected mpox cases were reported by the Democratic Republic of the Congo, including 5,160 laboratory-confirmed cases (sample testing rate: 46%), with a positivity rate of 51.3% (ref. 12). This is substantially higher than those of previous years. This recent increase, along with the newly documented sexual transmission recorded in March 2023 in Kwango province and then in September 2023 in South Kivu province, which later drove the spread to Burundi, Rwanda, Uganda and Kenya, confirms the growing importance of human-to-human transmission, including through sexual contact, in Central and eastern Africa13–15.
mented sexual transmission recorded in March 2023 in Kwango province and then in September 2023 in South Kivu province, which later drove the spread to Burundi, Rwanda, Uganda and Kenya, confirms the growing importance of human-to-human transmission, including through sexual contact, in Central and eastern Africa13–15. The global mpox surveillance that was quickly established in 2022 provided a platform for the generation of genome sequencing data. These data have been useful in characterizing MPXV evolution, understanding the origins of emerging lineages and monitoring local and global spread. Previous studies have shown that clade IIb exhibits a higher substitution rate than other Orthopoxvirus variants9,16–18. This appears to be due to elevated TC>TT mutations (which represent C mutating to T with an upstream T nucleotide and also include the reverse GA>AA mutations) driven by human apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like 3 (APOBEC-3) proteins causing cytosine deamination in the viral genome9,16–18. This mutational signature has enabled inference that clade IIb is transmitting from human to human. In addition to characterizing mutations and mutational processes, global genomic surveillance can also enable monitoring of the integrity and stability of the MPXV genomic termini, which can rearrange driving gene duplication or gene loss19. These rearrangements are also drivers of poxvirus evolution and host adaption, and are therefore important to monitor20,21.
utational processes, global genomic surveillance can also enable monitoring of the integrity and stability of the MPXV genomic termini, which can rearrange driving gene duplication or gene loss19. These rearrangements are also drivers of poxvirus evolution and host adaption, and are therefore important to monitor20,21. Mpox surveillance programs and their associated genomic strategies globally remain critical to the understanding of the disease and the characterization of the virus evolutionary trajectory and genetic diversity, and provide potential insights into the associated phenotype. Ultimately, these data support the deployment of suitable countermeasures (diagnostics, therapeutics and vaccines), leveraging efforts garnered from smallpox interventions before eradication and preparedness in the years since, as well as advance research and development for countermeasures. Here we report a global analysis of the publicly available MPXV genomic sequence data, which provide key insights into the spatiotemporal spread, host species range and evolution of this ongoing threat.
We collected all available MXPV sequences and filtered them to retain 10,546 high-quality sequences from 64 countries (Methods). Of these sequences, 6,585 were extracted from GenBank (62%) and 3,914 sequences from the Global Initiative for Sharing of All Influenza Data (GISAID; 37%). Owing to its distinct epidemiology and sequence diversity, we divided clade IIb into the A sublineages and the B.1 lineage for the analyses below, and refer to these groupings as clade IIb A and lineage B.1, respectively. As expected, the majority (97.7%) of the available MPXV sequences cluster within lineage B.1 (Extended Data Fig. 1a), representing intensive genomic sequencing efforts during the global outbreak. Clades I, IIa and IIb A have been sequenced far less often (Extended Data Fig. 1a). Correspondingly, the majority (98.6%) of MPXV sequences were collected from 2022 to 2024, with limited historical surveillance resulting in many years between 1958 and 2015 with no sequences in global databases (Extended Data Fig. 1b). Nonetheless, we observe differences in the temporal, spatial and host species distributions of the major MPXV clades and discuss them below.
were collected from 2022 to 2024, with limited historical surveillance resulting in many years between 1958 and 2015 with no sequences in global databases (Extended Data Fig. 1b). Nonetheless, we observe differences in the temporal, spatial and host species distributions of the major MPXV clades and discuss them below. Subclade IIa and clade I were the first to be detected, in 1958 and 1970, respectively (Fig. 1 and Extended Data Fig. 1c)22,23. Sporadic detection of both clades has continued through to recent years (Fig. 1 and Extended Data Fig. 1c), showing continued circulation within the animal reservoir. Clade I continued to be detected in the Democratic Republic of the Congo and Sudan during the lineage B.1 outbreak between 2022 and 2024 (Fig. 1), with the Democratic Republic of the Congo cases in 2024 being associated with a novel divergent lineage showing signatures of human-to-human transmission within South Kivu14,15. Clade IIa has not been observed since 2018 (Fig. 1). Except for a sample from 1971, all clade IIb genomes were sampled from 2017 to 2023. The 1971 sample probably reflects the fact that also clade IIb originated in an animal reservoir. Clade IIb A was first detected in Nigeria in 2017 and has continued circulating through human-to-human transmission to at least 20238, while the descendent lineage B.1 was first detected in 2022 (Fig. 1 and Extended Data Fig. 1c).Fig. 1Spatiotemporal and host species distributions of MPXV sequences (worldwide, 1958–2024).a, Maximum likelihood phylogenetic tree highlighting the major clades of MPXV. The branches are colored by clade. Lineage B.1 clusters within clade IIb and caused the 2022 global MPXV outbreak; this lineage is therefore separated from the remainder of clade IIb. The scale bar shows the expected number of nucleotide substitutions per site. A subset of clades are collapsed for clarity. b, The temporal and regional distribution of MPXV sequences is shown for each clade. The n numbers show the total number of sequences from the clade. c, Distributions of the number of sequences from each host species.
he expected number of nucleotide substitutions per site. A subset of clades are collapsed for clarity. b, The temporal and regional distribution of MPXV sequences is shown for each clade. The n numbers show the total number of sequences from the clade. c, Distributions of the number of sequences from each host species. a, Maximum likelihood phylogenetic tree highlighting the major clades of MPXV. The branches are colored by clade. Lineage B.1 clusters within clade IIb and caused the 2022 global MPXV outbreak; this lineage is therefore separated from the remainder of clade IIb. The scale bar shows the expected number of nucleotide substitutions per site. A subset of clades are collapsed for clarity. b, The temporal and regional distribution of MPXV sequences is shown for each clade. The n numbers show the total number of sequences from the clade. c, Distributions of the number of sequences from each host species.
he expected number of nucleotide substitutions per site. A subset of clades are collapsed for clarity. b, The temporal and regional distribution of MPXV sequences is shown for each clade. The n numbers show the total number of sequences from the clade. c, Distributions of the number of sequences from each host species. Although it probably originated from an animal reservoir, clade IIb circulates via human-to-human transmission24, and correspondingly, both clade IIb A and B.1 lineages have, to date, been sampled exclusively in humans (Fig. 1). While clades I and IIa both circulate within poorly understood animal reservoirs, the hosts from which they have been sampled are markedly different. The majority of clade I sequences (96%) have been sampled from humans, with single samples from an outbreak in captive chimpanzees25, and from wild shrew (Crocidura littoralis) and rope squirrel (Funisciurus anerythrus)26. Conversely, only 12% (3 of 25) of clade IIa samples with recorded host species were collected in humans (Fig. 1), two in 1970 and one in 2003. Clade IIa has been isolated most often in chimpanzees (60%, Fig. 1), although these samples are mostly from a single study within Taï National Park in the Ivory Coast7, and chimpanzees are probably a spillover host rather than a reservoir host. Clade IIa has additionally been isolated from a wild sooty mangabey, imported cynomolgus monkeys in the USA and Denmark, and a prairie dog during the 2003 USA outbreak (Fig. 1)27.
rom a single study within Taï National Park in the Ivory Coast7, and chimpanzees are probably a spillover host rather than a reservoir host. Clade IIa has additionally been isolated from a wild sooty mangabey, imported cynomolgus monkeys in the USA and Denmark, and a prairie dog during the 2003 USA outbreak (Fig. 1)27. Clade I has mostly been isolated from the Congo Basin area, where it has been sampled in the Democratic Republic of the Congo, the Republic of the Congo, Central African Republic, Cameroon and Gabon (Fig. 2 and Extended Data Fig. 2). Sampling locations are broadly spread around these countries. Sequences from individual countries and provinces often do not cluster within the phylogenetic tree (Fig. 2), showing multiple introductions of clade I into local geographical regions.Fig. 2Regular international and inter-province transmission of clade I.a,b, Maximum likelihood phylogenetic tree of 113 high-quality clade I sequences. a, The tips are colored by country to match the map, and the shapes show the host species from which the sequence was isolated. The Sudan sequence cluster is highlighted. The 2024 South Kivu outbreak clade has been collapsed for clarity. The asterisks show phylogenetic nodes with bootstrap support of 70 or higher. The map shows sampling locations with points proportional to the number of sequences from the location. b, The tips are colored by province within the Democratic Republic of the Congo to match the map. Tips collected outside of the Democratic Republic of the Congo are colored gray and tips sampled within the Democratic Republic of the Congo but without a recorded province are colored black. c, We calculated the mutation distance between all possible pairs of clade I sequences (that is, each sequence was compared against all other sequences); the number of mutations is plotted stratified by whether the pairs are from the same (purple) or different (green) provinces.
t a recorded province are colored black. c, We calculated the mutation distance between all possible pairs of clade I sequences (that is, each sequence was compared against all other sequences); the number of mutations is plotted stratified by whether the pairs are from the same (purple) or different (green) provinces. a,b, Maximum likelihood phylogenetic tree of 113 high-quality clade I sequences. a, The tips are colored by country to match the map, and the shapes show the host species from which the sequence was isolated. The Sudan sequence cluster is highlighted. The 2024 South Kivu outbreak clade has been collapsed for clarity. The asterisks show phylogenetic nodes with bootstrap support of 70 or higher. The map shows sampling locations with points proportional to the number of sequences from the location. b, The tips are colored by province within the Democratic Republic of the Congo to match the map. Tips collected outside of the Democratic Republic of the Congo are colored gray and tips sampled within the Democratic Republic of the Congo but without a recorded province are colored black. c, We calculated the mutation distance between all possible pairs of clade I sequences (that is, each sequence was compared against all other sequences); the number of mutations is plotted stratified by whether the pairs are from the same (purple) or different (green) provinces.
t a recorded province are colored black. c, We calculated the mutation distance between all possible pairs of clade I sequences (that is, each sequence was compared against all other sequences); the number of mutations is plotted stratified by whether the pairs are from the same (purple) or different (green) provinces. Having first confirmed the presence of a temporal signal (Methods), we sought to identify the timescale of these virus movements by reconstructing a temporal phylogenetic tree (Fig. 3). We found that the most recent common ancestor of clade I (excluding the 2024 South Kivu outbreak that has a different substitution rate; Methods)9 occurred in approximately 1917 (95% highest probability density (HPD) 1880–1949). We observe frequent international and inter-province transmission over the past several decades (Figs. 2 and 3). For example, a clade sampled in Sud-Ubangi, Equateur and Kinshasa in 2023–2024 coalesces to a common ancestor in 2007 (95% HPD 2003–2011), supporting recent virus movement within the animal reservoir (Figs. 2b and 3). Furthermore, we observe highly similar distributions of genetic relatedness between clade I samples from the same and different provinces within the Democratic Republic of the Congo (Fig. 2c), further highlighting the regular movement of viruses between geographical locations.Fig. 3Temporal evolutionary history of clade I.The temporal maximum clade credibility phylogenetic tree is shown. The tips are colored by country of isolation. The red bars show the 95% HPD on the date of the corresponding node. The asterisks show nodes with posterior support of 70 or higher.
een geographical locations.Fig. 3Temporal evolutionary history of clade I.The temporal maximum clade credibility phylogenetic tree is shown. The tips are colored by country of isolation. The red bars show the 95% HPD on the date of the corresponding node. The asterisks show nodes with posterior support of 70 or higher. The temporal maximum clade credibility phylogenetic tree is shown. The tips are colored by country of isolation. The red bars show the 95% HPD on the date of the corresponding node. The asterisks show nodes with posterior support of 70 or higher. Clade I has been isolated outside of the Congo Basin in Sudan in 2005 (in a region that is now part of South Sudan) and 2022 (Fig. 2a). The two Sudan sequences cluster in the phylogenetic tree and share an ~10.5 Kb duplication (Extended Data Fig. 3). We estimate that the Sudan sequences diverged from their closest sampled relative (a Democratic Republic of the Congo sample from 1985) in 1978 (95% HPD 1969–1984); this lineage has therefore been sampled only in Sudan over roughly 46 years. It is likely that this lineage has circulated in the animal reservoir during this period as it shows 8.5% TC>TT mutations, highly similar to that expected in animals (8%) but far lower than expected from evolution in humans (85%)9. This is confirmed by the geographical spread (6 states in western, southern and eastern Sudan) of the 18 mpox cases that were laboratory confirmed in Sudan in 2022 (Extended Data Fig. 4).
s 8.5% TC>TT mutations, highly similar to that expected in animals (8%) but far lower than expected from evolution in humans (85%)9. This is confirmed by the geographical spread (6 states in western, southern and eastern Sudan) of the 18 mpox cases that were laboratory confirmed in Sudan in 2022 (Extended Data Fig. 4). Clades IIa and IIb A both circulate in West Africa and have been exported to other regions (Figs. 1 and 4). Clade IIa has not been observed outside West Africa since an outbreak in the USA in 2003 (Fig. 1). Samples of clade IIa from West Africa remain sparse with single sequences from Liberia and Sierra Leone from human cases in 1970 and two closely related clusters of sequences from chimpanzees in the Ivory Coast collected from 2017 to 2018 (Fig. 4a)7. We therefore currently lack the resolution to examine spatial transmission patterns in more detail for clade IIa.Fig. 4Spatial distributions of clades IIa and IIb A.a, Maximum likelihood phylogenetic tree of 25 high-quality clade IIa isolates. The tips are colored by host species and the shapes show sampling locations. The asterisks show nodes with bootstrap support of 70 or higher, and the scale bar shows the expected number of nucleotide substitutions per site. b, Maximum likelihood phylogenetic tree of 101 clade IIb A sequences. The tips are colored by country of collection. The red, blue and orange asterisks show samples with travel history. The potential Eastern Mediterranean clade containing samples with travel history to the United Arab Emirates and Saudi Arabia is highlighted. The black asterisks show nodes with bootstrap support of 70 or above, and the scale bar shows the expected number of nucleotide substitutions per site.
mples with travel history. The potential Eastern Mediterranean clade containing samples with travel history to the United Arab Emirates and Saudi Arabia is highlighted. The black asterisks show nodes with bootstrap support of 70 or above, and the scale bar shows the expected number of nucleotide substitutions per site. a, Maximum likelihood phylogenetic tree of 25 high-quality clade IIa isolates. The tips are colored by host species and the shapes show sampling locations. The asterisks show nodes with bootstrap support of 70 or higher, and the scale bar shows the expected number of nucleotide substitutions per site. b, Maximum likelihood phylogenetic tree of 101 clade IIb A sequences. The tips are colored by country of collection. The red, blue and orange asterisks show samples with travel history. The potential Eastern Mediterranean clade containing samples with travel history to the United Arab Emirates and Saudi Arabia is highlighted. The black asterisks show nodes with bootstrap support of 70 or above, and the scale bar shows the expected number of nucleotide substitutions per site.
mples with travel history. The potential Eastern Mediterranean clade containing samples with travel history to the United Arab Emirates and Saudi Arabia is highlighted. The black asterisks show nodes with bootstrap support of 70 or above, and the scale bar shows the expected number of nucleotide substitutions per site. While clade IIb A initially spread in Nigeria, it was then exported to other countries in Europe, Asia, North America and, more recently, North Africa (Figs. 1 and 4b). Genetic sequences from multiple individuals infected with clade IIb A from the UK, Israel, Singapore and India have travel history to Nigeria (Fig. 4b), supporting infection in endemic regions of Nigeria and subsequent export. In addition, eight sequences from India, South Korea, Vietnam and Thailand were isolated from travellers returning from the United Arab Emirates or Saudi Arabia; these sequences cluster within a single phylogenetic lineage that also includes sequences from the USA, UK, Slovenia and Egypt for which no information regarding recent travel is recorded (Fig. 4b). This is consistent with sustained circulation of a lineage of clade IIb A in the Eastern Mediterranean region; no sequences are currently available from the Eastern Mediterranean to allow further examination.
es from the USA, UK, Slovenia and Egypt for which no information regarding recent travel is recorded (Fig. 4b). This is consistent with sustained circulation of a lineage of clade IIb A in the Eastern Mediterranean region; no sequences are currently available from the Eastern Mediterranean to allow further examination. Previous studies have shown that mutational spectra of MPXV lineages that are transmitted from human to human show a high proportion of TC>TT mutations, implicating APOBEC-3 as a major driver of mutagenesis in human MPXV infections9. We calculated complete single-base substitution (SBS) mutational spectra (incorporating all possible nucleotide contexts for each mutation type) for the dominant clades of MPXV and corrected these for genomic composition (Fig. 5 and Methods). We find similar mutational patterns to previous studies9,18; for example, the spectrum of clade IIb (both A and B.1 lineages) is dominated by C>T mutations in which the C is preceded by a T, with preference for A or G following the substitution site (Fig. 5).Fig. 5Mutational spectra differ between major MPXV clades.a, SBS mutational spectra for clade I, clade IIb A and lineage B.1 (clade IIa was not included because of insufficient mutations; Methods). SBS spectra show the proportion of mutations of each mutation type within each surrounding nucleotide context; contexts for an example mutation type are shown in the right-hand panel. The two most prevalent contextual mutations are highlighted for clade I and lineage B.1. Mutational spectra are corrected for genome composition (Methods). Symmetrical mutations (for example, C>T and G>A) are combined as MPXV is a DNA pathogen29. b, The proportion of C>T mutations with each nucleotide upstream is shown for each clade. Each bar and dot show the proportion of C>T mutations that occur in the corresponding context within the clade (that is, n = 1 in each case). The error bars represent the Wilson score interval calculated using the corresponding proportion and number of sampled C>T mutations. c, To examine the potential for the clade IIa mutational spectrum to have been generated by the mutational spectra of clade I and lineage B.1, we compared the proportion of each mutation type in the clade IIa spectrum with that in 1,000 subsamplings of the other clade spectrum to the number of mutations in the clade IIa spectrum (Methods).
ential for the clade IIa mutational spectrum to have been generated by the mutational spectra of clade I and lineage B.1, we compared the proportion of each mutation type in the clade IIa spectrum with that in 1,000 subsamplings of the other clade spectrum to the number of mutations in the clade IIa spectrum (Methods). Each gray point represents the mutation type proportion in one subsample of the respective mutational spectrum while each red point shows the mutation type proportion in clade IIa. The clade IIa mutation type proportions are within that expected from clade I but often outside that expected from lineage B.1. The boxplot center lines show median values; the upper and lower bounds show the 25th and 75th quantiles, respectively; the upper and lower whiskers show the largest and smallest values within 1.5 times the interquartile range above the 75th percentile and below the 25th percentile, respectively.
cted from lineage B.1. The boxplot center lines show median values; the upper and lower bounds show the 25th and 75th quantiles, respectively; the upper and lower whiskers show the largest and smallest values within 1.5 times the interquartile range above the 75th percentile and below the 25th percentile, respectively. a, SBS mutational spectra for clade I, clade IIb A and lineage B.1 (clade IIa was not included because of insufficient mutations; Methods). SBS spectra show the proportion of mutations of each mutation type within each surrounding nucleotide context; contexts for an example mutation type are shown in the right-hand panel. The two most prevalent contextual mutations are highlighted for clade I and lineage B.1. Mutational spectra are corrected for genome composition (Methods). Symmetrical mutations (for example, C>T and G>A) are combined as MPXV is a DNA pathogen29. b, The proportion of C>T mutations with each nucleotide upstream is shown for each clade. Each bar and dot show the proportion of C>T mutations that occur in the corresponding context within the clade (that is, n = 1 in each case). The error bars represent the Wilson score interval calculated using the corresponding proportion and number of sampled C>T mutations. c, To examine the potential for the clade IIa mutational spectrum to have been generated by the mutational spectra of clade I and lineage B.1, we compared the proportion of each mutation type in the clade IIa spectrum with that in 1,000 subsamplings of the other clade spectrum to the number of mutations in the clade IIa spectrum (Methods). Each gray point represents the mutation type proportion in one subsample of the respective mutational spectrum while each red point shows the mutation type proportion in clade IIa. The clade IIa mutation type proportions are within that expected from clade I but often outside that expected from lineage B.1. The boxplot center lines show median values; the upper and lower bounds show the 25th and 75th quantiles, respectively; the upper and lower whiskers show the largest and smallest values within 1.5 times the interquartile range above the 75th percentile and below the 25th percentile, respectively.
cted from lineage B.1. The boxplot center lines show median values; the upper and lower bounds show the 25th and 75th quantiles, respectively; the upper and lower whiskers show the largest and smallest values within 1.5 times the interquartile range above the 75th percentile and below the 25th percentile, respectively. The clade I mutational spectrum shows that mutational processes within the animal reservoir are also dominated by C>T mutations (66% of the mutational burden accounting for genome composition), but with different contextual preferences to evolution within humans (Fig. 5). In clade I, C>T mutations occur most commonly in which the C is preceded by G (57% of the C>T mutational burden, Fig. 5b) and are most frequent in G[C>T]A and G[C>T]G contexts (Fig. 5a). C>A (which includes G>T mutations) is the second most common mutation in clade I (17% of the total mutational burden, Fig. 5). We observe contextual preferences within C>A mutations, with mutations in AC and GC contexts being less common (Fig. 5). As reactive oxygen species cause G>T (ref. 28), they are a potential driver of these mutations.
mutations) is the second most common mutation in clade I (17% of the total mutational burden, Fig. 5). We observe contextual preferences within C>A mutations, with mutations in AC and GC contexts being less common (Fig. 5). As reactive oxygen species cause G>T (ref. 28), they are a potential driver of these mutations. The clade IIa mutation spectrum contains 222 mutations, which may be too low to examine mutational patterns in detail29. We therefore used the clade I and lineage B.1 mutational spectra as references and investigated whether either could explain the clade IIa mutational spectrum. We found that the clade IIa mutational patterns could have been generated by the clade I spectrum but not by the lineage B.1 spectrum (Fig. 5c). This suggests that clade IIa shows similar mutational processes as clade I, consistent with a lack of human APOBEC-3 activity and therefore both clades consistently circulating outside of humans.
Subclade IIa and clade I were the first to be detected, in 1958 and 1970, respectively (Fig. 1 and Extended Data Fig. 1c)22,23. Sporadic detection of both clades has continued through to recent years (Fig. 1 and Extended Data Fig. 1c), showing continued circulation within the animal reservoir. Clade I continued to be detected in the Democratic Republic of the Congo and Sudan during the lineage B.1 outbreak between 2022 and 2024 (Fig. 1), with the Democratic Republic of the Congo cases in 2024 being associated with a novel divergent lineage showing signatures of human-to-human transmission within South Kivu14,15. Clade IIa has not been observed since 2018 (Fig. 1). Except for a sample from 1971, all clade IIb genomes were sampled from 2017 to 2023. The 1971 sample probably reflects the fact that also clade IIb originated in an animal reservoir. Clade IIb A was first detected in Nigeria in 2017 and has continued circulating through human-to-human transmission to at least 20238, while the descendent lineage B.1 was first detected in 2022 (Fig. 1 and Extended Data Fig. 1c).Fig. 1Spatiotemporal and host species distributions of MPXV sequences (worldwide, 1958–2024).a, Maximum likelihood phylogenetic tree highlighting the major clades of MPXV. The branches are colored by clade. Lineage B.1 clusters within clade IIb and caused the 2022 global MPXV outbreak; this lineage is therefore separated from the remainder of clade IIb. The scale bar shows the expected number of nucleotide substitutions per site. A subset of clades are collapsed for clarity. b, The temporal and regional distribution of MPXV sequences is shown for each clade. The n numbers show the total number of sequences from the clade. c, Distributions of the number of sequences from each host species.
Although it probably originated from an animal reservoir, clade IIb circulates via human-to-human transmission24, and correspondingly, both clade IIb A and B.1 lineages have, to date, been sampled exclusively in humans (Fig. 1). While clades I and IIa both circulate within poorly understood animal reservoirs, the hosts from which they have been sampled are markedly different. The majority of clade I sequences (96%) have been sampled from humans, with single samples from an outbreak in captive chimpanzees25, and from wild shrew (Crocidura littoralis) and rope squirrel (Funisciurus anerythrus)26. Conversely, only 12% (3 of 25) of clade IIa samples with recorded host species were collected in humans (Fig. 1), two in 1970 and one in 2003. Clade IIa has been isolated most often in chimpanzees (60%, Fig. 1), although these samples are mostly from a single study within Taï National Park in the Ivory Coast7, and chimpanzees are probably a spillover host rather than a reservoir host. Clade IIa has additionally been isolated from a wild sooty mangabey, imported cynomolgus monkeys in the USA and Denmark, and a prairie dog during the 2003 USA outbreak (Fig. 1)27.
Clade I has mostly been isolated from the Congo Basin area, where it has been sampled in the Democratic Republic of the Congo, the Republic of the Congo, Central African Republic, Cameroon and Gabon (Fig. 2 and Extended Data Fig. 2). Sampling locations are broadly spread around these countries. Sequences from individual countries and provinces often do not cluster within the phylogenetic tree (Fig. 2), showing multiple introductions of clade I into local geographical regions.Fig. 2Regular international and inter-province transmission of clade I.a,b, Maximum likelihood phylogenetic tree of 113 high-quality clade I sequences. a, The tips are colored by country to match the map, and the shapes show the host species from which the sequence was isolated. The Sudan sequence cluster is highlighted. The 2024 South Kivu outbreak clade has been collapsed for clarity. The asterisks show phylogenetic nodes with bootstrap support of 70 or higher. The map shows sampling locations with points proportional to the number of sequences from the location. b, The tips are colored by province within the Democratic Republic of the Congo to match the map. Tips collected outside of the Democratic Republic of the Congo are colored gray and tips sampled within the Democratic Republic of the Congo but without a recorded province are colored black. c, We calculated the mutation distance between all possible pairs of clade I sequences (that is, each sequence was compared against all other sequences); the number of mutations is plotted stratified by whether the pairs are from the same (purple) or different (green) provinces.
Previous studies have shown that mutational spectra of MPXV lineages that are transmitted from human to human show a high proportion of TC>TT mutations, implicating APOBEC-3 as a major driver of mutagenesis in human MPXV infections9. We calculated complete single-base substitution (SBS) mutational spectra (incorporating all possible nucleotide contexts for each mutation type) for the dominant clades of MPXV and corrected these for genomic composition (Fig. 5 and Methods). We find similar mutational patterns to previous studies9,18; for example, the spectrum of clade IIb (both A and B.1 lineages) is dominated by C>T mutations in which the C is preceded by a T, with preference for A or G following the substitution site (Fig. 5).Fig. 5Mutational spectra differ between major MPXV clades.a, SBS mutational spectra for clade I, clade IIb A and lineage B.1 (clade IIa was not included because of insufficient mutations; Methods). SBS spectra show the proportion of mutations of each mutation type within each surrounding nucleotide context; contexts for an example mutation type are shown in the right-hand panel. The two most prevalent contextual mutations are highlighted for clade I and lineage B.1. Mutational spectra are corrected for genome composition (Methods). Symmetrical mutations (for example, C>T and G>A) are combined as MPXV is a DNA pathogen29. b, The proportion of C>T mutations with each nucleotide upstream is shown for each clade. Each bar and dot show the proportion of C>T mutations that occur in the corresponding context within the clade (that is, n = 1 in each case). The error bars represent the Wilson score interval calculated using the corresponding proportion and number of sampled C>T mutations. c, To examine the potential for the clade IIa mutational spectrum to have been generated by the mutational spectra of clade I and lineage B.1, we compared the proportion of each mutation type in the clade IIa spectrum with that in 1,000 subsamplings of the other clade spectrum to the number of mutations in the clade IIa spectrum (Methods).
Implementing measures to control mpox outbreaks will require an in-depth understanding of how the virus is transmitting within human populations and among animals across different spatial scales. Here we carried out an in-depth analysis of spatiotemporal and host species patterns across all major MPXV clades. Our analysis revealed large differences in the spatiotemporal and host species patterns of the major clades. We identified regular international and inter-province transmission of clade I, including its likely circulation in animal reservoirs and/or endemicity in parts of eastern Africa; inferred transmission of clade IIb A in the Eastern Mediterranean; and showed that MPXV mutational patterns are associated with transmission route. Our analysis provides evidence of regular recent transmission of clade I among countries and provinces within Central Africa (Figs. 2 and 3). We also found co-circulation of multiple clade I lineages within individual countries and provinces (Figs. 2 and 3). This shows that high MXPV diversity is maintained within the clade I animal reservoir(s), suggesting high MPXV prevalence and thereby highlighting the potential for frequent spillover when humans interact with reservoir species. Our results also show that the animal reservoir is mobile. Understanding the speed and dynamics of virus movements may help to pinpoint possible reservoir species, as well as source locations, but this will require increased sequencing of epidemiologically representative samples in future.
eract with reservoir species. Our results also show that the animal reservoir is mobile. Understanding the speed and dynamics of virus movements may help to pinpoint possible reservoir species, as well as source locations, but this will require increased sequencing of epidemiologically representative samples in future. While clades I and IIa both circulate within animals and spillover into humans, we observe highly different distributions of host species among sampled genetic sequences, with clade I mostly being sampled from humans and clade IIa from animals (Fig. 1c). This difference in host sampling may be driven by (1) distinct abilities of the clades to infect humans, (2) differential disease severity in humans and/or animals altering the likelihood of case detection, (3) different contacts between humans and sampled animals and animal reservoirs (which may differ for the two clades) and/or (4) different surveillance and sampling strategies in humans and animals within the affected countries. Identifying the driver(s) of differential host sampling will require stronger surveillance in humans and animals, and linkage with sample metadata to determine likely routes of infection.
r for the two clades) and/or (4) different surveillance and sampling strategies in humans and animals within the affected countries. Identifying the driver(s) of differential host sampling will require stronger surveillance in humans and animals, and linkage with sample metadata to determine likely routes of infection. While clade I has mostly been detected within the Congo Basin, our data highlight the potential for widespread endemic and/or enzootic circulation of this clade within Sudan (former Sudan, now divided into Sudan and South Sudan). We identified a clade I lineage that has been sampled only from Sudan over roughly 45 years (Figs. 2 and 3). This lineage has probably circulated within the animal reservoir (owing to a lack of APOBEC-3-like mutations) and may have circulated continually in Sudan following introduction at any point during this time period. Alternatively, this lineage could have been introduced multiple times from an unsampled region either shortly before being sampled in Sudan or with some level of local transmission. Distinguishing between these possibilities will require additional sequencing data from Sudan. However, our results, combined with recent epidemiological data, highlight the potential for large numbers of MPXV cases in Sudan30. Following the declaration of the first Public Health Emergency of International Concern in July 2022, the Federal Ministry of Health of Sudan, with support from the WHO, started mpox surveillance. As a consequence, suspected mpox cases were reported from 17 states with 42 affected localities, of which some states host refugees and internally displaced persons. Over 40% of the suspected cases were children under the age of five30. Of the suspected cases, only a portion was tested and a total of 18 cases were laboratory confirmed from 6 states and 9 localities, including 1 death (CFR 5.8%)31.
cted localities, of which some states host refugees and internally displaced persons. Over 40% of the suspected cases were children under the age of five30. Of the suspected cases, only a portion was tested and a total of 18 cases were laboratory confirmed from 6 states and 9 localities, including 1 death (CFR 5.8%)31. Our data, combined with the recent emergence of a novel human transmissible clade I lineage in South Kivu14,15, where mpox cases have not previously been detected (except for a few cases in 2011 and 2012), highlight the potential for an increased risk of international spread of clade I MPXV. Recent studies highlighted that the affected population in South Kivu was composed of young adults, of which half were females and 30% were sex workers, with over 80% of patients reporting recent visits to bars for (professional) sexual interactions14,15,32.This represents a significant shift in the historical epidemiology of mpox in the Democratic Republic of the Congo, which involves children <15 years of age as the main affected age group14,33. As a comparison, in the years preceding the eradication of smallpox (1956–1971), the maximum number of smallpox cases reported by the Democratic Republic of the Congo (at that time Zaire) Ministry of Health to the WHO was 5,523 cases including 710 deaths in 196334, which is a quarter of the suspected mpox cases reported by the Democratic Republic of the Congo in 202433. Concerningly, a recent study has reported that 8 of 14 pregnant women with mpox had fetal loss32. Considering the high mobility of the population in South Kivu, cross-border transmission is likely to occur, as highlighted by the introductions of clade Ib to North Kivu35 and then Burundi, Uganda, Rwanda and Kenya. Further studies to better understand transmission patterns in these settings (including whether enzooticity has been established) and the geographical distribution of mpox in eastern Africa are urgently needed.
, as highlighted by the introductions of clade Ib to North Kivu35 and then Burundi, Uganda, Rwanda and Kenya. Further studies to better understand transmission patterns in these settings (including whether enzooticity has been established) and the geographical distribution of mpox in eastern Africa are urgently needed. The co-circulation of multiple clade I lineages in individual provinces combined with regular geographical movements and the potential for an East Africa clade I lineage suggests there is high prevalence and diversity of clade I within the animal reservoir(s). It is currently unclear whether this genotypic diversity is associated with phenotypic diversity. However, this high prevalence is likely to make control of clade I within the animal reservoir highly challenging. Prevention of human cases is therefore likely to require interventions at the human–animal interface and rapid detection and cessation of human-to-human transmission chains. Our results therefore underpin the importance of further studies to understand how humans become infected with clade I viruses and studies carrying out functional characterization of diverse clade I viruses.
rventions at the human–animal interface and rapid detection and cessation of human-to-human transmission chains. Our results therefore underpin the importance of further studies to understand how humans become infected with clade I viruses and studies carrying out functional characterization of diverse clade I viruses. Local human-to-human transmission chains were established in many countries across all six WHO regions during the 2022 MPXV global outbreak36. Local transmission of the ancestral clade IIb A lineages has also been identified in some cases outside of West Africa37. We were here able to infer local transmission of clade IIb A MPXV in the Eastern Mediterranean through travel data associated with sequences from other countries, despite sequences from that region being unavailable for analysis. This highlights the importance of associating detailed metadata with genetic sequences where possible.
able to infer local transmission of clade IIb A MPXV in the Eastern Mediterranean through travel data associated with sequences from other countries, despite sequences from that region being unavailable for analysis. This highlights the importance of associating detailed metadata with genetic sequences where possible. Mutational signatures have provided major insights into MPXV and can be used to identify lineages that are transmitting from human to human and to infer outbreak origin dates9,18. Here we calculated and compared in-depth mutational spectra of the major MPXV clades, showing that C>T mutations are most common within both human and animal hosts, but differences in contextual preferences exist between species (Fig. 5). The C>T mutations observed within clade I occur most commonly where the C is preceded by G (Fig. 5b) and are most frequent in G[C>T]A and G[C>T]G contexts (Fig. 5a). These mutations are unlikely to be the result of spontaneous deamination of cytosine as this has a strong preference for CG>TG contexts in human DNA38. The C>T mutations may therefore be the result of polymerase errors during genome replication, the action of alternative APOBEC enzymes within the animal reservoir and/or additional mutagens. The clade IIb spectra show similar enrichment of G over A and C as the nucleotide preceding C>T mutations (Fig. 5b), which may suggest that the GC>GT mutations are driven by a non-host species factor, but this will require additional future work to untangle. Our analyses suggest that the ratio between GC>GT mutations and TC>TT mutations is a reliable marker to distinguish human-to-human transmission from transmission in animals and is therefore potentially able to identify sustained human outbreaks from transmission from the reservoir.
e additional future work to untangle. Our analyses suggest that the ratio between GC>GT mutations and TC>TT mutations is a reliable marker to distinguish human-to-human transmission from transmission in animals and is therefore potentially able to identify sustained human outbreaks from transmission from the reservoir. The global outbreak has provided critical insights into the epidemiology of mpox in humans. However, it is unknown whether the characteristics of lineage B.1 were acquired following adaptation in humans, or whether they may be generalizable across MPXV clades. This can only be revealed with stronger surveillance, sharing of sequences and continued genotypic and functional characterization of the differences between clades and lineages39. Such characterization would shed light into the drivers of epidemiological differences between clades, but this work is technically challenging and, currently, few laboratories worldwide have such capacity. However, a small number of studies have found that the apparent difference in morbidity and mortality between clades I and II is probably driven by multiple proteins present in clade I but absent in clade II40,41. One particular area of interest is gene duplication, such as that found in clade I sequences from Sudan. Such duplications have been described for clade IIb sequences from the 2022 outbreak, when they were assumed to be involved in immune evasion and host range20,42,43. Gene duplication and loss in the MPXV terminal regions are also considered drivers of poxvirus evolution and adaption to the host20,21. Analyses comparing strains with and without deletions will be essential to uncover their functional consequences and understand and forecast the epidemiology of lineages showing such changes. In addition, deletions may lead to diagnostic failure, especially for nucleic acid amplification tests that target less conserved genes, such as some of the clade-specific PCRs, which are designed to distinguish clades. To date, there have been two reported examples of such diagnostic failure episodes, one for a variant of MPXV clade IIb detected in the USA that did not spread widely44, and one for the clade I lineage currently circulating in South Kivu14. This clearly highlights the critical importance of a strategic genomic surveillance system where mpox circulates.
ted examples of such diagnostic failure episodes, one for a variant of MPXV clade IIb detected in the USA that did not spread widely44, and one for the clade I lineage currently circulating in South Kivu14. This clearly highlights the critical importance of a strategic genomic surveillance system where mpox circulates. As highlighted in the standing recommendations for mpox issued by the director general of the WHO, it is critical that countries have national mpox strategic plans integrated into broader health systems, and that capacities that have been built in resource-limited settings and among marginalized groups should be sustained45. Without surveillance, no genomic sequence data can be generated and no virological characterization of circulating clades and lineages can be done. In this regard, more laboratories should engage in virological characterization of MPXV clades and lineages. Furthermore, countries are strongly encouraged to continue documenting and making sequences publicly available, prioritizing specimens for both targeted sequencing (for example, of imported cases, the first few cases of local emergence and cases with divergent demographic or clinical profiles) and representative sequencing. This will enable the tracking of virus circulation and evolution over time. If we want to prevent the next mpox global outbreak, it is now time to strengthen mpox surveillance, including in Africa, focusing on populations at highest risk and ensuring integration with existing systems to ensure comprehensive and seamless delivery of care.
We aimed to collate a dataset containing all available high-quality MPXV genetic sequences. To do this, we initially downloaded all MPXV nucleotide sequences from GenBank (identified as sequences containing at least one of the search terms ‘monkeypox’, ‘mpox’, ‘MPXV’ and ‘MPV’, n = 7,252) and the GISAID EpiPox database (n = 8,843) as of 17 February 2024. We then combined these datasets and filtered out duplicate and lower-quality sequences. To do this, we initially removed sequences containing fewer than 30,000 nucleotides (nt), with this cutoff chosen to retain historical MPXV sequences that were ~32,000 nt in length. We next discarded sequences with >20% indeterminate bases (Ns) and assigned clades and (for clade IIb) Pango lineages using Nextclade46. Sequences that could not be assigned a clade were assumed to be low quality and were excluded from further analysis. Based on the clade and lineage assignments from Nextclade, we divided the sequence dataset into four groups: clade I, clade IIa, clade IIb A (containing sequences from the A sublineages within clade IIb but not those within lineage B.1 and its descendent lineages) and lineage B.1 (containing sequences from lineage B.1 and its descendent lineages).
e assignments from Nextclade, we divided the sequence dataset into four groups: clade I, clade IIa, clade IIb A (containing sequences from the A sublineages within clade IIb but not those within lineage B.1 and its descendent lineages) and lineage B.1 (containing sequences from lineage B.1 and its descendent lineages). We identified sequences that were duplicated between GenBank and GISAID initially by identifying sequences with the same sample name, country and collection date. After removing one of each of these sequence pairs, we carried out an additional phylogenetic screen for duplicate sequences in the clade I, clade IIa and clade IIb A datasets. Sequences within each of the four datasets were aligned using squirrel v0.1 (https://github.com/aineniamh/squirrel)9, specifying the clade to which the sequences belong. We then reconstructed a phylogenetic tree for each clade using IQ-TREE v2.1.3 (ref. 47), using a Jukes-Cantor (JC) model of nucleotide substitution. We identified closely related pairs of sequences in the resulting phylogenetic trees and checked their sequence names, countries and collection dates to determine whether they might be duplicates. Where the sequence names and collection dates were similar (that is, the sequence names contained shared elements and the collection dates were the same to the most accurate level possible), we retained one of the sequences.
r sequence names, countries and collection dates to determine whether they might be duplicates. Where the sequence names and collection dates were similar (that is, the sequence names contained shared elements and the collection dates were the same to the most accurate level possible), we retained one of the sequences. We carried out a further quality control check by analyzing the root-to-tip distances of sequences compared with their collection date48. For clade I, clade IIa and clade IIb A, we aligned sequences with squirrel v0.1 and reconstructed maximum likelihood phylogenetic trees using IQ-TREE v2.1.3 as above but including an outlier sequence (outlier accession numbers KJ642617.1 for clade I, KJ642616.1 for clade IIa and clade IIb A) that was used to root the tree. We then identified sequences that were clear outliers in a root-to-tip plot of the rooted tree in TempEst v1.5.3 (ref. 48) and removed these from further analyses. Owing to the large size of the lineage B.1 dataset (n = 10,369 sequences before filtering), instead of reconstructing a phylogenetic tree, we carried out a root-to-tip-like analysis by comparing the collection date with the mutation distance to sample ON676708.1, which clusters immediately upstream of lineage B.1. We aligned the lineage B.1 dataset and ON676708.1 using squirrel v0.1 as above and calculated the number of mutations between each sequence and ON676708.1. This showed a strong correlation (Extended Data Fig. 5), so we generated a linear model between sample collection date and this mutation distance and removed samples whose residual within the linear model is more than five times the median absolute deviation away from the median residual (Extended Data Fig. 5). These filtering steps resulted in final datasets of 113 clade I sequences, 25 clade IIa sequences, 101 clade IIb A sequences and 10,307 lineage B.1 sequences. The majority of these sequences contain close to the complete genome (Extended Data Fig. 6 and Supplementary Tables 1 and 2). In addition, we have added 47 sequences from a GitHub directory of a recent paper that describes highly divergent viruses15.
equences, 101 clade IIb A sequences and 10,307 lineage B.1 sequences. The majority of these sequences contain close to the complete genome (Extended Data Fig. 6 and Supplementary Tables 1 and 2). In addition, we have added 47 sequences from a GitHub directory of a recent paper that describes highly divergent viruses15. We identified the spatiotemporal and host species distributions of sequences within each dataset using location, collection data and host species metadata associated with the sequence accession on either GenBank or GISAID. Where these metadata were missing, we attempted to identify them within original publications. To visualize the phylogenetic relationships between the clades, we reconstructed a tree containing all sequences from clades I, IIa and IIb A and the sublineage references for each of the sublineages within B.1 (accession numbers obtained from https://github.com/mpxv-lineages/lineage-designation/blob/master/auto-generated/lineages.md). We aligned these sequences using squirrel v0.1 with clade set to clade II and reconstructed a maximum likelihood phylogenetic tree as above.
ge references for each of the sublineages within B.1 (accession numbers obtained from https://github.com/mpxv-lineages/lineage-designation/blob/master/auto-generated/lineages.md). We aligned these sequences using squirrel v0.1 with clade set to clade II and reconstructed a maximum likelihood phylogenetic tree as above. To examine phylogenetic relationships within each clade, we aligned sequences within the respective dataset using squirrel v0.1 with clade set to clade I for the clade I dataset and set to clade II for the remaining datasets. Final maximum likelihood phylogenetic trees were reconstructed for each dataset using IQ-TREE v2.1.3 as above. Topological robustness was assessed using 1,000 bootstrap replicates. Travel histories for clade IIb A were identified in GISAID metadata and from examination of original publications. Phylogenetic trees were visualized using FigTree v1.4.4 and ggtree v3.0.2 (ref. 49). We aimed to reconstruct the temporal history of clade I. The recently identified outbreak in South Kivu contains evidence of APOBEC-3 mutagenesis15, which increases the substitution rate9 and may make the application of a single clock model unreliable. We therefore did not include the sequences from this outbreak in the temporal reconstruction.
e temporal history of clade I. The recently identified outbreak in South Kivu contains evidence of APOBEC-3 mutagenesis15, which increases the substitution rate9 and may make the application of a single clock model unreliable. We therefore did not include the sequences from this outbreak in the temporal reconstruction. Methods to infer temporal history are only valid if there is a temporal signal within the dataset48. We assessed temporal signal using root-to-tip randomization in which we compared the R2 correlation between the sample collection date and root-to-tip distance with that in 1,000 randomizations of collection dates. This supported the presence of a temporal signal (P < 0.001). We therefore reconstructed the temporal history of clade I using BEAST v2.6.6 using the JC69 model of nucleotide substitution. We used a relaxed log-normal clock model with a log-normal prior on the substitution rate with mean 1.9 × 10−6 (chosen to match the estimated slope in TempEst) and standard deviation 0.5. Population history was modeled using a coalescent constant population prior. Four independent runs were carried out for 150,000,000 total MCMC steps. Convergence was assessed using Tracer v1.7 (refs. 49,50) and all effective sample size values were above 550. From each run, 10% burnin was removed before the runs were combined and the maximum clade credibility tree was identified and annotated using TreeAnnotator. Genome rearrangements in the Sudan clade I sequences were identified as described previously19.
Methods to infer temporal history are only valid if there is a temporal signal within the dataset48. We assessed temporal signal using root-to-tip randomization in which we compared the R2 correlation between the sample collection date and root-to-tip distance with that in 1,000 randomizations of collection dates. This supported the presence of a temporal signal (P < 0.001). We therefore reconstructed the temporal history of clade I using BEAST v2.6.6 using the JC69 model of nucleotide substitution. We used a relaxed log-normal clock model with a log-normal prior on the substitution rate with mean 1.9 × 10−6 (chosen to match the estimated slope in TempEst) and standard deviation 0.5. Population history was modeled using a coalescent constant population prior. Four independent runs were carried out for 150,000,000 total MCMC steps. Convergence was assessed using Tracer v1.7 (refs. 49,50) and all effective sample size values were above 550. From each run, 10% burnin was removed before the runs were combined and the maximum clade credibility tree was identified and annotated using TreeAnnotator. Genome rearrangements in the Sudan clade I sequences were identified as described previously19. We calculated an SBS mutational spectrum for each of the major MPXV clades using the sequence alignments and maximum likelihood phylogenetic trees generated above and containing all high-quality sequences. Phylogenetic trees were outgroup rooted using the outgroups described above to enable the direction of each mutation (that is, the ancestral and mutated nucleotides) to be robustly identified. The outgroup was removed before spectrum calculation. We reconstructed mutational spectra using MutTui v2.0.2 (https://github.com/chrisruis/MutTui)29. We rescaled the resulting mutational spectra using MutTui v2.0.2 to account for the number of A, C, G and T nucleotides and the distribution of nucleotide triplets across the genome29.
s removed before spectrum calculation. We reconstructed mutational spectra using MutTui v2.0.2 (https://github.com/chrisruis/MutTui)29. We rescaled the resulting mutational spectra using MutTui v2.0.2 to account for the number of A, C, G and T nucleotides and the distribution of nucleotide triplets across the genome29. It has previously been suggested that a dataset requires at least 300–600 mutations for the mutational spectrum to be accurately estimated29. The clade I, clade IIb A and lineage B.1 datasets each contain more than 600 mutations. However, the clade IIa spectrum contains 222 mutations so we did not attempt to examine the detailed contextual patterns in these mutations. To estimate whether the clade IIa mutational spectrum could have been generated by the spectrum of clade I or clade IIb, we compared the mutation type proportions in the clade IIa spectrum with those in 1,000 random downsamplings of the clade I and lineage B.1 mutational spectra to 222 mutations. Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
equences, 101 clade IIb A sequences and 10,307 lineage B.1 sequences. The majority of these sequences contain close to the complete genome (Extended Data Fig. 6 and Supplementary Tables 1 and 2). In addition, we have added 47 sequences from a GitHub directory of a recent paper that describes highly divergent viruses15. We identified the spatiotemporal and host species distributions of sequences within each dataset using location, collection data and host species metadata associated with the sequence accession on either GenBank or GISAID. Where these metadata were missing, we attempted to identify them within original publications.
To visualize the phylogenetic relationships between the clades, we reconstructed a tree containing all sequences from clades I, IIa and IIb A and the sublineage references for each of the sublineages within B.1 (accession numbers obtained from https://github.com/mpxv-lineages/lineage-designation/blob/master/auto-generated/lineages.md). We aligned these sequences using squirrel v0.1 with clade set to clade II and reconstructed a maximum likelihood phylogenetic tree as above. To examine phylogenetic relationships within each clade, we aligned sequences within the respective dataset using squirrel v0.1 with clade set to clade I for the clade I dataset and set to clade II for the remaining datasets. Final maximum likelihood phylogenetic trees were reconstructed for each dataset using IQ-TREE v2.1.3 as above. Topological robustness was assessed using 1,000 bootstrap replicates. Travel histories for clade IIb A were identified in GISAID metadata and from examination of original publications. Phylogenetic trees were visualized using FigTree v1.4.4 and ggtree v3.0.2 (ref. 49).
We aimed to reconstruct the temporal history of clade I. The recently identified outbreak in South Kivu contains evidence of APOBEC-3 mutagenesis15, which increases the substitution rate9 and may make the application of a single clock model unreliable. We therefore did not include the sequences from this outbreak in the temporal reconstruction. Methods to infer temporal history are only valid if there is a temporal signal within the dataset48. We assessed temporal signal using root-to-tip randomization in which we compared the R2 correlation between the sample collection date and root-to-tip distance with that in 1,000 randomizations of collection dates. This supported the presence of a temporal signal (P < 0.001). We therefore reconstructed the temporal history of clade I using BEAST v2.6.6 using the JC69 model of nucleotide substitution. We used a relaxed log-normal clock model with a log-normal prior on the substitution rate with mean 1.9 × 10−6 (chosen to match the estimated slope in TempEst) and standard deviation 0.5. Population history was modeled using a coalescent constant population prior. Four independent runs were carried out for 150,000,000 total MCMC steps. Convergence was assessed using Tracer v1.7 (refs. 49,50) and all effective sample size values were above 550. From each run, 10% burnin was removed before the runs were combined and the maximum clade credibility tree was identified and annotated using TreeAnnotator.
We calculated an SBS mutational spectrum for each of the major MPXV clades using the sequence alignments and maximum likelihood phylogenetic trees generated above and containing all high-quality sequences. Phylogenetic trees were outgroup rooted using the outgroups described above to enable the direction of each mutation (that is, the ancestral and mutated nucleotides) to be robustly identified. The outgroup was removed before spectrum calculation. We reconstructed mutational spectra using MutTui v2.0.2 (https://github.com/chrisruis/MutTui)29. We rescaled the resulting mutational spectra using MutTui v2.0.2 to account for the number of A, C, G and T nucleotides and the distribution of nucleotide triplets across the genome29. It has previously been suggested that a dataset requires at least 300–600 mutations for the mutational spectrum to be accurately estimated29. The clade I, clade IIb A and lineage B.1 datasets each contain more than 600 mutations. However, the clade IIa spectrum contains 222 mutations so we did not attempt to examine the detailed contextual patterns in these mutations. To estimate whether the clade IIa mutational spectrum could have been generated by the spectrum of clade I or clade IIb, we compared the mutation type proportions in the clade IIa spectrum with those in 1,000 random downsamplings of the clade I and lineage B.1 mutational spectra to 222 mutations.
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41591-024-03370-3.
Supplementary InformationSupplementary table titles and Supplementary Table 2. Reporting Summary Supplementary Table 1GISAID sample accessions and acknowledgements. Supplementary table titles and Supplementary Table 2. Reporting Summary GISAID sample accessions and acknowledgements.