Main Page

From COVID-19
Jump to navigation Jump to search

A Guide to SARS-CoV-2 and the COVID-19 Pandemic

Nexus COVID-19 Response Team
Cambridge, MA
June 1, 2023

Background Information

“An ounce of prevention is worth a pound of cure.”

―Benjamin Franklin, Statesman (1706-1790)


Declared a Public Health Emergency of International Concern (PHEIC) by the World Health Organization (WHO) on January 30, 2020 and a Pandemic (Phase 6) on March 11, 2020, the Coronavirus Disease of 2019 or COVID-19 is caused by SARS-CoV-2, formerly 2019-nCoV, an emergent and zoonotic RNA-virus and the second-known novel Severe Acute Respiratory Syndrome Coronavirus affecting the human species. At the time of writing, the scale, scope, and reach of the COVID-19 Pandemic have yet to be fully realized, as the global counts of infections and deaths continue to increase, causing clinical stresses or near-breaking points for healthcare systems and medical infrastructures worldwide, while the long-term societal and global economic consequences still have yet to be determined. There is currently no universally-accepted treatment for COVID-19 or widely available vaccine against SARS-CoV-2, which can cause an overwhelming immune response and other systemic detriments even death in some (with currently unclear pathophysiology) due to a general lack of natural immunity in all. The combination of human-to-human transmission through airborne aerosolized respiratory droplets, a relatively high rate of transmissibility including by asymptomatic and pre-symptomatic but contagious individuals (due to a relatively lengthy incubation period) through currently unknown modes of communicability, relatively high case fatality rate compared to the most common influenza viruses, and a significant potential for global endemicity have made this pandemic particularly swift, lethal, and of immediate global concern.

As of October 20, 2020, exactly 287 days since the reported identification (January 7, 2020) of the aforementioned novel coronavirus responsible for the viral pneumonia of unknown etiology in the outbreak in Wuhan, Hubei Province, China, the global infection count exceeds 40,634,575 reported COVID-19 cases as confirmed by laboratory testing from 744,948,090 reported samples across 216 countries and territories and 2 international conveyances (i.e., ocean liners). These consist of 9,651,107 (23.75%) active cases (9,578,333 mild/moderate cases and 72,774 serious/critical cases) and 30,983,468 (76.25%) closed cases (29,860,725 recoveries/discharges and 1,122,743 related deaths directly and indirectly attributed to COVID-19), yielding a global Testing Rate (TR) of 9.57%, Positivity Rate (PR) of 5.45%, Provisional Infection Rate (PIR) of 5.22‰ (permille), Provisional Mortality Rate (PMR) of 144 dpm (deaths per million), Provisional [1]Case Fatality Rate (PCFR) of 2.76%, and a 14-Day Adjusted PCFR of 3.16%. The global distribution of reported COVID-19 cases by country is illustrated in Figure 1.1a. The time series of cumulative reported COVID-19 cases of several leading countries and Europe since first reporting 10,000 COVID-19 cases is illustrated in Figure 1.1b.

Figure 1.1a: Global Distribution of Reported COVID-19 Cases by Country and Fraction (October 20, 2020, Adapted from

The United States of America (U.S.), constituting 50 states, four territories (e.g., Puerto Rico, Guam, the U.S. Virgin Islands, and the Commonwealth of the Northern Mariana Islands), and the District of Columbia, is currently the global epicenter of the COVID-19 Pandemic in terms of total reported cases, daily reported case rate, total reported deaths, and daily reported death rate with 8,453,185 reported COVID-19 cases from 127,054,590 laboratory tests and 225,191 reported COVID-19-related deaths, which accounts for 20.80% (case fraction), 17.06% (test fraction), and 20.06% (death fraction) of the corresponding worldwide counts, respectively, yielding a country-wide TR, PR, PIR, PMR, and PCFR of 38.32%, 6.65%, 25.50‰, 679 dpm, and 2.66%, respectively.

Figure 1.1b Time Series of Cumulative Reported COVID-19

Figure 1.1c Time Series of Cumulative Reported COVID-19 Deaths for Several Leading Countries since first reporting 100 COVID-19 Deaths (October 20, 2020)

The State of New York leads the other 49 states and territories in terms of total U.S. COVID-19-related deaths, as the former domestic epicenter of the COVID-19 pandemic, with 521,215 reported cases from 12,982,175 tests and 33,497 related deaths, which is 6.17%, 10.22%, and 14.87% of the corresponding U.S. counts, respectively, with a state-wide TR, PR, PIR, PMR, and PCFR of 66.73%, 4.01%, 26.79‰, 1,722 dpm, and 6.43%, respectively. New York City (NYC), in particular, previously the epicenter within the global epicenter in terms of total reported cases, has 245,086 reported cases from 4,428,340 reported laboratory tests with 23,782 related deaths[2], which is 50.59% (cases fraction), 44.37% (test fraction), and 71.66% (death fraction) of the corresponding state-wide counts, respectively, yielding a city-wide TR, PR, PIR, PMR, and PCFR of 52.45%, 5.53%, 29.03‰, 2,821 dpm, and 9.70%, respectively. Detailed global, country, and U.S. data may be found in the sequel in Chronology, Data, and Observations.

Of the five boroughs comprising NYC, in fact, across all counties comprising the state of New York, Queens County leads the other boroughs and counties in terms of cases (second only to Kings County in deaths) with 72,979 reported cases from 1,123,958 reported laboratory tests with 7,244 related deaths, which is 29.78% (case fraction), 25.38% (test fraction), and 30.46% (death fraction) of the corresponding city-wide counts, respectively, yielding a borough/county-wide TR, PR, PIR, PMR, and PCFR of 48.90%, 6.49%, 31.75‰, 3,152 dpm, and 9.93%, respectively. On the level of neighborhoods within the state of New York, Corona and North Corona (ZIP Code: 11368) leads all others in terms of both cases and deaths with 5,154 reported cases and 447 related deaths, representing 7.06% and 6.17% of the Queens County counts, respectively, yielding a ZIP code-wide PIR, PMR, and PCFR of 46.88‰, 4,066 dpm, and 8.67%, respectively.

Across the U.S. recently reported COVID-19 cases initially peaked on April 24, 2020 (39,130), then subsided to a low on June 7, 2020 (18,934). In early July 2020, however, due to premature lifting of Stay-at-Home orders and advisories as well as poor adoption of on-going safety measures, a widespread resurgence of COVID-19 outbreaks took place in as many almost all contiguous U.S. states, most notably Florida, California, and Texas, all of which have now overtaken New York in terms of total reported cases. Together, these four states comprise 3,035,890 reported cases (35.91% of U.S. cases), 43,849,970tests (34.51% of U.S. tests), and 84,122 reported deaths (37.36% of U.S. deaths). In particular, the State of California now leads the other 49 states, districts, and territories in terms of total reported COVID-19 cases with 880,871 cases (10.42% case fraction) from 17,042,408 tests (13.41% test fraction) and 17,001 related deaths (7.55% death fraction). Los Angeles County currently leads all U.S. counties in terms of reported COVID-19 cases with 261,446 reported cases (33.07% of CA), 2,552,055 tests (18.67% of CA), and 6,366 related deaths (42.24% of CA). Similar resurgence of outbreaks of COVID-19 have occurred in several other countries.

To add further uncertainty, however, all of the aforementioned counts may be significant underestimates that ignore or incorrectly discount many thousands, even millions of tested, infected, and/or deceased individuals as a result of COVID-19, including the pre-symptomatic, asymptomatic and/or mildly symptomatic carriers who do not seek laboratory testing, improperly documented due to innate errors with the testing procedure (i.e., faulty reagents and false negatives), and inaccurately documented cause(s) of death. As of writing, the CDC estimates that the reported COVID-19 cases in the U.S. are a factor of 6-24 times too low, and the reported COVID-19-related deaths are approximately 50% too low. Other studies involving random testing and RNA detection in sewage samples estimate a factor for infections closer to two orders-of-magnitude, or 50-85 times, too low. It should also be noted that the time from initial infection to onset of symptoms to laboratory testing to reporting of test results to local and federal officials is approximately 7-28 days, which may pose further difficulty with response efforts.

Despite questions of data accuracy, the rapid growth of cases in locales once hit cannot be argued. To illustrate the rapid acceleration of the COVID-19 Pandemic for the U.S., in particular, we consider two important early dates, February 29, 2020 and March 13, 2020, as well as the week of March 20-26, 2020, all of which were pivotal for the four countries, namely, China, Italy, Spain, and the U.S., which accounted for 57.45-61.35% and 70.82-75.84% of the global share of COVID-19 reported cases and related deaths, respectively, during this time frame. On February 29, 2020, the U.S. had only 68 (0.08%) reported cases and just 1 (0.03%) death, the first reported[3] U.S. death due to COVID-19, compared to the significantly advanced situation in China, the first country to sustain an epidemic from the initial outbreak in Wuhan, Hubei Province, which had 79,824 (92.17%) reported cases and 2,870 (96.41%) related deaths, the majority share of the corresponding worldwide counts of 86,604 and 2,977, respectively. Two weeks later, on March 13, 2020, the U.S. declared a State of Emergency due to the rapidly increasing number of confirmed COVID-19 cases both domestically and globally, namely, 2,183 (1.50%) reported cases and 48 (0.88%) related deaths stateside among the worldwide counts of 145,417 reported cases and 5,427 related deaths, respectively, compared to the apparently stabilized situation in China which had grown marginally to 80,824 (55.58%) reported cases and 3,189 (58.76%) related deaths.

Table 1.1: COVID-19 Reported Counts for China, Italy, Spain, and U.S. (March 20/26, 2020)
Reported Cases RC / Total Reported Deaths RD / Total
China 81,008 / 81,340 29.38% / 15.27% 3,255 / 3,292 28.41% / 13.33%
Italy 47,021 / 80,589 17.05% / 15.13% 4,032 / 8,215 35.19% / 33.27%
Spain 21,571 / 57,786 7.82% / 10.85% 1,093 / 4,365 9.54% / 17.68%
U.S 19,551 / 86,379 7.09% / 16.21% 309 / 1,614 2.70% / 6.54%
Subtotal 169,151 / 306,094 61.35% / 57.54% 8,689 / 17,486 75.84% / 70.82%
Earth 275,733 / 532,807 11,457 / 24,691

One week later, on March 20, 2020, the U.S. counts had grown by nearly an order of magnitude to 19,551 (7.09%) reported cases and 309 (2.70%) related deaths, consistent with exponential growth, among the 275,733 reported cases and 11,457 related deaths worldwide, which now involved all populated continents excluding Antarctica. At the time, the U.S. was fourth to China (81,008, 29.38%), Italy (47,021, 17.05%), and Spain (21,571, 7.82%) in terms of reported cases. By the end of the week, however, the U.S. surpassed all three countries and took the global lead in reported cases and later deaths, quadrupling its counts and becoming the new epicenter of the global pandemic. This rapid development was due in part to a convolution of delayed governmental and statewide responses, several potentially large seeding events (e.g., Mardi Gras parades, St. Patrick’s Day festivities, the Biogen Conference, etc.), outbreaks in locations with dense populations (e.g., long term care facilities, correctional institutions, and food packing factories, etc.), and the increasing availability of laboratory testing. During the same time frame, however, Italy and Spain had approximately doubled or trebled their reported cases from 41,021 to 80,589 and 21,571 to 57,786, respectively, causing a veritable crisis for their healthcare systems, while China remained anchored with a negligible increase from 80,928 to 81,340 reported cases, having added just 412 during the week in question. A comparison of the simultaneous daily growth of the cumulative COVID-19 reported cases for China, Italy, Spain, and the U.S. is shown in Figure 1.2. Corresponding counts of cumulative reported cases (RC), reported deaths (RD), and their global fractions for these four countries on the extremal dates March 20, 2020 and March 26, 2020 are given in Table 1.1.

Figure 1.2a: Cumulative Reported COVID-19 Cases of Four Leading Countries (March 20-26, 2020)

In an attempt to slow the progress of the escalating situation, as early as January 21, 2020, the WHO recommended a worldwide strategic response through ten key points including 1.) implementing measures to interrupt human-to-human transmission to reduce secondary infections, 2.) preventing amplification events such as social gatherings, 3.) reducing global spread through seeding events such as international travel, 4.) isolating and optimally caring for the infected, 5.) identifying and reducing transmission from the animal source, 6.) addressing crucial scientific and medical unknowns including clinical severity of COVID-19, 7.) accelerating development of diagnostics, therapeutics, and potential vaccines against SARS-CoV-2, 8.) communicating risk factors and event data, 9.) countering the spread of misinformation and disinformation, and 10.) mitigating societal and economic impacts. Unfortunately, while such directives are prudent and necessary, they require global cooperation, and they have not yet been fully adopted by all affected nations including, in particular, the U.S.

Table 1.2: Stage Distribution of Current Global COVID-19
Phase I
Phase II
Phase III

Figure 1.2 Global Clinical Trials

Despite this grave predicament, there remains hope. Due to the rapid mobilization of first responders, acceleration of collaborative scientific research, growing availability of diagnostic tests and experimental medications, including commencement of several clinical trials, fast-tracked governmental grant funding, and expedited Federal Drug Administration (FDA) approvals, several treatments for COVID-19 appear to be on the horizon. By October 19, 2020, there were 46 candidate vaccines in testing through various human trials (Table 1.2), with an additional 91 preclinical candidate vaccines in animal studies, and 17 drug treatments demonstrating some clinical promise. In support of these efforts, our pandemic response team of over 100 researchers, medical professionals, and content experts have mobilized across disciplines, international borders, and institutional affiliations in a large-scale and collaborative effort to develop a rapidly evolving work of scientific facts, data analysis, medical observations, and professional recommendations concerning SARS-CoV-2, the COVID-19 infection, and the resulting pandemic to aid the community at large with the understanding and development of potential therapeutics and vaccines. It is our shared belief that such a large and detailed resource is not only timely but also necessary as a valuable contribution to the existing literature during this on-going calamity.

Molecular Characteristics

Characteristics of Coronaviruses

It would be helpful to cite mutation rate relative to other viruses rather than to the host.RNA-viruses (or riboviruses) are viruses that use single or double stranded ribonucleic acid (RNA), but not deoxyribonucleic acid (DNA), as their constituent genomes. Such viruses are abundant in nature, vary markedly in virulence (the ability to cause harm to the host), and include the common cold, influenza, rabies, polio, measles, hepatitis C/E, West Nile virus, and the Ebola virus. RNA-viruses are distinguished by the polarity of their strands and include negative-sense, positive-sense, and ambi-sense genomes. Only the positive-sense can be immediately translated by the host cell, whereas the negative-sense and ambi-sense polarities require initial activation by an RNA-dependent RNA polymerase (RdRp) to be converted to positive-sense, that is, by transcription. Coronaviruses are positive-sense, single-stranded RNA-viruses.

RNA-viruses are highly infective, often virulent, and possess very high mutation rates relative to other viruses, up to a million times that of their hosts, which may present difficulty in discovering effective treatments such as specific and effective antiviral drugs and vaccines. In contrast, however, retroviruses like the Human Immunodeficiency Viruses, HIV-1 and HIV-2, that cause Acquired Immune Deficiency Syndrome (AIDS) and several adenoviruses that cause gastroenteritis, conjunctivitis, and cystitis are effectively treatable diseases, despite the fact that some evade the human immune system.

Figure 1.3a Digital Illustration of a SARS-CoV-2 Virion (CDC PHIL)

Coronaviruses comprise the Orthocoronavirinae subfamily in the Coronaviridae family within the Nidovirales order of the Riboviria realm that consist of all RNA viruses and viroids that replicate by RNA-dependent RNA polymerases. These viruses are coronal, meaning they are enveloped in spiked, crown-like outer proteins, which allows them to remain intact in mucosal droplets in air and on surfaces for several days. They are some of the largest RNA-viruses, with genomes of 26.4 to 31.7 kilobases (kb) in length. Four genera distinguish the known coronaviruses that infect humans (H), non-human mammals (M), and birds (B), namely, Alpha-CoV (H/M), Beta-CoV (H/M), Gamma-CoV (M/B), and Delta-CoV (B), and include the seven known human coronaviruses: HCoV-229E, HCoV-NL63, HCoV-OC43, HCoV-HKU1, Middle East Respiratory Syndrome Coronavirus or MERS-CoV, SARS-CoV-1, and now SARS-CoV-2 (Figure 1.3a, Table 1.3). In particular, the coronaviruses HCoV-229E and HCoV-OC43 are responsible for 10-15% of infections of the common cold. The rest is caused by rhinoviruses (10-40%), influenza viruses (10-15%), parainfluenza viruses (20%), adenoviruses (5%), respiratory syncytial virus (RSV), ortho- and metapneumovirus, certain enteroviruses, and unknown sources, more than 200 types in total. Of the most lethal human coronaviruses causing severe respiratory illnesses, SARS-CoV-1 and MERS-CoV are zoonotic Betacoronaviruses.

Other coronaviruses are known to infect only non-human mammals, in particular, only bats and pigs, including Swine Acute Diarrhea Syndrome Coronavirus or SADS-CoV, which caused the death of approximately 24,000 piglets in Guangdong Province, China. SADS-CoV is thought to have been transmitted through the fecal matter of several species of horseshoe bats including Rhinolophus sinicus, Rhinolophus pusillus, Rhinolophus rex, and Rhinolophus affinus. However, despite the genetic proximity of pigs to humans, it is believed that SADS-CoV cannot infect humans.

Table 1.3: Human Orthocoronavirinae
Common Cold
Bats, Camels
Respiratory Illness
Bats, Civets
Severe Respiratory Illness
Bats, Camels
N-acetyl-9-O-acetylneuraminic acid
Common Cold
Mice, Cattle
N-acetyl-9-O-acetylneuraminic acid
Respiratory Illness
Mouse Hepatic Virus
Bats, Civets
Bats, Pangolins

Coronaviridae is unique among the families of enveloped viruses to cause gastroenteritis; all other families of viruses that cause gastroenteritis are non-enveloped. SARS-CoV-2 is similar to SARS-CoV-1, the coronavirus responsible for the SARS Pandemic of 2002-2004 or SARS-02, both structurally, in modality, and possibly in origin. Electron micrographs of samples taken from infected patients in Wuhan, China reveal that the SARS-CoV-2 virion is spherical and approximately 60-140 nm in diameter (Zhu, N. et al., 2020), with an envelope roughly 82-94 nm in diameter, which is ¼-⅓ of the length of the smallest wavelength of visible light and is, therefore, effectively invisible. The spiked crown-like outer proteins are 9-20 nm in length. The genome of the SARS-CoV-2 virion is 29.8-29.9 kb (Zhou et al., 2020). In general, coronaviruses genomes consist of 7 genes organized in non-structural, structural and non-essential accessory protein coding regions. The non-structural protein coding region comprises the replicase gene (~⅔ of the genome), whereas the structural and non-essential accessory protein coding regions comprise the other 6 genes. After translation and post-translational modifications of proteins encoded in the replicase gene, 16 non-structural proteins form the Double-Membrane Vesicles (DMV), which forms part of the Replicase-Transcriptase Complex (RTC) (Fehr and Perlman, 2015). The RTC contains the viral RdRp, which is an attractive drug target for some antivirals. Four structural proteins are recognized and illustrated in Figure 1.3b: S (spike, red), M (membrane, orange), E (envelope, yellow), and N (nucleocapsid protein or nucleoprotein, indigo). The S, M, and E proteins comprise the viral envelope, which surrounds the nucleocapsid that houses the viral RNA (violet). The M protein is a transmembrane protein that connects the viral membrane to the nucleocapsid, and its C-terminal domain makes contact with the N protein.

Figure 1.3b: Cross-Sectional Illustration of SARS-CoV-2 (Adapted from Encyclopedia Britannica)

Characteristics of SARS-CoV-2 Genome

The SARS-CoV-2 genome is 79.6% identical to the SARS-CoV-1 genome and 96.2% identical to the genome of the Bat coronavirus BatCoV RaTG13 found in the species Rhinolophus affinis in Yunnan Province in China (Zhou et al., 2020). Bats are purported to serve as a reservoir for many different types of coronaviruses, but an intermediate species is often needed for transmission to humans, such as civets with SARS-CoV-1 and camels with MERS-CoV. While pangolin coronavirus (Pangolin-CoV) identified in Malay pangolins share a lower percentage of identical genomic sequence (91.2%) with the SARS-CoV-2 genome (Zhang et al., 2020), the receptor binding domain of the genome shares 99% of its sequence with SARS-CoV-2 (Xiao, T., et al., 2020). In particular, the receptor binding site of the spike protein of Pangolin-CoV is identical to the analogous site in SARS-CoV-2, with the exception of only one amino acid residue (ibid.). Furthermore, the receptor to which the virus binds in pangolin species is closer in structure to human ACE2 receptors, the entry point for SARS-CoV-2 in humans. Taken together, this evidence strongly suggests that bat and pangolin coronaviruses are ancestral to SARS-CoV-2. Genomic sequences that code for the spike protein present in pangolin species may have been instrumental in the transition of the mutated or recombined version of the virus to infecting human cells.

The SARS-CoV-2 genome consists of a total of 11 expressed genes with 11 Open Reading Frames (ORFs), namely, ORF1a, ORF1ab, ORF2, ORF3a, ORF4, ORF5, ORF6, ORF7a, ORF7b, ORF8, ORF9, and ORF10. The spike protein, membrane protein, nucleocapsid protein, and the envelope protein, which are the four main structural proteins, are encoded by the ORF2, ORF5, ORF9, and the ORF4 genes, respectively. A table of these genes, the proteins that they encode, the number of amino acids in each protein, and the gene locations as found in the SARS-CoV-2 Wuhan-Hu-1 isolate (NCBI reference sequence: NC_045512.2) are given in Table 1.4.

The SARS-CoV-2 genome is notable because it contains one of the lowest proportions of CpG sites in its genome of any Betacoronavirus yet identified (Xia, X. et al., 2020). CpG sites serve as binding sites for Zinc finger activating protein (ZAP), a protein that activates other proteins to degrade viral RNA. Many RNA viruses have adapted over time by reducing the percentage of CpG sites they contain, thereby increasing their overall virulence. However, because ZAP expression is highly tissue specific, only viruses that target particular tissue layers where ZAP is more robustly expressed will show CpG deficiencies. Of 56 Betacoronavirus genomes tested from Rhinolophus bats, only BatCoV RaTG13, which shares 96% homology with SARS-CoV-2, demonstrated an extreme CpG deficiency (Xia et al., 2020). The authors suggest that SARS-CoV-2 and BatCoV RaTG13 evolved from mammalian tissues with high ZAP expression. The species in which the viruses evolved was not likely a bat, since other bat Betacoronaviruses do not show this same deficiency. In fact, of the 927 Betacoronavirus genomes the authors searched, no other genomes demonstrated similarly low CpG percentages found in SARS-CoV-2 and BatCoV RaTG13. However, the authors identified some highly infectious canine Alphacoronaviruses that infect the intestinal and respiratory tract that share a similar level of CpG deficiency.

One of the major distinguishing features between the SARS-CoV-1 and SARS-CoV-2 genomes is their ORF3b genes (open reading frame 3b). In particular, SARS-CoV-2 codes for a premature stop codon, which results in a truncated protein product when compared to its SARS-CoV-1 counterpart (22 amino acids in length versus 153 residues in SARS-CoV-1). Kopecky-Bromerg et al. (2007) report that the ORF3b protein of SARS-CoV-1 acts as a potent IFN-I antagonist, thereby inhibiting the activity of IFN-I activity, and so it is natural to ask how the truncated version of the ORF3b protein of SARS-CoV-2 affects type 1 interferon inhibition. Konno et al. (2020) report that the truncated SARS-CoV-2 ORF3b protein is more effective at suppressing the activation of type 1 interferon than the analogous SARS-CoV-1 protein. It is important to note that enhancement of type 1 interferon suppression is associated with a more severe COVID-19 clinical course. The authors also find that SARS-CoV-2 related viruses from bats and pangolins also encode for a similarly shortened version of the protein with enhanced type 1 interferon inhibition.

Characteristics of SARS-CoV-2 Proteins

The SARS-CoV-2 proteome is made up of several different types of proteins. The gene ORF1a encodes a large replicase polyprotein known as pp1a. During the translation of the ORF1a gene, a ribosomal frameshift to the adjacent ORF1b can occur, which will produce the larger of the two replicase polyproteins, pp1ab. Together, pp1a and pp1ab can undergo proteolysis to form 16 different non-structural proteins. Many of these non-structural proteins can recombine to form important replication machinery such as helicase and the RNA-replicase-transcriptase complex. The SARS-CoV-2 proteome also contains the four structural proteins characteristic of all coronaviruses: the Spike protein (S), the Envelope protein (E), the Membrane protein (M), and the Nucleocapsid protein (N) (see Characteristics of Coronaviruses). These four proteins are encoded by the ORF2, ORF4, ORF5, and ORF9 genes, respectively. Finally, there are also six known accessory proteins, namely, the ORF 3a, ORF6, ORF7a, ORF7b, ORF8a, and ORF8b. Table 1.4 lists the 11 expressed genes of the SARS-CoV-2 genome as found in the SARS-CoV-2 Wuhan-Hu-1 isolate, as well as their locations, the proteins they encode, and the size of these proteins. Table 1.5 lists the 16 non-structural proteins that are proteolytic products of the ORF1a and ORF1ab polyproteins. Figure 1.4 illustrates the portions of the SARS-CoV-2 genome that encode each of these proteins.

Figure 1.4 SARSCoV-2 Genome and Protein Map (Adapted from Gordon et al., 2020)

Table 1.4: Expressed Genes and Protein Products of SARS-CoV-2 Wuhan-Hu-1 Isolate

Nucleotide Location (from 5’ UTR)
Protein Expressed
Molecular Weight (Daltons)
Number of Amino Acids
ORF1ab Polyprotein (pp1ab)
ORF 1a Polyprotein (pp1a)
Spike Protein (S)
ORF3a Protein
Envelope Protein (E)
Membrane Protein (M)
ORF6 Protein
ORF7a Protein
ORF7b Protein
ORF8 Protein
Nucleocapsid Protein (N)
ORF10 Protein

Table 1.5: Non-Structural Proteins (NSPs) of SARS-CoV-2 Wuhan-Hu-1 Isolate (Adapted from Yoshimoto et al., 2020)
Molecular Weight (Daltons)
Number of Amino Acids
Possible Function
NSP1 / leader protein
Ribosomal protein leader
Binds to host PHB1 and PHB2 proteins
NSP3 / Papain like protease
Release NSPs 1,2,3
Membrane rearrangement
NSP5 / 3C-like proteinase
Cleaves at 11 sites of NSP polyprotein
Generates autophagosomes
Dimerizes with NSP8
Stimulates NSP12
May bind to helicase (NSP14)
May stimulate NSP16
NSP12 / RNA Dependent RNA Polymerase
Copies viral RNA; guanine methylation
NSP13 / helicase
Unwinds duplex RNA
NSP14 / 3’-5’ exonuclease
5’cap RNA
NSP15 / EndoRNAse
Degrade RNA to evade host defense
NSP16 / 2’-O-ribose methyltransferase
5’-cap RNA; adenine methylation

SARS-CoV-2 Spike (S) Protein

The SARS-CoV-2 spike protein shares 76% amino acid sequence homology with the SARS-CoV-1 protein (Grifoni et al.), demonstrating less amino acid conservation than that found in other portions of the SARS-CoV-2 proteome. All coronaviruses by definition encode the spike protein, which is responsible for recognizing and binding to the cell surface receptor of the host. The SARS-CoV-2 spike protein binds to the ACE2 (angiotensin-converting enzyme 2) receptor, which is expressed in the human heart, kidney, esophagus, bladder and lungs (Zou et al., 2020). Within the lungs, ACE2 is primarily expressed in the type 1 and type 2 alveolar epithelial cells, which are more common in the lower respiratory tract, and some reports suggest ACE2 is also expressed in ciliated bronchial epithelial cells. The SARS-CoV-2 spike protein binds with at least ten times greater affinity to the ACE2 receptor than the SARS-CoV-1 spike protein (Wrapp et al., 2020). The spike protein may also bind with high affinity to the CD147 receptor (with a dissociation constant of 185 nM), also known as basignin, which is an immunoglobulin that determines the antigen expressed in the Ok blood group system (Wang, K. et al., 2020). Using this receptor to mediate viral invasion, SARS-CoV-2 was able to successfully infect Vero E6 cells in vitro. The use of Meplazumab, an anti-CD147 antibody, significantly inhibited viral infection of the same cell line.

Figure 1.5a: SARS-CoV-2 Spike (S) Protein (Adapted from Walls et al., 2020)

Identifying the structure of a spike protein can provide key insights for drug and vaccine design. Cryo-electron microscopy has revealed the structure of the SARS-CoV-2 spike protein (Figure 1.5a) and its chemical binding affinities (Walls et al., 2020). The spike protein is composed of three heavily glycosylated protomers that have identical amino acid sequences but different three-dimensional conformation. Each monomer contains an N-terminal S2 domain and a C-terminal S1 domain, where many of the beta sheets of the protein are located (ibid.). The C-terminus also contains the receptor binding domain (RBD) of the spike protein, while the N-terminal S2 domain contains a fusion peptide that allows the membrane of the virion to bind to the host cell membrane. For the fusion peptide to be exposed, the spike protein must be cleaved by cell proteases, specifically the serine protease TMPRSS2. Each protomer contains a receptor binding domain, two of which are in a lower energy and more stable “down” conformation and one that is in an active “up” conformation. The receptor binding domain in the “up” confirmation is less energetically stable, which primes it for binding to the ACE2 protein. Hydrogen bonding between key amino acids of ACE2 and the RBD create strong attractive intermolecular forces between the molecules. Once a virion fuses to the host cell membrane, the virion can release its RNA into the intracellular space, thereby infecting the cell. Inhibitors of TMPRSS2 (such as Camostat Mesylate) provide possible future treatment options, and such protease inhibitors have shown promising results in cell culture and animal models (Hoffmann et al., 2020).

Coutard et al. (2020) identify a feature that is unique to SARS-CoV-2 that may distinguish it from other betacoronaviruses to which it shares extensive homology: the presence of a unique furin-like cleavage site in the spike protein located at the S1/S2 boundary. Furin is an enzyme, expressed abundantly in the lungs, that catalyzes the proteolysis of a precursor protein at a particular cleavage site, activating the protein functionally. In the SARS-CoV-2 spike protein, a furin-like protease cleaves the spike protein at the S1/S2 site, thereby enhancing the protein’s ability to bind to the cell membrane fusion site for intracellular entry. This site is notably absent in the SARS-CoV-1 spike protein (as well as the RaTG13 spike protein), and the feature is speculated to play a role in the enhanced infectivity of SARS-CoV-2. Figure 1.5b illustrates a single protein subunit of the homotrimer of the SARS-CoV-2 spike protein, C-terminus (violet), N-terminus (blue), central helix (orange), and the ACE2 binding domain (magenta).

SARS-CoV-2 Nucleocapsid (N) Protein

In contrast to the spike protein, the nucleocapsid (N) protein is highly conserved across all human beta-coronaviruses. In an effort to identify T and B cell epitopes of SARS-CoV-2, Grifoni et al. (2020) looked at areas of high amino acid sequence homology between sequences identified as part of the SARS-CoV-1 epitope and corresponding sequences from the SARS-CoV-2 proteome. Of the 10 nucleocapsid amino acid sequences compared, 8 had sequence homology over 85%, making the N protein the most conserved epitope identified, followed by the membrane protein epitope. When comparing genomic sequences using the NCBI databank, Kang et al. (2020) determined that the SARS-CoV-2 N protein encoding region shared 89.74% homology with the corresponding region in the SARS-CoV-1 genome.

The N protein serves several important functions in the life cycle of the SARS-CoV-2 virion. In addition to housing the viral RNA, it is known to play various pivotal roles during viral self-assembly (Chang et al. 2014). There is also evidence to suggest it may have the ability to modify the host cell metabolism and may also modify host-pathogen interaction, enabling the virus to evade host cell recognition. Perhaps most importantly, the N protein binds to the viral RNA at multiple sites, packaging the RNA into a helical nucleocapsid structure known as the ribonucleoprotein complex.

Figure 1.5b: ACE2 Binding Domain and Homotrimer with highlighted protein subunit of SARS-CoV-2 Spike (S) Protein. C-terminus (violet), N-terminus (blue), central helix (orange), and the ACE2 binding domain (magenta) (Adapted from Wikipedia)

Understanding the structural features of the SARS-CoV-2 N protein may provide insight into the design of therapeutics, as well as vaccines which can target known regions of the protein’s epitope. Nucleocapsid proteins of coronaviruses commonly include three conserved regions, which include the N-terminal RNA-binding domain, a C-terminal dimerization domain whose primary purpose is for oligomerization, and a central Serine-Arginine rich linker used for phosphorylation (Kang et al., 2020). Kang et al. (2020) determined the crystal structure of the N-terminal domain of the N protein using X-ray crystallography methods. They found that the N-terminal domain crystals pack in an orthorhombic form, with each crystal unit made up of four monomers. The monomers contain two loop regions adjacent to a β-sheet core that is composed of five antiparallel β-strands. The N-terminal domain contains regions enriched with basic and aromatic residues, forming what the authors characterize as the shape of a right hand, complete with an acidic wrist, basic palms, and basic fingers. Figure 1.6 illustrates these three areas on the N-terminal domain of the N protein. The figure also reveals the electrostatic surface potential of the protein: blue designates a positive charge potential and red a negative charge potential. The authors go on to compare the structure of the N-terminus to that found on SARS-CoV-1, MERS-CoV, and HCoV-OC443, finding that the basic palm region contains the highest frequency of conserved residues. While many regions are well conserved, the surface charge distributions on the respective N protein N-terminus differ dramatically, largely due to the relative positioning of the beta sheets located in the core. In particular, a protruding hairpin between two β-strands is less extended in SARS-CoV-2, creating a loosened N-terminal tail.

Figure 1.6: N-Terminal Domain of the SARS-CoV-2 Nucleocapsid Protein (Adapted from Kang et al., 2020)

SARS-CoV-2 Envelope (E) and Membrane (M) Proteins

The envelope of SARS-CoV-2 is a highly conserved, small protein made up of 75 amino acids, one less than the number found in the SARS-CoV-1 envelope protein. Overall, the two primary amino acid sequences share 94.7% homology. The envelope protein is found in all coronaviruses and has been shown to have the potential to oligomerize and form ion channels in the lipid membrane surrounding the nucleocapsid. It also may play roles in several stages of the viral replication cycle, specifically in viral assembly and virion release (Yoshimoto, 2020). The SARS-CoV-2 membrane (M) protein is made up of 222 amino acids, one more than the number found in the corresponding SARS-CoV-1 protein. Overall, the SARS-CoV-1 and SARS-CoV-2 M proteins share 90.5% sequence homology. Like the envelope protein, the membrane protein is an integral membrane protein that may play a role in viral assembly (Yoshimoto, 2020). Tsoi et al. (2014) have previously shown that the SARS-CoV-1 M protein can induce host cell apoptosis, and so it is likely that the SARS-CoV-2 protein may have similar action. The M protein is also significant because it interacts with the nucleocapsid protein, together forming a capsule around the viral RNA.

SARS-CoV-2 Non-Structural Proteins

There is some striking structural similarity between the SARS-CoV-1 and SARS-CoV-2 proteomes, particularly in amino acid sequences encoded from highly conserved Open Reading Frames (ORFs) found in their respective genomes. Of particular note are amino acid sequences from seven conserved replicase domains from ORF1ab polyprotein (pp1ab), which have been shown to be 94.4% identical (Zhou, P. et al., 2020).

The 16 non-structural proteins are all proteolytic products of either the pp1a or pp1ab polyproteins. They are believed to serve a variety of functions which are briefly outlined in Table 1.5. Some of the possible functions can be inferred from the known activity of the corresponding protein found in SARS-CoV-1, particularly when there is substantial homology between the sequences. For example, the NSP2 protein of SARS-CoV-1, which shares 68.3% of the same amino acid sequence as the SARS-CoV-2 NSP2 protein, is known to bind to two host proteins: prohibitin 1 and prohibitin 2 (Cornillez-Ty et al., 2009). Since PHB1 and PHB2 are involved in host mitochondrial biogenesis and intracellular signalling, it is believed that NSP2 may dysregulate the host cell function. For this reason, it is speculated that the corresponding protein in SARS-CoV-2 likely shares a similar function.

The SARS-CoV-2 NSP3 protein or Papain-like protease is approximately 217 kDa and is thus the largest protein encoded by SARS-CoV2 and shares 76.0% amino acid sequence identity with the NSP protein of SARS-CoV-1. The protein contains many highly conserved regions, which include the ssRNA binding domain, the ADPr binding domain, the G-quadruplex binding domain, the protease domain, the NSP4 binding domain, and a transmembrane domain (Yoshimoto, 2020). The protease domain of Papain-like proteases of other coronaviruses are known to release NSP1, NSP2, and NSP3 from the N-terminal regions of the two precursor polyproteins, making the inhibition of NSP3’s protease activity a target for future antiviral therapies.

ACE2 Receptor

The Angiotensin-converting Enzyme 2 (ACE2) receptor has been identified as the cellular entrypoint for SARS-CoV-2 (Zhou et al., 2020), as well as for other human coronaviruses such as SARS-CoV-1 and HCov-NL65. After cleavage by a serine protease, the SARS-CoV-2 spike protein can bind with high affinity to ACE2 and enter the cell. The receptor binding domain (RBD), a short sequence of amino acids located near the C-terminus of the spike protein, is the region of the protein that binds to the ACE2 receptor through strong attractive intermolecular forces (see Figure 1.7). After binding to the receptor, the spike protein purportedly can take ACE2 with it, as it does with SARS-CoV-1 and the SARS-02 infection (Wang, H. et al., 2008), where it can be intracellularly degraded by lysosomes. Note that SARS-CoV-1 is thought to cause severe lung failure by binding to ACE2 and causing its downregulation (Kuba et al., 2005).

ACE2 is expressed abundantly in the cell membranes of type I and type II pneumocyte cells, the epithelial cells that line the alveolus, a sac-like structure of the lung where gas exchange with the blood capillaries occurs. ACE2 and TMPRSS2, the serine protease that cleaves the N-terminal S2 subunit of the spike protein that exposes the fusion peptide necessary to bind to ACE2, are co-expressed most abundantly in the type II pneumocytes, absorptive intestinal cells, and nasal goblet secretory cells (Ziegler et al., 2020). Furthermore, Ziegler et al. report that interferon may stimulate the upregulation of ACE2 in human epithelial cells. Interferon is typically expressed to enhance antiviral activity during the initial stages of viral infection. However, the interferon-driven enrichment of ACE2 expression may provide coronaviruses more opportunities for successful intracellular transport and infection.

ACE2 is perhaps most well-known for the role it plays in the Renin-Angiotensin system (RAS). This system regulates many physiological processes, such as blood pressure and fluid regulation in the human body. The system is initiated by the hormone angiotensinogen, which is secreted by the liver. The hormone can be converted to angiotensin I by renin, an enzyme secreted by the kidneys when blood flow to the kidneys is reduced. In the lung, angiotensin I is converted to angiotensin II by ACE (note that ACE is different from ACE2). Angiotensin II is a powerful vasoconstrictor (a substance that constricts blood vessels), and it also stimulates the adrenal glands to produce aldosterone, which in turn reduces potassium and increases sodium concentration in the body. Furthermore, angiotensin II stimulates the central nervous system to produce vasopressin. Combined, these effects increase blood pressure and water retention.

The concentration of angiotensin II determines the action of ACE2 in RAS. When angiotensin II levels are low, ACE2 will bind to angiotensin receptor 1 and will cleave angiotensin II to produce angiotensin (1-7). Angiotensin (1-7) has both a vasodilative and anti-inflammatory effect. When angiotensin II levels are high, angiotensin receptor 1 separates from ACE2. Angiotensin receptor 1 then interacts with angiotensin II, leading to vasoconstriction, increased blood pressure, and increased pulmonary permeability, which can contribute to Acute Respiratory Distress Syndrome (ARDS).

Figure 1.7: SARS-CoV-2 Receptor Binding Domain (crimson) bound to ACE2 (green) (Adapted from Lan, J, et al ., 2020)

ACE2 may have a powerful pulmonary protective effect in humans. Imai et al. (2005) bred ACE2 knockout mice (mice that are deficient for the genes that code for ACE2) and compared them to wildtype mice (regular mice expressing ACE2) after subjecting both groups to acute pulmonary injury by acid aspiration and viral pneumonia induced by SARS-CoV-1. Acid aspiration in mice models acute lung injury in humans, leading to pulmonary edema, increased inflammation, and lowered blood oxygenation. The knockouts showed markedly increased stiffness in the lung tissue when compared to their wildtype counterparts. The knockout mice also showed higher levels of angiotensin II and angiotensin 1 receptor (since there was no ACE2 to which it could bind), which was associated with increased lung edema and worsened outcomes with viral pneumonia. When the knockouts were injected with a recombinant human ACE2 protein, lung stiffness was reduced and overall condition significantly improved. Penninger’s group later used this research to develop APN01, a soluble ACE2 drug intended to treat SARS-02 produced by APEIRON, and it is now in a clinical trial for use in treating COVID-19. The mechanism behind the action is simple: by overwhelming the spike protein of the virus with soluble ACE2, the virus’s ability to bind to the ACE2 receptors in the cellular membrane is drastically reduced.

In 2005, another team also led by Penninger was able to show that SARS-CoV-1 infection and the virus’s spike protein significantly reduced levels of ACE2 in wildtype mice infected with the virus. Furthermore, ACE2 knockout mice showed markedly worsened conditions and decreased recovery rates from infection. Lung injury caused by the virus was also mitigated when the RAS pathway was blocked (Kuba et al., 2005). These results point to the important protective role that ACE2 may play during lung injury. ACE2 may be protective against the inflammatory mechanisms in COVID-19 that lead to ARDS. A mouse model from the Baric group at UNC holds promise for being able to utilize a human ACE2 gene insertion transgenic animal to screen for new treatments (Dinnon et al., 2020).

Angiotensin receptor blockers, which include drugs like Losartan and Telmisartan, are used clinically to treat high blood pressure. They do so by keeping angiotensin receptor 1 bound to ACE2, leaving ACE2 to continue to catalyze the conversion of angiotensin II into angiotensin (1-7). The use of ARBs has been tied to the upregulation of ACE2 (Li XC et al., 2017), which leads to an increase in the production of ACE2 in the body. This has led some researchers to hypothesize that the use of ARBs may contribute to worsened COVID-19 outcome, since ACE2 is the receptor for the virus that causes the disease (Fang et al., 2020). However, Kuba et al. (2005) show that ACE2 upregulation may lead to improved conditions in both SARS-type infections and in acute lung injury. Furthermore, increased soluble ACE2 is already an effective therapy in the treatments of ARDS.

Replication Cycle

It is believed that SARS-CoV-2 can enter the cell in one of at least two ways, either through an endocytic pathway or by fusing directly to the plasma membrane. Both pathways are mediated by the ACE2 receptor, although Wang, K., et al. report that infection may also be mediated by the CD-147 receptor, which the spike protein must bind to first. In order to be shuttled into the cell via endosomes, the virus’s spike protein must be activated by cathepsin L (Ou et al., 2020). Cathepsin L is a cysteine protease that catalyzes the cleavage of the spike protein, a process that begins through the deprotonation of a thiol group from an adjacent basic side chain, such as an imidazole group from a histidine residue. Ou et al. (2020) were able to successfully decrease intracellular entry of SARS-CoV-2 into 293/hACE2 kidney cells by 99% through incubating the cells in 20 mM ammonium chloride or 100 nM bafilomycin A, two known inhibitors of cathepsin L. This result shows that in this particular kidney cell line, endocytosis was the primary means of cell entry for SARS-CoV-2. Endocytosis occurs through the invagination of the cell membrane, which surrounds the virus. A portion of the membrane will pinch off from the rest, bringing its fully encapsulated contents into the intracellular space.

The other pathway in which SARS-CoV-2 gains cellular entry is through direct fusion of the viral membrane to the cell membrane. In order to do this, the spike protein must be activated and cleaved by the TMPRSS2 serine protease (Hoffmann et al., 2020). Previous studies have shown that other human coronaviruses, such as HCoV-229E, HCoV-OC43, and HCoV-HKU1, prefer direct fusion with the cell membrane over endocytosis when infecting human airway epithelial cells (Shirato et al., 2017). Furthemore, direct fusion with the membrane enables the virus to evade host cell antiviral immunity, thereby allowing for enhanced SARS-CoV-2 replication (ibid.), whereas endocytic entry activates host cell immunity more pronouncedly, diminishing viral replication. SARS-CoV-2 likely has a similar preference for direct fusion in the pneumocytes of the respiratory tract, possibly explaining its enhanced replication in these cells. Camostat mesylate is a serine protease inhibitor that acts to inhibit TMPRSS2, making it a potential candidate for treatment of COVID-19. Hoffman et al. (2020) have reported that this protease inhibitor successfully reduces SARS-CoV-2 infection in Calu-3 lung cells.

Once SARS-CoV-2 translocates into the cell, its envelope and nucleocapsid proteins are uncoated, which releases the positive-sense viral RNA into the cytoplasm. The RNA can be immediately translated by the host cell, resulting in the production of two replicase polyproteins known as pp1a and pp1ab. The ORF1a domain of the SARS-CoV-2 genome translates directly into the pp1a polyprotein. However, because of a ribosomal frameshift site between ORF1a and ORF1b, during translation of ORF1a, a frameshift to the adjacent open reading frame (ORF1b) can occur, resulting in the production of pp1ab. These polyproteins undergo proteolysis in the cell, resulting in smaller proteins that can recombine to form helicase or the RNA replicase-transcriptase complex (RdRp). The RdRp (RNA dependent RNA polymerase) is especially essential, as it allows the virus to replicate. Proteases such as Papain-like protease and 3C-like protease catalyze the proteolytic process that lead to the production of the RNA replicase-transcriptase complex and may be potential targets for inhibition in future drug therapies (Zhavonronkov et al., 2020).

The viral RNA can then be replicated using the newly formed RdRp, forming new antisense RNA strands in the host cell cytoplasm. Due to its polarity, the antisense RNA cannot be translated in the same way that the initial positive-sense RNA that first entered the cell can be. However, the antisense RNA can be replicated back into positive-sense RNA, where it can be repackaged as virion in the viral offspring. The antisense RNA can also be transcribed via discontinuous transcription, which can result in various lengths of mRNAs transcripts (known as subgenomic mRNAs), all of which can be translated into different proteins, such as the structural protein components that make up the viral envelope. These structural proteins, which include new spike, envelope, and membrane, and nucleocapsid proteins, are synthesized on the surface of the rough endoplasmic reticulum of the host cell. These proteins, along with the newly synthesized positive-sense RNA, are shuttled to the Golgi apparatus, where they are put into vesicles and packaged as new, fully functional viral offspring. The vesicles can fuse to the membrane, allowing the virus to be released extracellularly. In turn, the newly formed viral offspring can attach to new cells and initiate the replication cycle again. In infected Vero E6 cells, SARS-CoV-1 and SARS-CoV-2 replicate at similar rates, peaking at 48 hours after post infection (Lokugamage, 2020). Figure 1.8 illustrates the sequence of events that take place during SARS-CoV-2 replication, from its initial transport into the cell, to its subsequence replication, translation, assembly, and exocytosis from the cell.

A combination platform that includes structure-assisted drug design, as well as virtual and high throughput drug screening (using FRET assays and a drug library of over 10,000 compounds), has revealed the crystalline structure of the main protease (Mpro), which is implicated in viral replication and transcription, and has identified several candidate drugs that have potential to deactivate or target this protease (Jin et al., 2020). Based on the structure of the protease, two drugs, Ebselen and N3, showed great promise at inhibiting the protease and thereby the virus’s replication. Furthermore, the organoselenium compound, Ebselen, shows extremely low cytotoxicity and anti-inflammatory, antioxidant, and cytoprotective properties from studies of its action in other diseases.

Two papers from May 2020 have raised additional issues to the matter of proteolytic activation of the spike protein and entry of SARS-CoV-2 into target cells. Jaimes et al. (2020) have explored the cleavage of the S1/S2 site by furin as well as a variety of other proteolytic enzymes. They used a biochemical peptide cleavage assay in which relevant peptides from the S1/S2 region are coupled to a Mca/Dnp FRET pair so that fluorescence is observed only after peptide cleavage. The peptide sequences studied were HTVSLLR|STSQ corresponding to SARS-CoV S1/S2 region and TNSPRRAR|SVA corresponding to the S1/S2 region of SARS-CoV-2. They investigated the ability of a variety of proteases including furin and PC1, trypsin, the type II transmembrane serine protease (TTSP) matriptase, and cathepsins B and L to cleave these peptides at the RS site. As hypothesized, furin cleaved the SARS-CoV-2 peptide but not the SARS-CoV peptide. Other proteases with the exception of cathepsin L were also more active at cleaving the SARS-CoV-2 peptide. The authors speculated that the insertion of the highly basic peptide PRRA prior to the RS cleavage site as well as the leading proline may have increased the accessibility of this sequence for proteolytic cleavage. They proposed that while furin cleavage may be the dominant mode of spike activation, other enzymes such as TMPRSS2 and matriptase may also be involved. The authors acknowledge a key limitation of the study, that it was conducted on model peptides mimicking the S1/S2 cleavage site and that the conformation of this, in addition to its susceptibility to cleavage by a variety of the studied proteases, may be quite different in the intact full-length protein. These results are suggestive and await further studies of cleavage of the S1/S2 site in intact full-length proteins.

Figure 1.8: SARS-CoV-2 Replication Cycle (Adapted from Shereen et al., 2020)

A different approach has been taken by Anand et al. (2020) who have published in eLife on May 26, 2020 a study in which he and researchers at nference, Inc., an artificial intelligence company, sought to investigate other instances in which the cleavage site in SARS-CoV-2 (RRAR|SVAS) might be identified in other proteins. The sequence of PRRA upstream from the cleavage site is not seen in other coronaviruses but mimics a site found exclusively in the human epithelial channel α-subunit (ENaC-α). They also found that on review of single-cell RNA-seq data from 65 studies that there was marked overlap between expression of ENaC-α and the viral receptor ACE2, particularly in tissues such as heart, lung, and kidney that bear much of the brunt of damage in severe COVID-19. They postulate that the spike protein may, by its mimicry of the ENaC-α furin cleavable site, result in reduction of tissue levels of ENaC-α leading to dysregulation of fluid balance in these tissues. This is a mechanism quite distinct from the inflammatory changes typically implicated in pulmonary damage due to COVID-19. This theoretical study provides no data that in fact there are alterations in ENaC-α levels or activity or abnormalities in airway surface liquid homeostasis seen in tissues undergoing such damage (Olena, 2020).

The aforementioned studies both require validation of their predictions in more relevant cell systems before they can be explored as candidates for antiviral therapy. The use of peptidomimetic inhibitors of furin and other proteases involved in spike activation is appealing, but may be limited by effects on other relevant cellular processes (Hoffmann, 2020).

As we continue to define the cell biology of coronaviruses like SARS-CoV-2 using higher resolution methods including electron microscopy, we determine additional vulnerabilities in the viral replication machinery. One such study has found a molecular pore complex in the membrane of the virus through which the transfer of viral RNA into the host cytosol may occur, allowing viral RNA to be eventually packaged into new exosome-like vesicles (Candelario and Steindler, 2014) and virions (Wolff G et al., 2020). This mechanism of replication, now better understood because of deep cell and molecular biological analysis, represents another target for replication-mitigating drug design.

Interaction with the Immune System

Since SARS-CoV-1 has previously been shown to downregulate type I interferon production and response in its host (Kopecky-Bromberg et al., 2007), thereby diminishing the innate immune response of the host, questions surrounding the ability of SARS-CoV-2 to do the same naturally arise. The innate immune system initiates its attack on a viral pathogen when PRRs (pattern recognition receptors) in a sentinel cell, usually a macrophage, detect characteristic viral motifs known as pathogen-associated molecular patterns (PAMPs). This detection leads to changes in gene expression in the sentinel cell, which initiates a release of cytokines and chemokines that recruit other leukocytes (white blood cells) to the site of infection. These leukocytes include neutrophils that can attack the infected cells directly, without the use of antibodies. The innate immune system is also initiated when the infected host cell begins to secrete Type I interferons (e.g. IFN-ɑ and IFN-β). Type I interferons are proteins that interfere with viral replication and recruit natural killers (NK) cells to the site of infection. The NK cells kill infected cells by releasing granules that contain apoptotic inducing granzymes, which initiate apoptosis (cell death), and perforin, which forms pores that perforate the cell membrane. Type I interferons can initiate a cascade of signalling proteins through its receptor, which in turn activates STAT1 (Signal Transducers and Activators of Transcription 1), STAT2, and other transcription factors through phosphorylation. These activated transcription factors can then upregulate IFN-stimulated genes (ISGs) by increasing the rate of their transcription and subsequent translation, producing proteins that will enhance the host’s antiviral response.

Lokugamage et al. (2020) reported that when Vero E6 cells were pretreated with IFN-α 18 hours before infection with either SARS-CoV-1 or SARS-CoV-2, the amount of virus detected 48 hours after infection differed substantially between the two groups. While pre-treated SARS-CoV-1 infected cells had almost the same viral titer as SARS-CoV-1 infected cells that were not pre-treated with interferon, the pre-treated SARS-CoV-2 infected cells showed significant reduction in viral titer (a 3-log to 4-log drop) compared to their untreated counterparts. Furthermore, the authors reported that production of the nucleocapsid protein of the virus was significantly reduced for SARS-CoV-2 infected cells that received IFN-α pre-treatment. These results suggest that SARS-CoV-2 is much more vulnerable to the antiviral effects of IFN-α. SARS-CoV-2 cells that received interferon-alpha pretreatment were also the only cells that showed any STAT1 phosphorylation and that demonstrated enhanced production of ISG proteins. These data suggest that SARS-CoV-2 is not as effective as SARS-CoV-1 at modulating the response of a type I interferon. On the other hand, it should be noted that SARS-CoV-1 is remarkably adept at regulating the type I interferon response, as pre-treatment with interferon led to no detectable STAT1 phosphorylation, just as in the infected cells that were not treated with interferon.

Among the protein products that result from the cleavage of precursor proteins 1a and 1ab are a set of functional proteins that include nonstructural protein 1 (Nsp1). Nsp1 isolated from various alpha and beta coronaviruses have consistently demonstrated the ability to effectively suppress host gene expression by inhibiting protein translation in the host cell. They do so by binding to the small ribosomal subunit (40S), thereby impeding mRNA translation. When Nsp1 binds to the ribosome, the host mRNA transcript that would have been translated is cleaved and later degraded. Thus, Nsp1 has the potential to inhibit any antiviral response that is dependent on the expression of host cell factors, such as the interferon response. Thoms et al. (2020) demonstrate that Nsp1 from SARS-CoV-2 can bind to the 40S but not to the 60S ribosomal subunit. The binding resulted in the loss of capped mRNA translation in cells and in vitro studies. The researchers used cryoelectron microscopy to show that the Nsp1 C-terminus is the region that tightly binds to and thereby obstructs the mRNA entry channel. By doing so, it effectively shuts down the type I-interferon pathway by inhibiting the translation of IL-8 and IFN-β mRNA transcripts. These transcripts are upregulated when RIG-I, a cytosolic PRR of the innate immune system, initiates signaling upon recognition of a coronavirus. The authors note that Nsp1 may be an important therapeutic target, since eliminating its binding to the 40S subunit may enhance host cell immunity, which could result in the reduction of viral load in the earliest rounds of viral replication.

SARS-CoV-1 and SARS-CoV-2 share a high degree of homology in their genomic sequences. In an effort to identify specific protein products that may differentiate the viruses’ sensitivity to IFN, Lokugamage et al. (2020) identified gene domains that encode protein products related to IFN antagonism activity in SARS-CoV-1. They prioritized regions that differed more substantially in nucleotide sequence in the respective portions of the SARS-CoV-2 genome. Of particular note is SARS-CoV-1 ORF3b, which encodes a 154 amino acid protein that Kopecky-Bromberg et al. (2007) previously found could antagonize IFN response by reducing the phosphorylation and activation of IRF-3 (a transcription factor that can upregulate the transcription of interferon). In SARS-CoV-2, the corresponding ORF3b region contains a premature stop codon, which truncates the resulting protein into a 24-amino acid fragment. Konno et al. (2020) report that the truncated SARS-CoV-2 ORF3b protein is more effective at suppressing the activation of type 1 interferon than the analogous SARS-CoV-1 protein. Lokugamage et al. (2020) also note that SARS-CoV-1 ORF6 encodes a protein that Kopecky-Bromberg et al. (2007) previously showed is a powerful IFN antagonist, through inhibiting the translocation of transcription factors like STAT1 through the nuclear membrane. The SARS-CoV-2 ORF6 only shares 69% of the same nucleotide sequence with SARS-CoV-1, resulting in a truncated protein product, which may also dampen the IFN antagonistic effect in SARS-CoV-2.

In vitro studies have demonstrated that SARS-CoV-2 can infect two T lymphocyte cell lines (MT-2 and A3.01) with very low ACE2 expression, and that these cells were more vulnerable to SARS-CoV-2 infection than they were to SARS-CoV-1 infection (Wang et al., 2020). The results demonstrate that SARS-CoV-2 spike protein can mediate potent infectivity even in cells with low ACE2 expression. This data suggests that the virus can infect cells through mediation by a different receptor, such as the CD-147 receptor. Furthermore, the researchers showed that SARS-CoV-2 was able to infect MT-2 cells through both direct fusion to the cell membrane and through an endocytic pathway. In contrast, however, SARS-CoV-1 showed no evidence of being able to enter the cells through direct fusion to the membrane, which may explain its diminished infectivity. Despite being able to infect T lymphocyte cells, SARS-CoV-2 was not able to efficiently replicate in these cells.

Blanco-Melo et al. (2020) arrive at similar conclusions and propose that SARS-CoV-2 regulates the host immune response in an imbalanced fashion. Firstly, infection with the virus induces low IFN-I and IFN-III levels, which is mediated by a suppressed interferon stimulating gene (ISG) response. This characteristic was confirmed in post-mortem lung samples from COVID-19 patients. On the other hand, the authors show that the virus induces a strong proinflammatory response through the induction of chemokines, such as IL-6(interleukin-6), CCL5, CCL8, and CCL11, which were detected at robust levels in the tissue samples. More specifically, SARS-CoV-2 induces high levels of the chemokines CCL20, CXCL1, IL-1B, IL-6, CXCL3, CXCL5, CXCL6, CXCL2, CXCL16, and the cytokine TNF (tumor necrosis factor). When ferrets were infected with SARS-CoV-2, the authors observed the induction of the chemokine response by Day 3 of infection, the time when viral load peaked. By Day 7, while virus levels were diminishing, cytokine levels continued to grow, including levels of CCL2, CCL8, CXCL9, as well as others. This same pattern was not observed when the ferrets were infected with influenza A virus.

There is a great deal of interest in how SARS-CoV-2 affects immune responses and function in the disease, especially with regard to normal activation and control of potential cytokine storms that result from over-activation. New insight has been gained from studying the cellular and molecular biology of clonogenic cells, including B cell genesis that takes place in the germinal centers of lymph nodes (Mesin et al., 2020) and that contributes to an effective antibody response to particular coronavirus antigens. From studies of postmortem lymph nodes and spleens in acute SARS-CoV-2-infected specimens (Kaneko et al., 2020), the fine line between an effective immune response to the virus versus the generation of a cytokine storm seems at risk in COVID-19. Kaneko et al. (2020) find that germinal centers are absent and exhibit a profound reduction in Bcl-6+ germinal center B cells. This finding offers insight into the dysregulated immune function characteristic of COVID-19, which include both cytokine storms and the potential for limited resilience in antibody responses.

Deep immune profiling in a SARS-CoV-2-infected patient population has revealed three immunotypes related to different patterns of lymphocyte responses (Mathew et al., 2020). These different responses may relate to recently described differences in patient myeloid signatures (Mann et al., 2020) that should aid in the stratification of patients and their immune responses to the coronavirus.


Natural Reservoirs

Preliminary genetic evidence suggests that SARS-CoV-2 is zoonotic and indigenous to the horseshoe bat Rhinolophus affinis. It is currently suspected that SARS-CoV-2 migrated from horseshoe bats to an intermediate host, the Chinese pangolin (Manis pentadactyla) by blood-born means, and then migrated again, this time to a human host, at the Huanan Seafood Market in Wuhan, Hubei Province, China. In similar wet markets across China, live exotic animals as well as their uncooked flesh, meats, organs, bones, and pelts are legally sold for trade and consumption. Cages of such animals are often stacked vertically, so the lowest cages are subject to overhead biofluids including saliva, vomitus, urine, feces, blood, and other potentially infectious excretions.

Bats (order Chiroptera) are widely reported as a natural reservoir for a wide range of infectious viral illnesses that can also infect humans, including hantaviruses, rabies, henipaviruses, ebolaviruses, and coronaviruses, such as SARS-CoV-1 and SARS-CoV-2. An analysis of 754 mammalian species and 586 virus species, which included every virus then known to infect mammals, revealed that Chiroptera (followed by rodents and primates) harbor a significantly larger share of zoonotic viruses than all other animal orders (Olival et al., 2017). The largest viral diversity represented in bats are for the Flavivirus, Bunyavirus, and Rhabdovirus families, and RNA viruses are far more common than DNA viruses (ibid.). The authors of the analysis report that use of the animal in medical research and mammal sympatry (when two closely related species or populations live in close proximity to one another) were the two best predictors of viral richness, that is, a measure of the average number of viruses found in a reservoir species. However, despite identifying various predictors for viral richness, Chiroptera in particular demonstrated a significantly higher viral richness than could be predicted by mammal sympatry, proximity to human population, geographic range, and body size. These findings suggest that other features not captured by the analysis, such as immunological function, social behavior or other characteristics may be driving the large viral diversity represented in bats.

Bats’ unique capacity for flight, as well as their feeding and social behaviors, makes them exceptionally suited as natural reservoirs for viruses. It has been theorized that the evolution of flight in bats is closely tied to unique selective pressures on the evolution of their immune systems. Firstly, the capacity for flight is energetically cumbersome, which burdens mitochondria that release reactive oxygen species (ROS) in turn. ROS can damage DNA, which can lead to a host of pathologies not observed in bat populations. At least two distantly related species of bats (Pteropus alecto and Myotis davidii) have genes that code for protein products that enhance DNA repair (Zhang, G. et al., 2013). Foley et al. (2018) identified 14 genes in Myotis myotis that enhance DNA repair, and many other studies corroborate these results with similar findings. However, DNA damage is often indicative of viral infection, which in other species would result in an inflammatory response to the virus. In Chiroptera however, an inflammatory viral response is largely diminished. At least two bat species have no trace of PYHIN genes, a characteristic that is unique among mammals, which code for proteins that activate immunomodulators, enhancing inflammation, when they come in contact with foreign nucleic acids, such as viruses. Furthermore, bats have hollow bones without bone marrow and hence cannot produce B lymphocytes, further diminishing their immune response. The combination of enhanced DNA repair and downregulation of proteins associated with an inflammatory response have allowed for viruses to persist in bats for greater periods of time. In order to fight viral replication, bats have also evolved a higher baseline expression of interferons, which inhibit intracellular viral replication as soon as infection occurs, making them less vulnerable to expressing symptoms of viral infection (Zhou, P. et al., 2016). Since bats live in large communities, closely roosting together in large groups, it’s possible that viral transmission is rampant among bat populations, enhancing their ability to act as a natural reservoir for many viruses.

Bats are known reservoirs for coronaviruses that infect humans, including SARS-CoV-1. More recently, BatCoV RaTG13, a bat coronavirus identified in Rhinolophus affinis, has been shown to share 96.2% sequence homology with SARS-CoV-2, although a clear common ancestor to both has not yet been identified (Zhou et al., 2020). However, due to the viral richness of Chiroptera as an order, it is highly likely that a more direct ancestor to SARS-CoV-2 will be identified in a bat species. Furthermore, bats are the natural reservoir for future emerging zoonotic coronaviruses, which warrant further study. Valitutto et al. published results on April 9, 2020, reporting the discovery of six novel coronaviruses identified in bats captured in Myanmar. Bats were swabbed orally and rectally, and guano samples were collected and tested by RT-PCR. The researchers identified three novel alphacoronaviruses and three novel betacoronaviruses. The authors note that land use practices in Myanmar have put human populations in increased contact with local bat populations, which may increase risk of newly emergent zoonotic threats.

Early Human Cases

The first cluster of patients with viral pneumonia of unknown etiology was reported to the WHO China Country Office on December 31, 2019, and the Huanan Seafood Market was closed for immediate disinfection on January 1, 2020. Of the first 41-44 cases, at least 27 were traced to the aforementioned market, 11 were severely ill, and 33 were stable. All of the patients presented some difficulty breathing and several had invasive lesions in both lungs as seen in chest radiographs. According to the South China Morning Post, there were 9 patients aged 39 to 79 years old presenting flu-like symptoms detected even earlier in November 2019. In particular, a 55-year-old resident of Hubei Province, reported on November 17, 2019, may be the earliest known infected individual, but “patient zero” remains unknown.

Serological evidence by Cohen et al. (2020) suggests that the first COVID-19 infection outside of China in Europe may have occurred in France as early as middle to late December, 2019, at least a month before the initial reports of three cases of COVID-19 in France on January 24, 2020 or the first suspected human-to-human transmission in Germany occurring sometime during January 19-22, 2020 between a pair of Chinese and German colleagues. A 43-year old man from Bobigny, a town northeast of Paris, with symptoms consistent with a COVID-19 infection but no recent international travel was admitted to hospital on December 27, 2019, suggesting an infection as possibly as early as December 14, 2019. The patient’s spouse, likely an asymptomatic carrier, worked at a market near Charles de Gaulle International Airport, where travelers often frequent immediately upon arrival.

Ghinai et al. (2020) reports on an early COVID-19 patient, an Illinois resident in her 60s, who had recently traveled to China in mid-January 2020 and the first suspected human-to-human transmission to her husband hospitalized only eight days later from her return. Postmortem studies of patients in Santa Clara, CA who expired at home on February 6, 2020 and February 17, 2020 have confirmed the presence of SARS-CoV-2 and are consistent with infections in the U.S. as early as middle to late January 2020.

The first suspected COVID-19 infection in the United States was reported on January 12, 2020. Despite the ongoing epidemic in China, the WHO did not officially recognize COVID-19 as a pandemic until March 11, 2020. By February 27, 2020, the number of new cases outside of China first exceeded those inside China, thus marking its unofficial designation as a pandemic. This came only one day after COVID-19 first touched South America, with the first reported case appearing in Brazil, which gave COVID-19 worldwide coverage on all continents except Antarctica.

The Latham-Wilson Hypothesis

The Latham-Wilson Hypothesis on the origin of SARS-CoV-2 purports that the first documented animal-to-human and human-to-human transmissions originated in a Chinese coal mine outbreak in 2012 involving a previously unknown viral pneumonia, which was documented in a Chinese Master’s thesis. Latham and Wilson, who translated and studied the thesis, claim that the symptoms described in the work are consistent with those of COVID-19 and, therefore, postulate that COVID-19 may have preceded the Wuhan outbreak in November/December 2019 by several years. Without further evidence, this claim remains unsubstantiated.

Climate Factors of Disease Distribution

Certain regions have been more heavily impacted than others, and local environmental factors, such as temperature and humidity, may explain some of the observed variation. In a longitudinal study done across hundreds of counties in the U.S., researchers observed that lower humidity, specifically water density below 6 g of water vapor per 1 kg of air, was tied to higher influenza mortality even after controlling for temperature (Barrecca and Shimshack, 2012). Seasonal variations in transmission rates of influenza A during the 2009 H1N1 epidemic in the U.S. was also closely correlated with regional variation in absolute humidity (Lipsitch et al., 2011). This is likely because respiratory droplets travel further in drier environments, thereby increasing the chance of viral transmission, while higher humidity may decrease the reach of such droplets.

The regional variation in basic reproduction numbers for COVID-19 across Chinese provinces in the current outbreak appears more complicated, however, as preliminary research has not been able to establish as strong a relationship between humidity and regional transmission rates (Santillana et al., 2020). Regardless, the authors recommend further research into establishing the relationship between humidity and transmission rates of COVID-19. A more recent study, that has not yet been peer-reviewed, has concluded that higher regional average temperature (at temperatures over 1°C) is negatively associated with COVID-19 incidence (Cameron et al., 2020). Moreover, the survival time of SARS-CoV-1 on various surfaces has been shown to decline as temperature increases over 25°C and as relative humidity increases over 50% (Seto et al., 2011). Given that SARS-CoV-1 is a closely related virus transmitted through respiratory droplets, establishing further trends between temperature, humidity, and SARS-CoV-2 stability and transmission should be prioritized.

Undocumented Cases

Using a statistical analysis of reported infections of COVID-19 within China, Li et al. (2020) estimated that 86% of all infections in China were undocumented before the travel restrictions on the Wuhan Tiahne International Airport were implemented by the Chinese government on January 23, 2020. Furthermore, they estimate that while infection rates were likely lower for these undocumented individuals (estimated at 55% the transmission rate of documented cases), undocumented cases were the likely transmission vector in 79% of all documented cases during this early stage of the COVID-19 epidemic (Li R. et al., 2020). These results indicate that asymptomatic individuals with the infection or those individuals experiencing mild symptoms may contribute substantially to the transmission of the virus, further highlighting the importance of ubiquitous testing and self-isolation. Sewage samples taken from a Massachusetts wastewater treatment facility provide evidence that underreporting of true case count is prevalent in the region (Wu, F.Q. et al., 2020). Based on SARS-CoV-2 viral titers detected in sewage samples tested March 18-25, 2020, the authors estimate that roughly 5% of the fecal samples in the facility tested positive, which is substantially higher than the 0.026% estimated case rate for that time in the state. Asymptomatic cases as well as lack of ubiquitous testing may have contributed substantially to this problem. Because untested individuals who recover from the infection will never test positive for the virus using PCR techniques, since they do not test for the virus before the virus has cleared their system, the number of infections may be widely underreported. Therefore, the widespread use of serological assays testing for SARS-CoV-2 antibodies, which COVID-19 positive individuals will produce after the infection or in late stages of the infection, will be fundamental in tracking the spread of the disease.

Bendavid et al. (2020) sought to estimate COVID-19 seroprevalence (the level of pathogen in a population measured through blood serum) in Santa Clara County, CA from April 3-4, 2020. Using targeted Facebook advertising, the team recruited 3,300 subjects from the county and tested each individual with a lateral flow immunoassay. The unadjusted estimate for the proportion of the sample that tested positive for SARS-CoV-2 antibodies was 1.5%. However, the authors themselves report with 95% confidence that the false positive rate could be 0-1.2%, which casts substantial doubt concerning the validity of the sample estimate. Adjusting for a wide variety of factors, their analysis resulted in an estimate of between 48,000 and 81,000 people who had been infected with SARS-CoV-2 in Santa Clara County, CA. This estimate was 50-85 times the official number of reported cases by that time, which was about 1,000 cases. Streeck et al. published a report on April 9, 2020 where 500 residents of Gangelt, a small town in rural Germany, were tested by a combination of RT-PCR and immunoassay testing for seroprevalence. The authors found that 14% of the town had antibodies for the virus and 2% were actively infected with SARS-CoV-2. Based on the results of the study, the researchers estimated a case fatality rate of 0.37%, which was considerably lower than the reported value of 2% for Germany at the time. Because of the increasing number of suspected asymptomatic and undocumented cases, the current case fatality rates estimated by geographic location are likely widely overestimated.

Superspreaders and Seeding Events

SARS-CoV-2 superspreaders are individuals who possess a high degree of viral transmissibility and infect at least ten others. Certain clinical characteristics may make someone more prone to becoming a superspreader, including a higher degree of viral shedding or a longer contagious period. However, certain behaviors, such as attending crowded indoor events, are particularly impactful for increasing exposure risk to the susceptible population. Therefore, “seeding events” has emerged as a term used to define single events where clusters of cases originate, and “super-seeding events” are seeding events with clusters originating from exposure to one or more superspreaders.

Superspreaders contribute to a high degree of variability in the individual-level distribution (i.e. overdispersion) of secondary infections. While the basic reproduction number (R0) has been reported to be in the 1.9-8.9 range, a consistent finding is that 80-90% of infected individuals do not spread the disease, and so superspreaders may be to blame for the vast majority of cases. Using data from the number of reported COVID-19 cases from a WHO report published on February 27, 2020 and an R0 of 2-3, Endo et al. (2020) estimated the overdispersion parameter, k, to be approximately 0.1 (95% credible interval or CrI of 0.5-2.0 for R0 = 2.5). The authors interpret this estimate by stating that potentially 80% of secondary infections were caused by 10% of infected individuals.

Several studies have pointed to increased risk of outbreaks (i.e., large clusters) from seeding events of viral transmission in indoor environments. Qian et al. (2020) identified 318 outbreaks that gave rise to three or more COVID-19 cases tied to transmission from a single individual in China outside of Hubei Province. These 318 outbreaks gave rise to 1,245 confirmed cases in 120 cities between January 4, 2020 and February 11, 2020. Of these events, all 318 outbreaks occurred indoors, with the vast majority occurring at residences (79.9% of cases). At the time, residences became the primary location for quarantine, and severe lockdown restrictions were beginning to be implemented, so the increased incidence of transmission in residences was expected. Only 1.9% of the outbreaks involved 10 or more individuals, and the majority of these occurred in commercial venues including shops and outdoor food venues. The largest outbreak noted occurred in a Tianjin shopping mall and involved 21 cases. In a related study, Nishiura et al. (2020) identified 110 cases that were associated with 11 different clusters or sporadic cases in Japan and used contact tracing to identify transmission events. All traced transmission was tied to closed indoor environments, including fitness centers, hospitals, and shared eating environments. The authors estimate that transmission in a closed environment was 18.7 times more likely than in an open air environment.

Leclerc et al. (2020) performed a meta-analysis of the existing literature and media reports to find settings linked to SARS-CoV-2 transmission clusters or seeding events. Settings were defined as locations that resulted in secondary transmission of the virus.[4]The authors found evidence for clusters of cases linked to 152 events, of which 11 had more than 100 reported cases. These outbreaks were linked to transmission that occurred in hospitals, elderly care facilities, work dormitories, and ships. Four religious venues were also linked to clusters of over 100 cases. Five clusters of 50-100 cases were also identified in schools, sporting events, bars, shopping centers, and conferences. The authors report that worker dormitories had a notably higher rate of secondary transmission, with one worker dormitory in Singapore tied to 797 cases. The most common setting for a cluster of cases were residences, but all had clusters with fewer than ten individuals. Furthermore, the vast majority of clusters occurred in indoor venues.

Table 1.6a: Leading U.S. Outbreak Clusters of over 1,000 Confirmed COVID-19 Cases (September 27, 2020, Adapted from N.Y. Times)
City, State
% State Cases
Avenal State Prison
Avenal, Calif.
San Quentin State Prison
San Quentin, Calif.
Marion Correctional Institution
Marion, Ohio
Pickaway Correctional Institution
Scioto Township, Ohio
Columbia Correctional Institution
Lake City, Fla.
North County Jail
Castaic, Calif.
California Institution for Men
Chino, Calif.
Seagoville Federal Prison
Seagoville, Texas
Trousdale Turner Correctional Center
Hartsville, Tenn.
Ouachita River Unit Prison
Malvern, Ark.
Chuckawalla Valley State Prison
Blythe, Calif.
Folsom Prison
Represa, Calif.
South Central Correctional Facility
Clifton, Tenn.
Cook County Jail
Chicago, Ill.
California Rehabilitation Center Prison
Norco, Calif.
Cummins Unit Prison
Grady, Ark.
Bexar County Jail
San Antonio, Texas

Of particular concern for outbreaks in the U.S. are slaughterhouses and meat processing plants. On April 23, it was reported that the South Dakota Smithfield pork processing had 783 employees that had tested positive for SARS-CoV-2, which contributed to more than half the infections reported for the state at the time. When the first 600 cases were reported on April 16, 2020, it became the single largest coronavirus hotspot in the U.S., surpassing the USS Theodore Roosevelt and Cook County Jail in Chicago. Other meat processing plants in the U.S. have been associated with outbreaks, including the Tyson Foods’ largest pork processing plant in Waterloo, Iowa, where 1,031 of 2,800 employees had tested positive for the virus by May 8 (and more since), contributing to over 90% of the cases reported in Black Hawk County. On May 22, 2020, it was reported that a quarter of workers at Tyson Foods’ Wilkesboro, North Carolina poultry facility had tested positive for the virus (570 of 2,200 employees).

Table 1.6b: Leading States by Number of Long-Term Care Facilities and their Reported COVID-19 Cases and Related Deaths (September 27, 2020, Adapted from N.Y. Times)
% State
% State
New Jersey
New York
North Carolina
South Carolina

In late April, 2020 the CDC published a report on the prevalence of COVID-19 in meat and poultry processing facilities in 19 states in the U.S. Among 130,000 workers at 115 facilities reporting cases of COVID-19 in the CDC report, which is not exhaustive, 4,913 cases had been confirmed, just around 3.0% of all employees at these facilities (Dyal et al. 2020). The report also notes that the percentage of employees with confirmed cases at any one of these facilities ranged from 0.6% to 18.2%. Any facilities that reported more than 5% of COVID-19 positive staff were exclusively pork and/or beef processing facilities rather than poultry processing plants, which may be related to the fact that SARS-CoV-2 may be transmitted by the oral-fecal route through the vector of pigs. Meat processing plants have served as the setting for multiple COVID-19 outbreaks abroad as well, including notable ones in Canada, Brazil, Germany, Australia, Ireland, Spain, U.K., and France. Canada’s largest outbreak of COVID-19 was tied to the Cargill Meat Processing Plant in High River, Alberta, where 949 of 2,000 employees tested positive for the disease.

Table 1.6c: Leading States by Number of Colleges and Universities and their reported COVID-19 cases. (October 1, 2020, Adapted from N.Y. Times)
Reported Cases
% State Cases
Colleges / Universities
% State Institutions
South Carolina
North Carolina

Prison inmates, who represent 2.3 million individuals in the U.S. as of 2020, have also been widely reported as a vulnerable population to infection due to overcrowding, confinement in small spaces, and limited access to healthcare. Indeed, multiple prisons have served as COVID-19 hotspots in the U.S.. The Marion Correctional Institution in Ohio is especially notable when it became the epicenter of the largest cluster of cases reported in the U.S., when on April 18, 2020, it reported that nearly 2,000 inmates and staff had tested positive for COVID-19. Over 70% of the prisoner population had tested positive Other notable clusters include Cook County Jail in Illinois, where 812 COVID-19 cases had been reported by April 22, 2020. The Pickaway Correctional Institution in Ohio became the second largest hotspot in the U.S. when it reported 1,555 cases of COVID-19 among approximately 2,000 inmates. Table 1.6a lists the largest known institutional outbreaks of COVID-19 in the U.S. Table 1.6b lists the number of long-term care facilities by state along with COVID-19 cases and deaths counts and their corresponding provisional case fatality rates. Tables 1.6c and 1.6d list the leading outbreaks among state colleges and universities and the aforementioned long-term facilities, respectively.

Table 1.6d: Leading U.S. Outbreaks at Long-Term Care Facilities with over 250 Confirmed COVID-19 Cases (September 27, 2020, Adapted from N.Y. Times)
Long-Term Care Facility
City, State
Brighton Rehabilitation & Wellness Center
Beaver, PA
Bergen New Bridge Medical Center Nursing Home
Paramus, NJ
Fair Acres Geriatric Center
Lima, PA
Charlotte Hall Veterans Home
Charlotte Hall, MD
Gracedale Nursing Home
Nazareth, PA
Paramus Veterans Memorial Home
Paramus, NJ
Conestoga View Nursing and Rehabilitation
Lancaster, PA
New Jersey Veterans Memorial Home at Menlo Park
Edison, NJ
Lincoln Park Care Center Rehabilitation Facility
Lincoln Park, NJ
Spring Creek Rehabilitation & Health Care Center
Harrisburg, PA
FutureCare Lochearn Nursing Home
Baltimore, MD
Hammonton Center for Rehabilitation and Nursing
Hammonton, NJ
Deptford Center for Rehabilitation and Healthcare
Deptford, NJ
City View Multicare Center Nursing Home
Cicero, IL

By May 26, 2020, of the 403 known locations with a documented outbreak of at least 100 confirmed COVID-19 cases, 278 (68.98%) locations involving 33,181 (41.64%) cases were homes, care facilities, and rehabilitation clinics, 98 (24.32%) l institutions, and 27 (6.70%) involving 8,460 (10.62%) cases were factories, plants, and farms. While long term care facilities, in particular, accounted for only 11% of U.S. reported COVID-19 cases, they represent over 35% of U.S. reported COVID-19 deaths. By July 7, 2020, some 14,000 long-term care facilities accounted for over 296,000 reported COVID-19 cases and over 55,000 related COVID-19 deaths, which represented approximately 10% and 42% of the corresponding cases and deaths in the U.S. In particular, as of July 12, 2020, 1,716 locations nationwide (1,312 long-term care facilities) with outbreaks of at least 50 confirmed COVID-19 cases, totaled 221,226 (122,907) COVID-19 infections in the U.S.

Table 1.7: Prevalence of SARS-CoV-2 in 19 U.S. Homeless Shelters in Four Cities (Adapted from Mosites et al., 2020)
Testing Dates
Shelters Reporting >1 Case
3/30 - 4/8
31 (17%)
6 (17%)
4/2 - 4/3
147 (36%)
15 (30%)
San Francisco
4/4 - 4/15
95 (66%)
10 (16%)
Shelters Reporting 1 Case
3/27 - 4/15
10 (5%)
1 (1%)
Shelters Reporting 0 Cases
4/8 - 4/9
10 (4%)
1 (1%)
3/27 - 4/15
293 (25%)
33 (11%)

Another potential setting for outbreaks in the U.S. are homeless shelters, due to overcrowding and conditions that make social distancing difficult to implement. The CDC responded to clusters of SARS-CoV-2 reported in homeless shelters in Boston, MA, San Francisco, CA, and Seattle, WA, from March 27, 2020 to April 15, 2020. They also tracked the prevalence of SARS-CoV-2 in 12 other emergency housing facilities in Seattle where one case had been identified and also tracked the prevalence of the disease at two shelters in Atlanta, GA, where no cases had yet been reported. Their results are tracked in Table 1.7.

Mutations and Divergent Strains

Early Lineages

Early studies on the molecular divergence of the virus showed that there were two primary virus types: L-type and S-Type. The S-type virus is the older version, less prevalent in Wuhan, China, and patients with this variant reportedly had less severe symptoms. The S-type was more common in cases outside of Wuhan, China and became increasingly prevalent worldwide. In Wuhan, China, as many as 96% of cases were related to L-type (Tang, X. et al., 2020). These conclusions may be overstated, however. The SARS-CoV-2 L- and S-types differ by only 1 nonsynonymous mutation, but as of March 2, 2020, 111 nonsynonymous mutations have been identified in SARS-CoV-2 since the beginning of the outbreak (MacLean et al., 2020). It is likely that the differences in severity seen between the S- and L-types is due to epidemiology, not the small difference in genetic sequence (ibid.).

By March 28, 2020, there were 8 strains infecting humans worldwide. However, the most divergent of these strains differed by at most 3 nucleotides. The rate of individual nucleotide substitution for SARS-CoV-2 is therefore quite low, as it is for other coronaviruses, estimated at 8 × 10-4 substitutions per site per year, which is 2-4 times slower than that of influenza (Rembault, 2020). For a virus with as many as 30,000 nucleotides, this would correspond to an estimated 24 nucleotide substitutions per year. Since mutations do not necessarily give rise to the translation of different amino acids, because there is redundancy in the possible mRNA codons that code for a specific amino acid, not all mutations will result in a different protein coded by the virus (known as a nonsynonymous mutation). The radial and rectangular phylogenetic trees of SARS-CoV-2 from early December 2019 to June/September 2020 are given in Figures 1.9a and 1.9b, the latter showing the currently estimated mutation rate. A table of several viruses by type, genome size, and mutation rate (as substitutions per nucleotide site per cell infected and substitutions per nucleotide site per year) is given in Table 1.8.

Figure 1.9a.1: Radial Phylogenetic Tree of SARS-CoV-2 by Older Clade (A2a-A7, color-coded) as of mid-June 2020 (Adapted from

Figure 1.9a.2: Radial Phylogenetic Tree of SARS-CoV-2 by New Clade (19A-20C, color-coded) as of mid-September 2020 (Adapted from

Table 1.8: Several Viruses by Genome Size and Mutation Rates

(September 2020, Adapted from Sanjuan et al., 2010)

Genome Size [kb]
Rate [s/n/c]
Rate [s/n/y]
Bacteriophage Qβ
1.1 × 10-3
Human rhinovirus-14 (HRV-14)
0.1-4.8 × 10-4
8 × 10-4
0.8-2.38 × 10-3
Poliovirus-1 (PV-1)
0.22-3.0 ×10-4
Hepatitis C virus (HCV)
1.2 × 10-4
0.82 × 10-3
Human immunodeficiency virus (HIV)
1.0-4.9 × 10-5
1.7× 10-3
Influenza A
0.71-4.5 × 10-5
2.28 × 10-3
Human T-cell leukemia virus-1
1.6 × 10-5
Tobacco Mosaic Virus (TMV)
8.7 × 10-6
Influenza B
1.7 × 10-6
Herpes simplex virus-1 (HSV-1)
5.9 × 10-8

By April 8, 2020, Forster et al. had identified three central clusters of lineages of SARS-CoV-2 (A, B, and C), each carrying distinct nonsynonymous mutations, for the purposes of tracking global distribution of the virus. Their analysis of 160 SARS-CoV-2 genomes identified a central node A as most ancestral to other human strains. Two subclusters of A lineages differ by one synonymous mutation (T29095C), which changes a Thymine to a Cytosine at nucleotide site 29095. Both T-allele and C-allele subclusters were found predominantly in viral genomes isolated from Chinese and East Asian patients. However, about half of patients (15 of 33) with Type C subcluster viral genomes were from the U.S. or Australia, and the U.S. patients carrying virus in the T-allele subcluster all had a history of living in Wuhan, China suggesting that this strain is most common to this region. A secondary cluster of lineages, B, differs from A in two mutations, a synonymous T8782C mutation and a nonsynonymous C28144T mutation, which converted a leucine residue into a serine. 74 of 93 type B genomes were isolated from patients in Wuhan, eastern China, or other parts of East Asia. Curiously, the Type B strains outside of East Asia showed a higher degree of mutations, perhaps suggesting that Type B was well adapted to east Asian populations immunologically but may have had to adapt to overcome some resistance in populations outside of this region. Type C differentiates itself from Type B by the nonsynonymous G26144T mutation, which switches a glycine residue for a valine. At the time, the authors report that this was the most common type isolated from patients in Europe, particularly France, Sweden, Italy, and England. It had also been isolated from patients in California and Brazil.

Figure 1.9b: Times Series of the Phylogenetic Tree of SARS-CoV-2 by New Clade (19A-20C, color-coded) as of mid-September 2020 (Adapted from

A genetic analysis by Mt. Sinai Hospital has established that most U.S. COVID-19 reported cases in New York originated from European SARS-CoV-2 strains, not Asian ones, presumably by seeding events involving international travel from Europe. By mid-July 2020, the predominant global clade was A2a (teal), which has been further subdivided into a new global clade (19A-20C). Following the analysis of many hundreds of thousands of samples, it is estimated that SARS-CoV-2 (mainly A2a, 20A/B) mutates at a rate of approximately 22.3 nucleotide substitutions per year (Figure 1.9b).

By the time of its publication on April 23, 2020, data from Yao et al. provided direct evidence that SARS-CoV-2 had acquired at least 30 different genetic variations capable of substantially changing its pathogenicity. The researchers analysed the characterization of 11 patient-derived viral isolates from Hangzhou (ibid.). In Hangzhou, there have been 1,264 reported cases, and Yao et al. studied how efficiently the different viral strains could infect and kill Vero-E6 cells (a lineage of cells isolated from kidney epithelial cells extracted from an African green monkey). Results showed that the viral isolates exhibited significant variation in cytopathic effects and viral load. There were up to 270-fold differences when infecting Vero-E6 cells. Moreover, Yao et al. (2020) observed intrapersonal variation and found 6 different mutations in the spike glycoprotein (S protein). Two of these mutations include 2 different SNVs that lead to the same missense mutation (ibid.).

Mutations to the SARS-CoV-2 genome that encode the spike protein are of particular interest, especially in the regions that code for the receptor binding domain, as these mutations have potential to alter the transmissibility and virulence of the virus. From April through early December, 2020, the COVID-19 Genomics UK Consortium has identified at least 4,000 mutations in the spike protein alone. Mutations that affect the epitope of the virus are also crucial for study as they affect antigenic drift, which in turn affects the efficacy of vaccines and protective antibodies against the virus.

D614G Mutation

On May 5, 2020, Korber et al. (2020) identified specific variants of the spike protein that showed signs of positive selection, as indicated by their increasing prevalence at the time. The D614G mutation, arising from a guanine to adenine mutation at site 23,403 of the genome resulting in a change from an aspartic acid (D) to a glycine (G) residue in the spike protein, is of particular note as the authors identify that it is the mutation that resulted in a clade of SARS-CoV-2 genome that arose in Europe and grew in frequency. The mutation is often accompanied with two other mutations, one that is synonymous and another that resulted in a change in one amino acid residue in the RNA-dependent RNA polymerase of the virus, which is crucial for viral replication. The authors note that in regions where the D614 form was present early in the pandemic (such as in Europe, Australia, Canada, and the U.S.), if G614 entered the population, it quickly increased in frequency and in many cases became the predominant form within weeks. This result suggests that the increasing frequency of the mutation originated from a selective advantage bestowed by the mutation rather than from a founder effect.

Korber et al. suggest two possible mechanisms for the enhanced fitness of the mutation, one of which concerns the intermolecular forces between the protomers of the spike protein that may enhance binding to ACE2. It is also possible that the mutation enhances inhibition of neutralizing antibody activity, as the amino acid affected is located within the region that can bind to various SARS-CoV-2 antibodies (i.e. the epitope). This second mechanism, if at work, could give rise to an increasing number of secondary infections. The authors also found that patients with the D614G mutation had higher viral loads, as these patients needed fewer cycles of PCR for viral detection. However, the authors did not find any association between the mutation and an increase in disease severity.

Mink Variants

On November 6, 2020, the World Health Organization reported that 214 human cases of COVID-19 that traced to SARS-CoV-2 variants associated with mink farms had been reported in Denmark. Twelve cases, all of which were reported in September, 2020 in Jutland, Denmark, were associated with a specific variant known as Cluster 5, which was found on five mink farms in the local region. Of these twelves cases, eight individuals had direct links to the mink farming industry.

The cluster 5 variant contains five different mutations, which causes a change to three amino acids and two deletions in the spike protein. Preliminary evidence seems to suggest that the variant is more resistant to neutralizing antibodies, but further studies will be necessary to verify these findings.

On November 13, 2020, Danish researchers had identified a total of 170 variants of SARS-CoV-2 identified across 40 mink farms. 300 COVID-19 cases in Denmark could be traced back to variants found on mink farms. While there is limited evidence to suggest that these variants lead to higher rates of transmission or to more higher likelihood of serious disease, some variants may lead to strains that are more resistant to current COVID-19 vaccine candidates.


VUI-202012/01 (Variant Under Investigation in December 2020), also known as lineage B.1.1.7, is a variant of SARS-CoV-2 that was first identified by the COVID-19 Genomics UK Consortium in October, 2020. The first sample identified with this variant was from an individual in the United Kingdom that was sampled on September 20, 2020 in Kent. In a news briefing from the BMJ published on December 16, 2020, Jacqui Wise elaborates on the variant’s characteristics. It is specifically defined by a set of 17 non-synonymous mutations, one of which is the N501Y mutation, which results in the replacement of the amino acid asparagine (N) with tyrosine (Y) at the 501st amino acid position. According to a report released by the COVID-19 Genomics UK Consortium on December 19, 2020, other mutations that affect the spike protein are a deletion of amino acids 69-70, a deletion of amino acid 144, A570D, the previously described D614G mutation, P681H, T716I, S982A, and D1118H. The authors state that the 69-70 deletion may be tied to enhanced evasion of the human immune response. They also note that the P618H mutation has immediate proximity to the furin cleavage site, which has biological importance for entry into respiratory epithelial cells. They note that both of these mutations have been previously observed but not in combination. Table 1.9 shows the affected genes, specific nucleotide sequences, and amino acids that the authors report gave rise to the B.1.1.7 lineage. Of particular interest is one mutation outside of the spike gene that results in a premature stop-codon, leading to a truncated version of ORF8, causing it to become inactive. There are also six synonymous mutations; five of them are in the ORF1ab gene and one is in the M gene.

Perhaps more troubling, the N501Y mutation occurs at the spike protein’s receptor binding domain and may be responsible for making the virus more infectious. Rambaut et al. (2020) state that the mutation shows increased binding affinity of the virus’s RBD to murine ACE2. The rapid spread of SARS-CoV-2 infection in Great Britain from October through December, 2020 is thought to be partially attributable to the increasing prevalence of the new variant, which as of December 13, 2020, has been identified in 1,108 cases from 60 different local authorities in the U.K., a number believed to be substantially lower than the true number of cases. As of December 15, 2020, Rambaut et al. report that there are 1623 genomes so far sequenced that belong to the B.1.1.7 lineage: 519 were sampled in Greater London, 555 were sampled in Kent, 545 were sampled in other UK regions such as Scotland and Wales, and four were sampled outside of the UL. As of December 20, 2020, a small set of cases have also been identified in Denmark, Belgium, Italy, the Netherlands, and Austria. Furthermore, a distinct strain carrying the N501Y mutation, known as the 501.V2 variant was also identified in South Africa on December 18, 2020. In mid-December, 2020, the New and Emerging Respiratory Virus Threats Advisory Group also raised concerns over the possibility that this variant may be resistant to current vaccine candidates and general antibody resistance, as the full set of mutations may make it antigenically distinct from previous variants. They specifically cited four likely SARS-CoV-2 reinfections that had been identified within a set of 915 new cases of the VUI-202012/01 variant.

Rambaut et al. speculate that the appearance of a large number of mutations in B.1.1.7 suggests that it may have arisen in a immunocompromised patient chronically infected with SARS-CoV-2, possibly one who had received convalescent plasma as part of their therapy. Such treatment has previously been noted to drive genetic diversity (Kemp et al., 2020).

Table 1.9: Mutations that gave rise to B.1.1.7 Lineage
(Adapted from Rambaut et al., 2020)
Gene Nucleotide Amino Acid
11288-11296 deletion
SGF 3675-3677 deletion
Spike (ORF2)
21765-21770 deletion
HV 69-70 deletion
Spike (ORF2)
21991-21993 deletion
Y144 deletion
Spike (ORF2)
Spike (ORF2)
Spike (ORF2)
Spike (ORF2)
Spike (ORF2)
Spike (ORF2)
Nucleocapsid (ORF9)
28280 GAT → CTA
Nucleocapsid (ORF9)

Other Notable Mutations

Korber et al. (2020) also identified an S943P (Serine (S) to Proline (P)) mutation, which was found only in Belgium. The mutation is notable because it is found in disparate lineages of the phylogeny of the virus, which suggests that it did not originate from a single founder but arose from a recombination event. For this to occur, the authors note that a host must have had simultaneous infection by two distinct viral genomes. The authors also identify nine other notable mutations that affect the spike protein: L5F and L89 (two signal protein mutations), V367F, G476S, and V483A (these three are found in the RBD region), H49Y, Y415H/del, Q239K (these three are located in the N-terminal S1 domain), A831V and D839Y/N/E (both located near the fusion peptide of the S2 domain), and P1263L.

The ORF3b protein product of SARS-CoV-2 is dramatically shorter than its SARS-CoV-1 counterpart due to the presence of four premature stop codons. The SARS-CoV-1 protein product is associated with potent type 1 interferon inhibition (Kopecky-Bromberg et al, 2007), but Konno et al. (2020) report that type 1 inhibition is further enhanced in the truncated SARS-CoV-2 version of the protein. Konno et al. also identify and describe two SARS-CoV-2 sequences isolated from Ecuadorian patients where the ORF3b genomic sequence was extended due to a loss in the first premature stop-codon. The two ORF3b genomic sequences themselves were more than 99.6% identical, and the proteins for which they encoded were exactly identical. IFNβ reporter assays revealed that the elongated variant of the SARS-CoV-2 ORF3b protein demonstrated significantly higher anti-IFN-I activity. Since the inhibition of type 1 interferon is associated with a worsened clinical outcome, the new variants isolated may be associated with more deleterious SARS-CoV-2 strains.

Infections in Non-human Species

While SARS-CoV-2 shares substantial homology with the coronaviruses found in reservoir species, the true intermediate species (e.g. civets in SARS-CoV-1) that may have transferred the virus to humans has not yet been identified. As yet, no virus sampled in non-human species shares enough common homology to be considered a direct phylogenetic ancestor to SARS-CoV-2. Data concerning which animal species SARS-CoV-2 can infect will provide further insight into potential natural reservoirs of the disease, particularly in domesticated animals that live in close contact with humans, such as pets and livestock.

On April 5, 2020, several news outlets began reporting a confirmed SARS-CoV-2 infection (confirmed by RT-PCR) in a four-year-old Malayan tiger residing in the Bronx Zoo in New York City. The tiger was speculated to have become infected by a possibly asymptomatic zoo employee. Five other felines in the zoo were reported to have a dry cough, loss of appetite, and some wheezing, leading to the suspicion of a SARS-CoV-2 infection in another Malayan tiger, two Amur tigers, and three African lions.

Researchers from the Harbin Veterinary Research Institute of the Chinese Academy of Agricultural Sciences sought to identify animal species that could potentially become infected with SARS-CoV-2. Two distinct SARS-CoV-2 viral samples—one isolated from an environmental sample in Huanan Seafood Market in Wuhan, China and the other isolated from a human patient—were used to study the viral dynamics of animal infections in various species. Four ferrets were inoculated with the virus intranasally, two ferrets inoculated per sample source. After four consecutive days of inoculation, the ferrets were sacrificed. SARS-CoV-2 RNA was detected in the nasal turbinate, soft palate, and tonsils of all four tested (Shi, J. et al., 2020). No viral RNA was detected in the lung, heart, spleen, kidney, pancreas, small intestine, or brain samples also extracted from the subjects. These results suggest that the virus can only replicate effectively in the upper respiratory tract of ferrets. Six other ferrets were also infected with the virus samples (three per sample), and these animals were monitored and tested by the researchers to observe symptoms and viral load. Viral RNA was detected in the nasal washes of all ferrets on Days 2, 4, 6, 8, and 10 after infection and also in the rectal swabs of the ferrets, but viral load was considerably lower in the rectal samples. Moreover, infectious virus was only detected in the nasal washes of the animals. One animal in each group of three developed a fever and loss of fever (one on Day 10 and the other on Day 12). These two animals were sacrificed on Day 13; their lungs showed signs of vasculitis (severe inflammation of blood vessels), increased count of type II pneumocytes, neutrophils, and macrophages around the alveoli, and mild peribronchitis. On Day 20, the other ferrets were euthanized; all six had neutralizing antibodies, and those sacrificed on Day 20 had higher antibody counts than the two sacrificed on Day 13.

Shi, J. et al. (2020) conducted similar studies to test for viral susceptibility in domesticated cats, dogs, and livestock. They confirmed that the virus was transmissible by respiratory droplets between domestic cats. Viral RNA was detected in the nasal turbinates, soft palates, tonsils, trachea, and small intestine in at least one of the adult cats later sacrificed. No viral RNA was detected in lung tissue, however. Neutralizing antibodies for SARS-CoV-2 were detected in all infected adult cats tested. Juvenile cats showed a worsened clinical course. Histological samples revealed large lesions in the nose, trachea, and lungs of the infected juvenile cats. Rectal samples taken from beagles infected with the virus tested positive for viral RNA, but no infectious virus was present. Furthermore, the virus was undetectable in any tissue samples taken in the infected dogs. Two of the four dogs tested showed antibodies for the virus, and two did not. Taken together, the evidence suggests that SARS-CoV-2 cannot replicate as efficiently in dogs as it can in cats and ferrets. The researchers were not able to detect viral RNA in samples collected from pigs, chickens, and ducks inoculated with the viral samples, suggesting that these species are not vulnerable to a SARS-CoV-2 infection.

On the contrary, further studies on pigs warrant investigation, as Zhou, P. et al. (2020) were able to successfully demonstrate that SARS-CoV-2 can infect HeLA cells that expressed ACE2 receptors from humans, horseshoe bats, civets, and pigs, but not mice. Moreover, Chen, W. et al. (2005) identified two pigs that had developed SARS-CoV-1 antibodies (a total of 242 domestic animals living in close contact with humans were surveyed in the study), one of which tested positive for the virus by RT-PCR performed on fecal samples. Viral isolates were obtained from this animal in both blood and fecal samples. The study performed by Shi, J. et al., which concluded that pigs were not vulnerable to SARS-CoV-2, only used five pigs, all of which were juveniles under 40 days of age, which may have contributed to a more robust immune response to the virus.

Sit et al. (2020) studied dogs from 15 different households with confirmed human SARS-CoV-2 infection. Of these, two dogs were confirmed to test positive for the virus from nasal and oral swabs collected for testing by RT-PCR. Their respective viral genomes were sequenced and confirmed to be identical to the virus detected in the humans from these homes. These dogs also showed antibody responses in plaque reduction SARS-CoV-2 neutralization assays. The two dogs were a 17-year-old Pomeranian and a 2.5-year-old German Shepherd, and both were clinically asymptomatic during their period of quarantine. The results suggest that in some cases, humans can transmit the virus to dogs, but it remains unclear whether animal-to-human transmissions can occur.

Rhesus macaques infected with SARS-CoV-2 in a laboratory setting have shown symptoms of infection (Bao et al., 2020), although no other non-human primates as of April 19, 2020 have been confirmed with SARS-CoV-2 infection. Nevertheless, all apes, including bonobos, chimpanzees, gorillas, orangutans, and all African and Asian monkeys, may show enhanced susceptibility to SARS-CoV-2 infection because their ACE2 receptors contain the key amino acids that have been identified to interact with the RBD of SARS-CoV-2 (Melin et al., 2020). American monkeys and some tarsiers and lemurs, however, show increased deviation at these key residues. Protein modeling reveals that these deviations limit the ability of the virus to efficiently bind to ACE2 receptors, which likely decreases susceptibility to infection in these species.

A July 2020 study conducted on household pets in northern Italy found SARS-CoV-2 infection rates in both cats and dogs to be comparable to human infection rates (Patterson et al., 2020). The researchers sampled 817 pets from northern Italy during the COVID-19 outbreak in the region. Of these animals, 540 were dogs and 277 were cats, and all tested negative by RT-PCR methods using oropharyngeal and/or rectal samples, indicating no active SARS-CoV-2 infection in these animals. However, for those animals where blood sera were collected, SARS-CoV-2 neutralizing antibodies were detected in 3.35% of dogs and 3.95% of cats, indicative of previous infection. Not all pets that were seropositive came from households with known previous COVID-19 infection, but the animals all came from regions with high rates of human infection. While previous studies have identified cats as being particularly susceptible to SARS-CoV-2, evidence suggesting that dogs may be equally as susceptible has not been as clear. These results, from a SARS-CoV-2 study that sampled the largest number of animals at the time of its publication, strongly indicate that susceptibility in dogs should be reevaluated and further investigated.

Previous Pandemics and Global Epidemics

Several pathogens have become a pestilence upon the human species during recent recorded history, and not all have been viral in origin. The most notable of these is the Bubonic Plague (or Black Death) of 1347-1351, caused by Yersinia pestis, a rod-shaped, coccobacillus bacterium, which was responsible for an estimated 75-200 million deaths worldwide including 30-60% of the population of Europe. The non-human animal reservoir for such devastation was later determined to be a flea species enzootic to rodents, mainly the black rat (Rattus rattus), which then transferred Yersinia pestis to humans by their flea-bites. Nearly a millennium earlier, 541-751, it is believed that the same pathogen was responsible for 25 million deaths in the Plague of Justinian, responsible for the loss of about 50% of the population of Europe at the time. The global populations then were approximately 200 and 475 million, respectively.

More recently, the Flu Pandemic of 1918-1920 (formerly Spanish Flu[5]), caused by the influenza A virus (subtype H1N1), infected an estimated 500 million individuals worldwide and is believed to have caused 50-100 million deaths through three infection waves from the spring of 1918 to the winter of 1919/1920 and a significantly smaller fourth wave in the spring of 2020. The second wave was the most deadly of the four, due in part to the increased virulence of the virus in late 1918, a result of natural mutations through millions of hosts. Premature governmental responses to the first wave through the quick reopening of schools and businesses to counter the consequences of World War I (WWI) are also to blame. The case fatality rate was further exacerbated by weakened supply chains that led to rampant malnutrition in addition to overcrowding of hospitals and clinics, which contributed to poor hygiene and lethal bacterial superinfections. Many survivors of the second wave had been victims of the first and, therefore, had developed a protective immunity to the virus sparing them from further danger. However, the curious age pattern of mortality leads to more questions, many unanswered, with a peak age for mortality occurring at 28 years in the U.S., Canada, and several locations in Europe. Gagnon et al. (2013) have proposed that early life exposure to the Russian Flu Pandemic of 1889-1894 may have resulted in an immunological memory that later contributed to a dysregulated immune response (leading to cytokine storm, for example) to the antigenically novel influenza strains of 1918-1920. In contrast, much younger and much older individuals were spared, likely due to early exposures with more antigenically similar flu viruses. Starko (2007) suggests differently, however, that the high mortality rate may have resulted from the overuse of salicylic acid (e.g., Aspirin) in quantities as large as 8-31 grams per day following recommendations made by the Surgeon General of the U.S. Army and the Journal of the American Medical Association as part of an experimental treatment protocol with an abundant drug that had recently expired from patent protection. Such doses are now known to be responsible for hyperventilation (33%) and also pulmonary edema (3%) in patients receiving them. No matter the actual causes, adjusting for population growth, a similar pandemic would result in 200-425 million deaths today.

Figure 1.10: Time Series of U.K. Deaths due the Influenza Pandemic of 1918-1920 indicating the initial three Waves (Adapted from Wikipedia)

Several other influenza-A-subtype pandemics have occurred in the last few centuries including, for example, most recently, the Flu Pandemic of 2009 (H1N1), the (Hong Kong) Flu Pandemic of 1968-1970 (H3N2), the (Asian) Flu Pandemic of 1957-1958 (H2N2), and the (Russian) Flu Pandemic of 1889-1890 (possibly H2N2, H3N8, or Betacoronavirus HCoV-OC43).

HIV-1 (groups M, N, O, and P) and HIV-2 (groups A, B, C, D, E, F, G, and H), the family of sexually transmitted zoonotic viruses responsible for the current Global AIDS Epidemic (not officially deemed a pandemic by the WHO), have caused approximately 58.3-98.1 million infections worldwide and approximately 23.6-43.8 million related deaths from 1981 to 2018. HIV-1 was identified in 1976 in Zaire but can be traced as far back as 1910 to Kinshasa, Belgian Congo (now, the Democratic Republic of the Congo) and is genetically similar to Simian Immunodeficiency Virus (SIV). The earliest known human case dates to 1959, believed to have been transmitted by consumption of bushmeat. The specific animal reservoirs of HIV-1, the more virulent and highly infective species of the two, and HIV-2, which appears to be isolated to West Africa, 5-30 times less transmittable, and less responsive to treatments than HIV-1, are believed to be the common chimpanzee (Pan troglodytes troglodytes) and the sooty mangabey (Cercocebus atys atys), respectively.

Table 1.10: Significant Epidemics in the Common Era (October 19, 2020)
Cases [M]
Deaths [M]
Antonine Plague
Near East
Eurasia, Africa
Justinian Plague
Eurasia, Africa
Black Death
Eurasia, Africa
First Cocoliztli Epidemic
Second Cocoliztli Epidemic
Third Cholera Pandemic
Third Plague Pandemic
(Russian) Flu Pandemic
(Spanish) Flu Pandemic
(Asian) Flu Pandemic
AIDS Pandemic
(Hong Kong) Flu Pandemic
Flu Pandemic of 2009
COVID-19 Pandemic

Due to the advent of retroviral medications, the mortality rate of HIV/AIDS has decreased by 56% since its height in 2004 and by 40% since 2010. Currently, the epicenter of the Global AIDS Epidemic is in Southern Africa with an estimated 7.1 million infections with approximately 110,000 annual deaths, where the adult prevalence exceeds 27% in Eswatini (Swaziland), 25% in Lesotho, 21.9% in Botswana, and 3.1-18.9% for several smaller surrounding countries, followed by Nigeria with an estimated 3.2 million infections (2.9%) with approximately 160,000 annual deaths, and India with an estimated 2.1 million infections (0.22%) with approximately 67,000 annual deaths. At the time of writing, there is no known vaccine for either HIV-1 or HIV-2.

The suspected origin, geographic reach, and estimated number of deaths of recent pandemics and global epidemics are given in Table 1.9. In nearly all of the aforementioned examples, Africans and North Asians, predominantly involving the economically disadvantaged and otherwise underserved, were most impacted in terms of both infections and fatality compared to all other demographics.

With regard to the present situation, since SARS-CoV-2 is currently mutating as it propagates across the globe, the likelihood of a sustained COVID-19 Pandemic, possibly in several waves, cannot be ruled out at this time. However, while SARS-CoV-2 has twice the genomic length as that of seasonal influenza and mutates at half the nucleotide substitutions per year (24 vs 50 times per year), SARS-CoV-2 mutates about one-quarter of the rate as influenza in terms of substitutions per site per year (See Mutations and Divergent Strains). The seasonal frequency and mutation rate of influenza are precisely why new vaccinations are required on a yearly basis. At present, it is unclear whether a similar or delayed vaccination rate would arise for SARS-CoV-2, assuming a vaccine is found at all.

Viral Inactivation

While it is paramount to understand the underlying structure and primary action of any pathogen in the human host for the development of treatments for the infected and potential vaccines for the healthy, it is equally as valuable to understand vulnerabilities that may contribute to mitigating or evading infection in the first place. In particular, there are at least three ways to prevent a pathogen from effectuating its purpose, that is, by inhibiting its ability to infect cells and/or multiply. For viruses, the most obvious method involves 1.) mechanical removal or physical distancing from the host prior to infection (See Preparations and Recommendations) followed by 2.) inactivation, which involves altering the lipid/protein coat and effectively weakening or disabling it from its normal action, and finally 3.) denaturing, which is an extreme form of inactivation and involves dismantling into constituent parts. Viral inactivation may involve several concurrent processes including, but not limited to, the following:

  1. Chemical inactivation (such as acids/bases and reactive species)
  2. Pasteurization inactivation (such as heat)
  3. Radiation inactivation (such as far-UVC light)
  4. Solvent/detergent/surfactant inactivation (such as soap)

Oxygen, Heat, pH, and Radiation

The composition of air is primarily Nitrogen (78.084%), Oxygen (20.9476%), Argon (0.934%), and Carbon Dioxide (0.0314%), and the remainder is composed of several other gases in trace quantities. Oxygen, while vital to an animal host, is a known corrosive, causing oxidation in several metals, for example, and can inactivate several enveloped viruses given sufficient time, the rate of which depends on the chemical composition of the envelope proteins. Other oxygen reactive species have been used for viral decontamination such as vaporized hydrogen peroxide for personal protective equipment (PPE) currently used in hospitals and clinics.

SARS-CoV-2 shows increasing sensitivity to heat. Chin et al. (2020) demonstrated that when SARS-CoV-2 is in transport medium at 4°C, the virus remains most stable, experiencing only a 0.7 log-unit reduction in infectious viral titer over 14 days (approximately a 5-fold reduction to 20% of the original viral titer). In contrast, at 70°C tested under otherwise identical conditions, the viral sample was completely inactivated in 5 minutes, and showed a 1.47 log-unit reduction (almost a 30-fold reduction) in infectious titer by one minute. When tested at 56°C, the sample was inactivated in 30 minutes (but still present at 10 minutes albeit with a 2.97 log-unit reduction in infectious viral titer), and when tested at 37°C, the sample was inactivated in 2 days (at 1 day, the viral titer had had reduced by 3.58 log-units, roughly a 3800-fold decrease). Finally, at 22°C, the sample was still viable by 7 days (but showed a significant 3.4 log-unit reduction by this point) and completely inactive by 14 days. Figure 1.11 illustrates all of the data reported for the stability of SARS-CoV-2 in transport medium at the five temperatures tested. At room temperature (22°C), Chin et al. (2020) demonstrated that SARS-CoV-2 remained stable for a wide range of pH values (3-10), with similar log-unit reductions reported after incubation for 1 hour at each pH tested.

Far-UVC light (specifically 222 nm light at 2 mJ/cm2) has been shown to efficiently inactivate airborne aerosolized viruses. At this dosage and wavelength, for example, UVC light was found to inactivate over 95% of aerosolized H1N1 influenza virus (Welch et al., 2018). At the same time, the study found that this treatment was relatively safe even when applied to biological surfaces. Radiation with a wavelength of 222 nm did not damage mammalian cells because of the strong absorbance in the outer layers of the skin and eyes, whereas broad spectrum radiation at 254 nm, more commonly used in some sterilization protocols, is associated with potential damage in tissues. The study authors recommend the use of this particular wavelength in overhead lamps in public spaces as a means of limiting the transmission of a wide range of microbial infections of both bacterial and viral origin. [NB: Light of this wavelength should be tested for its potential efficacy in inactivating SARS-CoV-2.] Buananno et al. (2020) report that 1.7 and 1.2 mJ/cm2 dosages of 222 nm light inactivated 99.9% of aerosolized alpha coronavirus 229E and beta coronavirus OC43. The authors go on to suggest that the use of low intensity far-UVC light may have efficacy in inactivating a broad range of coronaviruses, including SARS-CoV-2.

Figure 1.11: Percentage of Original SARS-CoV-2 Titer Remaining over Time at 5 Temperatures (Adapted from Chin et al., 2020)

It follows that SARS-CoV-2 cannot persist outdoors for more than a handful of days, which has yet to be quantified by any studies. In contrast, however, protection from Far-UVC, trapping in cold temperatures, and isolation from oxygen extend its ability to persist possibly even up to years under the right conditions, which is how samples are stored in certain laboratories.

Soaps, Surfactants, and Detergents

Good personal hygiene, such as washing with soap, mild surfactants, and/or detergents, is effective in inactivating several dangerous pathogens or mechanically removing them from the body or contaminated items. These compounds consist of amphipathic (both hydrophilic and hydrophobic) pin-shaped molecules, which is key to their protective action, and rupture the membrane of several types of bacteria and viruses, including coronaviruses like SARS-CoV-2. The charged, polarized, hydrophilic (water-loving) head is attracted to water molecules while the non-polar hydrophobic (water-fearing) tail mixes with oils and fats. When these molecules are suspended in water, they create a solvation shell through hydrogen bonding with water molecules. This water cage has an ice-like crystal structure and can be characterized according to the hydrophobic effect. The structure of SARS-CoV-2 lipid membranes resembles the double-layered cage or micelles. These are studded with important proteins that allow viruses to infect cells. Pathogens wrapped in lipid membranes include several coronaviruses, hepatitis B/C, herpes viruses, ebolaviruses, the Zika virus, and numerous bacteria that attack the intestines and respiratory tract. Soap physically denatures the virion when the water-shunning tails of the molecules wedge themselves into the lipid membrane (Figure 1.12).

Figure 1.12: Soap-Enveloped Virion Inactivation

Electrokinetic Inactivation

The stability of enveloped RNA viruses, such as SARS-CoV-2, can be influenced by electrostatic interactions (Forrey et al., 2009). Because positively charged capsid proteins package the negatively charged RNA, for instance, the overall positive charge of the capsid is known to limit the length of the viral genome (Belyi et al., 2006). Furthermore, the envelope protein of coronaviruses can generate transmembrane voltage-dependent ion conductive pores, a vulnerability which can be targeted through electrokinetic disruption (Verdia-Baguena et al., 2012). Many other electrical properties that affect the structural integrity of the virion make it vulnerable to inactivation through electrical mechanisms.

Sen et al. (2020) tested the ability of an FDA-approved wireless electroceutical wound dressing to disrupt the infectivity of SARS-CoV-2 through electrokinetic destabilization of the virion. Specifically, the authors tested how the fabric could affect the zeta potential of the virion, a parameter that affects the adsorption and stability of the virus in a colloidal dispersion. The fabric tested was made from polyester that had been printed with alternating dots of Zn and Ag metal. Using the metals as a redox couple, the fabric could generate a weak 0.5 V potential difference in the presence of an aqueous ionized environment, such as a bodily fluid. The fabric was tested against a control polyester fabric with no metallic deposition. The authors found that upon contact with the electroceutical fabric, the zeta potential was significantly attenuated, and this effect was augmented with longer contact time. Moreover, upon contact with the electroceutical fabric, the virus lost its ability to infect cells, a protective effect which was not demonstrated by the control fabric. The authors suggest that the reduction in zeta potential may have led to defects in the structural integrity of the virus, which resulted in the loss of infectivity. These results suggest that further research into the potential use of such electroceutical materials in the inactivation of SARS-CoV-2 are warranted.

  1. The case fatality rate (CFR) of an epidemic is the ratio of the total fatalities among the total of confirmed infected cases (by laboratory testing, for example), which is a meaningful measurement of the severity of the epidemic only after fatalities have ceased. A provisional CFR or PCFR during an on-going epidemic should be used with caution and for relative comparison, as it may differ from the CFR by a wide margin. However, while a PCFR introduces an inherent uncertainty into the denominator of the ratio, it may be partially corrected by using the total infection count from a date prior to that of the total fatalities using a time frame equal to the average time to death from initial infection of the corresponding disease, which is approximately 14 days for COVID-19. The CFR may be contrasted with the infection fatality rate (IFR), which is the fraction of fatalities among the infected, not just among those confirmed to be infected, and the mortality rate (MR), which is the fraction of fatalities of a disease among the general population.
  2. Reported death counts of New York City include probable deaths attributed to COVID-19.
  3. Evidence is emerging that the first COVID-19 death in the U.S. may have occurred as early as February 6, 2020.
  4. For instance, for those individuals infected on an international conveyance who would then disembark, any subsequent secondary infections would not be considered as part of the cluster tied to the ship.
  5. The Flu pandemic of 1918-2020, or Spanish Flu, gets its colloquial name, a misnomer, from the fact that in Spain, a neutral country during WWI, print and broadcast media were not censored in their reporting of the ravaging illness for reasons likely concerning wartime morale. The confounding effect of censorship in the U.S., U.K., France, Germany and Austria-Hungary, Italy, and Russia, combined with the public dissemination of reports of the flu in Spain gave the false impression that the origin and epicenter of the pandemic occurred and existed in Spain.

Chronology, Data, and Observations

“Measure what is measurable, and make measurable what is not so.”

―Galileo Galilei, Astronomer (1564-1642)

Observations and calculations from a mathematical, probabilistic, and statistical points-of-view are essential in understanding and modeling the progression of COVID-19 infections, for establishing proper responses, and realizing the actual reach, scope, and range of the corresponding pandemic. These include, but are not limited to, modeling daily and cumulative time series of infections, deaths, and testing rates, computing times to doubling and order-of-magnitude increases for forecasting, detailing the particular sequence of most impacted locales by case fatality rates or other changing variables, studying the efficacy of proposed experimental treatments and potential vaccines, and comparing governmental responses by locales to determine safest and most effective actions. It remains difficult, however, to separate the effect of testing rates on actual infection rates (likely much greater than the reported cases), as both are increasing simultaneously.


The varying timelines of the course of COVID-19, mutations of SARS-CoV-2, and other practical considerations of healthcare robustness and infrastructure for reporting confound accurate estimation of infection rates, case fatality rates, and mortality rates[1], too, especially if the virus is virulent. Despite these inherent difficulties, we summarize the generally accepted chronology of events:

  1. On December 29, 2019, four cases of viral pneumonia of unknown etiology were linked to the Huanan Seafood Market.
  2. On December 30-31, 2019, the Chinese Center for Disease Control (CDC) and the WHO were informed of an outbreak of a viral pneumonia in Wuhan, Hubei Province, China.
  3. On January 6, 2020, Chinese CDC activated a Level 2 Emergency Response.
  4. On January 7-10, 2020, Chinese scientists identified the second novel coronavirus responsible for the aforementioned outbreak, called nCoV-2019, and released the genome sequence.
  5. On January, 11, 2020, China reported its first death, a 61-year old male who had visited the Huanan Seafood Market.
  6. On January 21, 2020, China reported the first confirmed human-to-human transmitted infection due nCoV-2019 (now SARS-CoV-2).
  7. On January 23, 2020, the Chinese government imposed a strict Stay-in-Place order for the residents of Wuhan, China.
  8. On January 30, 2020, the WHO declared a Global Health Emergency for the growing number of nCoV-2019-related clusters and outbreaks in several locations in and outside of China.
  9. On February 11, 2020, the WHO defined the new name of the disease, COVID-19.
  10. On February 26, 2020, the first case of human-to-human transmission in the U.S. was documented involving an individual from California who had not traveled recently. Similar cases were reported in the states of Oregon, Washington, and New York.
  11. On February 29, 2020, the first U.S. COVID-19 death was reported