The outbreak of severe acute respiratory syndrome-coronavirus-2 MESHD
(SARS-CoV-2) has caused an unprecedented pandemic. Since the first sequenced whole-genome of SARS-CoV-2 on January 2020, the identification of its genetic variants has become crucial in tracking and evaluating their spread across the globe.
In this study, we compared 15,259 SARS-CoV-2 genomes isolated from 60 countries since the outbreak of this novel coronavirus with the first sequenced genome in Wuhan to quantify the evolutionary divergence of SARS-CoV-2. Thus, we compared the codon usage patterns, every two weeks, of 13 of SARS-CoV-2 genes encoding for the membrane protein (M PROTEIN
), envelope (E), spike surface glycoprotein (S PROTEIN
), nucleoprotein (N PROTEIN
), non-structural 3C-like proteinase ( 3CLpro PROTEIN
), ssRNA-binding protein ( RBP HGNC
), 2-O-ribose methyltransferase (OMT), endoRNase (RNase), helicase HGNC
, RNA-dependent RNA polymerase PROTEIN
( RdRp PROTEIN
), Nsp7, Nsp8, and exonuclease ExoN.
As a general rule, we find that SARS-CoV-2 genome tends to diverge over time by accumulating mutations on its genome and, specifically, on the coding sequences for proteins N PROTEIN
and S. Interestingly, different patterns of codon usage were observed among these genes. Genes S, Nsp7, NSp8, tend to use a norrower set of synonymous codons that are better optimized to the human host. Conversely, genes E PROTEIN
and M consistently use a broader set of synonymous codons, which does not vary with respect to the reference genome. We identified key SARS-CoV-2 genes (S, N, ExoN, RNase, RdRp PROTEIN
, Nsp7 and Nsp8) suggested to be causally implicated in the virus adaptation to the human host.