The outbreak of severe acute
respiratory syndrome-coronavirus-2 MESHD (SARS-CoV-2) has caused an unprecedented pandemic. Since the first sequenced whole-genome of SARS-CoV-2 on January 2020, the identification of its genetic variants has become crucial in tracking and evaluating their spread across the globe.
In this study, we compared 15,259 SARS-CoV-2 genomes isolated from 60 countries since the outbreak of this novel coronavirus with the first sequenced genome in Wuhan to quantify the evolutionary divergence of SARS-CoV-2. Thus, we compared the codon usage patterns, every two weeks, of 13 of SARS-CoV-2 genes encoding for the membrane
protein (M PROTEIN), envelope (E), spike surface
glycoprotein (S PROTEIN),
nucleoprotein (N PROTEIN), non-structural 3C-like proteinase (
3CLpro PROTEIN), ssRNA-binding protein (
RBP HGNC), 2-O-ribose methyltransferase (OMT), endoRNase (RNase),
helicase HGNC,
RNA-dependent RNA polymerase PROTEIN (
RdRp PROTEIN), Nsp7, Nsp8, and exonuclease ExoN.
As a general rule, we find that SARS-CoV-2 genome tends to diverge over time by accumulating mutations on its genome and, specifically, on the coding sequences for
proteins N PROTEIN and S. Interestingly, different patterns of codon usage were observed among these genes. Genes S, Nsp7, NSp8, tend to use a norrower set of synonymous codons that are better optimized to the human host. Conversely,
genes E PROTEIN and M consistently use a broader set of synonymous codons, which does not vary with respect to the reference genome. We identified key SARS-CoV-2 genes (S, N, ExoN, RNase,
RdRp PROTEIN, Nsp7 and Nsp8) suggested to be causally implicated in the virus adaptation to the human host.