During the first months of 2016, my research team and I started tracking the genetics of the Zika virus’ spread across the world using Nvector, a tool developed, and currently only in use in our laboratory at the Department of Bioinformatics and Computational Biology at the University of North Carolina, Charlotte. The combination of traditional phylogenetic tools and Nvector allowed us to rapidly perform phylogenetic analyses of the genomic differences and relationships of the Zika virus sequences generated by different research groups around the world, and project the generated phylogenetic trees onto a global map. This approach was pioneered by Daniel Janies, PhD, a Carol Grotnes Belk Distinguished Professor of Bioinformatics and Genomics at University of North Carolina at Charlotte, who had performed similar analyses during prior outbreaks of Middle East Respiratory Syndrome (MERS) and influenza A viruses.
With these analyses, we were able to phylogenetically demonstrate that the viral sequences obtained from Zika as it crossed the Pacific Ocean (and subsequently radiated across northern Latin America and the Caribbean) were unlikely to be derived from the African strain of the virus, which was first described in 1947. The sequences from Brazilian and Pacific Island Zika isolates clearly clustered as a different strain, or clade, and were found to be more closely related to the historic Asian rather than the historic African isolates. This cluster is now usually referred to as the Asian-Pacific-American strain (also known as the Asian strain). Our review of the historic literature as well as subsequent genetic analyses suggests that these African and Asian genetic clusters (or clades) had circulated largely independently for many years prior to the first detection of the virus in the Zika forest of Uganda in 1947.
When analyzing the metadata associated with these various sequences, we realized that most of the Zika sequences isolated in Africa were from either different species of mosquitoes or from non-human primates, rather than from patients, whereas the available Asian sequence accessions were exactly the opposite; most Asian Zika sequences were sourced from human serum-isolated virus. This observation raised concerns about intrinsic selection bias and skewing within the available data, which we highlighted in our study
published in Cladistics
in December 2016, as the consistency and predictive value of our analyses were dependent on the data available at the time.
In early 2016, one of the most pressing questions surrounding Zika, was why the virus appeared to be causing a (previously unreported) birth defect syndrome in infants, as well as Guillain-Barré syndrome (GBS) in adults. We hypothesized that both syndromes may reflect a type of autoimmunity, which could have been triggered by changes in the Zika genome, and that we could quickly gain insight into this possibility using computational modeling tools. As we were already investigating the phylogenetics and evolution of the Zika virus, it was fairly straightforward to search these evolving viral sequences for changes in predicted Zika virus protein B cell epitopes, and then compare these evolving Zika epitopes to computationally predicted human protein epitope sequences.