When I hear the term “global pandemic” I tend to not so fondly associate it with unrelenting COVID-19. Thanks to the ability of COVID-19 to take over both our respiratory systems and all media outlets, the everyday person is now an expert. Terms such as RNA virus, genome sequencing, PCR testing and viral evolution are household names. Yet, SARS-CoV-2, the virus responsible for COVID-19, is not the first RNA virus to inflict an enormous burden of disease on humanity. The original RNA viral pandemic belongs to the persevering 1981 Human Immunodeficiency Virus (HIV). While the manifestation of disease and method of infection vary between these two RNA viruses, many comparisons can be drawn between their basic genetics, mutation rate, and the way in which we can track their evolution to guide clinical prevention and treatment.

HIV is a virus that interferes with the human immune system by impairing its ability to fight infection and disease. It does so by infecting T-cells which are the mini fighters in our body. HIV leads to the development of Acquired Immunodeficiency Syndrome (AIDS) which currently has no cure, and in 2020 caused approximately 680,000 deaths (Joint United Nations Project on HIV AIDS, 2018). Not only is HIV highly prevalent and almost always fatal, but it has an incredible talent at evolving rapidly. This is thanks to the combined activity of three key biological factors :
- HIV is a retrovirus
HIV has two single strands of RNA which encode for the entire HIV genome of only nine genes! (German Advisory Committee Blood, 2016). As a retrovirus, HIV uses cellular machinery to convert its RNA into DNA (the stuff humans have) to infect human host cells. This process is called reverse transcription and it lacks a proof reading ability. As a result, errors, known as mutations are introduced into the DNA sequence (Andrews & Rowland-Jones, 2017). - HIV has a short generation time
HIV has a generation time of 1-2 days. A shorter generation time results in quicker evolution as mutations can accumulate in a population faster. - HIV undergoes recombination
Recombination is where HIV’s two RNA strands exchange genetic material with each other. This reshuffling of genetic material can provide unique combinations of different versions of genes (alleles), increasing diversity.
These three factors contribute to HIV’s extensive viral diversity which allow it to successfully evade the human immune system, evolve drug resistance, and bypass vaccinations (Andrews & Rowland-Jones, 2017). HIV’s rapid evolutionary rate has gained attention from various researchers such as Bertels et el. (2020) who performed a long term evolutionary experiment on HIV. In a long term evolution experiment, organisms are transferred for many generations in defined and reproducible conditions. This allows us to understand basic evolutionary principals or to see how an organism’s evolutionary history might help us infer its evolutionary future.
Tracing HIVs evolutionary history is a bit like looking at an enemy’s past tactics to better prepare ourselves for future battle. If we can work out where they came from and how well they did, then we can develop an airtight war plan against them. However, it becomes harder to prepare when the intel isn’t clear. Did they come from the north, or the north west? Do they have 100 or 1000 men? Should we bring guns or knives? Trying to determine the evolutionary history of a virus is much like trying to get a message across a radio with a weak signal. We know parts, but there’s interference and certain factors we can’t quite see to account for. One of these factors is parallel evolution. Parallel evolution is where two geographically distinct groups of organisms develop the same mutations or traits (Westram & Johannesson, 2016). As a result, these two groups can look more closely related than what they are, interfering with our ability to trace how they really evolved. Having an inaccurate evolutionary history, presents challenges when using it to inform our battle plan of clinical management and development of treatment.

Bertels et al. (2020) wanted to investigate the extent of parallel evolution in HIV in a constant environment. Over the course of 315 days, they serially transferred HIV to Human T-cell leukemia cell lines called MT-2 and MT-4. The HIV was left to replicate for a few days and then transferred to fresh MT-2 and MT-4 cell cultures twice a week. At the end of 315 days, 90 transfers had taken place (approximately 180 viral replications). At every 10th transfer, Illumina genome sequencing took place. Genome sequencing – there’s a term all of us COVID-19 armchair experts should be familiar with by now! If you haven’t been keeping up with the 1pm news conferences, genome sequencing is where the entire RNA sequence is essentially read out by a computer. To measure if a mutation occurs or not, the researchers look at nucleotides. These are structural components of RNA and can exist in one of four forms – Adenine (A), Uracil (U), Cytosine (C) and Guanine (G). Looking at a certain point along the genome, the researchers compare what the original nucleotide was, with the current nucleotide. If they find that a new nucleotide is present at a higher frequency than the original one, this is determined to be a “majority mutation”.
Two replicates of each T-cell line were used to see if the same mutations appear even though they are separated from one another i.e to observe parallel evolution. To increase the power of this experiment more replicates of each T-cell line could have been tested. This would have boosted confidence that mutations occurring in multiple cell lines were as a result of evolutionary advantage rather than appearing by chance.

Over the course of 315 days, 92 majority mutations appeared (figure 2A). Of these mutations, several were found to exist in more than one of the strains, and one mutation existed in all four (Figure 2C). So what does this mean? The same mutations occurring in geographically separated cultures is indicative of parallel evolution. Interestingly, the authors expected the accumulation of mutations in HIV to decelerate towards the end of the experiment as HIV reaches an “adaptive peak.” By adaptive peak, they mean that the HIV strains have evolved to do really great in their new environments! So great, that there is almost no more mutations that could occur to make them better.
I found this statement a bit strange considering that HIV can live for years within a host continuing to mutate and evolve. The expectation that mutations begin to cease after only 315 days (180 viral replications) seems to be arbitrary and assumes that the end of the experiment coincides with a natural evolutionary peak. There is nothing cited in this paper to support this assumption either. The HIV genome consist of approximately 9800 nucleotides. With 3 possible mutations at every base, there is a total of 29,400 mutations that are able to occur (German Advisory Committee Blood, 2016). A very similar long term evolution experiment conducted by Bons et al. (2020), found over the course of up to 600 generations only 3% of all possible mutations reach majority. Additionally, mutations are still continuing to accumulate towards the end of their experiment. Given this is over 3 times what Bertels et al. (2020) performed, we might say that 180 generations really isn’t long enough to see an adaptive peak!
To further investigate why the mutation accumulation rate did not decrease, they had a look at the fitness of each majority mutation (figure 2B). Fitness is how well a mutation survives in a particular environment and persists into the next generation. The more the mutation helps HIV – the better the fitness. How big of an impact the mutation has, depends on whether it changes the amino acid sequence. Amino acids are VERY important molecules that are the building blocks of proteins. Each set of three nucleotides encodes for one amino acid. If there is a change in the nucleotide, this has the potential to change the amino acid and thus change the protein. Here, each mutation is plotted as Synonymous (there is no change in the amino acid, so considered neutral or silent), Non – synonymous (there is a change in the amino acid and the mutation should impact the function of the protein) and Untranslated (the mutation occurs in a region that does not code for a protein).
The lines on the graph of figure 2B are linear regressions which demonstrate the relationship between the fitness of mutations across the time of the experiment. In short – the downwards sloping line tells us that as the experiment proceeds, the fitness gains of the mutations decrease. This is expected to occur for populations that adapt to new environments. As mutations occur, those that help the virus in its environment are selected to stick around. As this continues, the virus becomes better suited to the environment and so the increase in new beneficial mutations starts to decline.
So taking these two figures together – the mutations continue to accumulate towards the end of the experiment, but the fitness decreases. What does this tell us? The authors suggest that the continued increase in mutations is due to neutral mutations occurring as opposed to beneficial majority mutations – therefore, no fitness gains are expected.
The – very pretty – Venn Diagram in figure 2C gives a great visual demonstration of the mutations. We can see here how several mutations arise in more than one cell line. This is parallel evolution occurring. There is more overlap in replicates of the same cell line i.e MT2-1 with MT2-2 and MT4-1 with MT4-2 which we would expect as mutations help HIV in one particular environment. The more similar the environment, the more likely similar mutations will arise and persist.

As previously mentioned, recombination is another one of HIVs special immune evading tactics. Bertels et al. (2020) concluded that there was a frequent occurrence of recombination throughout their experiment. While Figure 3 may look a little confusing, take notice of the yellow and black line in the first box for MT2-1. We can see that initially, their frequency increases at the same rate. However, around transfer 20, this decouples and they increase at different rates. The likely explanation is recombination. As RNA strands exchange genetic material, some combinations of mutations perform better than others. It’s like having a really great tennis partner that helps you win a lot of games, but at half time you get swapped out. Your new partner (or new set of mutations) isn’t as great and so the frequency at which you win games decreases. Therefore recombination can both improve and impede HIV’s evolution by creating combinations of alleles that aid or hinder HIV.
So we want to know if parallel evolution influences evolutionary history. How do we figure this out? Bertels et al. (2020) used phylogenetic trees which are diagrams that depict the evolutionary relationship of one organism to another.
They constructed three different trees to show how evolution can look different for the exact same organisms.

- Phylogeny A
This phylogenetic tree is constructed from the experimental sequence data. At every 10th transfer, Illumina sequencing is performed for each strain. A consensus sequence is determined which represents the most common sequence among all the individual HIV viruses for a particular T-cell line. Using a computer program called PhyML, the phylogenetic tree is constructed based on maximum likelihood – a statistical analysis that ensures the outcome is the most probable.
- Phylogeny B
This phylogeny is considered to be the correct evolutionary history constructed from simulated sequence data. This tree represents the true set up of the experiment.
When comparing phylogenies A and B, we can see that there are some obvious differences between the inferred history (A) and the correct history (B). In A, the MT-4 line clusters together according to environment as opposed to the two distinct MT4-1 MT4-2 lines as in B. Why is this? Parallel evolution! The emergence of the same majority mutations in different cell lines creates sequences that appear more closely related than what they are in reality. This results in the inferred tree clustering the MT4 sequences together. On the other hand, the inferred tree looks similar to what we would expect if the MT4-1 and MT4-2 lines were not separate in the lab or had experienced some kind of cross contamination – oops!
- Phylogeny C
Phylogeny C includes minority mutations. Minority mutations are still changes in nucleotides, however these changes don’t become more frequent than the original nucleotide. The method in which this tree was constructed is again entirely different to either of the other trees.
Surprisingly, there is no commonly used software that can consider both majority and minority mutations. This meant that the authors had to come up with their own method of constructing a phylogeny to include these minority mutations. If you are anything like me, then understanding the intricacies of phylogenetic construction is not your thing! But bear with me as I try to explain this one. They started by calculating the difference in frequency between the original nucleotide and all three possible mutations (remember a nucleotide can be one of four things – A, U, C, G!). This gave them the genetic distance between each of the different cell lines at each transfer. From there they used “neighbor joining” which is an algorithm that constructs a tree based on these genetic differences. Confused? Me too!
The take home message from these three trees is much simpler to understand. The presence of parallel evolution and shared majority mutations causes the MT-4 lines to look more closely related than what they are (Tree A). Tree C shows than when we include minority mutations, we get a tree much more similar to the correct history of tree B. This indicates that the cell lines diverge early at the level of minority mutations, not at the level of majority mutations. Tree C’s similarity to tree B helps to somewhat absolve our Tree A contamination theory – phew!
This brings us to a nice conclusion that parallel evolution does indeed influence evolutionary reconstruction. But if anything, the chopping and changing between reconstruction methods due to the unavailability of appropriate software for all situations, enhances the opinion that we should be careful in how much weight we place on a phylogenetic tree. Like most things in life, using a different method is likely to give you a slightly different result.
Additionally, as discovered above, there is a certain amount of recombination occurring in this experiment which does influence phylogenetic reconstruction (Posada, 2000; Wang & Liu, 2016). While the authors discuss the occurrence of recombination in their evolutionary lines, they do not discuss how this may impact the phylogenetic inference.
The comparison of the three trees teaches us a lesson about the reliability of phylogenetic trees. While all three are constructed from the exact same strains, they tell a different story. The presence of parallel evolution in these strains depicts the limitations of phylogenetic trees when using them to infer evolutionary history. This is critical to acknowledge when deciphering our clinical battle plan. When treatment of deadly viruses is reliant on the knowledge that we gain from phylogenies, it is important to remember that in each instance there is shrouded information.

Bertels et al. (2020) didn’t stop investigating evolutionary predictability there! They suspected that looking at nucleotide diversity at different time points could forecast mutation rate. Nucleotide diversity is the difference in nucleotides at the same site across one or many sequences – in this case, the diversity that exists between different sequences of the same evolutionary T-cell line. They correlated the nucleotide diversity of all time points, with the number of majority mutations at transfer 90. They found a correlation between the two in that a higher nucleotide diversity, indicates a higher number of parallel majority mutations. How great does this sound!? It means we have a bit of a leg up on being able to predict how HIV might mutate and therefore send our army in to stop it in its tracks! Right? Well maybe not so right…and here’s why.
They found that the strongest correlation occurs at transfer 30 in the MT-2-1 line with a p value of 0.0003 and an R-squared of 0.79. While the stats on this particular location seem to be convincing, it is not a pattern followed by any other cell line. At most, there would be an expectation that the MT2-2 cell line would also have some similarity in correlation at this point. Interestingly, it is at transfer 30 that this line has the lowest correlation between nucleotide diversity and mutation. Additionally, all evolution lines have peaks and troughs in the R- squared value at different transfer points. So while the authors argue that earlier nucleotide diversity can be predictive of the number of majority mutations, we must ask – what classifies as an early predictive time point? If this changes from strain to strain, then it can hardly be a good predictor of evolution in the wider HIV population. How do we know when and where to look? Additionally, the length of the experiment is very short when compared to the typical occupancy of HIV within a host – which can last decades. Therefore, the ability to predict the accumulation of mutations at transfer 90 (180 generations) does not provide a necessarily clinically relevant prediction mechanism. The authors conclusion that a prediction can be performed based on nucleotide diversity doesn’t appear to be a widely applicable concept.
While the paper is centered around HIV, it is only too easy to draw comparisons to the trendier RNA virus – SARS CoV-2. The COVID-19 Pandemic is fast approaching its second birthday, yet it’s origin and evolutionary history still remain to be fully elucidated. Sequencing of genomes has been essential in tracing the pathway of infection as well as participating in the design of prevention strategies. However, this research by Bertels et al. (2020) demonstrates the inability of a phylogenetic tree to correctly identify the evolutionary path that an RNA virus has taken due to parallel evolution. Additionally, the genetics of an RNA virus are complex and act to interfere with our ability to form a battle plan. While the authors are confident in coming to conclusions surrounding the predictability of mutations, I would debate that this paper does more for the argument against prediction. If anything, it demonstrates the need to exercise caution when using an evolutionary history to predict the evolutionary future of a RNA virus.
So next time you are developing your viral pandemic expertise by listening to the 1pm news conference, stop and reflect on this. How might parallel evolution be influencing our own efforts in tracing and preventing COVID-19?
This blog post and included figures were based on the research article by Bertels, F., Leemann, C., Metzner, K. J., & Regoes, R. R. (2019). Parallel Evolution of HIV-1 in a Long-Term Experiment. Molecular Biology and Evolution, 36(11), 2400-2414. doi:10.1093/molbev/msz155
Joint United Nations Project on HIV AIDS. (2018). Global report: UNAIDS report on the global AIDS epidemic 2018. Retrieved from https://www.unaids.org/sites/default/files/media_
Andrews, S. M., & Rowland-Jones, S. (2017). Recent advances in understanding HIV evolution. F1000Research, 6, 597-597. doi:10.12688/f1000research.10876.1
Baum, D. (2008) Reading a Phylogenetic Tree: The Meaning of Monophyletic Groups. Nature Education 1(1):190
Bertels, F., Leemann, C., Metzner, K. J., & Regoes, R. R. (2019). Parallel Evolution of HIV-1 in a Long-Term Experiment. Molecular Biology and Evolution, 36(11), 2400-2414. doi:10.1093/molbev/msz155
Bons, E., Leemann, C., Metzner, K. J., & Regoes, R. R. (2021). Long-term experimental evolution of HIV-1 reveals effects of environment and mutational history. PLOS Biology, 18(12), e3001010. doi:10.1371/journal.pbio.3001010
Britannica, T. Editors of Encyclopaedia (2020, January 31). T cell. Encyclopedia Britannica. https://www.britannica.com/science/T-cell
Britannica, T. Editors of Encyclopaedia (2019, March 1). Retrovirus. Encyclopedia Britannica. https://www.britannica.com/science/retrovirus
German Advisory Committee Blood, S. A. o. P. T. b. B. (2016). Human Immunodeficiency Virus (HIV). Transfusion medicine and hemotherapy : offizielles Organ der Deutschen Gesellschaft fur Transfusionsmedizin und Immunhamatologie, 43(3), 203-222. doi:10.1159/000445852
Guindon S, Delsuc F, Dufayard JF, Gascuel O. Estimating maximum likelihood phylogenies with PhyML. Methods Mol Biol. 2009;537:113-37. doi: 10.1007/978-1-59745-251-9_6. PMID: 19378142.
Posada, D. (2000). How does recombination affect phylogeny estimation? Trends in Ecology & Evolution, 15(12), 489-490. doi:https://doi.org/10.1016/S0169-5347(00)02027-9
Wang, Z., & Liu, K. J. (2016). A performance study of the impact of recombination on species tree analysis. BMC Genomics, 17(10), 785. doi:10.1186/s12864-016-3104-5
Westram, A. M., & Johannesson, K. (2016). Parallel Speciation. In R. M. Kliman (Ed.), Encyclopedia of Evolutionary Biology (pp. 212-219). Oxford: Academic Press.