Аннотация:Introduction
It is long known that indices of non-parametric bootstrapping significantly underestimate the probability of existence of a topological element, particularly, in cases when lineages analysed differ in rates of character substitution [1, 2]. Results of conventional bootstrap analysis often indicate low support for deeper nodes of the tree or suggest unresolved phylogenies. True topological elements present in some of equal trees are thus lost upon building a consensus. Here we present an easy method of estimating reliability of individual nodes of the tree and screening the equal tree space for reliable topological elements.
Some Premises
Parsimony is a fast and widely used approach of analysing character sets. However, it is also known to perform inconsistently under certain circumstances [3]. In particular, sharp disparities in rates of character substitution generate high contents of homoplasy in the data at the level of individual characters. If the amount of homoplasy is high enough, parsimony may graft taxa erroneously on the basis of homoplastic characters while searching for most parsimonious trees during additional sequence replicates. Ultimately, heuristic search may stall within a local tree island, which is isolated from the global optimum by branch swappings that can not be implemented given a current combination of taxa [4, 5]. Thus, reconstruction of a true topology is strongly related to the signal/noise ratio in the data.
The Method
We analysed an alignment containing complete SSU rDNA sequences from 36 nematode taxa and 6 metazoan groups. In nematodes sampled across the entire phylum the gene has clearly different rates of molecular evolution. Variable positions were gradually removed from the alignment according to the amount of change assigned to each position on the most parsimonious tree, which resulted in a series of 24 subalignments. Each subalignment was processed with parsimony (PAUP* 4.0beta10 package). Screening of equal trees over series of reconstructions revealed many topological elements, which also occur in the bootstrap consensus obtained on the basis of the initial alignment. However, composition of other nodes altered depending on the amount of variable positions removed. In order to assess reliability of an individual element, we computed values of the homoplasy index (HI) for each character assigned the synapomorphic status by the algorithm and screened for the least homoplastic ones. Analysis showed that HI values for characters, which support robust elements retained in the initial consensus are close to or equal zero. Search within subalignment topologies detected nodes not retained in the consensus and yet reconstructed on the basis of low homoplastic characters (HI close to zero). Some of them correspond to higher nematode taxa, which monophyly has been already substantiated with independent evidence (morphology or SSU rRNA secondary structure), and others were novel. However, some alternative topologies were supported by characters with almost equally high HI values. To estimate the total amount of homoplasy generated by these topologies, pairwise homoplasy distances were calculated for each tree inferred in a series of subsequent reconstructions. Homoplasy matrices thus obtained and visualised as “trees” give an idea of how much homoplasy is generated between taxa by a particular topology. Detailed analysis of homoplasy “trees” showed that lineages, which constitute robust clades in the initial consensus contain less amount of homoplasy with respect to each other and are thus situated close on tree representations of the matrices. This is by no means always the case for taxa in nodes reconstructed on the basis of highly homoplastic characters. Comparative analysis of actual phylogenies and the corresponding homoplasy “trees” showed that the composition of highly homoplastic nodes in most cases is in discord with the pattern of distribution of pairwise homoplasy distances, i.e. taxa joined by the algorithm generate high level of homoplasy in the node and thus do not cluster together in homoplasy “trees”. Alternatively, some of local combinations of taxa did correspond to the pattern of homoplasy distance distribution and, thus, are likely to represent natural monophyletic clades.
We used the approach described to screen the tree space pooled over the series of reconstructions for topological elements that meet the minimum homoplasy requirement and compiled a phylogeny, which was not found in neither of heuristic searches. In order to test this phylogenetic hypothesis, we constrained combinations of taxa, which either were supported by low homoplastic characters or were in accord with corresponding homoplasy distance distribution patterns and used these constraints in parsimony analysis of the initial alignment. Heuristic approach failed to find a tree, which would fully coincide with the compiled phylogeny (a total of 23 equal trees found). It was compared with the found trees by the Shimodaira-Hasegawa one-tailed test using RELL bootstrap and received a maximum likelihood score.