Trichosanthes anguina L. is variety of Trichosanthese cucumerina L.- evidence based on molecular phylogenetic analysis of internal transcribed spacer (ITS) sequences of nuclear ribosomal DNA

- Phylogenetic relationship among some species of Trichosanthes L ( Cucurbitaceae ) was assessed using internal transcribed spacer (ITS) sequences of nuclear ribosomal DNA to infer the taxonomic status of Trichosanthes anguina . The parsimony analysis of the entire ITS region resulted in 85 maximally parsimonious trees (MPTs) with a total length of 100 steps, a consistency index (CI) of 0.8800 (0. 8378 excluding uninformative characters), a homoplasy index (HI) of 0.1200 (0. 1622 excluding uninformative characters), rescaled consistency index (RC) of 0.7733 and a retention index (RI) of 0.8788. Our findings support the recognition of T. cucumerina var. anguina (L.) Haines as a variety of T. cucumerina .


Introduction
Genus Trichosanthes L. of tribe Trichosantheae, subtribe Trichosanthinae, family Cucurbitaceae include c. 100 species [1][2][3][4][5]. Principally it is an Asiatic genus. The geographic distribution of the genus indicates either an Indo-Malayan or Chinese Centre of origin [6]. Tichosanthes anguina (snake gourd or serpent gourd) is an usual cucurbits with long, white spackled fruits that actually in morphology resembles with snake and is widely grown as a vegetable in India and in the Orient. Roots and seeds are used to expel worms and to treat diarrhea and syphilis [7][8]. Haines, 1921Haines, -1924 recognized T. cucumerina L. var. cucumerina (L) Haines as a wild variant with short fruits and T. cucumerina var. anguina (L.) Haines as cultivated variant with elongated, snake-like fruits. Jeffrey, 1980 [6] also followed Haines, 1921Haines, -1924 treatment of T. anguina as a variety of T. cucumerina. Chakravarty, 1982 [10], however, treated these two varieties as two different species. This treatment has been followed in most of the Indian floras, monographs and research paper published so far. A perusal of literature reveals that taxonomic status of T. anguina is controversial [11][12][13][14][15][16][17]. Hence, this study was undertaken to compare sequences of the internal transcribed spacer regions of nrDNA in some species of Trichosanthes in order to infer taxonomic status of T. anguina.

Materials and Methods
Present study sampled 10 [17] and pollen morphology [18] a close relationship between tribe Trichosantheae and Luffeae have been suggested to which the genus Trichosanthes and Luffa belong. Therefore, the sequences of Luffa were used as outgroup. Total DNA was extracted using the DNeasy Plant Mini Kit (QIAGEN, Amsterdam, Netherlands). ITS sequences of nuclear ribosomal DNA were amplified using primers ITS1 (Forward 5'-GTCCACTGAACCTTATCATTTAG-3') and ITS4 (Reverse 5'-TCCTCCGCTTATTGATATGC-3') [19] via the polymerase chain reaction (PCR) using the AccuPower HF PCR PreMix (Bioneer, Daejeon, South Korea) in 20 µL volumes containing 2 µL of 10X buffer, 300 µM dNTPs, 1 µL of a 10 pM solution of each primer, 1 unit of HF DNA polymerase. The initial denaturation at 94°C for 5 min, and followed by 40 cycles of 94°C for 1 min, 49°C for 1 min, and 72°C for 1 min, with a final extension step of 72°C for 5 min. The PCR products were ligated into the pT7Blue cloning vector using Perfectly Blunt Cloning Kit (Novagen, Inc.) according to the manufacturer's instructions. Resulting recombinant plasmids were used to transform competent cells included in the kit. The transformation mix was incubated in 250 µl SOC medium for 1hour at 37°C on a rotary shaker, then plated on LB agar with 50 µg/mL ampicillin. Colonies were randomly selected and were put into PCR buffer. The PCR products were purified with the SolGent PCR Purification Kit-Ultra (SolGent, Daejeon, South Korea) prior to sequencing.
The purified fragments were directly sequenced using dye terminator chemistry following the manufacturer's protocol. Cycle sequencing was conducted using same primers used in amplification and BigDye vers. 3 reagents and an ABI PRISM 3730XL DNA Analyzer (Perkin-Elmer, Applied Biosystems) by following the manufacturer's instructions. Cycling conditions included an initial denaturing set at 94°C for 5 min., followed by 30 cycles of 96°C for 10 sec., 50°C for 5 sec., and 60°C for 4 minutes. Each sample was sequenced in the sense and antisense direction. The sequences were analyzed with ABI Sequence Analysis and ABI Sequence Navigator software (Perkin-Elmer/Applied Biosystems). Nucleotide sequences of both DNA strands were obtained and compared to ensure accuracy. Initially the sequence alignments were performed using ClustalX version 1.81 [20] with gap opening penalty = 10 and gap extension penalty = 3.0. Sequence alignments were subsequently adjusted manually using BioEdit [21] and SeaView [22]. Insertion-deletions (Indels) were scored as single characters when we had confidence in positional homology (Annexure). The boundaries between the ITS1, 5.8S, and ITS2 were determined by comparisons with earlier published sequences available at National Center for Biotechnology Information (NCBI) GenBank (www.ncbi.nlm.nih.gov). Gaps were treated as missing data in phylogenetic analyses. All sequences generated in the present study were deposited in GenBank and GenBank accession number included in Table 1. Parsimony analyses were performed with PAUP* 4.0b10 [23]. Heuristic searches were conducted using 10,000 random addition sequence replicates, holding 10 trees at each step, and with tree-bisection-reconnection (TBR) branch swapping, characters equally weighted, and gaps treated as missing data. Support for internal nodes was assessed using bootstrap analysis [24] of 1000 replicates with 100 random additions per replicate and holding 10 trees at each step. Phylogenetic and molecular evolutionary analyses (evolutionary divergence between sequences, the number of base substitutions per site from averaging evolutionary divergence over all sequence pairs, homogeneity test of substitution patterns between sequences, base composition bias difference between sequences, maximum composite likelihood estimate of the pattern of nucleotide substitution, codon-based test of neutrality for analysis between sequences, and Fisher's exact test of neutrality for sequence pairs) were conducted using MEGA version 4 [25][26][27][28]. The result was verified with BioNJ and Parsimony analysis (using SeaView) and Baseyan analysis (Mr Bayes). For Bayesian analysis, the best-fit model of nucleotide evolution was found using jModelTest v1.0.1 [29]. Bayesian posterior probabilities for the clades were obtained using Metropolis-coupled Markov chain Monte Carlo analysis as implemented in MrBayes. Two simultaneous independent runs with four Markov chains were done for 5 million generations, and trees were sampled every 100th generation, resulting in 50,000 trees. The first 10,000 trees were considered as the burn-in phase and discarded.

Sequence Characteristics-
The combined length of the entire ITS region (ITS1, 5.8S and ITS2) from taxa sampled in the present study ranged from 608-616 bp. The length of ITS1 region and %GC ranged from 191-201 bp and 61-62% respectively, the 5.8S gene was 163 bp, the length of ITS2 region and %GC ranged from 235-260 bp and 65-67% respectively ( Table 2). Data matrix has a total number of 632 characters of which 554 characters are constant, 25 characters are variable but parsimonyuninformative and 55 characters are parsimonyinformative. Insertions and deletions (indels) were necessary to align the sequences. Indels were ranged from 1 to 11 bp.

Phylogenetic analyses-
The parsimony analysis (using PAUP) of the entire ITS region resulted in 85 maximally parsimonious trees (MPTs) with a total length of 100 steps, a consistency index (CI) of 0.88 (0. 8378 excluding uninformative characters), a homoplasy index (HI) of 0.12 (0. 1622 excluding uninformative characters), rescaled consistency index (RC) of 0.7733 and a retention index (RI) of 0.8788. The bootstrap values above the line in bootstrap strict consensus tree (Fig. 1) show the relative support of each clade. The number of base substitutions per site from analysis between sequences (evolutionary divergence) is shown in Table 3. The number of base substitutions per site from averaging evolutionary divergence over all sequence pairs was found 0.048. Homogeneity test of substitution patterns between sequences: The probability of rejecting the null hypothesis that sequences have evolved with the same pattern of substitution, as judged from the extent of differences in base composition biases between sequences. A Monte Carlo test (1000 replicates) was used to estimate the P-values, which are shown in diagonal in the Table 4. Pvalues smaller than 0.05 are considered significant. The estimates of the disparity index per site are shown for each sequence pair above the diagonal.

Base composition bias difference between sequences:
The difference in base composition bias per site is shown in Table 5. Even when the substitution patterns are homogeneous among lineages, the compositional distance correlates with the number of differences between sequences.  (Table 6).
Codon-based test of neutrality for analysis between sequences: The probability of rejecting the null hypothesis of strict-neutrality (dN = dS) (Codon-based test of neutrality for analysis between sequences) is shown in Table 7 (below diagonal). Values of P less than 0.05 are considered significant at the 5% level. The test statistic (synonymous substitutions dNnonsynonymous substitutions dS) is shown above the diagonal.
Fisher's exact test of neutrality for sequence pairs: The probability (P) of rejecting the null hypothesis of strict-neutrality in favor of the alternative hypothesis of positive selection is shown for each sequence pair (Table 8). P values smaller than 0.05 are considered significant at the 5% level.
University, Vietnam, for his kind help during DNA sequencing. We are grateful to anonymous reviewers for constrictive comments.