INVESTIGATING THE SEX SPECIFIC RATES OF REPLICATION DRIVEN MU- TATIONS IN HUMANS USING GENOME-WIDE INDEL MUTATIONS IN HU- MAN ALU REPEATS

Background: To understand the tempo and mode of evolution at the nucleotide level it is important to estimate the spontaneous rate of each mutation type. Many molecular evolutionary studies have concluded that due to the greater number of cell divisions in the male germline than in the female germline, replication-based nucleotide substitutions in primates occur more frequently in males than in females. However, a potential sex bias in mutations other than nucleotide substitutions has not been extensively investigated. The human Alu repeats provide an ideal mechanism to further investigate the degree of replication-based indel (insertion and deletion) mutations in the human chromosomes. Results: We analyze patterns of small indel mutations (1bp) in the middle poly (A) track of Alu repeats across the entire human genome in order to elucidate the processes of mutation and fixation. This analysis adds further support for the accumulation of more mutations in the Y chromosome compared to the X chromosome. We report the male-to-female mutation ratio α in humans as ~1.5. Conclusion: Our results suggest that although small indel mutation may be primarily replication driven (as previous studies suggest) the observed value of α does not exceed the threshold necessary to conclude that contributions of replication independent factors are negligible. We also report that, with small indels (1bp) deletions outnumber insertion events. This relative excess of deletions may be an important parameter in the long-term evolution of genomic size. Keywords– ALU repeats, male-to-female mutation ratio, insertions, deletions, indels


Background
In humans, men have more germ cell divisions than women. The germ-lines are maintained separately from the somatic cells; therefore, the mutations in the gametes can arise only from within the germ cells. If mutations arise primarily from DNA replication errors during germ cell divisions, the mutation rates in males should be higher in males than that in females. Assuming mutations to be the source of genetic variations, a male bias in mutation rates would suggest that evolution is 'male biased'. Even though a number of studies have detected a maledriven evolution in mammals, birds and plants, a precise value of the male-to-female mutation ratio, (α), in humans is incomplete. Knowing the accurate value of human α is critical in understanding whether germline mutations are primarily caused by imperfectly copied DNA during replication or by primarily environmental factors. With many more rounds of cell division per generation, males accumulate more mutations. In primates, males undergo two-to-six times more germline cell divisions than females [3]. If mutations originate primarily due to errors in replication, then the male-to-female mutation rates (α) should be similar to the male-to-female ratio of germline cell division (c). If the observed value of α is smaller than c then the role of replication independent factors in generating mutations is not negligible. Published molecular evolutionary studies have concluded that the nucleotide substitution rates are higher in males than among females [9,17]. The Y chromosome is transmitted only through the male germ line because it is carried only by males; the X chromosome is transmitted more often through the female germline (because X spends 1/3 of its evolutionary time in males and 2/3 of its time in females) while the autosomes are transmitted equally in the male and female germline. Thus the male-to-female mutation rate ratio, α, can be determined by comparing the mutation rates among the X chromosome, the Y chromosome, and the autosomes [21]. A value of α less than one provides evidence that the mutations under study are selectively neutral (w.r.t. errors due to replication). A value of α between one and the ratio of germline cell division (c) would provide evidence indicating a possible male bias and also the presence of replication-independent factors for the mutations under study. The reported value of International Journal of Bioinformatics Research ISSN: 0975-3087, E-ISSN: 0975-9115, Vol. 3, Issue 1, 2011 germline cell division in humans is 6 (c = 6) [12]. A value of α greater than c provides evidence confirming the important role of replication errors in the generation of mutations. A value of α much greater than c might imply that errors in DNA replication during germ-cell division are the primary source of mutation and that replication-independent mutagenic factors such as methylation and oxygen radicals play lesser roles [33]. Wide range values is reported for human α in the current literature. Studies that compare the nucleotide substitution rates at homologous regions in primate genes between the sex chromosomes and the autosomes, have reported the value for α as ~5 [11,33]. When large regions (38.6 kb) with no known genes from the X and Y chromosomes were compared in humans, the value of α reported was 1.7 (95% confidence interval 1.15 -2.87) in primates [2]. A genome wide analysis of Long Interspersed Nuclear Elements (LINES) from the initial sequence of the human genome reported α as ~2 [16]. All possible homologous comparisons between chimpanzee and human chromosomes reported α as ~3 [7]. When noncoding fragment on Y of about 10.4 kilobases (kb) and a homologous region on chromosome 3 in humans, greater apes, and lesser apes were compared, the estimated α was ~5 [18]. Hence, there is compelling evidence that the mutation rate for nucleotide substitution is higher amongst males than among females; however the precise extent of male point mutations remains an issue of debate. Several reasons can be attributed for the variation in the reported α. Many investigations use homologous genes or strictly sex-linked sequences to calculate α [3,11,33]. Selection could have skewed sequence evolution in the introns and exons thus rendering the investigation to be biased. When sequences across species are compared to calculate α, the pairs under study might lie within chromosomal regions with substantially divergent nucleotide sequences which might skew the result. Also, when closely related sequences are compared, the reported α could be underestimated due to preexisting polymorphisms. The variation in the reported values of α may be in part attributed to the small size of samples used in the various studies. Interestingly, most of the researches investigating male bias have analyzed point mutations only. While nucleotide substitution models have been studied extensively other mutations like indels have largely been treated as uninformative events. Thus, investigating whether insertions and deletions (indels) occur predominantly in males compared to females provides new insights on the widely accepted male driven evolution hypothesis. For humans knowing the extent of male bias in humans is of interest to evolutionary biologists. A commonly observed replication error is the replication slippage, which occurs at the repetitive sequences when the new strand mispairs with the template strand. Mononucleotide runs are wellknown hot spots for frame shift mutations, with DNA polymerase slippage typically resulting in loss or gain of one or a few nucleotides. Several studies have reported that replication slippage is responsible for many (1bp) small indels [24,34]. Deletions are generated when the replication complex skips across a number of nucleotides and fails to replicate them, whereas insertions are formed when the same region is mistakenly re-replicated. The replication driven origins of small indels in humans is supported by the study of potential indel mutation mechanisms including misalignment of short direct repeats during DNA replication and excision repairmediated resolution of short inverted repeats [4]. The formation of indels is related to the nucleotidesequence features in which they occur, such as the occurrence of repetitive motifs. Hence, it is necessary to investigate the male-to-female mutation rate using repeat sequences that harbor repetitive motifs are ancestrally related (that have accumulated indel mutations over time).

Fig. 1-A Typical Alu element structure
A major category of non-coding repetitive DNA within all mammalian genomes studied to date is the Short Interspersed Nuclear Elements (SINEs) that account for as much as 10% of all genomic sequence. Within the human genome, there are approximately one million copies of the Alu family of SINEs alone. Alus are 280bp long sequences with no known functionality [25]. Alus require forming of an RNA transcript that must then be reverse transcribed and inserted into a new location in the genome [6]. Thus Alus are believed to have colonized the genome by a 'copy and paste' mechanism [10] and have actively copied and pasted themselves in the genome at different time periods. Interestingly, there are no known mechanisms that specifically remove Alu elements from the genome [29] and Copyright © 2011, Bioinfo Publications hence Alus can be used as effective fossil records. Alus have bypassed mutational inactivation, negative selection and/or putative host defense mechanisms that could have limited their expansion [26]. Alu elements are therefore a rich source of interand intra-species primate genomic variation [1,27,31,32]. As shown in Figure 1, the Alu element is a fusion of two free Alu monomers, the free left Alu monomer (FLAM) and the free right Alu monomer (FRAM) [26]. The two monomers are linked by a ~ 16 base pair (bp) poly (A) region. This middle poly (A) track in Alus provides an ideal mechanism to further investigate the degree of replicationbased indel (insertion and deletion) mutations in the human chromosomes. In a recent study on indels across the human genome, the majority of single base pair indels were reported as A:T and T:A base pairs, and these two classes together accounted for 84 % of the single base pair indels recorded [20]. Also, the middle poly (A) rich region is free from CpG dinucleotides and its phylogenetic analysis shall avoid chances of spurious variations.
In this study we provide a large scale genetic analysis of Alu elements found in the human genome. Analysis of indel patterns in the poly (A) track of the Alu elements found in the autosomes and the sexchromosomes provides an unbiased investigation in calculating α for humans. It allows analysis of large numbers of sequences throughout the genome since it is found on all chromosomes in numbers sufficient for a rigorous statistical analysis. In nonfunctional sequences the rate of small indel mutations (replication driven mutations) should equal to the rate of mutation, hence the indels accumulated in Alu elements found on the Y-chromosomes shall constitute the mutations of paternal origin. Likewise, the number of indels accumulated on the Xchromosomes shall provide us with the mutations of maternal origin. The indels on the Alu elements that are found on the remaining 22 autosomes (non-sexbased chromosomes) shall provide us with a statistical baseline. This data is used to calculate the male-to-female mutation rate ratio (α).

Results
Number of Alu elements found in the human genome. Table I shows the result of searching the entire human genome for Alu elements. 436562 Alu elements in the 22 non sex chromosomes (Autosomes), 6624 Alu elements in the X-chromosome and 3628 Alu elements in the Y-chromosome were recorded for analysis. Imperfectly copied Alus during recombination were avoided in the search. Only the Alu elements with the middle poly (A) track were recorded and analyzed. A total of 7099741 nucleotides in the Autosomes, 107425 nucleotides in the X-chromosome and 59320 nucleotides in the Ychromosome (all constituting the middle poly (A) regions of the detected Alus) were reported. As shown in Table I

Insertion and Deletion events
After extracting information about the number of insertions, deletions, and length of middle poly (A) of each Alu element reported in the data set, the rate ratios are calculated using the three different methods shown below. As shown below the rate ratios Y/X are calculated each using only insertion events, deletion events and both insertion and deletion (Indels) events. The values for percentage indel events were obtained from Table I. Similarly, rate ratios were calculated for Y/A and A/X as shown in Table II The male-to-female mutation rate ratio (α) Having estimated the rate ratios in the Autosomes (A), X chromosome (X) and the Y chromosome (Y), the male-to-female mutation rate ratios are calculated using the simple model of mutation frequencies proposed by Miyata T [21].
where R is the rate ratio of the mutations in Autosomes and the X-chromosome where R is the rate ratio of the mutations in Y-chromosome and the Autosomes ; where R is the rate ratio of the mutations in Y-chromosome and the X-chromosome The calculated values for the male-to-female mutation rate ratio (α) are shown in Table II. We report the αY/X using combined (both insertion and dele-tion) indel events (shown in bold in Table II) as our analyzed male-to-female mutation ratio α in humans.

Discussion
The magnitude of the sex ratio of mutation rate has been a controversial issue, particularly in humans. The observations presented here are a result of investigations on only deletion and insertion mutations as point mutations have a different mechanism of mutagenesis. Because mutations in general and indels in particular are very rare, they are often difficult to measure with precision in a laboratory setting. A common alternative approach is to study substitutions in non-coding DNA. Given their evolutionary history and dearth of functionality, Alus offer a nearly ideal substrate for estimation of mutation rates in humans. Additionally, Alu repeats based results utilize information gathered over a large number of sites and from the accumulation of mutations over long evolutionary times. Since the α estimated for indel events from the three chromosomal comparisons (αA/X , αY/A and αY/X ) are similar (as shown in Table II) it can be inferred that differences between indel rates in the male and female germlines may be the dominant factor influencing the rate of DNA sequence evolution in humans. Thus, the time DNA sequences spend in the male and female germline determines their overall evolutionary rate. Our estimate of α ~ 1.5 is based on the complete, diverse set of germline indel mutations that accumulated within the large, selectively neutral genomic Alu sequences. Our findings propose that indel rates in human males are only mildly higher than in females. Moreover, our findings suggest that sexual differences in indel rates are far less evident than the striking asymmetry observed in the number of cell divisions reported in humans. From the estimated value of α, it can be inferred that the errors in mitotic DNA replication and repair account for only a minority of germline indels in the human genome. As noted by Bohossian HB et al. [2] perhaps DNA replication and repair are unusually accurate in spermatogonial stem cells, which account for most of the excess cell divisions in the male germline. Our findings reflect a difference in numbers of genomic replications coupled to cell divisions per generation in males and females. Our results thus suggest a re-investigation of the model that human mutation rates are directly proportional to the number of cell divisions (c). The value of α in human can be much smaller than c because the generation time in humans is much longer than the 25 years that was used in estimating the value of c for humans [12]. Also, the data for Copyright © 2011, Bioinfo Publications calculating the number of germ-cell divisions in humans is insufficient to provide a reliable estimate for the value of c [17]. If recombination is mutagenic then the value of α can be underestimated from a comparison of Alu elements in the autosomes and the sex chromosomes because recombination is absent in the Y chromosome and the recombination rate is lower in the X chromosome than in the autosomes. Another possible reason for the significantly low value of α could be the specially reduced mutation rate in the X chromosome that may have been selected to compensate for its hemizygous state in males [19]. Even substantial variation in mutational rates between chromosomes due to regional differences in GC content, DNA repair, nuclear localization and metabolism may have skewed our results. Finally, it can also be hypothesized that the difference in mutational bias observed is simply from the DNA repair errors in the sperm (because of the higher levels of DNA damage) assuming that the errors in replication are similar for both sex chromosomes. It therefore remains to be demonstrated that other mechanisms do play a role in the observed differences in mutational rates between the sex chromosomes.
Many studies have indicated that indel mutations are related to recombination [5,34]. Also, small in-dels causing some human genetic diseases were found to originate with the same frequency in males and females [28]. If recombination were to main source of small indel mutations we would expect to see a lower X / Autosomes indel rate ratio. Thus our study supports a view that small and large indels originate by different molecular mechanisms. Sequence comparison between ~ 6kb on the X chromosome and ~ 5kb on the Y chromosome in primates indicated similar indel frequencies, suggesting no sex bias for large (> 1bp) indels in primates [34]. Interestingly, the most parsimonious explanation for our results is that most 1bp indels occur during DNA replication and/or during DNA repair after DNA replication. This is consistent with the hypothesis that DNA replication errors are the major source of small indels. The reason for substantial variations in primate genome sizes is currently unknown. Indel polymorphisms are of great interest because they can alter human phenotypes. It has been suggested that DNA loss caused by biases in small insertions and deletions (indels) can be a determinant of genomic size [24]. Our findings add further support to the mutational equilibrium model shown in Fig. 2-(proposed by Petrov DA [24]). Fig. 2-The Mutational Equilibrium model [24].
The model hypotheses that for small genome sizes the rate of genome size increase is higher than that of DNA loss resulting in genome size growth. However, since the rate of DNA loss through small deletions is shown to grow linearly and thus faster than the rate of DNA gain, for very large genome sizes DNA loss is faster than DNA growth. Therefore, there exists a stable equilibrium at a finite value of genome size (shown as G in Fig. 2-). In our analysis, higher prevalence of indels on the Y chromosome compared with X and autosomes are observed for both insertions and deletions. Interestingly, the male-to-female ratio is higher for insertions (αY/X = 1.9262) than for deletions (αY/X = 1.4912). Although we cannot rule out coincidence, deletions seem to be a major phenomenon in the generation of sequence diversity. Our results indicate that the mutational pressure at the level of small indels is biased toward DNA loss. If the preferential fixation of small deletions over small insertions is not prevented by selection then all genomes are constantly losing DNA through small indels. We conclude that although small 1bp indel mutations may be primarily replication driven (as previous studies suggest) the observed value of α does not exceed the threshold necessary to conclude that contributions of replication independent factors are negligible. We also report that, with small indels (1bp) deletions outnumber insertion events. This relative excess of deletions may be an important parameter in the long-term evolution of genomic size.