Using Drosophila natural variation to study the role of positive selection in cis-regulatory evolution and the genetic basis of a complex disease trait

He BZ  2012  The University of Chicago. 145; 3548238 

Abstract

In the first part of this thesis, I examined the role of positive selection in cis-regulatory evolution. In comparison with the coding regions, where the importance of positive selection in shaping natural variation patterns has been established by both theoretical and empirical work, the role of natural selection in cis-regulatory regions has been more controversial. On one hand, genome-wide scans of noncoding DNA pointed to strong signals of positive selection, particularly within 5’ and 3’ UTR regions, where regulatory elements are enriched. On the other, empirical observations of a fast turnover (lineage specific gain and loss) of transcription factor binding sites (TFBS) contrasts with striking functional conservation of other regulatory sequences, which has prompted many researchers to propose neutral evolution under functional constraint. However, a rigorous population genetics approach has not been applied to formally evaluate these and alternative hypotheses. In this study I specifically tested the alternative hypothesis of natural selection driving the turnover of TFBS, using Drosophila enhancers as an example. By combining a population genetic approach with a high-quality dataset of TFBS and a state-of-the-art microfluidics technology, I found that the patterns of divergence and polymorphism are not consistent with the neutral hypotheses. Instead they strongly suggested the action of positive selection both in the gain of new binding sites and also in their loss. Consistent with this finding is a nuanced, two-timescale view of regulatory evolution. Frequent and subtle changes in function can occur on a short timescale and drive adaptive changes, while constraints fundamental to developmental processes and genetic network interactions act as a centripetal force and assure functional stability of regulatory components and interactions across a longer timescale. This view is also supported by empirical findings of subtle yet significant differences in the expression patterns driven by orthologous enhancers, whose functions were previously considered unchanged.

The second part of my thesis explores a novel approach of using Drosophila natural variation to study the genetic architecture of human complex diseases. The question of identifying the polygenic basis for common human disorders have gained increasing attention, due both to the advances in technology that made genome wide association studies (GWAS) in human possible, and the rising incidence of common diseases that increasingly burden our societies. Hampering this effort, however, is the inability to resolve more basic questions about the types of mutations producing complex traits, their mechanism of action (and interaction), their frequencies in population and their magnitudes of effects. To overcome some of the limitations faced by human studies, such as a low mapping resolution and difficulty in performing functional analysis, we developed a fly model approach, in which we first constructed a model for a Mendelian disease trait, which was subsequently turned into a genetically complex trait by crossing the mutant line into a diverse genetic background (178 inbred lines derived from a wild Drosophila melanogaster population). Employing both traditional GWAS approaches and a novel extreme selection scheme, the aim was to identify both common and rare variants underlying the continuously variable disease trait, and to dissect their genetic and molecular effects. The fast decay of LD combined with complete genome sequences enabled us to narrow down the association peak to a 400bp block containing an insertion/deletion (indel) polymorphism in the intron region of the gene sfl. Experimental analysis established the functional link between sfl and the human mutant proinsulin induced neuro-degeneration phenotype. RNAi analysis of additional genes in the same pathway strongly suggested a previously unknown link between Heparan Sulfate Proteoglycan (HSPG) and cellular responses to misfolded proteins. Finally, by performing allelic specific expression analysis, we revealed the potential mechanism of the intronic variation, suggesting that changes in expression level of sfl may be the cause for phenotypic variation. (Abstract shortened by UMI.)