Recent work from our group has shown the value in analysing paralogous gene families to boost the functional signal in data from DNA sequencing human exomes and genomes at the population level (MacGowan et al, 2017). The signal is especially strong in families of protein repeats such as TPRs, Ankyrins, HEAT and Armadillo and our recent collaborative research has shown how population genetics data can be used to select variants least likely to perturb function (Llabrés et al, 2019). The project will extend these principles to a wider range of protein families important in disease.
This project will train the student in software development and advanced bioinformatics research techniques including machine learning noSQL technology and statistics. They will also gain experience of contemporary molecular dynamics simulation methods. On completion of the Ph.D. the student will be well prepared for a research career in bioinformatics, but also have excellent transferrable skills appropriate to careers in Big Data analytics or software engineering.
Stuart A MacGowan, Fábio Madeira, Thiago Britto Borges, Melanie S Schmittner, Christian Cole, and Geoffrey J Barton, (2017), “Human Missense Variation is Constrained by Domain Structure and Highlights Functional and Pathogenic Residues”, bioRxiv preprint, https://doi.org/10.1101/127050
Llabrés, Salomé, Tsenkov, M., I., MacGowan, S. A., Barton, G. J. and Zachariae, U. (2019), “Disease related single point mutations alter the global dynamics of a tetratricopeptide (TPR) alpha-solenoid domain”, Journal of Structural Biology, https://doi.org/10.1016/j.jsb.2019.107405