University of Dundee

EASTBIO: Prediction of specificity determining sites in proteins from human population genetic variation by deep learning

This 4 year PhD project is part of a competition funded by EASTBIO BBSRC Doctoral Training Partnership This opportunity is open to UK and EU nationals.

Applicants should apply by completing the EASTBIO application form (downloadable from the EASTBIO website) and e-mail to Candidates should also include their academic transcripts and ensure that they ask their referees to send completed references to Applicants may wish to explain their motivation for joining the EASTBIO training programme.

Our research has focused for more than 20 years on developing effective computational methods to predict the function, structure and specificity of proteins from the amino acid sequence. This experience is encapsulated in widely used BBSRC funded software tools which include the Jalview ( sequence analysis workbench that has over 70,000 regular users world-wide and JPred ( performs up to 250,000 predictions/month of secondary structure and other features from the amino acid sequence for scientists in laboratories in the UK and internationally. Together, Jalview and JPred have accumulated over 7,000 citations to the papers that describe them. The rapid advances in DNA sequencing technology over recent years have stimulated the large-scale sequencing of populations of single species. Thereis now publicly available data on variation in over 200,000 human individuals, human cancers, bacterial strains, major food crops (e.g. wheat and barley) and animals (e.g. cow). While most effort to date has focussed on exploiting these data to identify variants involved in genetic disease, the variation data provides a completely new resource to inform details of protein structure, function and interactions within a species. Recent work from our group (MacGowan et al, 2017) has demonstrated that variation data can identify key residues important in protein-ligand specificity and protein-protein interaction specificity in over 200 protein domain families.

This Ph.D. project will build on these findings to characterise the identified sites by a variety of techniques including molecular dynamics simulations to identify which are most likely to affect molecular function. The project will focus initially on repeat families since these provide the most statistically significant signals for specificity sitesand are a focus of the research of our co-supervisor Dr Ulrich Zachariae who is also an expert in MD simulation techniques. This project will train the student in software development and advanced bioinformatics research techniques including machine learning noSQL technology and statistics. On completion of the Ph.D. the student will be well prepared for a research career in bioinformatics, but also have excellent transferrable skills appropriate to careers in Big Data analytics or software engineering.