Chasing poly(A) tails to define where genes end. 

Genes are encoded within DNA, but when genes are switched on, they are copied into a related molecule called RNA. For messenger RNAs, the end point of the RNA is defined in two steps: first by a cleavage reaction at what we call a poly(A) signal, then by the addition of a long stretch of Adenines which we call the poly(A) tail. The poly(A) tail helps to protect the RNA from decay, find its way around the cell and helps the mRNA to be translated into protein.

Knowing where a gene ends is hugely important: First, it defines what a gene codes for, affecting the function of that gene and how it can be controlled. Second, if poly(A) site selection doesn’t happen properly, it can cause disease or be a feature of disease (recent work with human cells revealed global changes in poly(A) site choice in cancer tissue). Third, knowing where a gene ends is central to understanding genomes. Sequencing genomes has become relatively routine, but defining and annotating what they encode remains a huge challenge. Cleavage and polyadenylation effectively partitions the genome, maintaining expression of neighbouring genes. Knowing where cleavage and polyadenylation happens helps us tell where genes start and stop.

We became interested in this area almost by accident: We were studying how plants control the time at which they flower, a fundamental developmental transition carefully controlled to ensure that flowering takes place in conditions favourable for reproductive success. Underpinning the quantitative control of flowering time is a complex network of gene regulation. Working with late flowering mutants of the model plant Arabidopsis thaliana we discovered that the genes disrupted in these mutants encoded proteins that functioned to control the site of cleavage and polyadenylation of RNA. We therefore discovered that a biological consequence of controlling where genes end is the time at which plants flower.

Since the realisation that the regulation of alternative poly(A) site choice is widespread, there is now intense interest in understanding how this is controlled.

We have used third generation direct RNA sequencing to define where 3’ end formation occurs genome-wide using a range of Arabidopsis mutants to uncover this control. This first direct sequencing of RNA from any plant species has helped us to better understand how the genome is organised and how it should be annotated. It has also helped us to discover RNAs that aren’t translated into protein – so called non-coding RNAs that have previously been missed or disregarded because they are difficult to identify or predict what they do. Thus our RNA sequencing work has led us into several exciting new areas.

We are passionate about discovering things that have never been found before. Working with Arabidopsis has helped us do that. As our experience in this area has grown we have become interested in translating our expertise and understanding of the impacts of poly(A) site choice on gene function and genome organisation to crop plants essential for food security and into biomedical contexts as well.