RNA-RNA interactions are key mechanisms through which noncoding RNA (ncRNA) regions exert biological functions. Computational prediction of RNA-RNA interactions is an essential method for detecting novel RNA-RNA interactions because their comprehensive detection by biological experimentation is still quite difficult. Many RNA-RNA interaction prediction tools have been developed, but they tend to produce many false positives. Accordingly, assessment of the statistical significance of computationally predicted interactions is an important task. However, there is no method to evaluate the statistical significance of RNA-RNA interactions that is applicable to interactions between two long RNA sequences. We developed a method to calculate the p-value for the minimal interaction energy between two long RNA sequences. The developed method depends on the fact that minimum interaction energies of RNA-RNA interactions between long RNAs follow a Gumbel distribution when repeat sequences in RNAs are masked. To show the usefulness of the developed method, we applied it to whole human 5′-untranslated region (UTR) and 3′-UTR sequences to detect novel 5′-UTR-3′-UTR interactions. We thus identified two significant 5′-UTR-3′-UTR interactions. Specifically, the human small proline-rich repeat protein 3 shows conserved 5′-UTR-3′-UTR interactions with some nucleotide variations preserving base pairings among primates. Our developed method enables us to detect statistically significant RNA-RNA interactions between long RNAs such as long ncRNAs. Statistical significance estimates help in identification of interactions for experimental validation and provide novel insights into the function of ncRNA regions.
ASJC Scopus subject areas