Table of Contents

1. Computational analyses of high-throughput sequencing dataset

1.1 tRNA sequences used to find tRFs
1.2 Analysis of CLASH data
1.3 Analysis of PAR-CLIP data

2. Naming convention of tRFs

2.1 tRF-5p and tRF-5i
2.2 tRF-3p and tRF-3i
2.3 tRF-3t
2.4 tRF-i
2.5 tRF ID
2.6 Intron excised (X)

3. Other terms

3.1 Unique hybrids
3.2 Read count
3.3 Motif
3.4 Direction of pairs
3.5 Minimum Free Energy (MFE)

4. Search

4.1 Search by tRFs
4.2 Search by targets
4.3 Other filters
4.4 Default filters

5. Database Interoperability



1. Computational analyses of high-throughput sequencing dataset

Figure1. Workflow for Data Collection/Analysis
1.1 tRNA sequences used to find tRFs
We downloaded tRNA sequences from GtRNAdb [1], tRNAdb and mitoRNAdb [2]. We firstly ran Bowtie 1.0.1 (-v 0 -a) to align all tRNA genes to human genome (hg38) and excluded sequences that could not be mapped to nuclear or mitochondrial genome. Remaining tRNA genes were collapsed and reindexed for every isodecoder. Every tRNA isoform was then assigned with a unique ID in the form of Amino Acid (AA)_Anticodon followed by a three-digit index of the isoform and N or M indicating whether it was encoded in nuclear genome or mitochondrial genome. tRNAs which could be found on both nuclear and mitochondrial genome had -NM- in their IDs. 
1.2 Analysis of CLASH data
CLASH data for Ago1 in HEK293 cells [3] were obtained from the SRA database (SRR959751 to SRR959759). We used fastx_toolkit 0.0.13 [4] to remove barcode and adapter sequences and collapse identical reads. We used an in house developed aligner script to identify tRFs covering the 5' or 3' end of the reads, allowing no mismatches and giving preference to longer tRF isoforms [5, 6]. tRF isoform (≥16 nts) was identified as the guide sequence and the remainder of the hybrid read was considered the targeted sequence.Same script was used to align reads to miRNAs (miRBase [7]) and rRNAs (Refseq [8], Ensembl [9] and U13369) to identify miRNAs and rRNAs as targets of tRFs. Target sequences were searched against human transcriptome (Ensembl91) and genome (hg38) using BLAST (blastn, word size=7, evalue < 0.1 and default scoring matrix). Targets annotated as Ensembl transcripts or the introns of transcripts were kept. Except for mRNA, rRNA, miRNA and lincRNA which are the most abundant types of targets, other targets are classified in the "Other" group in the search panel. See Table 1 for details of target types annotation.
Given the small size of the mitochondrial genome, we observed a few cases when the target sequence was close to a tRNA gene. For completeness, we kept all such mitochondrial pairs.
Table 1. Annotation of Target Types
AbbreviationTarget Type
mRNAMessenger RNA
miRNAMicroRNA
rRNARibosomal RNA
lincRNALong Intergenic Non-coding RNA
snRNASmall Nuclear RNA
snoRNASmall Nucleolar RNA
scRNASmall Cytoplasmic RNA
misc_RNAMiscellaneous Other RNA
PTProcessed Transcript
PGPseudogene
Figure2. Frequency of tRFs Targets
1.3 Analysis of PAR-CLIP data
We downloaded PAR-CLIP datasets for Ago1 to Ago4 in HEK293 cells (SRR048973 to SRR048979) [10] from SRA database. We used Fastx_toolkit [4] to remove adaptors and Bowtie 1.0.1 [11] to align the reads to tRNA references in end-to-end mode, allowing one T to C mismatch and giving preference to perfect matches, as in the earlier tRF analysis. tRFs shorter than 16nt were excluded and their abundance were normalized to reads per million mapped to the genome (RPM). T>C conversion spots were firstly aligned to tRNAs and then mapped to CLASH tRFs.

2. Naming convention of tRFs

2.1 tRF-5p and tRF-5i
tRFs which have their 5' border cleaved in the first five nucleotides of tRNAs are classified as tRF-5. If the 3' border of a tRF-5 is located in the anticodon loop of a tRNA molecule, it is considered as a 5' tRNA half and is called tRF-5i in our dataset. tRF-5 with its 3' border being upstream of the anticodon loop is called tRF-5p. 
2.2 tRF-3p and tRF-3i
tRFs which have their 3' border cleaved in the last five nucleotides of mature tRNAs (including CCA addition) are classified as tRF-3. If the 5' border of a tRF-3 is located in the anticodon loop of a tRNA molecule, it is considered as a 3' tRNA half and is called tRF-3i in our dataset. tRF-3 with its 5' border being downstream of the anticodon loop is called tRF-3p. 
trf_5p_3p trf_5i_3i trf_3t
Figure3. Naming convention of tRFs. (A). tRF-5p (green), tRF-3p (pink) and tRF-i (white). (B). tRF-5i (green) and tRF-3i (pink). (C). tRF-3t (pink)
2.3 tRF-3t
tRF having its 3' end being cleaved in the 3' trailer sequence of a precusor tRNA (pre-tRNA) is called tRF-3t. The 5' border of a tRF-3t could be either within the tRNA gene or in the trailer. 3' trailer sequence is defined as a 40-nt extension downstream of a tRNA gene on the genome. tRNA genes with same body sequences could have different flanking sequences on the genome, therefore we considered all possible 3' trailers when searching tRF-3t. Alighments of tRF-3t to different 3' trailers of a given tRNA isoform are shown separately on the same page. 
2.4 tRF-i
tRFs are called tRF-i if their 5' and 3' end are in the internal regions (excluding first and last 5 nts) of tRNAs. 
2.5 tRF ID
Each tRF isoform is assigned with a human readable ID in the form of AA_anticodon-Isoform-Genome-Type-Start-End. 
2.6 Intron excised (X)
If the tRF is formed from mature tRNA with intron being excised, an "X" is added to its ID in the form of AA_anticodon-Isoform-Genome-TypeX-Start-End. 

3. Other terms

3.1 Unique hybrids
A unique hybrid represents a hybrid pair between a tRF isoform and a target sequence. Pairs, in which the tRF isoform is different or the target sequences varied by at least 1nt are considered as different unique hybrids. A target gene can therefore have multiple unique hybrids with the same tRF. 
3.2 Read count
All sequenced reads in CLASH are used for calculating read count. 
3.3 Motif
We firstly combined tRF isoforms of the same tRF type for every tRNA isoform. Their target sequences were combined and we kept only the longest sequence per target gene. tRFs with less than 5 target genes were ignored. We used MEME [12] to search for enriched motifs in targets (-mod zoops -minw 5 -minsites 5 -evt 0.05 -maxw 12, e-value < 0.01) and used FIMO [13] to match it back to tRF sequences (p-value < 0.001). 
3.4 Direction of pairs
If the tRF covers the 5' end of the CLASH read followed by the target sequence on the 3' end, it is called a "forward pair". If the tRF ends at the last nucleotide of the reads and the target sequence is on the 5' end, it is called a "reverse pair" (Fig. 1). 
3.5 Minimum Free Energy (MFE)
We used RNAhybrid 2.1.2 [14] to predict the secondary structure of each tRF-target interaction and calculate the minimum free energy (MFE) of hybridization between the tRF and target. 
Figure4. Minimum Free Energy (MFE) of Interactions of tRFs and Targets

4. Search

We provide multiple filters to query tRFs and their targets.
4.1 Search by tRFs
Amino acid input needs to be the 3-letter standard abbreviation and T needs to be used instead of U in the input for anticodon. Genome indicates whether the tRNA gene is encoded in Nuclear (N) or Mitochondria (M). Some tRNAs found in both nuclear and mitochondrial genome are denoted as NM. If "Exact S/E" is checked, tRFs with exact start and end positions on the tRNA genes are returned. Otherwise, all tRFs that are included in the range of Start to End are shown. "tRF ID" in defined in Section 2 and partial input of the ID is allowed. If you search tRFs by sequences, tatDB will report all hits that cover the entire input sequence without mismatches. 
4.2 Search by targets
These filters can be used together with the filters for tRFs. When "Exact name" is checked, interactions with exact input gene name are returned, otherwise, partial input for the gene name is allowed. Same as searching tRFs by sequences, mismatches are not allowed in the input for the target sequence. If you are not 100% sure about the target sequence, please input a shorter and partial sequence and tatDB will look for all target sequences that cover the input sequence. 
4.3 Other filters
Additional criteria can be specified in the Filters section for a query to find pairs with specific levels of support, based on interaction energy or frequency. Range of the MFE of the interaction of tRF and target sequence could be specified, input needs to be negative values. Direction of a tRF and target pair is defined in 3.3 which could be either "Forward" or "Reverse". When "Both" is seleted, tatDB will report interactions of tRFs and targets which are found in both forward and reverse pairs. Minimum number of CLASH reads supporting every unique hybrid could be specified. One can also look for genes targeted in multiple regions by giving high unique hybrids. By default, tRFs with motifs are returned and this can be disabled by unselecting the "Motif". 
4.4 Default filters
Based on the data distribution that we observed and in order to show high-confidence interactions, default filters (target type=mRNA, MFE ≤ -20, read counts ≥ 10 and motif=True) are selected on the search page and they can be removed by clicking on the "Clear" button. Please note that if tatDB is accessed from direct url link, these default filters will be automatically applied and shown on the page. 

5. Database Interoperability

We provide a mechanism by which one can simply link a tRF or a tRF-gene pair to its tatDB page if the tRF sequence and gene name are known. Variables in the URL are highlighted.

All target genes and other details of a given tRF can be found at: https://grigoriev-lab.camden.rutgers.edu/tatdb/trf_isoform.php?trf_seq=tRF_Sequence.

All target sites of a tRF on a given gene can be found at: https://grigoriev-lab.camden.rutgers.edu/tatdb/trf
_gene.php?trf_seq=tRF_Sequence&gene_name=Target_Gene_Name.

Alignments of a specific type of tRFs to a given tRNA gene can be found at: https://grigoriev-lab.camden.rutgers.edu/tatdb/trf_type.php?trna_id=Formatted_tRNA_ID&trna_region=tRF_Type.

Default filters will be selected for such links if no filter is specified in the url. These filter settings may be changed after landing on the tatDB page.


Reference

1.	Chan, P.P. and T.M. Lowe, GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res, 2009. 37(Database issue): p. D93-7.
2.	Juhling, F., et al., tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res, 2009. 37(Database issue): p. D159-62.
3.	Helwak, A., et al., Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell, 2413. 153(3): p. 654-65.
4.	http://hannonlab.cshl.edu/fastx_toolkit
5.	Guan, L., S. Karaiskos, and A. Grigoriev, Inferring targeting modes of Argonaute-loaded tRNA fragments. RNA Biol, 2020. 17(8): p. 1070-1080.
6.	Guan, L., V. Lam, and A. Grigoriev, Large-Scale Computational Discovery of Binding Motifs in tRNA Fragments. Front Mol Biosci, 2021. 8:647449.
7.	http://www.mirbase.org
8. 	https://www.ncbi.nlm.nih.gov/refseq/
9. 	http://dec2017.archive.ensembl.org/index.html
10.	Hafner, M., et al., Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 2010. 141(1): p. 129-41.
11.	Langmead, B., et al., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 2009. 10(3): p. R25.
12.	Bailey, T.L., et al., MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res, 2009. 37(Web Server issue): p. W202-8.
13.	Grant, C.E., T.L. Bailey, and W.S. Noble, FIMO: scanning for occurrences of a given motif. Bioinformatics, 2011. 27(7): p. 1017-8.
14.	Kruger, J. and M. Rehmsmeier, RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Res, 2006. 34(Web Server issue): p. W451-4.