tatDB Page

We downloaded tRNA sequences from GtRNAdb [1], tRNAdb and mitoRNAdb [2]. We firstly ran Bowtie 1.0.1 (-v 0 -a) to align all tRNA genes to human genome (hg38) and excluded sequences that could not be mapped to nuclear or mitochondrial genome. Remaining tRNA genes were collapsed and reindexed for every isodecoder. Every tRNA isoform was then assigned with a unique ID in the form of Amino Acid (AA)_Anticodon followed by a three-digit index of the isoform and N or M indicating whether it was encoded in nuclear genome or mitochondrial genome. tRNAs which could be found on both nuclear and mitochondrial genome had -NM- in their IDs.

CLASH data for Ago1 in HEK293 cells [3] were obtained from the SRA database (SRR959751 to SRR959759). We used fastx_toolkit 0.0.13 [4] to remove barcode and adapter sequences and collapse identical reads. We used an in house developed aligner script to identify tRFs covering the 5' or 3' end of the reads, allowing no mismatches and giving preference to longer tRF isoforms [5, 6]. tRF isoform (≥16 nts) was identified as the guide sequence and the remainder of the hybrid read was considered the targeted sequence.Same script was used to align reads to miRNAs (miRBase [7]) and rRNAs (Refseq [8], Ensembl [9] and U13369) to identify miRNAs and rRNAs as targets of tRFs. Target sequences were searched against human transcriptome (Ensembl91) and genome (hg38) using BLAST (blastn, word size=7, evalue < 0.1 and default scoring matrix). Targets annotated as Ensembl transcripts or the introns of transcripts were kept. Except for mRNA, rRNA, miRNA and lincRNA which are the most abundant types of targets, other targets are classified in the "Other" group in the search panel. See Table 1 for details of target types annotation.
Given the small size of the mitochondrial genome, we observed a few cases when the target sequence was close to a tRNA gene. For completeness, we kept all such mitochondrial pairs.

Table 1. Annotation of Target Types

Abbreviation	Target Type
mRNA	Messenger RNA
miRNA	MicroRNA
rRNA	Ribosomal RNA
lincRNA	Long Intergenic Non-coding RNA
snRNA	Small Nuclear RNA
snoRNA	Small Nucleolar RNA
scRNA	Small Cytoplasmic RNA
misc_RNA	Miscellaneous Other RNA
PT	Processed Transcript
PG	Pseudogene

Figure2. Frequency of tRFs Targets

We downloaded PAR-CLIP datasets for Ago1 to Ago4 in HEK293 cells (SRR048973 to SRR048979) [10] from SRA database. We used Fastx_toolkit [4] to remove adaptors and Bowtie 1.0.1 [11] to align the reads to tRNA references in end-to-end mode, allowing one T to C mismatch and giving preference to perfect matches, as in the earlier tRF analysis. tRFs shorter than 16nt were excluded and their abundance were normalized to reads per million mapped to the genome (RPM). T>C conversion spots were firstly aligned to tRNAs and then mapped to CLASH tRFs.

tRFs which have their 5' border cleaved in the first five nucleotides of tRNAs are classified as tRF-5. If the 3' border of a tRF-5 is located in the anticodon loop of a tRNA molecule, it is considered as a 5' tRNA half and is called tRF-5i in our dataset. tRF-5 with its 3' border being upstream of the anticodon loop is called tRF-5p.

tRFs which have their 3' border cleaved in the last five nucleotides of mature tRNAs (including CCA addition) are classified as tRF-3. If the 5' border of a tRF-3 is located in the anticodon loop of a tRNA molecule, it is considered as a 3' tRNA half and is called tRF-3i in our dataset. tRF-3 with its 5' border being downstream of the anticodon loop is called tRF-3p.

Figure3. Naming convention of tRFs. (A). tRF-5p (green), tRF-3p (pink) and tRF-i (white). (B). tRF-5i (green) and tRF-3i (pink). (C). tRF-3t (pink)

tRF having its 3' end being cleaved in the 3' trailer sequence of a precusor tRNA (pre-tRNA) is called tRF-3t. The 5' border of a tRF-3t could be either within the tRNA gene or in the trailer. 3' trailer sequence is defined as a 40-nt extension downstream of a tRNA gene on the genome. tRNA genes with same body sequences could have different flanking sequences on the genome, therefore we considered all possible 3' trailers when searching tRF-3t. Alighments of tRF-3t to different 3' trailers of a given tRNA isoform are shown separately on the same page.

A unique hybrid represents a hybrid pair between a tRF isoform and a target sequence. Pairs, in which the tRF isoform is different or the target sequences varied by at least 1nt are considered as different unique hybrids. A target gene can therefore have multiple unique hybrids with the same tRF.

We firstly combined tRF isoforms of the same tRF type for every tRNA isoform. Their target sequences were combined and we kept only the longest sequence per target gene. tRFs with less than 5 target genes were ignored. We used MEME [12] to search for enriched motifs in targets (-mod zoops -minw 5 -minsites 5 -evt 0.05 -maxw 12, e-value < 0.01) and used FIMO [13] to match it back to tRF sequences (p-value < 0.001).

If the tRF covers the 5' end of the CLASH read followed by the target sequence on the 3' end, it is called a "forward pair". If the tRF ends at the last nucleotide of the reads and the target sequence is on the 5' end, it is called a "reverse pair" (Fig. 1).

Figure4. Minimum Free Energy (MFE) of Interactions of tRFs and Targets

Amino acid input needs to be the 3-letter standard abbreviation and T needs to be used instead of U in the input for anticodon. Genome indicates whether the tRNA gene is encoded in Nuclear (N) or Mitochondria (M). Some tRNAs found in both nuclear and mitochondrial genome are denoted as NM. If "Exact S/E" is checked, tRFs with exact start and end positions on the tRNA genes are returned. Otherwise, all tRFs that are included in the range of Start to End are shown. "tRF ID" in defined in Section 2 and partial input of the ID is allowed. If you search tRFs by sequences, tatDB will report all hits that cover the entire input sequence without mismatches.

These filters can be used together with the filters for tRFs. When "Exact name" is checked, interactions with exact input gene name are returned, otherwise, partial input for the gene name is allowed. Same as searching tRFs by sequences, mismatches are not allowed in the input for the target sequence. If you are not 100% sure about the target sequence, please input a shorter and partial sequence and tatDB will look for all target sequences that cover the input sequence.

Additional criteria can be specified in the Filters section for a query to find pairs with specific levels of support, based on interaction energy or frequency. Range of the MFE of the interaction of tRF and target sequence could be specified, input needs to be negative values. Direction of a tRF and target pair is defined in 3.3 which could be either "Forward" or "Reverse". When "Both" is seleted, tatDB will report interactions of tRFs and targets which are found in both forward and reverse pairs. Minimum number of CLASH reads supporting every unique hybrid could be specified. One can also look for genes targeted in multiple regions by giving high unique hybrids. By default, tRFs with motifs are returned and this can be disabled by unselecting the "Motif".

Based on the data distribution that we observed and in order to show high-confidence interactions, default filters (target type=mRNA, MFE ≤ -20, read counts ≥ 10 and motif=True) are selected on the search page and they can be removed by clicking on the "Clear" button. Please note that if tatDB is accessed from direct url link, these default filters will be automatically applied and shown on the page.

We provide a mechanism by which one can simply link a tRF or a tRF-gene pair to its tatDB page if the tRF sequence and gene name are known. Variables in the URL are highlighted. All target genes and other details of a given tRF can be found at: https://grigoriev-lab.camden.rutgers.edu/tatdb/trf_isoform.php?trf_seq=tRF_Sequence. All target sites of a tRF on a given gene can be found at: https://grigoriev-lab.camden.rutgers.edu/tatdb/trf _gene.php?trf_seq=tRF_Sequence&gene_name=Target_Gene_Name. Alignments of a specific type of tRFs to a given tRNA gene can be found at: https://grigoriev-lab.camden.rutgers.edu/tatdb/trf_type.php?trna_id=Formatted_tRNA_ID&trna_region=tRF_Type. Default filters will be selected for such links if no filter is specified in the url. These filter settings may be changed after landing on the tatDB page.

1. Chan, P.P. and T.M. Lowe, GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res, 2009. 37(Database issue): p. D93-7. 2. Juhling, F., et al., tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res, 2009. 37(Database issue): p. D159-62. 3. Helwak, A., et al., Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell, 2413. 153(3): p. 654-65. 4. http://hannonlab.cshl.edu/fastx_toolkit 5. Guan, L., S. Karaiskos, and A. Grigoriev, Inferring targeting modes of Argonaute-loaded tRNA fragments. RNA Biol, 2020. 17(8): p. 1070-1080. 6. Guan, L., V. Lam, and A. Grigoriev, Large-Scale Computational Discovery of Binding Motifs in tRNA Fragments. Front Mol Biosci, 2021. 8:647449. 7. http://www.mirbase.org 8. https://www.ncbi.nlm.nih.gov/refseq/ 9. http://dec2017.archive.ensembl.org/index.html 10. Hafner, M., et al., Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 2010. 141(1): p. 129-41. 11. Langmead, B., et al., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 2009. 10(3): p. R25. 12. Bailey, T.L., et al., MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res, 2009. 37(Web Server issue): p. W202-8. 13. Grant, C.E., T.L. Bailey, and W.S. Noble, FIMO: scanning for occurrences of a given motif. Bioinformatics, 2011. 27(7): p. 1017-8. 14. Kruger, J. and M. Rehmsmeier, RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic Acids Res, 2006. 34(Web Server issue): p. W451-4.

Table of Contents

1. Computational analyses of high-throughput sequencing dataset

1.1 tRNA sequences used to find tRFs

1.2 Analysis of CLASH data

1.3 Analysis of PAR-CLIP data

2. Naming convention of tRFs

2.1 tRF-5p and tRF-5i

2.2 tRF-3p and tRF-3i

2.3 tRF-3t

2.4 tRF-i

2.5 tRF ID

2.6 Intron excised (X)

3. Other terms

3.1 Unique hybrids

3.2 Read count

3.3 Motif

3.4 Direction of pairs

3.5 Minimum Free Energy (MFE)

4. Search

4.1 Search by tRFs

4.2 Search by targets

4.3 Other filters

4.4 Default filters

5. Database Interoperability