Misincorporation tRNA sequencing

Details for misincorporation tRNA sequencing
Full name	misincorporation tRNA sequencing
Summary	Uniform sequence coverage of tRNA pools, while retaining modification signatures, by identifying conditions that enable efficient RT readthrough together with a comprehensive and user-friendly computational toolkit.
Key findings	dramatic heterogeneity of tRNA isodecoder pools across cell lines and species
Sequencing strategy	RNA-Seq
Raw data	https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152621
Software/data repository	https://github.com/nedialkova-lab/mim-tRNAseq; https://mim-trnaseq.readthedocs.io/en/latest/intro.html
Experimental protocol	RNA from Drosophila BG3-c2, HEK293T, and human iPS cells was isolated with Trizol (Sigma Aldrich) according to the manufacturer’s instructions. For total RNA isolation from yeast, frozen cells were resuspended in 100 mM sodium acetate pH = 4.5, 10 mM EDTA pH = 8.0, 1% SDS (1 mL per 50 OD600 units). An equal volume of hot acid phenol (pH = 4.3) was added, and the cell suspension was vortexed vigorously followed by incubation at 65°C for 5 min (S. cerevisiae) or 45 min (S. pombe) with intermittent mixing. After addition of 1/10 volume 1-Bromo-3-chloropropane (BCP, Sigma Aldrich), samples were centrifuged at 10,000 x g for 5 min and the aqueous phase was transferred to a new tube. Following an additional round of hot acid phenol/BCP and a round of BCP only extraction, RNA was precipitated from the aqueous phase by the addition of 3 volumes of 100% ethanol. Pellets were washed in 80% ethanol, briefly air-dried, and resuspended in RNase-free water. For RNA isolation from yeast under conditions that preserve tRNA charging, frozen cells were resuspended in ice-cold 100 mM sodium acetate pH = 4.5, 10 mM EDTA pH = 8.0. One volume of cold acid phenol (pH = 4.3) was added and cells were lysed with 500 μm-diameter glass beads by three rounds of vortexing for 45 s with a 1-min incubation on ice in between. One-tenth volume of BCP was then added and the samples were centrifuged at 10,000 x g/4°C for 5 min, followed by a second round of cold phenol-BCP and one round of BCP-only extraction. RNA was ethanol-precipitated from the aqueous phase and pellets were washed in 80% ethanol containing 50 mM sodium acetate, pH = 4.5, briefly air-dried, and resuspended in 50 mM sodium acetate pH = 4.5, 1 mM EDTA pH = 8.0. 25 μg of total RNA were resuspended in 10 mM sodium acetate pH 4.5 and oxidized by the addition of freshly prepared NaIO4 to a final concentration of 50 mM in a 58 μL volume for 30 min at 22°C. The reaction was quenched by addition of 6 μL 1 M glucose for 5 min at 22°C. RNA was purified with Micro Bio Spin P30 columns (BioRad) followed by two rounds of ethanol precipitation in the presence of 0.3M sodium acetate pH = 4.5. Pellets were resuspended in 20 μL RNase-free water and β-elimination was performed by addition of 30 μl 100 mM sodium borate pH = 9.5 (freshly prepared) for 90 min at 45°C. RNA was recovered with Micro Bio Spin P30 columns followed by ethanol precipitation, resuspended in RNase-free water, quantified on a NanoDrop, and stored at −80°C in single-use aliquots. Two synthetic RNA standards corresponding to E.coli tRNA-Lys-UUU with intact 3′-CCA (5′-GGGUCGUUAGCUCAGUUGGUAGAGCAGUUGACUUUUAAUCAAUUGGUCGCAGGUUCGAAUCCUGCACGACCCACCA-3′) or a 3′-CC (5′-GGGUCGUUAGCUCAGUUGGUAGAGCAGUUGACUUUUAAUCAAUUGGUCGCAGGUUCGAAUCCUGCACGACCCACC-3′) were added to total RNA in a 3:1 molar ratio at 0.06 pmol/μg, followed by incubation at 37°C in 50 mM Tris-HCl pH = 9.0 to deacylate tRNAs. Deacylation was omitted for samples subjected to oxidation and β-elimination. Total RNA was subsequently dephosphorylated with 10 U of T4 PNK (NEB) at 37°C for 30 min and purified by ethanol precipitation in 0.3M sodium acetate pH = 4.5 with 25 μg glycogen (Ambion) as a carrier. RNA was resolved on a denaturing 10% polyacrylamide/7M urea/1XTBE gel alongside Low Range ssRNA marker (NEB) and visualized with SYBR Gold. Species migrating at the size range of mature tRNAs (60 – 100 nt) were excised and gel slices were crushed with disposable pestles. Low-retention tubes and tips (Biotix, Axygen) were used for all subsequent steps of sequencing library construction to maximize nucleic acid recovery. Following addition of 400 μl gel elution buffer (0.3M sodium acetate pH = 4.5, 0.25% SDS, 1mM EDTA pH = 8.0), the gel slurry was incubated at 65°C for 10 min, snap-frozen on dry ice, and thawed at 65°C for 5 min. RNA was eluted overnight at room temperature with continuous mixing. Gel pieces were removed with Costar Spin-X centrifuge tube filters and RNA was recovered from the flow-through by ethanol precipitation in the presence of 25 μg of glycogen. 50 to 200 ng of gel-purified tRNA was ligated to one of four adapters with distinct barcodes: I1: 5′-pGATATCGTCAAGATCGGAAGAGCACACGTCTGAA/ddC/-3′; I2: 5′-pGATAGCTACAAGATCGGAAGAGCACACGTCTGAA/ddC/-3′; I3: 5′-pGATGCATACAAGATCGGAAGAGCACACGTCTGAA/ddC/-3′; I4: 5′-pGATTCTAGCAAGATCGGAAGAGCACACGTCTGAA/ddC/-3′ (barcodes italicised; underlined sequence complementary to RT primer). The adapters are blocked by the 3′ chain terminator dideoxycytidine to prevent concatemer formation, and 5′- phosphorylated to enable pre-adenylation by Mth RNA ligase prior to ligation (McGlincy and Ingolia, 2017). Ligation was performed for 3 hours at 25°C in a 20-μl reaction volume containing pre-adenylated adapter and RNA substrate in a 4:1 molar ratio, 1x T4 RNA Ligase Reaction Buffer, 200 U of T4 RNA ligase 2 (truncated KQ; NEB), 25% PEG 8000, and 10 U SUPERase In (Ambion). Ligation products were separated from excess adapter on denaturing 10% polyacrylamide/7M urea/1XTBE gels. Bands migrating at 95-125 nt were excised and ligation products were recovered from crushed gel slices. All reactions contained 125 nM primer, 125 nM template and 500 nM TGIRT (InGex) or 200 U Superscript III (Invitrogen). To prime reverse transcription in template-switching reactions, a synthetic RNA/DNA duplex with a single-nucleotide 3′ overhang was generated by annealing an RNA oligonucleotide (5′-GAGCACACGUCUGAACUCCACUCUUUCCCUACACGACGCUCUUCCGAUCU-3′) to a DNA oligonucleotide (5′-pRAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGGAGTTCAGACGTGTGCTCN-3′). The DNA oligonucleotide contained a phosphorylated A/G at its 5′ end, which is a preferred substrate for CircLigase used in subsequent cDNA circularization. For primer-dependent reverse transcription reactions, adapter-ligated tRNA and RT primer (5′-pRNAGATCGGAAGAGCGTCGTGTAGGGAAAGAG/iSp18/GTGACTGGAGTTCAGACGTGTGCTC-3′; underlined sequence complementary to 3′ adapter, 5′-RN to ameliorate potential biases during circularization) were mixed in MAXYMum Recovery PCR Tubes (Axygen), denatured at 82°C for 2 min and annealed at 25°C for 5 min in a Thermocycler. TGIRT reactions were assembled in a 20-μl final volume by combining template and primer with 10 U SUPERase In, 5 mM DTT (from a freshly made 100 mM stock) and manufacturer-recommended TGIRT buffer (20 mM Tris-HCl pH = 7.6, 450 mM NaCl, 5 mM MgCl2) or low salt buffer (50 mM Tris-HCl pH = 8.3, 75 mM KCl, 3 mM MgCl2). After TGIRT addition, samples were pre-incubated at reaction temperature for 10 min (primer-dependent reactions) or 22°C for 30 min (template-switching reactions), initiated by addition of dNTPs to a final concentration of 1.25 mM, and incubated in a Thermocycler for 1 hour or 16 hours. For Superscript III RT, template and primer were denatured at 75°C for 5 min and chilled on ice, and reverse transcription was performed in the presence of 1X First-Strand Buffer, 5 mM DTT, 0.5 mM dNTPs, 10 U SUPERase In, and 200 U Superscript III (Invitrogen) at 57°C for 60 min. Template RNA was subsequently hydrolyzed by the addition of 1 μl 5M NaOH and incubation at 95°C for 3 min and reaction products were separated from unextended primer on denaturing 10% polyacrylamide/7M urea/1XTBE gels. Gels were stained with SYBR Gold, the region between 60 and 150 nt was excised and cDNA was eluted from crushed gel slices in 400 μl 10 mM Tris-HCl pH = 8.0, 1 mM EDTA at 70°C/2000 rpm for 1 hour in a Thermoblock, followed by ethanol precipitation in 0.3M sodium acetate pH = 5.5 in the presence of 25 μg glycogen. Purified cDNA was circularized with CircLigase ssDNA ligase (Lucigen) in 1X reaction buffer supplemented with 1 mM ATP, 50 mM MgCl2, and 1M betaine for 3 hours at 60°C, followed by enzyme inactivation for 10 min at 80°C. One-fifth of circularized cDNA was directly used for library construction PCR with a common forward (5′-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCT∗C-3′) and unique indexed reverse primers (5′-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGT∗G-3′, asterisks denote a phosphorothioate bond and NNNNNN corresponds to the reverse complement of an Illumina index sequence). Amplification was performed with KAPA HiFi DNA Polymerase (Roche) in 1X GC buffer with initial denaturation at 95°C for 3 min, followed by five to six cycles of 98°C for 20 s, 62°C for 30 s, 72°C for 30 s at a ramp rate of 3°C/sec. PCR products were purified with DNA Clean&Concentrator 5 (Zymo Research) and resolved on 8% polyacrylamide/1XTBE gels alongside pBR322 DNA-MspI Digest (NEB). The 130-220 bp region of each lane was excised and DNA was eluted from crushed gel slices in 400 μl water with continuous mixing at room temperature overnight. After ethanol precipitation in 0.3M sodium acetate pH = 5.5 and 25 μg glycogen, libraries were dissolved in 10 μl 10 mM Tris-HCl pH = 8.0, quantified with the Qubit dsDNA HS kit, and sequenced for 150 cycles on an Illumina NextSeq platform.
Analysis protocol	Sequencing libraries were demultiplexed using cutadapt v2.5 (Martin, 2011) and a fasta file (barcodes.fa) of the first 10 nt for the four different 3′ adapters (see 3′ adapter ligation above). Indels in the alignment to the adapter sequence were disabled with --no-indels. Following demultiplexing, reads were further trimmed to remove the two 5′-RN nucleotides introduced by circularization from the RT primer with -u 2. In both processing steps, reads shorter than 10 nt were discarded using -m 10. Example commands for demultiplexing and 5′ nucleotide trimming: cutadapt --no-indels -a file:barcodes.fa -m 10 -o mix1_{name}_trim.fastq.gz mix.fastq.gz ; cutadapt -j 40 -m 10 -u 2 -o mix1_barcode1_trimFinal.fastq.gz mix1_barcode1_trim.fastq.gz. mim-tRNAseq uses modification data from MODOMICS to guide accurate alignment of short reads from tRNAs. A prepackaged set of data is available for S. cerevisiae, S. pombe, C. elegans, D. melanogaster, M. musculus, H. sapiens and E. coli, and can be specified with the --species parameter. For other organisms, mim-tRNAseq requires a fasta file of predicted genomic tRNA sequences (-t) and a tRNAscan-SE “out” file containing information about tRNA introns (-o), both of which should be obtained from GtRNAdb (Chan and Lowe, 2016) or from running tRNAscan-SE (Lowe and Chan, 2016) on the genome of interest. Lastly, a user-generated sample input file is required which contains two tab-separated columns specifying the path to trimmed tRNA-seq reads in fastq format, and the experimental condition of each fastq file. Additionally, a mitochondrial tRNA fasta reference is supplied with the prepackaged data inputs listed above, or may be supplied (-m) for custom genomes as a fasta file obtained from mitotRNAdb. mim-tRNAseq automatically removes nuclear-encoded mitochondrial tRNAs (nmt-tRNAs) and tRNA species with undetermined anticodons (where applicable), generates mature, processed tRNA sequences (with appended 3′-CCA if necessary, 5′-G for tRNA-His, and spliced introns), and fetches species-matched MODOMICS entries accordingly. Transcript sequences are then matched to MODOMICS entries using BLAST in order to index all known instances of residues modified at the Watson-Crick face within each tRNA. An additional modifications file for modifications reported in the literature but not yet added to MODOMICS may be supplied and is automatically processed by the pipeline. tRNA clustering is enabled with the --cluster parameter, which utilizes the usearch --cluster_fast algorithm (Edgar, 2010) to cluster tRNA sequences by a user-defined sequence identity threshold (customizable with --cluster-id). Regardless of the chosen threshold, only tRNAs sharing an anticodon are clustered to maintain isoacceptor resolution in cases where tRNA transcripts differ by a single nucleotide in the anticodon. The clusters are re-centered based on the number of identical sequences, and this is used to re-cluster and improve the selection of a representative centroid/parent sequence for each cluster (https://www.drive5.com/usearch/manual7/recenter.html). Polymorphisms between cluster members are recorded, and mismatches at these sites during alignment are tolerated, but they are not included in misincorporation analysis for modified sites. Since inosine is interpreted as a G during reverse transcription, annotated inosines are changed to G in tRNA reference sequences. After clustering, reads are aligned using GSNAP to the representative centroid cluster sequences of mature tRNA transcripts. By enabling SNP-tolerant alignment with --snp-tolerance, the indexed modified sites are treated as pseudo-SNPs to allow modification-induced mismatches at these sites in a sequence- and position-specific manner. Soft-clipping during alignment in combination with the GSNAP parameter --ignore-trim-in-filtering = 1 ensures that non-templated nucleotide extensions are not counted as mismatches during alignment. Mismatch tolerance outside of indexed SNPs is controlled using the --max-mismatches parameter, where an integer of allowed mismatches per read can be provided, or a relative mismatch fraction of read length between 0.0 and 0.1 can be supplied (default 0.1). If --remap is specified, then misincorporation analysis is performed and new, unannotated modifications are called where --misinc-thresh (total misincorporation proportion at a residue; default is 0.1 or 10%) and --min-cov (minimum total coverage for a cluster) regulate the calling of new modifications, which exclude mismatch sites between cluster members appearing as misincorporations in this analysis. The existing SNP index is then updated with these new sites, and realignment of all reads is performed with a mismatch tolerance set using --remap-mismatches. New potential inosine sites are classified for position 34 where a reference A nucleotide is misincorporated with a G in 95% or more total misincorporation events. Both --remap and --max-mismatches are extremely useful for detecting unknown modifications in poorly annotated tRNAs, subsequently allowing more accurate and efficient read alignment, which improves the results of all downstream analyses. Users should consider a low mismatch tolerance during remap to avoid inaccuracy resulting from lenient alignment parameters. We recommend a relative mismatch fraction of 0.075 during remapping (--remap-mismatches 0.075). Only uniquely mapped reads are retained for post-alignment analyses. The deconvolution algorithm first searches each cluster of tRNA reference sequences for single-nucleotide differences that distinguish among cluster members. For this, each nucleotide in a reference sequence is assessed for uniqueness at that position when compared to all other reference sequences in the cluster. If a nucleotide is unique in position and identity for a specific tRNA reference in the cluster, it is catalogued. Then, after alignment, each read is assessed for mismatches to the cluster parent to which it was aligned. These are then scanned individually to find potential matches to the previously catalogued set for the cluster which can distinguish unique tRNA references. Based on the presence and identity of a unique distinguishing mismatch, a read is then be assigned to a specific tRNA reference within a cluster. Depending on the organism and/or cluster ID threshold, unique distinguishing mismatches may not always be present for all tRNA references in a cluster. Reads without distinguishing mismatches remain assigned to the cluster parent, which is then marked as not fully deconvolved. Using this algorithm, uniquely aligned reads are assigned to individual tRNA sequences in the reference (where possible) before any of the downstream analyses detailed below. For differential expression analyses of reads summed per tRNA anticodon, read deconvolution is not necessary and therefore not performed. Following read deconvolution, all other mismatched positions for the read are extracted from alignment records in bam files, and converted into positions relative to the unique transcript to which the read was assigned (or the cluster parent if definitive assignment is not possible). The identity of the misincoporated nucleotide is recorded to enable signature analysis, and the counts of mismatches for each of the four nucleotides for all reads with the misincorporation are normalized relative to total read coverage at that position. Stops during reverse transcription are extracted from the alignment start position of each read relative to the reference (5′ read ends correspond to cDNA 3′ ends during RT) and normalized to total read counts for the unique tRNA. Similarly, readthrough for each position is calculated as the fraction of reads that stop at a position relative to read coverage at each position (as opposed to stop proportions which are normalized to total tRNA read coverage). This value is then subtracted from one to estimate the proportion of reads per position that extend beyond that site, and the minimum value in a 3-nucleotide window centered around the modification is recorded. Using a 3-nucleotide window ensures that potential variance in the position at which the RT stalls due to the modification is accounted for. Taking the minimum value of readthrough for these 3 nucleotides reduces the likelihood of readthrough overestimation. Misincorporation, stop data, and readthrough per unique tRNA sequence, per position are output as tab-separated files, and global heatmaps showing misincorporation and stop proportions across all unique tRNA sequences are plotted per experimental condition. Misincorporation signatures are also plotted for well-known conserved modified tRNA sites (9, 20, 26, 32, 34, 37 and 58) separated by upstream and downstream sequence context to assess potential factors influencing misincorporation signatures. Lastly, the dinucleotide at the 3′ ends of reads is quantified, so long as the read aligns to the conserved 3′-CCA tail of the reference. Proportions of transcripts with absent 3′ tails, 3′-C, 3′-CC and 3′-CCA are calculated per unique tRNA sequence and plotted pairwise between conditions for quantitation and comparison of functional tRNA pools, or tRNA charging fractions in periodate oxidation experiments. The cluster deconvolution algorithm allows coverage analysis, novel modification discovery and read counting for tRNA quantitation to be done at the level of unique tRNA sequences. Coverage is calculated as the depth of reads at all positions across a tRNA sequence and plotted using custom R scripts. Cytosolic tRNAs with low read coverage can be filtered at the coverage analysis step by supplying a minimum coverage threshold to --min-cov. Unique tRNA sequences filtered out here are excluded from all downstream analyses, except differential expression analysis by DESeq2 where all unique tRNA sequences are included. Normalized coverage (read fraction relative to library size) is plotted per sample in 25 bins across gene length in a metagene analysis. Normalized coverage is also scaled relative to the second last bin to account for potential differences in 3′ CCA intactness. Read counts per unique tRNA sequence are summed to calculate read counts per isoacceptor family (all tRNAs sharing an anticodon). These counts are subsequently used by a DESeq2 pipeline for count transformations, sample distance analysis using distance matrix heatmaps, PCA plots, and differential expression analysis at the level of isoacceptor families and unique tRNA transcripts (only for completely resolved clusters). In the case that only one experimental condition is supplied, or if there are no replicates for one or more conditions, differential expression analysis is not performed on these samples, but a normalized counts table is still produced for investigations into tRNA abundance.
Organism/cell line	Schizosaccharomyces pombe; Saccharomyces cerevisiae; Drosophila melanogaster; Homo sapiens (HEK293T, K562)
Approximate experimental time	4-5 days
Starting RNA amount	25 µg of total RNA
tRNA expression	++
Base modifications	+
Charging status	+
tRNA processing and fragmentation	-
Citation	https://doi.org/10.1016/j.molcel.2021.01.028