1 Reading the data
The input files are:
- the output tables made by featureCounts (run in rule ‘count_on_TE’ of the TE_RNASeq.Snakefile), which contain read counts of all repetitive elements (i.e. all repNames) for all samples
- the repetitive elements annotation ../../data/annotations/RepeatMasker_RepeatLibrary20140131_mm10.noGenes.noSimple.bed used for featureCounts command, that is used to retrieve repFamily and repClass for each repName. The annotation I chose is the RepeatMasker most updated library to Nov2020, from which ‘Simple repeats’ and ‘Low complexity regions’ are removed in rule filter_TE_annotation rule of TE_RNASeq.Snakefile
- a table with total number of reads per sample in the raw fastq files
- the samples’ table, containing info on experimental design:
2 Analysis at family level
2.1 FPKM for each family of Repetitive Elements
For each family of Repetitive Elements (in case of elements with no repFamily name or repFamilies belonging to more than one repClass I use repClass) I compute FPKM values, as follows: for each sample:
- I compute the sum of counts for all elements belonging to that repFamily
- I divide this sum by the total number of reads for that sample and multiply by 10⁶
- I divide this number by the total sum of lengths (in Kb) of the elements belonging to that repFamily –> FPKM
- When specified, I subtract from each FPKM the total FPKM of all transposons belonging to the DNA repClass
2.2 Heatmaps
The heatmaps are scaled by rows.
- I exclude samples mA9, mA20 and mC6 from the rest of the TE analysis because more contaminated with DNA transposons.
- No RNA TE family shows deregulation in one of the three experimental groups.
3 DE-Seq analysis of RNA transposons
I include the FPKM of DNA repetitive elements as confounding factor in DESeq2 formula.
Before running the Differential Expression analysis, the data are pre-filtered to remove all repetitive elements with < 10 reads among all samples.
3.1 MA-plots
- The threshold used for a dot to be coloured in the MA-plots is p-value adjusted < 0.05.
- Transposable elements whose mean expression > 10 and log2FoldChange > 0.2 (or < -0.2) are labeled.
baseMean | log2FoldChange | lfcSE | pvalue | padj | repName |
---|---|---|---|---|---|
806.49139 | -0.3778477 | 0.1193406 | 0.0000683 | 0.0116156 | MMETn-int |
206.40109 | -0.3818225 | 0.1369757 | 0.0001269 | 0.0116156 | ETnERV-int |
3573.42804 | -0.3363603 | 0.1266031 | 0.0004848 | 0.0295735 | B2_Mm1a |
498.50741 | -0.3352599 | 0.1495905 | 0.0007787 | 0.0332760 | RLTR10-int |
2198.03897 | -0.3218701 | 0.1305130 | 0.0009092 | 0.0332760 | B2_Mm1t |
70.57467 | 0.3044471 | 0.1306259 | 0.0016102 | 0.0453632 | L1MdFanc_II |
16.97384 | -0.2951958 | 0.1710063 | 0.0019182 | 0.0453632 | RLTR26 |
389.54918 | -0.2847695 | 0.1180779 | 0.0019831 | 0.0453632 | L1MdTf_III |
baseMean | log2FoldChange | lfcSE | pvalue | padj | repName |
---|---|---|---|---|---|
52.239767 | 0.5380230 | 0.1350304 | 0.0000007 | 0.0001573 | L1_Mur3 |
31.009136 | -0.4685576 | 0.1483968 | 0.0000304 | 0.0034974 | RLTR17 |
198.441948 | 0.4520685 | 0.2067209 | 0.0000526 | 0.0040305 | MER89 |
57.391712 | 0.3619053 | 0.1223337 | 0.0003116 | 0.0179177 | L1Lx_IV |
82.818382 | 0.3463769 | 0.1222770 | 0.0005484 | 0.0228044 | ORR1A3-int |
120.961862 | -0.3601924 | 0.1321217 | 0.0005949 | 0.0228044 | RLTR13B2 |
1010.357509 | 0.3837509 | 0.1613170 | 0.0007645 | 0.0251195 | MTA_Mm-int |
389.549176 | -0.3187987 | 0.1148767 | 0.0009187 | 0.0262123 | L1MdTf_III |
9.499054 | -0.3721853 | 0.1869677 | 0.0011180 | 0.0262123 | MLTR11B |
103.868560 | 0.3480707 | 0.1376616 | 0.0011397 | 0.0262123 | RLTR6-int |
14.749991 | 0.3584921 | 0.1790419 | 0.0017506 | 0.0360284 | L1Lx_II |
37.170810 | 0.3497443 | 0.1597735 | 0.0019886 | 0.0360284 | MT-int |
7.922593 | 0.3294815 | 0.2051498 | 0.0020364 | 0.0360284 | L1_Rod |
67.889418 | 0.3314359 | 0.1521968 | 0.0028899 | 0.0464899 | MT2B |
61.251019 | 0.2822648 | 0.1134049 | 0.0030320 | 0.0464899 | LTRIS_Mus |
baseMean | log2FoldChange | lfcSE | pvalue | padj | repName |
---|
sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=it_IT.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=it_IT.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggpubr_0.5.0 ggrepel_0.9.2
## [3] gridExtra_2.3 DESeq2_1.34.0
## [5] SummarizedExperiment_1.24.0 Biobase_2.54.0
## [7] MatrixGenerics_1.6.0 matrixStats_0.62.0
## [9] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
## [11] IRanges_2.28.0 S4Vectors_0.32.4
## [13] BiocGenerics_0.40.0 pheatmap_1.0.12
## [15] data.table_1.14.6 ggplot2_3.4.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 bit64_4.0.5 RColorBrewer_1.1-3
## [4] httr_1.4.4 tools_4.1.3 backports_1.4.1
## [7] bslib_0.4.1 irlba_2.3.5.1 utf8_1.2.2
## [10] R6_2.5.1 DBI_1.1.3 colorspace_2.0-3
## [13] withr_2.5.0 tidyselect_1.2.0 bit_4.0.5
## [16] compiler_4.1.3 cli_3.4.1 DelayedArray_0.20.0
## [19] labeling_0.4.2 sass_0.4.2 scales_1.2.1
## [22] SQUAREM_2021.1 genefilter_1.76.0 mixsqp_0.3-48
## [25] stringr_1.4.1 digest_0.6.30 rmarkdown_2.18
## [28] XVector_0.34.0 pkgconfig_2.0.3 htmltools_0.5.3
## [31] invgamma_1.1 highr_0.9 fastmap_1.1.0
## [34] rlang_1.0.6 rstudioapi_0.14 RSQLite_2.2.18
## [37] prettydoc_0.4.1 farver_2.1.1 jquerylib_0.1.4
## [40] generics_0.1.3 jsonlite_1.8.3 BiocParallel_1.28.3
## [43] car_3.1-1 dplyr_1.0.10 RCurl_1.98-1.9
## [46] magrittr_2.0.3 GenomeInfoDbData_1.2.7 Matrix_1.5-3
## [49] Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3
## [52] abind_1.4-5 lifecycle_1.0.3 stringi_1.7.8
## [55] yaml_2.3.6 carData_3.0-5 zlibbioc_1.40.0
## [58] grid_4.1.3 blob_1.2.3 parallel_4.1.3
## [61] crayon_1.5.2 lattice_0.20-45 Biostrings_2.62.0
## [64] splines_4.1.3 annotate_1.72.0 KEGGREST_1.34.0
## [67] locfit_1.5-9.6 knitr_1.40 pillar_1.8.1
## [70] ggsignif_0.6.4 codetools_0.2-18 geneplotter_1.72.0
## [73] XML_3.99-0.12 glue_1.6.2 evaluate_0.18
## [76] png_0.1-7 vctrs_0.5.1 gtable_0.3.1
## [79] purrr_0.3.5 tidyr_1.2.1 assertthat_0.2.1
## [82] ashr_2.2-54 cachem_1.0.6 xfun_0.35
## [85] xtable_1.8-4 broom_1.0.1 rstatix_0.7.1
## [88] survival_3.2-13 truncnorm_1.0-8 tibble_3.1.8
## [91] AnnotationDbi_1.56.2 memoise_2.0.1