1 Reading the data
The input files are:
- the output tables made by featureCounts (run in rule ‘count_on_TE’ of the Snakefile), which contain read counts of all repetitive elements (i.e. all repNames) for all samples
- the repetitive elements annotation ../../data/annotations/RepeatMasker_RepeatLibrary20140131_mm10.noGenes.noSimple.bed (converted to SAF format in rule of Snakefile) used for featureCounts command, that is used to retrieve repFamily and repClass for each repName. The annotation I chose is the most updated library to Nov2020 of rmsk, from which I removed ‘Simple repeats’ and ‘Low complexity regions’
- a table with total number of reads per sample in raw fastq files
- the samples’ table, containing infos on experimental design:
2 Analysis at family level
2.1 FPKM for each family of Repetitive Elements
For each family of Repetitive Elements (in case of elements with no repFamily name or repFamilies belonging to more than one repClass I use repClass) I compute FPKM values, as follows: for each sample:
- I compute the sum of counts for all elements belonging to that repFamily
- I divide this sum by the total number of reads sequenced for that sample and multiply by 10⁶
- I divide this number by the total sum of lengths (in Kb) of the elements belonging to that repFamily –> FPKM
- I subtract from each FPKM the total FPKM of all transposons belonging to the DNA repClass
2.2 Heatmaps
The heatmaps are scaled by rows.
- I remove sample het_4D, wt_9F, het_5A, het_3B from the next heatmaps and from the DESeq analysis below because more contaminated with DNA.
- I use the values not corrected for DNA transposons because - with these single low-input single-embryo samples - the error affecting the correction hides the subtle differences in the amount of RNA transposons. This is compensated by having many single embryos: a similar trend observed in multiple embryos with the same genotype - affected by a different amount of DNA contamination and, in the case of the wt males, even coming from two different IVF drops - would be indicative of a phenotype.
- I plot separately male and female samples and cluster the main repFamilies of retrotransposons using euclidean distance.
A not-fully-penetrant trend of upregulation of all families of TEs is visible for male but not female samples.
2.3 PCA of male samples
This plot is useful because it shows the contribution of the different families to sample clustering i.e. LTR and L1 contribute more than SINE.
2.4 Testing difference in FPKM of retrotransposons’ families
I use the values not corrected for DNA transposons.
Wilcoxon test on male samples:
[[1]] [1] “ERVL - T931del vs wt pval = 0.0892095520578493 - T931A vs wt pval = 0.909090909090909”
[[2]] [1] “L1 - T931del vs wt pval = 0.314999242243824 - T931A vs wt pval = 0.757575757575758”
[[3]] [1] “ERVK - T931del vs wt pval = 0.0185433761285154 - T931A vs wt pval = 0.484848484848485”
[[4]] [1] “ERV1 - T931del vs wt pval = 0.00388620667258438 - T931A vs wt pval = 0.121212121212121”
[[5]] [1] “ERVL-MaLR - T931del vs wt pval = 0.0232306393297105 - T931A vs wt pval = 0.909090909090909”
[[6]] [1] “B2 - T931del vs wt pval = 0.279861005867198 - T931A vs wt pval = 0.484848484848485”
[[7]] [1] “Alu - T931del vs wt pval = 0.630528913810648 - T931A vs wt pval = 0.484848484848485”
3 DE-Seq analysis of RNA transposons - female WT blastocysts to test the batch effect
In this and all other DESeq comparisons:
- I exclude female samples wt_9F, het_5A, het_3B and male samples het_4D.
- I include the FPKM of DNA repetitive elements as confounding factors in DESeq2 formula.
- The threshold used for a dot to be coloured in the MA-plots is p-value adjusted < 0.1.
- Transposable elements whose mean expression > 10 and log2FoldChange > 0.2 (or < -0.2) are labeled.
4 DE-Seq analysis of RNA transposons - male blastocysts only
baseMean | log2FoldChange | lfcSE | pvalue | padj | repName |
---|---|---|---|---|---|
528.90386 | 0.9786905 | 0.1447289 | 0.0000000 | 0.0000000 | MuLV-int |
1119.31449 | 0.8158621 | 0.1287900 | 0.0000000 | 0.0000000 | RLTR4_MM-int |
135.06702 | 0.5213699 | 0.1728559 | 0.0000162 | 0.0009644 | RLTR4_Mm |
206.29170 | 0.4607629 | 0.1579520 | 0.0000361 | 0.0016157 | RLTR6-int |
1300.93099 | -0.6613354 | 0.2690325 | 0.0000604 | 0.0021628 | MMERGLN-int |
678.99933 | 0.3609694 | 0.1271392 | 0.0001577 | 0.0047044 | MMVL30-int |
27.55500 | -0.6409836 | 0.3666934 | 0.0003380 | 0.0081098 | RodERV21-int |
187.68044 | -0.2907434 | 0.0997517 | 0.0003624 | 0.0081098 | MTA_Mm |
31.75663 | -0.5258848 | 0.2832002 | 0.0005226 | 0.0103932 | MMERGLN_LTR |
42.13382 | -0.4011300 | 0.1948019 | 0.0008109 | 0.0142718 | RLTR17 |
1286.57219 | 0.2912919 | 0.1128029 | 0.0008770 | 0.0142718 | ERVB4_2-I_MM |
11.38854 | -0.4721387 | 0.3096455 | 0.0014410 | 0.0199878 | ORR1C1 |
221.73857 | -0.2271770 | 0.0843115 | 0.0014516 | 0.0199878 | L1MdV_III |
32.75616 | -0.3299012 | 0.1855960 | 0.0031362 | 0.0400989 | RLTR19-int |
22.56685 | -0.3604645 | 0.2434361 | 0.0039036 | 0.0465828 | RMER19C |
162.20863 | -0.2460331 | 0.1173378 | 0.0046624 | 0.0521609 | Lx |
38.64215 | -0.2956434 | 0.1708622 | 0.0050930 | 0.0536259 | B1_Mus2 |
26.09248 | 0.3173226 | 0.2100425 | 0.0057694 | 0.0573733 | IAPEY4_I-int |
32.21240 | -0.2753766 | 0.1688235 | 0.0077095 | 0.0697200 | ID_B1 |
18.45116 | -0.3113425 | 0.2301053 | 0.0077899 | 0.0697200 | MTB_Mm |
118.79695 | 0.2240525 | 0.1160159 | 0.0083204 | 0.0709218 | RMER19B |
258.79774 | 0.2120458 | 0.1158948 | 0.0115058 | 0.0933189 | ORR1A1-int |
64.63403 | 0.2293131 | 0.1347057 | 0.0119907 | 0.0933189 | ORR1A2-int |
672.68207 | 0.2423683 | 0.1612692 | 0.0141694 | 0.0995731 | L1MdA_I |
1118.11786 | 0.2347604 | 0.1520234 | 0.0146043 | 0.0995731 | ETnERV3-int |
24.91461 | -0.2602543 | 0.1990863 | 0.0148912 | 0.0995731 | B3A |
39.13013 | -0.2325337 | 0.1506258 | 0.0150194 | 0.0995731 | B4A |
4.1 FPKM of DE TEs
4.2 Comparing DE TEs with developmentally dynamic TEs
- TE dynamics across preimplantation development was analyzed using the preimplantation WT samples from GSE66582 (MII oocyte to inner cell mass) and GSE76505 (E3.5 to E7.5).
- I focus on the 8-cell to E3.5 ICM and 8-cell to E3.5 trophectoderm transition as the relevant one for the comparison with the mutant blastocysts.
4.2.1 Conclusion
TEs which are upregulated in the mutant are downregulated between the 8c and ICM in WT embryos, hence their deregulation could potentially partly be due to a developmental delay. However, the developmental delay cannot completely explain TE upregulation, because the negative correlation that we would expect between developmental changes and changes due to the mutation is very low.
sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=it_IT.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=it_IT.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] factoextra_1.0.7 ade4_1.7-20
## [3] RColorBrewer_1.1-3 ggpubr_0.5.0
## [5] ggrepel_0.9.2 gridExtra_2.3
## [7] DESeq2_1.34.0 SummarizedExperiment_1.24.0
## [9] Biobase_2.54.0 MatrixGenerics_1.6.0
## [11] matrixStats_0.62.0 GenomicRanges_1.46.1
## [13] GenomeInfoDb_1.30.1 IRanges_2.28.0
## [15] S4Vectors_0.32.4 BiocGenerics_0.40.0
## [17] pheatmap_1.0.12 data.table_1.14.6
## [19] ggplot2_3.4.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 bit64_4.0.5 httr_1.4.4
## [4] tools_4.1.3 backports_1.4.1 bslib_0.4.1
## [7] irlba_2.3.5.1 utf8_1.2.2 R6_2.5.1
## [10] DBI_1.1.3 colorspace_2.0-3 withr_2.5.0
## [13] tidyselect_1.2.0 bit_4.0.5 compiler_4.1.3
## [16] cli_3.4.1 DelayedArray_0.20.0 labeling_0.4.2
## [19] sass_0.4.2 scales_1.2.1 SQUAREM_2021.1
## [22] genefilter_1.76.0 mixsqp_0.3-48 stringr_1.4.1
## [25] digest_0.6.30 rmarkdown_2.18 XVector_0.34.0
## [28] pkgconfig_2.0.3 htmltools_0.5.3 invgamma_1.1
## [31] highr_0.9 fastmap_1.1.0 rlang_1.0.6
## [34] rstudioapi_0.14 RSQLite_2.2.18 prettydoc_0.4.1
## [37] farver_2.1.1 jquerylib_0.1.4 generics_0.1.3
## [40] jsonlite_1.8.3 BiocParallel_1.28.3 car_3.1-1
## [43] dplyr_1.0.10 RCurl_1.98-1.9 magrittr_2.0.3
## [46] GenomeInfoDbData_1.2.7 Matrix_1.5-3 Rcpp_1.0.9
## [49] munsell_0.5.0 fansi_1.0.3 abind_1.4-5
## [52] lifecycle_1.0.3 stringi_1.7.8 yaml_2.3.6
## [55] carData_3.0-5 MASS_7.3-55 zlibbioc_1.40.0
## [58] grid_4.1.3 blob_1.2.3 parallel_4.1.3
## [61] crayon_1.5.2 lattice_0.20-45 Biostrings_2.62.0
## [64] splines_4.1.3 annotate_1.72.0 KEGGREST_1.34.0
## [67] locfit_1.5-9.6 knitr_1.40 pillar_1.8.1
## [70] ggsignif_0.6.4 codetools_0.2-18 geneplotter_1.72.0
## [73] XML_3.99-0.12 glue_1.6.2 evaluate_0.18
## [76] png_0.1-7 vctrs_0.5.1 gtable_0.3.1
## [79] purrr_0.3.5 tidyr_1.2.1 assertthat_0.2.1
## [82] ashr_2.2-54 cachem_1.0.6 xfun_0.35
## [85] xtable_1.8-4 broom_1.0.1 rstatix_0.7.1
## [88] survival_3.2-13 truncnorm_1.0-8 tibble_3.1.8
## [91] AnnotationDbi_1.56.2 memoise_2.0.1