1 Reading the data

The input files are:

2 Analysis at family level

2.1 FPKM for each family of Repetitive Elements

For each family of Repetitive Elements (in case of elements with no repFamily name or repFamilies belonging to more than one repClass I use repClass) I compute FPKM values, as follows: for each sample:

  • I compute the sum of counts for all elements belonging to that repFamily
  • I divide this sum by the total number of reads sequenced for that sample and multiply by 10⁶
  • I divide this number by the total sum of lengths (in Kb) of the elements belonging to that repFamily –> FPKM
  • I subtract from each FPKM the total FPKM of all transposons belonging to the DNA repClass

2.2 Heatmaps

The heatmaps are scaled by rows.

  • I remove sample het_4D, wt_9F, het_5A, het_3B from the next heatmaps and from the DESeq analysis below because more contaminated with DNA.
  • I use the values not corrected for DNA transposons because - with these single low-input single-embryo samples - the error affecting the correction hides the subtle differences in the amount of RNA transposons. This is compensated by having many single embryos: a similar trend observed in multiple embryos with the same genotype - affected by a different amount of DNA contamination and, in the case of the wt males, even coming from two different IVF drops - would be indicative of a phenotype.
  • I plot separately male and female samples and cluster the main repFamilies of retrotransposons using euclidean distance.

A not-fully-penetrant trend of upregulation of all families of TEs is visible for male but not female samples.

2.3 PCA of male samples

This plot is useful because it shows the contribution of the different families to sample clustering i.e. LTR and L1 contribute more than SINE.

2.4 Testing difference in FPKM of retrotransposons’ families

I use the values not corrected for DNA transposons.

Wilcoxon test on male samples:

[[1]] [1] “ERVL - T931del vs wt pval = 0.0892095520578493 - T931A vs wt pval = 0.909090909090909”

[[2]] [1] “L1 - T931del vs wt pval = 0.314999242243824 - T931A vs wt pval = 0.757575757575758”

[[3]] [1] “ERVK - T931del vs wt pval = 0.0185433761285154 - T931A vs wt pval = 0.484848484848485”

[[4]] [1] “ERV1 - T931del vs wt pval = 0.00388620667258438 - T931A vs wt pval = 0.121212121212121”

[[5]] [1] “ERVL-MaLR - T931del vs wt pval = 0.0232306393297105 - T931A vs wt pval = 0.909090909090909”

[[6]] [1] “B2 - T931del vs wt pval = 0.279861005867198 - T931A vs wt pval = 0.484848484848485”

[[7]] [1] “Alu - T931del vs wt pval = 0.630528913810648 - T931A vs wt pval = 0.484848484848485”

3 DE-Seq analysis of RNA transposons - female WT blastocysts to test the batch effect

In this and all other DESeq comparisons:

4 DE-Seq analysis of RNA transposons - male blastocysts only

Males T931del vs wt
baseMean log2FoldChange lfcSE pvalue padj repName
528.90386 0.9786905 0.1447289 0.0000000 0.0000000 MuLV-int
1119.31449 0.8158621 0.1287900 0.0000000 0.0000000 RLTR4_MM-int
135.06702 0.5213699 0.1728559 0.0000162 0.0009644 RLTR4_Mm
206.29170 0.4607629 0.1579520 0.0000361 0.0016157 RLTR6-int
1300.93099 -0.6613354 0.2690325 0.0000604 0.0021628 MMERGLN-int
678.99933 0.3609694 0.1271392 0.0001577 0.0047044 MMVL30-int
27.55500 -0.6409836 0.3666934 0.0003380 0.0081098 RodERV21-int
187.68044 -0.2907434 0.0997517 0.0003624 0.0081098 MTA_Mm
31.75663 -0.5258848 0.2832002 0.0005226 0.0103932 MMERGLN_LTR
42.13382 -0.4011300 0.1948019 0.0008109 0.0142718 RLTR17
1286.57219 0.2912919 0.1128029 0.0008770 0.0142718 ERVB4_2-I_MM
11.38854 -0.4721387 0.3096455 0.0014410 0.0199878 ORR1C1
221.73857 -0.2271770 0.0843115 0.0014516 0.0199878 L1MdV_III
32.75616 -0.3299012 0.1855960 0.0031362 0.0400989 RLTR19-int
22.56685 -0.3604645 0.2434361 0.0039036 0.0465828 RMER19C
162.20863 -0.2460331 0.1173378 0.0046624 0.0521609 Lx
38.64215 -0.2956434 0.1708622 0.0050930 0.0536259 B1_Mus2
26.09248 0.3173226 0.2100425 0.0057694 0.0573733 IAPEY4_I-int
32.21240 -0.2753766 0.1688235 0.0077095 0.0697200 ID_B1
18.45116 -0.3113425 0.2301053 0.0077899 0.0697200 MTB_Mm
118.79695 0.2240525 0.1160159 0.0083204 0.0709218 RMER19B
258.79774 0.2120458 0.1158948 0.0115058 0.0933189 ORR1A1-int
64.63403 0.2293131 0.1347057 0.0119907 0.0933189 ORR1A2-int
672.68207 0.2423683 0.1612692 0.0141694 0.0995731 L1MdA_I
1118.11786 0.2347604 0.1520234 0.0146043 0.0995731 ETnERV3-int
24.91461 -0.2602543 0.1990863 0.0148912 0.0995731 B3A
39.13013 -0.2325337 0.1506258 0.0150194 0.0995731 B4A

4.1 FPKM of DE TEs

4.2 Comparing DE TEs with developmentally dynamic TEs

  • TE dynamics across preimplantation development was analyzed using the preimplantation WT samples from GSE66582 (MII oocyte to inner cell mass) and GSE76505 (E3.5 to E7.5).
  • I focus on the 8-cell to E3.5 ICM and 8-cell to E3.5 trophectoderm transition as the relevant one for the comparison with the mutant blastocysts.

4.2.1 Conclusion

TEs which are upregulated in the mutant are downregulated between the 8c and ICM in WT embryos, hence their deregulation could potentially partly be due to a developmental delay. However, the developmental delay cannot completely explain TE upregulation, because the negative correlation that we would expect between developmental changes and changes due to the mutation is very low.

sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=it_IT.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] factoextra_1.0.7            ade4_1.7-20                
##  [3] RColorBrewer_1.1-3          ggpubr_0.5.0               
##  [5] ggrepel_0.9.2               gridExtra_2.3              
##  [7] DESeq2_1.34.0               SummarizedExperiment_1.24.0
##  [9] Biobase_2.54.0              MatrixGenerics_1.6.0       
## [11] matrixStats_0.62.0          GenomicRanges_1.46.1       
## [13] GenomeInfoDb_1.30.1         IRanges_2.28.0             
## [15] S4Vectors_0.32.4            BiocGenerics_0.40.0        
## [17] pheatmap_1.0.12             data.table_1.14.6          
## [19] ggplot2_3.4.0              
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7           bit64_4.0.5            httr_1.4.4            
##  [4] tools_4.1.3            backports_1.4.1        bslib_0.4.1           
##  [7] irlba_2.3.5.1          utf8_1.2.2             R6_2.5.1              
## [10] DBI_1.1.3              colorspace_2.0-3       withr_2.5.0           
## [13] tidyselect_1.2.0       bit_4.0.5              compiler_4.1.3        
## [16] cli_3.4.1              DelayedArray_0.20.0    labeling_0.4.2        
## [19] sass_0.4.2             scales_1.2.1           SQUAREM_2021.1        
## [22] genefilter_1.76.0      mixsqp_0.3-48          stringr_1.4.1         
## [25] digest_0.6.30          rmarkdown_2.18         XVector_0.34.0        
## [28] pkgconfig_2.0.3        htmltools_0.5.3        invgamma_1.1          
## [31] highr_0.9              fastmap_1.1.0          rlang_1.0.6           
## [34] rstudioapi_0.14        RSQLite_2.2.18         prettydoc_0.4.1       
## [37] farver_2.1.1           jquerylib_0.1.4        generics_0.1.3        
## [40] jsonlite_1.8.3         BiocParallel_1.28.3    car_3.1-1             
## [43] dplyr_1.0.10           RCurl_1.98-1.9         magrittr_2.0.3        
## [46] GenomeInfoDbData_1.2.7 Matrix_1.5-3           Rcpp_1.0.9            
## [49] munsell_0.5.0          fansi_1.0.3            abind_1.4-5           
## [52] lifecycle_1.0.3        stringi_1.7.8          yaml_2.3.6            
## [55] carData_3.0-5          MASS_7.3-55            zlibbioc_1.40.0       
## [58] grid_4.1.3             blob_1.2.3             parallel_4.1.3        
## [61] crayon_1.5.2           lattice_0.20-45        Biostrings_2.62.0     
## [64] splines_4.1.3          annotate_1.72.0        KEGGREST_1.34.0       
## [67] locfit_1.5-9.6         knitr_1.40             pillar_1.8.1          
## [70] ggsignif_0.6.4         codetools_0.2-18       geneplotter_1.72.0    
## [73] XML_3.99-0.12          glue_1.6.2             evaluate_0.18         
## [76] png_0.1-7              vctrs_0.5.1            gtable_0.3.1          
## [79] purrr_0.3.5            tidyr_1.2.1            assertthat_0.2.1      
## [82] ashr_2.2-54            cachem_1.0.6           xfun_0.35             
## [85] xtable_1.8-4           broom_1.0.1            rstatix_0.7.1         
## [88] survival_3.2-13        truncnorm_1.0-8        tibble_3.1.8          
## [91] AnnotationDbi_1.56.2   memoise_2.0.1