1 Reading the data

The input files are:

the output tables made by featureCounts (run in rule ‘count_on_TE’ of the Snakefile), which contain read counts of all repetitive elements (i.e. all repNames) for all samples
the repetitive elements annotation ../../data/annotations/RepeatMasker_RepeatLibrary20140131_mm10.noGenes.noSimple.bed (converted to SAF format in rule of Snakefile) used for featureCounts command, that is used to retrieve repFamily and repClass for each repName. The annotation I chose is the most updated library to Nov2020 of rmsk, from which I removed ‘Simple repeats’ and ‘Low complexity regions’
a table with STAR input reads per sample
the samples’ table, containing info on experimental design:

2 Analysis at family level

2.1 FPKM for each family of Repetitive Elements

For each family of Repetitive Elements (in case of elements with no repFamily name or repFamilies belonging to more than one repClass I use repClass) I compute FPKM values as follows: for each sample:

I compute the sum of counts for all elements belonging to that repFamily
I divide this sum by the total number of STAR input reads (/2 because with featureCounts I quantify fragments) for that sample and multiply by 10⁶
I divide this number by the total sum of lengths (in Kb) of the elements belonging to that repFamily –> FPKM
I subtract from each FPKM the total FPKM of all transposons belonging to the DNA repClass.

2.2 FPKM of specific families per condition

The values are the FPKM corrected for DNA transposons.

2.2.1 testing the difference in the mean FPKM per condition

I use t-test on the FPKM values corrected for DNA transposons.

[[1]] [1] “L1 - F pval = 0.125541125541126 - M pval = 0.536796536796537”

[[2]] [1] “ERVK - F pval = 0.662337662337662 - M pval = 0.125541125541126”

[[3]] [1] “ERVL - F pval = 0.051948051948052 - M pval = 0.0303030303030303”

[[4]] [1] “Satellite - F pval = 0.536796536796537 - M pval = 0.00432900432900433”

3 DE-Seq analysis of RNA transposons

I include the FPKM of DNA transposons as confounding factor in DESeq2 formula.

Before running the Differential Expression analysis, the data are pre-filtered to remove all repetitive elements with < 10 reads among all samples.

3.1 Differentially expressed TEs

## log2 fold change (MMSE): condition M_HEMI vs M_WT 
## Wald test p-value: condition M_HEMI vs M_WT 
## DataFrame with 4 rows and 5 columns
##             baseMean log2FoldChange     lfcSE      pvalue      padj
##            <numeric>      <numeric> <numeric>   <numeric> <numeric>
## IAPLTR2_Mm   179.523       0.316178 0.1519884 0.000104618 0.0101654
## MERVL-int    527.511       0.258159 0.0914886 0.000145426 0.0110614
## L1MdTf_II    393.994       0.286710 0.1144614 0.000151786 0.0110614
## L1MdTf_I     401.114       0.280728 0.1140062 0.000176763 0.0114503

3.2 MA-plots

The threshold used for a dot to be coloured in the MA-plots is p-value adjusted < 0.05.
Transposable elements whose mean expression > 20 and abs(log2FoldChange) > 0.2 are labeled.

3.3 FPKM of differentially expressed TEs

I also include two non-DE TEs, one expressed and one non-expressed in murine trophoblast cells (Ext.Fig3g in Weigert et al. 2023 ).

sessionInfo()

## R version 4.1.3 (2022-03-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=it_IT.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggpubr_0.5.0                ggrepel_0.9.2              
##  [3] gridExtra_2.3               DESeq2_1.34.0              
##  [5] SummarizedExperiment_1.24.0 Biobase_2.54.0             
##  [7] MatrixGenerics_1.6.0        matrixStats_0.62.0         
##  [9] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
## [11] IRanges_2.28.0              S4Vectors_0.32.4           
## [13] BiocGenerics_0.40.0         pheatmap_1.0.12            
## [15] data.table_1.14.6           ggplot2_3.4.0              
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7           bit64_4.0.5            RColorBrewer_1.1-3    
##  [4] httr_1.4.4             tools_4.1.3            backports_1.4.1       
##  [7] bslib_0.4.1            irlba_2.3.5.1          utf8_1.2.2            
## [10] R6_2.5.1               DBI_1.1.3              colorspace_2.0-3      
## [13] withr_2.5.0            tidyselect_1.2.0       bit_4.0.5             
## [16] compiler_4.1.3         cli_3.4.1              DelayedArray_0.20.0   
## [19] labeling_0.4.2         sass_0.4.2             scales_1.2.1          
## [22] SQUAREM_2021.1         genefilter_1.76.0      mixsqp_0.3-48         
## [25] stringr_1.4.1          digest_0.6.30          rmarkdown_2.18        
## [28] XVector_0.34.0         pkgconfig_2.0.3        htmltools_0.5.3       
## [31] invgamma_1.1           highr_0.9              fastmap_1.1.0         
## [34] rlang_1.0.6            rstudioapi_0.14        RSQLite_2.2.18        
## [37] prettydoc_0.4.1        farver_2.1.1           jquerylib_0.1.4       
## [40] generics_0.1.3         jsonlite_1.8.3         BiocParallel_1.28.3   
## [43] car_3.1-1              dplyr_1.0.10           RCurl_1.98-1.9        
## [46] magrittr_2.0.3         GenomeInfoDbData_1.2.7 Matrix_1.5-3          
## [49] Rcpp_1.0.9             munsell_0.5.0          fansi_1.0.3           
## [52] abind_1.4-5            lifecycle_1.0.3        stringi_1.7.8         
## [55] yaml_2.3.6             carData_3.0-5          zlibbioc_1.40.0       
## [58] grid_4.1.3             blob_1.2.3             parallel_4.1.3        
## [61] crayon_1.5.2           lattice_0.20-45        Biostrings_2.62.0     
## [64] splines_4.1.3          annotate_1.72.0        KEGGREST_1.34.0       
## [67] locfit_1.5-9.6         knitr_1.40             pillar_1.8.1          
## [70] ggsignif_0.6.4         codetools_0.2-18       geneplotter_1.72.0    
## [73] XML_3.99-0.12          glue_1.6.2             evaluate_0.18         
## [76] png_0.1-7              vctrs_0.5.1            gtable_0.3.1          
## [79] purrr_0.3.5            tidyr_1.2.1            assertthat_0.2.1      
## [82] ashr_2.2-54            cachem_1.0.6           xfun_0.35             
## [85] xtable_1.8-4           broom_1.0.1            rstatix_0.7.1         
## [88] survival_3.2-13        truncnorm_1.0-8        tibble_3.1.8          
## [91] AnnotationDbi_1.56.2   memoise_2.0.1

Transposable Elements (TEs) analysis in E12.5 placentae from OgtY841A-het Females x OgtY841A-homo males

Sara Formichetti

June 29th, 2022