For this PCA:
To understand how many genes to consider, I plot their cumulative variance shown in the data, ordered by their contribution to each PC.
–> I store the 50 and 30 genes mostly associated with PC1 and PC2 (i.e. with highest contribution), respectively, together with their coordinates on PC1 and PC2. Their expression dynamics across mouse preimplantation stages will be plotted using publicly available mRNA-Seq data spanning mouse early development.
sessionInfo()
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=it_IT.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=it_IT.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=it_IT.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] factoextra_1.0.7 gridExtra_2.3
## [3] ggpubr_0.5.0 data.table_1.14.6
## [5] ggrepel_0.9.2 ggplot2_3.4.0
## [7] DESeq2_1.34.0 SummarizedExperiment_1.24.0
## [9] Biobase_2.54.0 MatrixGenerics_1.6.0
## [11] matrixStats_0.62.0 GenomicRanges_1.46.1
## [13] GenomeInfoDb_1.30.1 IRanges_2.28.0
## [15] S4Vectors_0.32.4 BiocGenerics_0.40.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-7 bit64_4.0.5 RColorBrewer_1.1-3
## [4] httr_1.4.4 tools_4.1.3 backports_1.4.1
## [7] bslib_0.4.1 utf8_1.2.2 R6_2.5.1
## [10] DBI_1.1.3 colorspace_2.0-3 withr_2.5.0
## [13] tidyselect_1.2.0 bit_4.0.5 compiler_4.1.3
## [16] cli_3.4.1 DelayedArray_0.20.0 labeling_0.4.2
## [19] sass_0.4.2 scales_1.2.1 genefilter_1.76.0
## [22] stringr_1.4.1 digest_0.6.30 rmarkdown_2.18
## [25] XVector_0.34.0 pkgconfig_2.0.3 htmltools_0.5.3
## [28] highr_0.9 fastmap_1.1.0 rlang_1.0.6
## [31] rstudioapi_0.14 RSQLite_2.2.18 farver_2.1.1
## [34] jquerylib_0.1.4 generics_0.1.3 jsonlite_1.8.3
## [37] BiocParallel_1.28.3 dplyr_1.0.10 car_3.1-1
## [40] RCurl_1.98-1.9 magrittr_2.0.3 GenomeInfoDbData_1.2.7
## [43] Matrix_1.5-3 Rcpp_1.0.9 munsell_0.5.0
## [46] fansi_1.0.3 abind_1.4-5 lifecycle_1.0.3
## [49] stringi_1.7.8 yaml_2.3.6 carData_3.0-5
## [52] zlibbioc_1.40.0 grid_4.1.3 blob_1.2.3
## [55] parallel_4.1.3 crayon_1.5.2 lattice_0.20-45
## [58] Biostrings_2.62.0 splines_4.1.3 annotate_1.72.0
## [61] KEGGREST_1.34.0 locfit_1.5-9.6 knitr_1.40
## [64] pillar_1.8.1 ggsignif_0.6.4 codetools_0.2-18
## [67] geneplotter_1.72.0 XML_3.99-0.12 glue_1.6.2
## [70] evaluate_0.18 png_0.1-7 vctrs_0.5.1
## [73] gtable_0.3.1 purrr_0.3.5 tidyr_1.2.1
## [76] assertthat_0.2.1 cachem_1.0.6 xfun_0.35
## [79] xtable_1.8-4 broom_1.0.1 rstatix_0.7.1
## [82] survival_3.2-13 tibble_3.1.8 AnnotationDbi_1.56.2
## [85] memoise_2.0.1