Last updated: 2024-07-26
Checks: 7 0
Knit directory: Cinquina_2024/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20240320)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 373fd42. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/SCP1290/
Ignored: data/azimuth_integrated.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/methods.Rmd
) and HTML
(docs/methods.html
) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 373fd42 | Evgenii O. Tretiakov | 2024-07-26 | added MTT analysis figure legends |
Rmd | 285dc11 | Evgenii O. Tretiakov | 2024-07-26 | update theme in Rmds |
html | 5fe5043 | Evgenii O. Tretiakov | 2024-07-26 | fix typos |
Rmd | 25c1972 | Evgenii O. Tretiakov | 2024-07-25 | fix typos |
html | c7313e9 | Evgenii O. Tretiakov | 2024-07-25 | Build site. |
Rmd | 70b65cf | Evgenii O. Tretiakov | 2024-07-25 | workflowr::wflow_publish(here::here("analysis/methods.Rmd"), |
Rmd | 3b3abfe | EugOT | 2024-07-25 | add bibliography files |
Rmd | 78f7fe4 | Evgenii O. Tretiakov | 2024-07-25 | add description |
# Presentation
library(glue)
library(knitr)
# JSON
library(jsonlite)
# Tidyverse
library(tidyverse)
library(Seurat)
library(dabestr)
library(here)
library(RColorBrewer)
library(scCustomize)
library(SeuratData)
library(SeuratWrappers)
library(Azimuth)
library(magrittr)
library(cowplot)
library(patchwork)
dir.create(here::here("output", DOCNAME), showWarnings = FALSE)
write_bib(c("base", "Seurat", "SeuratWrappers", "SeuratData", "sctransform",
"patchwork", "scCustomize", "cowplot", "UpSetR", "gridExtra",
"tidyverse", "dplyr", "tidyr", "magrittr", "stringr", "purrr",
"here", "workflowr", "knitr", "rmarkdown", "dabestr",
"RColorBrewer", "Azimuth", "DESeq2", "ggplot2",
"viridis", "jsonlite", "glue"),
file = here::here("output", DOCNAME, "packages.bib"))
This methods section details the analytical approaches used in our study of astrocyte modulation of neuronal development through S100A6 signaling. We employed state-of-the-art single-cell RNA sequencing analysis techniques to explore gene expression patterns in the developing mouse cortex, and used robust statistical methods to analyze MTT assay data measuring astrocyte viability under various conditions. Our approach emphasizes reproducibility, statistical rigor, and comprehensive data visualization.
versions <- list(
Seurat = packageVersion("Seurat"),
dabestr = packageVersion("dabestr"),
tidyverse = packageVersion("tidyverse"),
RColorBrewer = packageVersion("RColorBrewer"),
scCustomize = packageVersion("scCustomize"),
SeuratData = packageVersion("SeuratData"),
SeuratWrappers = packageVersion("SeuratWrappers"),
Azimuth = packageVersion("Azimuth"),
cowplot = packageVersion("cowplot"),
patchwork = packageVersion("patchwork"),
R = R.version.string
)
We analyzed single-cell RNA sequencing data from developing mouse cortex spanning embryonic day (E) 10 to postnatal day (P) 4. The dataset was obtained from (Di Bella et al. 2021) and accessed through the Single Cell Portal (SCP1290; (Tarhan et al., n.d.)). Raw count data and metadata were downloaded and processed using Seurat (v5.1.0), (Satija et al. 2015; Stuart and Satija 2019) in R (vR version 4.4.0 (2024-04-24)). We chose Seurat for its comprehensive toolset for quality control, analysis, and exploration of single-cell RNA-seq data, as well as its wide adoption in the field.
The raw count matrix was loaded using the Read10X()
function from Seurat. We performed the following preprocessing
steps:
The log1p normalized matrix was converted back to raw counts by
applying expm1()
. Scaling factors were calculated based on
the total UMI counts per cell. The count matrix was scaled by
multiplying each cell’s counts by its scaling factor. A new Seurat
object was created using the scaled count matrix. Cells annotated as
doublets, low quality, or red blood cells were removed using the
subset()
function. The data was then normalized using the
NormalizeData()
function, and 5000 highly variable features
were identified using FindVariableFeatures()
.
We performed principal component analysis (PCA) on the variable
features using RunPCA()
. Based on the Elbow plot, which
indicates the explained variability of each principal component, we
selected the first 30 out of 50 PCs for downstream analysis. This choice
helps reduce noise in the data while ensuring biological reproducibility
of results.
Uniform Manifold Approximation and Projection (UMAP; (McInnes et al. 2018)) and t-distributed Stochastic Neighbor Embedding (t-SNE; (Maaten and Hinton 2008; Kobak and Linderman 2021)) were used for dimensionality reduction, with embeddings stored in the Seurat object. Both techniques used the selected 30 PCs as input.
Cells were clustered using the FindNeighbors()
and
FindClusters()
functions. For community detection, we
employed the Leiden algorithm (resolution = 0.7) instead of the commonly
used Louvain algorithm or alternatives such as walktrap, multilevel, or
infomap. The Leiden algorithm was chosen for its ability to find
converged optimal solutions more efficiently, which is particularly
beneficial for large-scale single-cell datasets (Traag, Waltman, and Eck 2019).
We analyzed the expression of S100 family genes and a curated list of
genes of interest across different developmental stages and cell types.
Feature plots, violin plots, and dot plots were generated using Seurat’s
visualization functions (FeaturePlot()
,
VlnPlot()
, DotPlot()
) and custom functions
from the scCustomize package (v2.1.2).
Cells annotated as astrocytes, apical progenitors, and cycling glial
cells were subset for focused analysis of the astrocyte lineage. This
subset was re-clustered using the same approach as described above. We
performed differential expression analysis between astrocyte clusters
using both the FindAllMarkers
function in Seurat (using a
logistic regression test; (Ntranos et al.
2019)) and DESeq2 (v1.44.0), Love, Huber,
and Anders (2014) on pseudobulk data aggregated by cluster and
developmental stage. The combination of these two approaches allows us
to leverage the strengths of both single-cell and bulk RNA-seq
differential expression methods.
Two-dimensional UMAP plots were generated using
FeaturePlot()
with the blend = TRUE
option to
examine co-expression patterns of key genes. We used custom color
palettes and the patchwork package to create composite figures.
MTT assay data measuring astrocyte viability after treatment with eicosapentaenoic acid (EPA) at 5 μM, 10 μM, and 30 μM concentrations were analyzed using the DABEST (Data Analysis using Bootstrap-Coupled ESTimation) package v2023.9.12 in R. The analysis was performed to calculate effect sizes and their confidence intervals using estimation statistics (Ho et al. 2019).
The analysis workflow was as follows:
Data was loaded from a TSV file using read_tsv()
and
reshaped into long format using tidyr::gather()
.
For each EPA concentration (5 μM, 10 μM, and 30 μM), control and
treatment groups were compared using the load()
function
from DABEST. The data was loaded with the minimeta = TRUE
argument to enable mini-meta analysis across multiple experimental
replicates.
Mean differences between EPA-treated and control samples were
calculated using the mean_diff()
function. This function
computes:
5000 bootstrap resamples were used to generate effect size estimates with 95% confidence intervals. The confidence intervals are bias-corrected and accelerated.
Results were visualized using the dabest_plot()
function to create Cumming estimation plots. These plots show:
Additional statistical information, including p-values from permutation t-tests, was also calculated and reported, although the focus of the analysis was on effect sizes and their confidence intervals rather than null hypothesis significance testing.
MTT assay data of 100 μM glutamate treatment were analyzed the same way.
This approach allows for a comprehensive view of the treatment effects across multiple replicates, taking into account both the magnitude of the effects and the uncertainty in their estimation. The mini-meta analysis provides a summary measure of the overall treatment effect while still preserving information about individual replicates.
Visualisations and figures were primarily created using the ggplot2 (v), cowplot (v1.1.3) (Wilke 2024) and patchwork (v1.2.0.9000) packages using the viridis colour palette (v) for continuous data. UpSet plots (Conway, Lex, and Gehlenborg 2017) were produced using the UpSetR package (v) (Gehlenborg 2019) with help from the gridExtra package (v) (Auguie 2017).
Data manipulation was performed using other packages in the tidyverse (v2.0.0.9000) (Wickham 2024) particularly dplyr (v) (Wickham et al. 2023), tidyr (v) (Wickham, Vaughan, and Girlich 2024) and purrr (v) (Wickham and Henry 2023).
The analysis project was managed using the workflowr (v) (Blischak, Carbonetto, and Stephens 2023) package which was also used to produce the publicly available website displaying the analysis code, results and output. Reproducible reports were produced using knitr (v) (Xie 2024) and R Markdown (v) (Allaire et al. 2024).
Our methodological approach combines cutting-edge single-cell RNA sequencing analysis techniques with robust statistical methods for analyzing experimental data. By using tools like 1) Seurat for scRNA-seq analysis with two different frameworks for differential gene expression analysis: logit tailored for the analysis of scRNA-seq data, and DESeq2 on pseudo-bulk data, and 2) DABEST for MTT assay analysis, we ensure a comprehensive and statistically sound exploration of astrocyte-mediated neuronal development. The use of estimation statistics and mini-meta analysis allows for a nuanced interpretation of experimental results, while our focus on reproducibility and open science practices ensures that our findings can be thoroughly validated and built upon by the scientific community.
All analyses were performed using R version 4.4.0 (2024-04-24). Key packages used include Seurat v5.1.0, patchwork v1.2.0.9000, ggplot2 v3.5.1, dplyr v1.1.4, and DABEST v2023.9.12. Code for the full analysis is available at https://github.com/harkany-lab/Cinquina_2024.
versions <- purrr::map(versions, as.character)
versions <- jsonlite::toJSON(versions, pretty = TRUE)
readr::write_lines(versions,
here::here("output", DOCNAME, "package-versions.json"))
sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] patchwork_1.2.0.9000 cowplot_1.1.3 magrittr_2.0.3
[4] Azimuth_0.5.0 shinyBS_0.61.1 SeuratWrappers_0.3.5
[7] SeuratData_0.2.2.9001 scCustomize_2.1.2 RColorBrewer_1.1-3
[10] here_1.0.1 dabestr_2023.9.12 Seurat_5.1.0
[13] SeuratObject_5.0.2 sp_2.1-4 lubridate_1.9.3
[16] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[19] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
[22] tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0.9000
[25] jsonlite_1.8.8 knitr_1.47 glue_1.7.0
[28] workflowr_1.7.1
loaded via a namespace (and not attached):
[1] IRanges_2.38.0 R.methodsS3_1.8.2
[3] vroom_1.6.5 poweRlaw_0.80.0
[5] goftest_1.2-3 DT_0.33
[7] Biostrings_2.72.1 vctrs_0.6.5
[9] spatstat.random_3.2-3 digest_0.6.35
[11] png_0.1-8 shape_1.4.6.1
[13] git2r_0.33.0 ggrepel_0.9.5.9999
[15] deldir_2.0-4 parallelly_1.37.1
[17] MASS_7.3-61 Signac_1.13.9003
[19] reshape2_1.4.4 httpuv_1.6.15
[21] BiocGenerics_0.50.0 withr_3.0.0
[23] ggrastr_1.0.2 xfun_0.45
[25] survival_3.7-0 EnsDb.Hsapiens.v86_2.99.0
[27] memoise_2.0.1 ggbeeswarm_0.7.2
[29] janitor_2.2.0.9000 zoo_1.8-12
[31] GlobalOptions_0.1.2 gtools_3.9.5
[33] pbapply_1.7-2 R.oo_1.26.0
[35] rematch2_2.1.2 KEGGREST_1.44.0
[37] promises_1.3.0 httr_1.4.7
[39] restfulr_0.0.15 globals_0.16.3
[41] fitdistrplus_1.1-11 rhdf5filters_1.16.0
[43] ps_1.7.6 rhdf5_2.48.0
[45] rstudioapi_0.16.0 UCSC.utils_1.0.0
[47] miniUI_0.1.1.1 generics_0.1.3
[49] processx_3.8.4 curl_5.2.1
[51] S4Vectors_0.42.0 zlibbioc_1.50.0
[53] polyclip_1.10-6 GenomeInfoDbData_1.2.12
[55] SparseArray_1.4.8 xtable_1.8-4
[57] pracma_2.4.4 evaluate_0.24.0
[59] S4Arrays_1.4.1 hms_1.1.3
[61] GenomicRanges_1.56.1 irlba_2.3.5.1
[63] colorspace_2.1-0 hdf5r_1.3.10
[65] ROCR_1.0-11 reticulate_1.37.0
[67] spatstat.data_3.0-4 lmtest_0.9-40
[69] snakecase_0.11.1 later_1.3.2
[71] lattice_0.22-6 spatstat.geom_3.2-9
[73] future.apply_1.11.2 getPass_0.2-4
[75] scattermore_1.2 XML_3.99-0.16.1
[77] matrixStats_1.3.0 RcppAnnoy_0.0.22
[79] pillar_1.9.0 nlme_3.1-165
[81] pwalign_1.0.0 caTools_1.18.2
[83] compiler_4.4.0 RSpectra_0.16-1
[85] stringi_1.8.4 tensor_1.5
[87] SummarizedExperiment_1.34.0 GenomicAlignments_1.40.0
[89] plyr_1.8.9 crayon_1.5.2
[91] abind_1.4-5 BiocIO_1.14.0
[93] googledrive_2.1.1 bit_4.0.5
[95] fastmatch_1.1-4 whisker_0.4.1
[97] codetools_0.2-20 bslib_0.7.0
[99] paletteer_1.6.0 plotly_4.10.4
[101] mime_0.12 splines_4.4.0
[103] circlize_0.4.16 Rcpp_1.0.12
[105] fastDummies_1.7.3 cellranger_1.1.0
[107] blob_1.2.4 utf8_1.2.4
[109] seqLogo_1.70.0 AnnotationFilter_1.28.0
[111] fs_1.6.4 listenv_0.9.1
[113] Matrix_1.7-0 callr_3.7.6
[115] tzdb_0.4.0 pkgconfig_2.0.3
[117] tools_4.4.0 cachem_1.1.0
[119] RSQLite_2.3.7 viridisLite_0.4.2
[121] DBI_1.2.3 fastmap_1.2.0
[123] rmarkdown_2.27 scales_1.3.0
[125] grid_4.4.0 ica_1.0-3
[127] shinydashboard_0.7.2 Rsamtools_2.20.0
[129] sass_0.4.9 ggprism_1.0.5
[131] BiocManager_1.30.23 dotCall64_1.1-1
[133] RANN_2.6.1 yaml_2.3.8
[135] MatrixGenerics_1.16.0 rtracklayer_1.64.0
[137] cli_3.6.2 stats4_4.4.0
[139] leiden_0.4.3.1 lifecycle_1.0.4
[141] uwot_0.2.2 Biobase_2.64.0
[143] presto_1.0.0 BSgenome.Hsapiens.UCSC.hg38_1.4.5
[145] BiocParallel_1.38.0 annotate_1.82.0
[147] timechange_0.3.0 gtable_0.3.5
[149] rjson_0.2.21 ggridges_0.5.6
[151] progressr_0.14.0 parallel_4.4.0
[153] RcppHNSW_0.6.0 TFBSTools_1.42.0
[155] bitops_1.0-7 bit64_4.0.5
[157] Rtsne_0.17 spatstat.utils_3.0-5
[159] CNEr_1.40.0 jquerylib_0.1.4
[161] shinyjs_2.1.0 SeuratDisk_0.0.0.9021
[163] R.utils_2.12.3 lazyeval_0.2.2
[165] shiny_1.8.1.1 htmltools_0.5.8.1
[167] GO.db_3.19.1 sctransform_0.4.1
[169] rappdirs_0.3.3 ensembldb_2.28.0
[171] TFMPvalue_0.0.9 spam_2.10-0
[173] googlesheets4_1.1.1 XVector_0.44.0
[175] RCurl_1.98-1.14 rprojroot_2.0.4
[177] BSgenome_1.72.0 gridExtra_2.3
[179] JASPAR2020_0.99.10 igraph_2.0.3
[181] R6_2.5.1 RcppRoll_0.3.0
[183] GenomicFeatures_1.56.0 cluster_2.1.6
[185] Rhdf5lib_1.26.0 gargle_1.5.2
[187] GenomeInfoDb_1.40.1 DirichletMultinomial_1.46.0
[189] DelayedArray_0.30.1 tidyselect_1.2.1
[191] vipor_0.4.7 ProtGenerics_1.36.0
[193] AnnotationDbi_1.66.0 future_1.33.2
[195] rsvd_1.0.5 munsell_0.5.1
[197] KernSmooth_2.23-24 data.table_1.15.4
[199] htmlwidgets_1.6.4 rlang_1.1.4
[201] spatstat.sparse_3.0-3 spatstat.explore_3.2-7
[203] remotes_2.5.0 fansi_1.0.6
[205] beeswarm_0.4.0