There are a number of statistical tests/R packages one can use to perform differential abundance testing for proteomics data. The list below is by no means complete:

Load dependencies

Load the required libraries.

library(camprotR)
library(ggplot2)
library(MSnbase)
library(DEqMS)
library(limma)
library(dplyr)
library(tidyr)

Here, we will use the SILAC data processed in Processing and QC of SILAC data. Please see the previous notebook for details of the experimental design and aim and data processing.

In brief, we wish to determine the proteins which are significantly enriched by UV crosslinking (CL) vs non-CL control (NC).

pep_res <- readRDS('results/pep_res.rds')
prot_res <- readRDS('results/prot_res.rds')

Differential abundance testing

To identify proteins with significantly increased abundance in CL vs NC, we will use DEqMS (Zhu et al. 2020), which you can think of as an extension of limma (Ritchie et al. 2015) specifically for proteomics.

The analysis steps are taken from the DEqMS vignette. The idea is that we want to create an MArrayLM object as per normal limma analysis and then add a $count column to the MArrayLM object and then use the spectraCounteBayes function to perform the Bayesian shrinkage using the count column (which describes the number of pepitdes per protein) rather than the $Amean column (which describes the mean CL:NC ratio).

First, we create the MArrayLM object. We filter out proteins with too few quantification values across the 4 replicates.

dat <- prot_res %>% 
  filterNA(pNA = 0.5) %>% # select proteins in min. 2/4 reps
  exprs()

# Perfoming the equivalent of a one-sample t-test, so the design matrix is just an intercept
design <- cbind(Intercept = rep(1, ncol(dat))) 

fit <- lmFit(dat, design)
efit <- eBayes(fit)

Next, we define the $count column, which DEqMS will use. In the DEqMS paper, they suggest that the best summarisation metric to use is the minimum value across the samples, so our count columns is the minimum number of peptides per protein.

protein_ratio_long <- prot_res %>%
  exprs() %>%
  data.frame() %>%
  tibble::rownames_to_column('Master.Protein.Accessions') %>%
  pivot_longer(cols=-Master.Protein.Accessions, values_to='protein_ratio', names_to='sample')

# Obtain the min peptide count across the samples and determine the minimum value across 
# samples
min_pep_count <- camprotR::count_features_per_protein(pep_res) %>%
  merge(protein_ratio_long, by=c('Master.Protein.Accessions', 'sample')) %>%
  filter(is.finite(protein_ratio)) %>%  # We only want to consider samples with a ratio quantified
  group_by(Master.Protein.Accessions) %>%
  summarise(min_pep_count = min(n))

# add the min peptide count
efit$count <- min_pep_count %>% 
  filter(Master.Protein.Accessions %in% rownames(efit$coefficients)) %>% 
  pull(min_pep_count)

And now we run spectraCounteBayes from DEqMS to perform the statistical test.

# run DEqMS
efit_deqms <- suppressWarnings(spectraCounteBayes(efit))

Below, we inspect the peptide count vs variance relationship which DEqMS is using in the statistical test.

In this case the relationship between peptide count and variance is only really apparent when the minimum number of peptides from which a protein-level ratio is obtained is very low. Typically, one might remove quantifications where there is just a single peptide for a protein. Here, have keep them, and we can refer to the count column in the final results object if we want to check the minimum number of peptides observed per sample.

# Diagnostic plots
VarianceBoxplot(efit_deqms, n = 30, xlab = "Peptides")

Below, we summarise the number of proteins with statistically different abundance in CL vs NC and plot a ‘volcano’ plot to visualise this. Note that all proteins with a statistically significant change are increased in CL, so the plot looks more like a fire-hose than a volcano!

deqms_results <- outputResult(efit_deqms, coef_col = 1)

table(ifelse(deqms_results$sca.adj.pval < 0.01, 'sig.', 'not sig.'),
      ifelse(deqms_results$logFC > 0, 'Higher in CL', 'Lower in CL'))
#>           
#>            Higher in CL Lower in CL
#>   not sig.          409          77
#>   sig.              246           0

deqms_results %>%
  ggplot(aes(x = logFC, y = -log10(sca.P.Value), colour = sca.adj.pval < 0.01)) +
  geom_point() +
  theme_camprot() +
  scale_colour_manual(values = c('grey', get_cat_palette(2)[2]), name = 'CL vs NC Sig.') +
  labs(x = 'CL vs NC (Log2)', y = '-log10(p-value)')

Finally, we can explore and, if we wish, export our results. Importantly, the $t, $P.Value and $adj.P.Val columnts are from limma. The columns prefixed with sca are the from DEqMS.

head(deqms_results)
#>           logFC  AveExpr        t      P.Value    adj.P.Val        B   gene
#> P15880 5.843093 5.843093 28.13353 2.300892e-07 0.0001590016 7.528938 P15880
#> P22626 6.561859 6.561859 25.16140 4.344306e-07 0.0001590016 7.084582 P22626
#> Q13148 4.920545 4.920545 22.69799 7.802209e-07 0.0001607499 6.640967 Q13148
#> P43243 4.424089 4.424089 18.70247 2.336893e-06 0.0001607499 5.728685 P43243
#> P23588 5.250793 5.250793 20.13712 1.538278e-06 0.0001607499 6.088209 P23588
#> P84103 5.857123 5.857123 20.66152 1.329806e-06 0.0001607499 6.210104 P84103
#>        count    sca.t  sca.P.Value sca.adj.pval
#> P15880     4 35.89893 4.910590e-08 3.594552e-05
#> P22626     7 29.58960 1.498581e-07 5.484808e-05
#> Q13148     1 24.37400 4.578182e-07 7.341608e-05
#> P43243     7 22.91838 6.522450e-07 7.341608e-05
#> P23588     3 22.86445 6.611324e-07 7.341608e-05
#> P84103     1 21.57023 9.237939e-07 7.341608e-05
Ritchie, Matthew E., Belinda Phipson, Di Wu, Yifang Hu, Charity W. Law, Wei Shi, and Gordon K. Smyth. 2015. “Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47. https://doi.org/10.1093/nar/gkv007.
Zhu, Yafeng, Lukas M. Orre, Yan Zhou Tran, Georgios Mermelekas, Henrik J. Johansson, Alina Malyutina, Simon Anders, and Janne Lehtiö. 2020. “DEqMS: A Method for Accurate Variance Estimation in Differential Protein Expression Analysis.” Molecular & Cellular Proteomics : MCP 19 (6): 1047–57. https://doi.org/10.1074/mcp.TIR119.001646.