Label-free Quantification Proteomics

Preamble
Input data
- Parse PeptideGroups.txt file
Convert to MSnSet
QC peptides
Missing values
Normalise peptide intensities
Summarising to protein-level abundance
Summarising to protein-level abundances
- Optional extra sections on normalisation and summarisation

Preamble

Label-Free Quantification (LFQ) is the simplest form of quantitative proteomics, in which different samples are quantified in separate MS runs.

Since each sample is run separately, different peptides will be quantified in each sample and the peptide intensities may not be directly comparable between samples. One solution to the higher burden of missing values is ‘match-between-runs’ (Cox et al. 2014), or the functionally equivalent ‘Minora’ algorithm employed by Proteome Discoverer (PD). These algorithms use the observed retention times of MS1 ions which were successfully spectrum matched in one sample to identify the likely peptide sequence of MS1 ions that could not be spectrum matched in another sample.

Despite the pitfalls of LFQ, the data analysis is still relatively straightforward, though there are steps that need some careful consideration and quality control assessment.

Load dependencies

Load the required libraries.

library(ggplot2)
library(MSnbase)
library(biobroom)
library(camprotR)
library(Proteomics.analysis.data)
library(dplyr)
library(tidyr)

Input data

We start with the peptide-level output from PD. We recommend staring from peptide-level PD output for LFQ data, as this will allow you to perform QC at the peptide-level and then summarise to protein-level abundance in a more appropriate manner than PD does by default, which is to simply sum all PSMs passing filters.

The data we are using are from an experiment designed to identify RNA-binding proteins (RBPs) in the U-2 OS cell line using the OOPS method (Queiroz et al. 2019). A comparison of RNase +/- is used to separate RBPs from background non-specific proteins. Four replicate experiments were performed, with the RNase +/- experiments performed from the same OOPS interface. For each LFQ run, approximately the same quantity of peptides were injected, based on quantification of peptide concentration post trypsin digestion. This data has not published but the aim of the experiment is equivalent to Figure 2e in the original OOPS paper.

The data we will use is available through the Proteomics.analysis.data package.

pep_data <- read.delim(
  system.file("extdata", 'OOPS_RNase_LFQ', 'LFQ_OOPS_RNase_PeptideGroups.txt',
              package = "Proteomics.analysis.data"))

We can explore the structure of the input data using str. We see that we have a data.frame with 7808 rows and 41 columns. The most important columns to us are:

Sequence: The sequence of the peptide
Modifications: The detected peptide modifications, including variable, e.g induced modifications such as oxidation
Master.Protein.Accessions: The assigned master protein(s)
Abundance.F*.Sample: Columns with the peptide intensities

str(pep_data)
#> 'data.frame':    7808 obs. of  41 variables:
#>  $ Peptide.Groups.Peptide.Group.ID                      : int  3 103128 103062 103059 103056 103049 103048 103045 102936 102916 ...
#>  $ Checked                                              : chr  "False" "False" "False" "False" ...
#>  $ Confidence                                           : chr  "High" "High" "High" "High" ...
#>  $ Sequence                                             : chr  "AAAAAAAAAAAAAAAGAGAGAK" "LVKPGNQNTQVTEAWNK" "LVGSVNLFSDENVPR" "LVGSQEELASWGHEYVR" ...
#>  $ Modifications                                        : chr  "" "" "" "" ...
#>  $ Qvality.PEP                                          : num  4.27e-07 8.31e-04 1.36e-03 7.92e-04 1.87e-06 ...
#>  $ Qvality.q.value                                      : num  0.00026 0.00026 0.00026 0.00026 0.00026 ...
#>  $ Number.of.Protein.Groups                             : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ Number.of.Proteins                                   : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ Number.of.PSMs                                       : int  8 8 4 8 16 7 26 12 10 11 ...
#>  $ Master.Protein.Accessions                            : chr  "P55011" "Q9UQ80" "Q9BQG0" "Q13200" ...
#>  $ Protein.Accessions                                   : chr  "P55011" "Q9UQ80" "Q9BQG0" "Q13200" ...
#>  $ Number.of.Missed.Cleavages                           : int  0 0 0 0 0 1 0 0 0 1 ...
#>  $ Theo.MHplus.in.Da                                    : num  1597 1927 1646 1960 1515 ...
#>  $ Abundance.F17.Sample                                 : num  1927597 3768782 NA NA 6090638 ...
#>  $ Abundance.F18.Sample                                 : num  3540084 6682988 NA NA 12042683 ...
#>  $ Abundance.F19.Sample                                 : num  3466165 14879391 NA NA 19815360 ...
#>  $ Abundance.F20.Sample                                 : num  789214 1826370 NA NA 3435918 ...
#>  $ Abundance.F21.Sample                                 : num  1425166 NA NA NA 1736376 ...
#>  $ Abundance.F22.Sample                                 : num  2515468 1258701 NA NA 2838919 ...
#>  $ Abundance.F23.Sample                                 : num  2426854 1511282 NA NA 3298768 ...
#>  $ Abundance.F24.Sample                                 : num  1370120 819806 NA NA 1711976 ...
#>  $ Quan.Info                                            : chr  "" "" "NoQuanValues" "NoQuanValues" ...
#>  $ Found.in.Sample.in.S17.F17.Sample                    : chr  "High" "High" "High" "High" ...
#>  $ Found.in.Sample.in.S18.F18.Sample                    : chr  "High" "High" "High" "High" ...
#>  $ Found.in.Sample.in.S19.F19.Sample                    : chr  "High" "High" "Not Found" "High" ...
#>  $ Found.in.Sample.in.S20.F20.Sample                    : chr  "High" "High" "Not Found" "Not Found" ...
#>  $ Found.in.Sample.in.S21.F21.Sample                    : chr  "High" "Not Found" "Not Found" "Not Found" ...
#>  $ Found.in.Sample.in.S22.F22.Sample                    : chr  "High" "High" "Not Found" "Not Found" ...
#>  $ Found.in.Sample.in.S23.F23.Sample                    : chr  "High" "Peak Found" "Not Found" "High" ...
#>  $ Found.in.Sample.in.S24.F24.Sample                    : chr  "High" "Peak Found" "Not Found" "Not Found" ...
#>  $ Confidence.by.Search.Engine.MS.Amanda.20             : chr  "n/a" "High" "High" "High" ...
#>  $ Confidence.by.Search.Engine.Sequest.HT               : chr  "High" "High" "High" "High" ...
#>  $ Percolator.q.Value.by.Search.Engine.MS.Amanda.20     : num  NA 0.000358 0.000434 0.000347 0.000434 ...
#>  $ Percolator.q.Value.by.Search.Engine.Sequest.HT       : num  0.00041 0.000469 0.000457 0.00041 0.000469 ...
#>  $ Percolator.PEP.by.Search.Engine.MS.Amanda.20         : num  NA 3.76e-04 6.54e-04 3.47e-04 1.06e-06 ...
#>  $ Percolator.PEP.by.Search.Engine.Sequest.HT           : num  3.16e-07 3.91e-04 3.71e-04 1.21e-03 1.69e-06 ...
#>  $ Amanda.Score.by.Search.Engine.MS.Amanda.20           : num  NA 45.4 81.1 65.3 132.2 ...
#>  $ CharmeRT.Combined.Score.by.Search.Engine.MS.Amanda.20: num  NA 45.4 81.1 65.3 132.2 ...
#>  $ XCorr.by.Search.Engine.Sequest.HT                    : num  5.77 3 4.34 4.37 3.85 5.92 3.09 4.3 2.92 3.58 ...
#>  $ Top.Apex.RT.in.min                                   : num  65.7 57.8 NA NA 66.6 ...

Discussion 1

Examine the column names and answer the following:

How do the numerical values in the Abundance.F*.Sample columns relate to the conditions (RNase +/-)?

Solution

# It's not possible to tell!

Solution end

The InputFiles.txt file from PD can be used to determine the relationship between the Abundance.F*.Sample columns and the samples. If you are in any doubt, you can also check with the Proteomics Facility.

The accompanying LFQ_OOPS_RNase_InputFiles.txt file is also available through the Proteomics.analysis.data package.

inputfiles <- read.delim(
  system.file("extdata", 'OOPS_RNase_LFQ', 'LFQ_OOPS_RNase_InputFiles.txt',
              package = "Proteomics.analysis.data"))
print(inputfiles %>% filter(Study.File.ID!=''))
#>   Input.Files.Workflow.ID Input.Files.Workflow.Level Input.Files. Study.File.ID
#> 1                    -177                          1           55           F17
#> 2                    -180                          1           40           F18
#> 3                    -183                          1           42           F19
#> 4                    -186                          1           44           F20
#> 5                    -189                          1           46           F21
#> 6                    -192                          1           48           F22
#> 7                    -195                          1           50           F23
#> 8                    -198                          1           52           F24
#>                                                                File.Name
#> 1      Z:\\RAW\\rq214\\LFQ_RNase_test_Jan2020\\R_neg_MetChloroform_1.raw
#> 2      Z:\\RAW\\rq214\\LFQ_RNase_test_Jan2020\\R_neg_MetChloroform_2.raw
#> 3      Z:\\RAW\\rq214\\LFQ_RNase_test_Jan2020\\R_neg_MetChloroform_3.raw
#> 4      Z:\\RAW\\rq214\\LFQ_RNase_test_Jan2020\\R_neg_MetChloroform_4.raw
#> 5 Z:\\RAW\\rq214\\LFQ_RNase_test_Jan2020\\R_positive_MetChloroform_1.raw
#> 6 Z:\\RAW\\rq214\\LFQ_RNase_test_Jan2020\\R_positive_MetChloroform_2.raw
#> 7 Z:\\RAW\\rq214\\LFQ_RNase_test_Jan2020\\R_positive_MetChloroform_3.raw
#> 8 Z:\\RAW\\rq214\\LFQ_RNase_test_Jan2020\\R_positive_MetChloroform_4.raw
#>           Creation.Date RT.Range.in.min  Instrument.Name Software.Revision
#> 1  2/17/2020 2:02:44 AM   40.00 - 90.00 Orbitrap Eclipse       3.3.2782.34
#> 2  2/17/2020 3:34:26 AM   40.00 - 90.00 Orbitrap Eclipse       3.3.2782.34
#> 3  2/17/2020 5:06:10 AM   40.00 - 90.00 Orbitrap Eclipse       3.3.2782.34
#> 4  2/17/2020 6:37:53 AM   40.00 - 90.00 Orbitrap Eclipse       3.3.2782.34
#> 5 2/17/2020 11:42:55 AM   40.00 - 90.00 Orbitrap Eclipse       3.3.2782.34
#> 6  2/17/2020 1:14:33 PM   40.00 - 90.00 Orbitrap Eclipse       3.3.2782.34
#> 7  2/17/2020 2:46:16 PM   40.00 - 90.00 Orbitrap Eclipse       3.3.2782.34
#> 8  2/17/2020 4:17:54 PM   40.00 - 90.01 Orbitrap Eclipse       3.3.2782.34

Here, we see that F17-20 are ‘R_neg’, e.g RNase - and F21-24 are ‘R_positive’, e.g RNase +.

We could parse the InputFiles.txt file to generate the sample information, but for this experiment, it’s small enough that it can be easily generated manually.

sample_data <- data.frame(
  File = paste0("F", 17:24),
  Sample = paste0(rep(c("RNase_neg", "RNase_pos"), each = 4), ".", 1:4),
  Condition = rep(c("RNase_neg", "RNase_pos"), each = 4),
  Replicate = rep(1:4, 2)
)

# Displaying the table in a nicer format
knitr::kable(sample_data,
             align = "cccc",
             format = "html",
             table.attr = "style='width:30%;'")

File	Sample	Condition	Replicate
F17	RNase_neg.1	RNase_neg	1
F18	RNase_neg.2	RNase_neg	2
F19	RNase_neg.3	RNase_neg	3
F20	RNase_neg.4	RNase_neg	4
F21	RNase_pos.1	RNase_pos	1
F22	RNase_pos.2	RNase_pos	2
F23	RNase_pos.3	RNase_pos	3
F24	RNase_pos.4	RNase_pos	4

Parse PeptideGroups.txt file

To simplify the process of reading in the data and performing initial filtering, we will use camprotR::parse_features. This function will read in the data and remove contaminant proteins and features without quantification data. Contaminant proteins were defined using the cRAP database and provided to PD. We need to obtain their accessions and provide these to camprotR::parse_features. Below, we parse the cRAP FASTA to extract the IDs for the cRAP proteins, in both ‘cRAP’ format and Uniprot IDs for these proteins.

The file we will use, cRAP_20190401.fasta.gz, is again available through the Proteomics.analysis.data package.


crap_fasta_inf <- system.file(
  "extdata", "cRAP_20190401.fasta.gz",
  package = "Proteomics.analysis.data"
)

# Load the cRAP FASTA used for the PD search
crap_fasta <- Biostrings::fasta.index(crap_fasta_inf, seqtype = "AA")

# Extract the UniProt accessions associated with each cRAP protein
crap_accessions <- crap_fasta %>%
  pull(desc) %>%
  stringr::str_extract_all(pattern="(?<=\\|).*?(?=\\|)") %>%
  unlist()

We can then supply these cRAP protein IDs to camprotR::parse_features() which will remove features (i.e. peptides in this case) which may originate from contaminants, as well as features which don’t have a unique master protein.

See ?parse_features for further details, including the removal of ‘associated cRAP’ for conservative contaminants removal.

pep_data_flt <- camprotR::parse_features(
  pep_data,
  level = 'peptide',
  crap_proteins = crap_accessions,
)
#> Parsing features...
#> 7808 features found from 1463 master proteins => Input
#> 242 cRAP proteins supplied
#> 364 proteins identified as 'cRAP associated'
#> 7509 features found from 1414 master proteins => cRAP features removed
#> 7476 features found from 1396 master proteins => associated cRAP features removed
#> 7471 features found from 1395 master proteins => features without a master protein removed
#> 7250 features found from 1307 master proteins => features with non-unique master proteins removed

From the above, we can see that we have started with 7808 ‘features’ (peptides) from 1463 master proteins across all samples. After removal of contaminants and peptides that can’t be assigned to a unique master protein, we have 7250 peptides remaining from 1307 master proteins.

Convert to MSnSet

We now store the filtered peptide data in an MSnSet, the standard data object for proteomics in R.

This object contains 3 elements:

A quantification data matrix (rows=features, e.g peptides/proteins, columns=samples)
Feature data (rows=features, columns=feature annotations, e.g peptide master protein assignment)
Experimental details (rows=samples, columns=experimental details, e.g treatment)

See the vignette("msnset", package="camprotR") for more details.

# Create expression matrix with peptide abundances (exprs) and
# human readable column names
exprs_data <- pep_data_flt %>%
  select(matches("Abundance")) %>%
  setNames(sample_data$Sample) %>%
  as.matrix()

# Create data.frame with sample metadata (pData)
pheno_data <- sample_data %>%
  select(-File) %>%
  tibble::column_to_rownames(var = "Sample")

# Create data.frame with peptide metadata (fData)
feature_data <- pep_data_flt %>%
  select(-matches("Abundance"))

# Create MSnSet
pep <- MSnbase::MSnSet(exprs = exprs_data,
                       fData = feature_data,
                       pData = pheno_data)

QC peptides

First of all, we want to inspect the peptide intensity distributions.

Exercise 1

Plot the distributions of intensities for each sample. What do you conclude?

Hints:

You can access the assay data using exprs(pep)

You will need to log-transform the abundances to make them interpretable

Solution

log(pep, base=2) %>% exprs() %>% boxplot()

Solution end

The above code give us a crude representation. We could make this prettier, but camprotR already has a function to explore the quantification distributions, plot_quant, which we can use instead. Below, we plot boxplots and density plot to inspect the abundance distributions.

pep %>%
  log(base = 2) %>%
  camprotR::plot_quant(method = 'box')
#> Warning: Removed 19289 rows containing non-finite values (stat_boxplot).

Peptide intensities


pep %>%
  log(base = 2) %>%
  camprotR::plot_quant(method = 'density')
#> Warning: Removed 19289 rows containing non-finite values (stat_density).

Peptide intensities

We expect these to be approximately equal and any very low intensity sample would be a concern that would need to be further explored. Here, we can see that there is some clear variability, but no sample with very low intensity.

Missing values

Next, we consider the missing values, using MSnbase::plotNA to give us a quick graphical overview. This function shows the number of features with each level of data completeness (‘Individual features’) and the overall proportion of missing values in the dataset (Full dataset). The number of features with an acceptable level of missingness (specified by pNA) is also highlighted on the plot.

Note that MSnbase::plotNA assumes the object contains protein-level data and names the x-axis accordingly. Here, we update the plot aesthetics and rename the x-axis.

p <- MSnbase::plotNA(pep, pNA = 0) +
  camprotR::theme_camprot(border = FALSE, base_family = 'sans', base_size = 10) +
  labs(x = 'Peptide index')

print(p)

Peptide-level data completeness

Exercise 2

We have used MSnbase::plotNA to assess the missing values but it’s straightforward to do this ourselves directly from the prot_res object.

How many values are missing in total?

What fraction of values are missing?

How many missing values are there in each sample?

How many peptides have no missing values?

Hint: You can use is.na directly on the MSnSet and it is equivalent to calling is.na(exprs(obj))

Solution


sum(is.na(pep)) #1
#> [1] 19289
mean(is.na(pep)) #2
#> [1] 0.332569
colSums(is.na(pep)) #3
#> RNase_neg.1 RNase_neg.2 RNase_neg.3 RNase_neg.4 RNase_pos.1 RNase_pos.2 
#>        1901        2001        1897        3542        2580        2504 
#> RNase_pos.3 RNase_pos.4 
#>        2098        2766
sum(rowSums(is.na(pep))==0) #4
#> [1] 2085

Solution end

So, from the 7250 peptides, just 2085 have quantification values in all 8 samples. This is not a surprise for LFQ, since each sample is prepared and run separately.

We can also explore the structure of the missing values further using an ‘upset’ plot. Here, we use the naniar package for this.

missing_data <- pep %>%
  exprs() %>%
  data.frame()

naniar::gg_miss_upset(missing_data,
                      sets = paste0(colnames(pep), '_NA'),
                      keep.order = TRUE,
                      nsets = 10)

Missing values upset plot

So in this case, we can see that the most common missing value patterns are:

Missing in just RNase negative replicate 4
Missing in all samples.
Missing in all the other samples, except RNase negative replicate 4

RNase negative replicate 4 had slightly lower overall peptide intensities and appears to be somewhat of an outlier. In this case, we will retain the sample but in other cases, this may warrant further exploration and potentially removal of a sample.

Normalise peptide intensities

We don’t have internal benchmark proteins we can normalise against, so we will only be able to assess protein abundances relative to the protein present in each sample. Since we injected the same quantity of peptides for each sample, your intuition may be that there should be no reason to normalise and to do so risks removing true biological variance.

Discussion 2

What technical explanations can you think of which would explain the differences in intensity?

Solution

# 1. Incorrect total peptide quantification leading to under/over-injection of peptides
# 2. Differences between separate MS runs (especially when lots of samples are being processed)

Solution end

The technical explanations are compelling. Futhermore, we can only assess relative quantification since we have unknown losses of material in the sample processing. Thus, it’s reasonable to normalise the abundances.

Here we will apply median normalisation such that all column (sample) medians match the grand median. In MSnbase::normalise, this is called diff.median. Since the peptide intensities are log-Gaussian distributed, we log₂-transform them before performing the normalisation.

Median normalisation is a relatively naive form of normalisation, since we are only applying a transformation using a single correction factor for each sample. This is most likely to be appropriate when the samples being compared are similar to one another. Arguably, in this case our samples are more distinct since we are comparing OOPS samples +/- RNase and we could at least explore using a more sophisticated normalisation such as Variance Stabilising Normalisation (VSN). For a more complete discussion of proteomics normalisation, see (Välikangas, Suomi, and Elo 2018). However, bear in mind that this paper is applying the normalisations to the protein-level abundances.

pep_norm <- pep %>%
  log(base = 2) %>%
  MSnbase::normalise('diff.median')  

pep_norm %>%
  camprotR::plot_quant(method = 'density')
#> Warning: Removed 19289 rows containing non-finite values (stat_density).

Protein intensities post-normalisation

Summarising to protein-level abundance

Before we can summarise to protein-level abundances, we need to exclude peptides with too many missing values. Here, peptides with more than 4/8 missing values are discarded, using MSnbase::filterNA(). We also need to remove proteins without at least three peptides. We will use camprotR::restrict_features_per_protein() which will replace quantification values with NA if the sample does not have two quantified peptides for a given protein. Note that this means we have to repeat the filtering process since we are adding missing values.

pep_restricted <- pep_norm %>%
  # Maximum 4/8 missing values
  MSnbase::filterNA(pNA = 4/8) %>%

  # At least two peptides per protein
  camprotR::restrict_features_per_protein(min_features = 3, plot = FALSE) %>%

  # Repeat the filtering since restrict_features_per_protein will replace some values with NA
  MSnbase::filterNA(pNA = 4/8) %>%

  camprotR::restrict_features_per_protein(min_features = 3, plot = FALSE)

We can then re-inspect the missing values. Note that we have reduced the overall number of peptides to 4570.

p <- MSnbase::plotNA(pep_restricted, pNA = 0) +
  camprotR::theme_camprot(border = FALSE, base_family = 'sans', base_size = 15) +
  labs(x = 'Peptide index')

print(p)

Peptide-level data completeness for retained peptides

Summarising to protein-level abundances

We can now summarise to protein-level abundance. Below, we use ‘robust’ summarisation (Sticker et al. 2020) with MSnbase::combineFeatures(). This returns a warning about missing values that we can ignore here since the robust method is inherently designed to handle missing values. See MsCoreUtils::robustSummary() and this publication for further details about the robust method.

prot_robust <- pep_restricted %>%
  MSnbase::combineFeatures(
    # group the peptides by their master protein id
    groupBy = fData(pep_restricted)$Master.Protein.Accessions,
    method = 'robust',
    maxit = 1000  # Ensures convergence for MASS::rlm
  )
#> Your data contains missing values. Please read the relevant section in
#> the combineFeatures manual page for details on the effects of missing
#> values on data aggregation.

We can then re-inspect the missing values at the protein level. So, we have quantification for 505 proteins, of which 288 are fully quantified across all 8 samples. The most common missing values pattern remains missing in just RNase negative replicate 4.

p <- MSnbase::plotNA(prot_robust, pNA = 0) +
  camprotR::theme_camprot(border = FALSE, base_family = 'sans', base_size = 15)

print(p)

naniar::gg_miss_upset(data.frame(exprs(prot_robust)),
                      sets = paste0(colnames(prot_robust), '_NA'),
                      keep.order = TRUE,
                      nsets = 10)

We have now processed our peptide-level LFQ abundances and obtained protein-level abundances, from which we can perform our downstream analyses.

Below, we save the protein level objects to disk, so we can read them back into memory in downstream analyses. We use saveRDS to save them in compressed R binary format.

saveRDS(prot_robust, 'results/lfq_prot_robust.rds')

Optional extra sections on normalisation and summarisation

For a comparison between robust and maxLFQ for summarisation to protein-level abundances, see the notebook Alternatives for summarising to protein-level abundance - MaxLFQ

For a demonstration of an alternative normalisation approach where one has a strong prior belief, see Normalisation to a prior expectation

We will also save the pep_restricted object, since this is used in Alternatives for summarising to protein-level abundance - MaxLFQ

saveRDS(pep_restricted, 'results/lfq_pep_restricted.rds')

Session info

#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] tidyr_1.2.0                    dplyr_1.0.8                   
#>  [3] Proteomics.analysis.data_0.1.0 camprotR_0.0.0.9000           
#>  [5] biobroom_1.26.0                broom_0.7.12                  
#>  [7] MSnbase_2.20.4                 ProtGenerics_1.26.0           
#>  [9] S4Vectors_0.32.3               mzR_2.28.0                    
#> [11] Rcpp_1.0.8.3                   Biobase_2.54.0                
#> [13] BiocGenerics_0.40.0            ggplot2_3.3.5                 
#> 
#> loaded via a namespace (and not attached):
#>  [1] bitops_1.0-7           naniar_0.6.1           doParallel_1.0.17     
#>  [4] UpSetR_1.4.0           GenomeInfoDb_1.30.1    tools_4.1.2           
#>  [7] backports_1.4.1        bslib_0.3.1            utf8_1.2.2            
#> [10] R6_2.5.1               affyio_1.64.0          DBI_1.1.2             
#> [13] colorspace_2.0-3       withr_2.5.0            gridExtra_2.3         
#> [16] tidyselect_1.1.2       compiler_4.1.2         preprocessCore_1.56.0 
#> [19] cli_3.2.0              labeling_0.4.2         sass_0.4.0            
#> [22] scales_1.1.1           DEoptimR_1.0-10        robustbase_0.93-9     
#> [25] affy_1.72.0            stringr_1.4.0          digest_0.6.29         
#> [28] rmarkdown_2.12         XVector_0.34.0         pkgconfig_2.0.3       
#> [31] htmltools_0.5.2        fastmap_1.1.0          limma_3.50.1          
#> [34] highr_0.9              rlang_1.0.2            impute_1.68.0         
#> [37] farver_2.1.0           jquerylib_0.1.4        generics_0.1.2        
#> [40] jsonlite_1.8.0         mzID_1.32.0            BiocParallel_1.28.3   
#> [43] RCurl_1.98-1.6         magrittr_2.0.2         GenomeInfoDbData_1.2.7
#> [46] MALDIquant_1.21        munsell_0.5.0          fansi_1.0.2           
#> [49] MsCoreUtils_1.6.2      visdat_0.5.3           lifecycle_1.0.1       
#> [52] vsn_3.62.0             stringi_1.7.6          yaml_2.3.5            
#> [55] MASS_7.3-55            zlibbioc_1.40.0        plyr_1.8.6            
#> [58] grid_4.1.2             parallel_4.1.2         crayon_1.5.0          
#> [61] lattice_0.20-45        Biostrings_2.62.0      knitr_1.37            
#> [64] pillar_1.7.0           codetools_0.2-18       XML_3.99-0.9          
#> [67] glue_1.6.2             evaluate_0.15          pcaMethods_1.86.0     
#> [70] BiocManager_1.30.16    vctrs_0.3.8            foreach_1.5.2         
#> [73] gtable_0.3.0           purrr_0.3.4            clue_0.3-60           
#> [76] assertthat_0.2.1       xfun_0.30              ncdf4_1.19            
#> [79] tibble_3.1.6           iterators_1.0.14       IRanges_2.28.0        
#> [82] cluster_2.1.2          ellipsis_0.3.2

References

Cox, Jürgen, Marco Y. Hein, Christian A. Luber, Igor Paron, Nagarjuna Nagaraj, and Matthias Mann. 2014. “Accurate Proteome-Wide Label-Free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ*.” Molecular & Cellular Proteomics 13 (9): 2513–26. https://doi.org/10.1074/mcp.M113.031591.

Queiroz, Rayner M. L., Tom Smith, Eneko Villanueva, Maria Marti-Solano, Mie Monti, Mariavittoria Pizzinga, Dan-Mircea Mirea, et al. 2019. “Comprehensive Identification of RNA–Protein Interactions in Any Organism Using Orthogonal Organic Phase Separation (OOPS).” Nature Biotechnology 37 (2): 169. https://doi.org/10.1038/s41587-018-0001-2.

Sticker, Adriaan, Ludger Goeminne, Lennart Martens, and Lieven Clement. 2020. “Robust Summarization and Inference in Proteome-wide Label-free Quantification.” Molecular & cellular proteomics: MCP 19 (7): 1209–19. https://doi.org/10.1074/mcp.RA119.001624.

Välikangas, Tommi, Tomi Suomi, and Laura L Elo. 2018. “A Systematic Evaluation of Normalization Methods in Quantitative Label-Free Proteomics.” Briefings in Bioinformatics 19 (1): 1–11. https://doi.org/10.1093/bib/bbw095.

Label-free Quantification Proteomics

Data processing and QC

Tom Smith

2022-09-16

Preamble

Load dependencies

Input data

Discussion 1

Parse PeptideGroups.txt file

Convert to MSnSet

QC peptides

Exercise 1

Missing values

Exercise 2

Normalise peptide intensities

Discussion 2

Summarising to protein-level abundance

Summarising to protein-level abundances

Optional extra sections on normalisation and summarisation

Session info

References