Proteomics Data Analysis
The Proteomics Data Aanalysis material was prepared by the MRC Toxicology unit Bioinformatics and mass4tox Proteomics facilities to provide training in the basics of proteomics analyses.
It assume the user’s data has been processed by Proteome Discoverer, as per standard Proteomics facility workflows
Tutorials take the form of Rmarkdown notebooks (see links below). If you would like to contribute or suggest modifications to the material, please see the github page
Prerequisites
-
R
You should be comfortable using
R
. We will be using baseR
functions likelapply
,gsub
,file.path
, alongsidetidyverse
functions likegroup_by
,mutate
andggplot
. If these are not familiar, we recommend undertaking training inR
and thetidyverse
beforehand. We recommend usingR>=4.1.2
since the material has not been tested on earler versions.The Bioinformatics facility provide separate training covering basic
R
, data carpentry (using thetidyverse
) and plotting (usingggplot2
). If there is not a course scheduled, you can get recordings by emailing bioinfo@mrc-tox.cam.ac.uk.The Cambridge Bioinformatics Training centre also offer a regular course on R for Biologists
-
RStudio
The material will be taught in live coding sessions through Rstudio and we recommend using this environment whenever you use R. Installation instructions can be found here
-
Proteomics
The materials herein assume you have attended Cat Franco’s introduction to the principles of bottom-up proteomics by Mass-Spectrometry.
Course dependencies and data
To ensure all the neccessary R packages are installed for you to run the code,
you can install the Protoemics.data.analysis
package like so:
remotes::install_github("MRCToxBioinformatics/Proteomics_data_analysis", dependencies='Suggests')
This will also install the Proteomics.analysis.data
package which contains
the data we will use.
Course materials
The first part of the course is broken into sections for different ‘flavours’ of quantitative bottom-up proteomics by Mass-spectrometry. Each section contains a subsection covering:
- Data processing and QC which starts from the Proteome Discoverer (PD) output files and performs filtering, quality control and data processing to obtain the quantification data
- Statistical testing for differential abundance
Additional subsections are included to cover further topics for each flavour.
In addition to the core part of the course, there are extended materials to cover:
- Phosphoproteomics using Tandem Mass Tags