
Proteomics Data Analysis
The Proteomics Data Aanalysis material was prepared by the MRC Toxicology unit Bioinformatics and mass4tox Proteomics facilities to provide training in the basics of proteomics analyses.
It assume the user’s data has been processed by Proteome Discoverer, as per standard Proteomics facility workflows
Tutorials take the form of Rmarkdown notebooks (see links below). If you would like to contribute or suggest modifications to the material, please see the github page
Prerequisites
-
R
You should be comfortable using
R. We will be using baseRfunctions likelapply,gsub,file.path, alongsidetidyversefunctions likegroup_by,mutateandggplot. If these are not familiar, we recommend undertaking training inRand thetidyversebeforehand. We recommend usingR>=4.1.2since the material has not been tested on earler versions.The Bioinformatics facility provide separate training covering basic
R, data carpentry (using thetidyverse) and plotting (usingggplot2). If there is not a course scheduled, you can get recordings by emailing bioinfo@mrc-tox.cam.ac.uk.The Cambridge Bioinformatics Training centre also offer a regular course on R for Biologists
-
RStudio
The material will be taught in live coding sessions through Rstudio and we recommend using this environment whenever you use R. Installation instructions can be found here
-
Proteomics
The materials herein assume you have attended Cat Franco’s introduction to the principles of bottom-up proteomics by Mass-Spectrometry.
Course dependencies and data
To ensure all the neccessary R packages are installed for you to run the code,
you can install the Protoemics.data.analysis package like so:
remotes::install_github("MRCToxBioinformatics/Proteomics_data_analysis", dependencies='Suggests')
This will also install the Proteomics.analysis.data package which contains
the data we will use.
Course materials
The first part of the course is broken into sections for different ‘flavours’ of quantitative bottom-up proteomics by Mass-spectrometry. Each section contains a subsection covering:
- Data processing and QC which starts from the Proteome Discoverer (PD) output files and performs filtering, quality control and data processing to obtain the quantification data
- Statistical testing for differential abundance
Additional subsections are included to cover further topics for each flavour.
In addition to the core part of the course, there are extended materials to cover:
- Phosphoproteomics using Tandem Mass Tags