Introduction to ampir

Legana Fingerhut

2019-11-26

Background

The ampir (short for antimicrobial peptide prediction in r ) package was designed to be a fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs. It incorporates a support vector machine classification model that has been trained on publicly available antimicrobial peptide data.

Usage

Standard input to ampir is a data.frame with sequence names in the first column and protein sequences in the second column.

Read in a FASTA formatted file as a data.frame with read_faa()

seq_name seq_aa
G1P6H5_MYOLU MALTVRIQAACLLLLLLASLTSYSLLLSQTTQLADLQTQDTAGAT…

Calculate the probability that each protein is an antimicrobial peptide with predict_amps()

Note that amino acid sequences that are shorter than five amino acids long and/or contain anything other than the standard 20 amino acids are not evaluated and will contain an NA as their prob_AMP value.

seq_name seq_aa prob_AMP
G1P6H5_MYOLU MALTVRIQAACLLLLLLASLTSYSLLLSQTTQLADLQTQDTAGAT… 0.934

Predicted proteins with a specified predicted probability value could then be extracted and written to a FASTA file:

seq_name seq_aa
G1P6H5_MYOLU MALTVRIQAACLLLLLLASLTSYSLLLSQTTQLADLQTQDTAGAT…

Write the data.frame with sequence names in the first column and protein sequences in the second column to a FASTA formatted file with df_to_faa()