(Mo)tif (D)escription (L)ength (MoDL) Algorithm Last edited Nov. 10, 2008 -------------------------------------------------------------------------------- From "Discovery of Phosphorylation Motif Mixtures in Phosphoproteomics Data" by Anna Ritz, Gregory Shakhnarovich, Arthur R. Salomon, and Benjamin J. Raphael Bioinformatics 2008 Code written by Anna Ritz and Gregory Shakhnarovich. -------------------------------------------------------------------------------- The MoDL algorithm discovers novel motifs in a set of aligned sequences centered on the same letter given a much larger set of background sequences aligned on the same residue. Matlab R2007a is required to run this program. -------------------------------------------------------------------------------- TO RUN: If you have not done so, download and extract the tarball. Open Matlab and move to the MoDL_code directory. This program takes two arguments, a foreground file name and a background file name. There are 6 background files provided with the code: human S,T, or Y-centered peptides of length 13, and mouse S,T, or Y-centered peptides of length 13. These datasets were constructed by taking all 13-mers in the entire proteome. The input file "example.input" contains 30 peptides (25 unique) from the following publication: Quantitative time-resolved phosphoproteomic analysis of mast cell signaling. Cao L,Yu K,Banh C,Nguyen V,Ritz A,Raphael BJ,Kawakami Y,Kawakami T,Salomon AR. J Immunol. 2007 Nov 1;179(9):5864-76. To run the example, at the Matlab prompt (>>) type: >> [model MDL foreground background bg] = ... MoDL('example.input','background_sequences/mouse_Y_13.txt'); -------------------------------------------------------------------------------- OUTPUT: The output for the example file is below. >> [model MDL foreground background bg] = ... MoDL('example.input','background_sequences/mouse_Y_13.txt'); Using all single-letter motifs to initialize... There are 138 motifs present in the foreground; these comprise the candidate set. There are 25 foreground sequences and 532413 background sequences Step 0: Null DL = 1268.0698 Step 1: add, DL = 1267.8198 (0.2500 bits saved) xxxDxxYxxxxxx Step 2: collapse 1 and replace 1 with new motif, DL = 1273.1195 (-5.2997 bits saved) xxx[DL]xxYxxxxxx Step 3: collapse 1 and replace 1 with new motif, DL = 1280.8474 (-7.7279 bits saved) xxx[DLV]xxYxxxxxx Step 4: join 1 and replace 1 with new motif, DL = 1285.0626 (-4.2152 bits saved) Exx[DLV]xxYxxxxxx Step 5: merge 1 and replace 1 with new motif, DL = 1285.6483 (-0.5857 bits saved) [EY]xx[DLV]xxYxxxxxx Step 6: merge 1 and replace 1 with new motif, DL = 1287.7885 (-2.1402 bits saved) [ELY]xx[DLV]xxYxxxxxx Step 7: merge 1 and replace 1 with new motif, DL = 1285.5246 (2.2639 bits saved) [ELTY]xx[DLV]xxYxxxxxx Step 8: merge 1 and replace 1 with new motif, DL = 1285.6246 (-0.1001 bits saved) [ELTVY]xx[DLV]xxYxxxxxx Step 9: merge 1 and replace 1 with new motif, DL = 1285.7510 (-0.1264 bits saved) [ELPTVY]xx[DLV]xxYxxxxxx Step 10: merge 1 and replace 1 with new motif, DL = 1286.0393 (-0.2883 bits saved) [ELPSTVY]xx[DLV]xxYxxxxxx Step 11: merge 1 and replace 1 with new motif, DL = 1286.3480 (-0.3087 bits saved) [EGLPSTVY]xx[DLV]xxYxxxxxx Final Ranked Model: DL = 1267.8198 xxxDxxYxxxxxx