RED HOT Contributors

 

Large-scale analysis of post-translational modifications in E. coli under glucose-limiting conditions

0

We took advantage of a previously existing LC-MS/MS proteomics dataset [39] isolated from 3 biological replicate cultures of E. coli B REL606 sampled across 9 timepoints, from early exponential phase (3h post-innoculation) to extended late stationary phase (336h, or 2 weeks post-innoculation). The raw spectra from this dataset were used for simultaneous spectrum-sequence matching and PTM identification using the hybrid fragment matching/spectral alignment software MODa [38]. To reduce computation time and limit the occurrence of false positives, we restricted the MODa search to single-peptide mass shifts of +/− 200 Daltons, with one PTM allowed per peptide spectral match (PSM). To further limit the occurrence of false positive matches, we used the MODa “correct match” probability [38] to calculate the false discovery rate (FDR) and construct subsets of the highest-probability PSMs with 5 and 1% FDR (hereafter referred to as FDR5 and FDR1, respectively). The samples in our analysis were treated with iodoacetamide (IAA) to modify cysteines with a 57 Da carbamidomethyl group; during MODa analysis, this was treated as a static modification to cysteine (i.e. all modifications were relative to the molecular weight of Cys + 57Da). However, this results in an incorrect mass shift for any Cys PTMs that prevent carbamidomethylation (e.g. oxidation), so we added 57 Da to all Cys modifications to ensure that mass shifts for these modifications matched those for non-Cys residues. Note that this also results in a small number of mass shifts falling outside the +/-200 Da window specified in the initial MODa analysis, e.g. the +209 Da mass shift due to combined dithiothreitol (DTT) and carbamidomethyl modification of cysteine [40].

Localization of modifications within peptides was performed by MODa during the spectral alignment phase [38]. To most effectively combine modifications among overlapping peptides, we transformed these MODa peptide position calls into protein coordinates and used them to generate vectors of counts for all observed mass shifts at every amino acid position across the proteome. A matching set of unmodified counts was generated for all amino acid positions by counting all observations of an unmodified residue across all peptides overlapping a given amino acid position.

We identified a total of 2,527,135 PSMs across all 27 samples, corresponding to a total of 32,755 peptides that occur in at least one sample; these peptides represent 3544 individual proteins when all timepoints are considered (Table

1

). FDR filtering lowers these numbers substantially, yielding 1,980,884 PSMs and 22,776 unique peptides across 2,445 proteins in the FDR5 set, and 1,473,636 PSMs and 19,265 unique peptides across 2121 proteins in the FDR1 set (Table

1

). These filtered numbers are in agreement with previous proteomic experiments in

E. coli

[

31

,

39

,

41

], with the slightly lower number of proteins in our analysis, likely a result of the reduced sensitivity inherent in the larger search space used by MODa.

Table 1

Counts of PSMs, unique peptides, and proteins for unfiltered, 1% FDR, and 5% FDR datasets

Unfiltered

 

2,527,135

44,4225.241

0.176

32,755

3544

 

Modified

608,357

19,0915.030

0.314

25,362

3478

 

Unmodified

1,918,778

253,310.212

0.132

7393

66

1% FDR

 

1,473,636

14,736.377

0.010

19,265

2121

 

Modified

198,277

3,224.088

0.016

8369

1690

 

Unmodified

1,275,359

11,512.288

0.009

10,896

431

5% FDR

 

1,980,884

99,044.212

0.050

22,776

2445

 

Modified

362,291

31,912.966

0.088

13,299

2188

 

Unmodified

1,618,593

67,131.246

0.041

9477

257

We chose to focus on the FDR1 dataset for all subsequent analysis for two primary reasons. First, this was the more conservative cutoff, and by our analysis did not result in the exclusion of an excessive number of PSMs. In addition, because PSM error rates can differ significantly between modified and unmodified peptides [42, 43] we wanted to select the dataset that minimized the differences in error rates between modified and unmodified PSMs. We did observe that the distribution of MODa probabilities was in general higher for modified compared to unmodified PSMs, (Table 1, column “FDR”), but this difference was minimal in the FDR1 dataset (effective error rates of 1.6% and 0.9% for modified and unmodified, respectively).

A large fraction of the E. coli proteome undergoes PTM during growth and starvation in glucose

Of the 1,473,636 PSMs identified across all timepoints in the FDR1 dataset, a remarkably large fraction, 198,277 (13.5%), are predicted by MODa as having a putative PTM. These modified PSMs corresponded to 8,369 out of 19,265 unique peptides (42%) having at least one modification in any sample, and 1,690 out of 2121 proteins (79.7%) having at least one modification on any constituent peptide. Interestingly, the proportion of the proteome predicted to have at least one PTM remains relatively constant across time points and biological replicates. PSMs, unique peptides, and proteins all show very little change in the proportion of overall PTM across all 9 time points (Fig.

1

).

Fig. 1

Global abundance of all modifications across growth timepoints. Shown are the fraction of total counts of PSMs (a), unique peptides (b), and proteins (c) containing at least one mass shift passing the 1% FDR threshold at the timepoint indicated on the x-axis for biological replicates 1, 2 and 3 (solid, dashed, and dotted lines, respectively)

Composition of the E. coli PTM spectrum

A unique feature of our analysis strategy is the ability to conduct an unbiased search for spectra matching post-translationally modified peptides across a wide range of possible mass shifts. We used MODa to search our raw spectral data for 400 potential peptide mass shifts, ranging from −200 Da to +200 Da; counts of PSMs for this range of mass shifts are shown in Fig.

2

and Additional file

1

.

Fig. 2

Mass-shift counts across all timepoints and datasets. Shown are the PSM counts for all mass shifts identified by MODa [38] between −200 Da and +200 Da, summed across all nine timepoints and all three biological replicates; labeled peaks are the top 5 most abundant (by raw count) mass shifts in the dataset

The overall abundance of individual mass shifts varies widely, with the most abundant mass shifts corresponding to small functional group modifications. The most abundant mass shift is a neutral gain of 1 Da (84,357 PSMs, 45% of all modified PSMs). In addition to simple protonation, this mass change can result from a number of more complicated modifications and MS artifacts; see Discussion. Other abundant mass shifts include oxidations (+16 Da, 46,244 PSMs, 24% of all modified PSMs; +32 Da, 1,563, 0.8% of all modified PSMs), metal ion adducts such as sodium (+22 Da, 8,882 PSMs, 4.7% of all modified PSMs) and potassium (+38 Da, 1,490 PSMs 0.79% of all modified PSMs), and neutral losses such as deamidation (−17 Da, 9,780 PSMs, 5.2% of all modified PSMs) and dehydration (−18 Da, 7,169 PSMs, 3.8% of all modified PSMs).

Commonly studied regulatory PTMs are relatively rare in our data, most likely due to their low abundance in the proteome and the fact that our samples did not undergo enrichment for specific modifications prior to analysis. Although a large number of apparent acetylations (+42 Da) were identified, only a handful of these map to known acetylated lysine residues [

30

,

32

,

44

]. A small number of phosphorylations (+80 Da) were identified, although the majority of these are modifications to an active-site serine that acts as a phosphoryl group donor during catalysis in the metabolic enzyme phosphoglucomutase (see Table

2

). We also recovered a number of previously identified lysine methylation modifications for the ribosomal proteins L7/L12 (encoded by the

rplL

gene), L11 (encoded by

rplK

), and Elongation Factor Tu (encoded by

tufB

), although the last two were only observed in the first and third biological replicates, respectively. A table of counts for all mass shifts recovered by MODa is included in Additional file

2

.

Table 2

Previously identified post-translational modifications recovered in our analysis

rplL

82

K

+14 Da

Monomethylation

38 / 56 (71%)

13 / 18 (72%)

71 / 87 (81%)

[96]

tufB

57

K

+14 Da

Monomethylation

0 / 3 (0%)

0 / 6 (0%)

97 / 192 (50%)

[54, 97]

     

+28 Da

Dimethylation

0 / 3 (0%)

0 / 6 (0%)

41 / 95 (43%)

 

pgm

146

S

+81 Da

Phosphorylation

1/25 (4%)

0 / 11 (0%)

0 / 34 (0%)

UniProt version 2015–08 released on 2015-07-22 (UniProt consortium)

gapA

124

K

+42 Da

Acetylation

0 / 39 (0%)

1 / 71 (1.4%)

0 / 239 (0%)

[98]

 

213

K

+42 Da

Acetylation

0 / 0 (0%)

0 / 3 (0%)

7 / 25 (28%)

[98]

icdA

242

K

+42 Da

Acetylation

1/198 (0.5%)

2 / 164 (1.2%)

0 / 184 (0%)

[98]

glyA

346

K

+42 Da

Acetylation

0 / 0 (0%)

1 / 1 (100%)

0 / 17 (0%)

[98]

fbaA

326

K

+42 Da

Acetylation

0 / 9 (0%)

1 / 17 (5.9%)

0 / 40 (0%)

[98]

rplK

40

K

+42 Da

Trimethylation

1 / 16 (6.3%)

0 / 1 (0%)

0 / 16 (0%)

[99]

rpsF

131

E

+129

Glutamylation

122 / 219 (55.7%)

152 / 415 (37%)

64 / 169 (38%)

[47]

secB

2

S

+42

Acetylation

184 / 221 (83%)

169 / 215 (78%)

115 / 135 (85%)

[55]

rpsE

2

A

+42

Acetylation

0 / 0 (0%)

0 / 0 (0%)

5 / 5 (100%)

[53]

Distribution of target amino-acid residues varies widely among mass shifts

The most commonly modified amino acid across all timepoints is methionine—nearly all of these modifications are a +16 Da shift corresponding to oxidation (see Discussion)—followed by the hydrophobic amino acids Ala, Val, Leu, Ile; amide-containing amino acids Asn and Gln; and their carboxyl counterparts Asp and Glu (Table

3

). The observation of a large number of modifications on amino acids with hydrocarbon side-chains, which are generally not expected to undergo PTM, can likely be explained by a combination of incorrect assignment of a mass shift to the amino acid (AA) by MODa, modification of the backbone NH or CO groups, or selection of peaks with isotopically shifted masses during MS2. The bulk of modifications to Ala, Val, Leu, and Ile are +1 Da modifications, consistent with most of these modifications being due to selection of

13

C-containing peaks (see file “DATA_TABLE_2_mass_shifts_by_AA.tsv” provided as part of the data tables in Additional file

2

). This effect is expected to occur randomly across the proteome, so the higher numbers for these particular amino acids are most likely due to their higher abundance relative to other AAs in

E. coli

proteins [

45

].

Table 3

Most commonly modified amino-acid residues

M

42457

21.41

Q

13990

7.06

A

13682

6.90

N

13500

6.81

L

12581

6.35

V

11309

5.70

E

10872

5.48

G

10623

5.36

S

9528

4.81

T

8396

4.23

I

8267

4.17

P

8139

4.10

D

7859

3.96

K

5325

2.69

W

4964

2.50

Y

4274

2.16

C

4211

2.12

H

4084

2.06

F

3261

1.64

R

955

0.48

We constructed the distribution of targeted amino acids for each mass shift by counting occurrences of each mass shift–AA pair across all nine time points. We observed significant differences among mass shifts in preference for a single type (or, in some cases, groups) of amino acid residues; the +22 Da and +38 Da modifications, for example, show a broad distribution across AA types, while +16 Da and −2 Da show strong (though not exclusive) preference for methionine. To quantify these differences in AA distribution, we ranked mass shifts by the ratio: PSMs for most common AA / mean(PSMs for all other AAs). AA distributions for the the top ranked (most biased towards one AA across multiple biological replicates) mass shifts are shown in Fig.

3

.

Fig. 3

Distribution of selected mass shifts across amino acids. The height of bars within each row represents the fraction of total AA positions for the mass shift (indicated on the y axis) that were identified on each amino acid residue type (columns). Individual bars within each column represent fractions for each biological replicate (replicates 1, 2 and 3 from left to right within each column). Mass shifts are ordered by the single-AA bias score (the ratio of counts for the most commonly modified AA type to the mean of the counts for all other types; see Methods) with the highest score (most biased for a single AA) at the top; only the top 20 mass shifts are shown. Note that a constant mass shift of +57 Da was added to all cysteine modifications to correct for the presence of carbamidomethylation, meaning that a small number of cysteine modifications (e.g. +209 Da) fall outside of the mass range scanned by MODa (+/−200 Da)

A large number of modifications with a strong preference for cysteine residues were identified in all three biological replicates; most of these are likely artifacts of IAA treatment during sample preparation, and correspond to common modifications co-occuring with carbamidomethylation (+57 Da), e.g. +58 Da (57 + 1 Da), +59 Da (57 + 2 Da), and +40 Da (57 – 17 Da). The +209 Da mass shift, corresponding to a carbamidomethylated dithiothreitol modification of the cysteine thiol group, is a minor artifact of the reduction and alkylation of cysteine during sample preparation [40]. The +48 Da mass shift was almost exclusively found at catalytic cysteine residues in a handful of proteins, and corresponds to the hyperoxidation of the cysteine thiol group (Cys-SH) into cysteic acid (Cys-SO3H). This modification is likely to be inactivating and irreversible, resulting in the increased accumulation of the modified form throughout the stationary phase. Among modifications targeting non-Cys residues, putative oxidative modifications show the strongest bias towards a single AA, with the +32 Da and +16/+17 Da modifications showing strong preferences for tryptophan and methionine, respectively. The eighth-ranked −48 Da modification is likely also a result of oxidation via dethiomethylation of methionine residues [46]. The strong preference of the acetylation mass shift (+42 Da) for serine is largely due to modifications on protein N-termini (see Section “N-terminal and C-terminal modifications”). A table of counts for each mass shift-amino acid pair is included in Additional file 2.

N-terminal and C-terminal modifications

To search for modifications that preferentially occur at protein N and C termini, we used Fisher’s exact test (FET) to compare the ratio of modified : unmodified counts of each mass shift occurring at the N or C terminus of a protein to the same ratio for mass shifts occurring at all other positions. FET

p

-values for N-terminal and C-terminal enrichment were calculated for all mass shifts within each biological replicate and filtered for consistency by requiring all three replicates to have

p4

and

5

. We also examined the distribution of unique modified positions for these Nt- and Ct-biased mass shifts as a function of normalized protein length, to determine whether the observed positional bias was a general feature of the mass shift or due to a small number of highly abundant modified positions (Figs.

4

and

5

).

Fig. 4

Distribution of Nt-biased mass shifts across positions in protein sequence. The widths of traces within each row represent the density of unique positions identified for the mass shift indicated to the left along target proteins, normalized by protein length (x-axis). Traces are plotted symmetrically about the x-axis. Mass shifts are ranked from top to bottom by combined p-value from the Fisher’s exact test for N-terminal modification enrichment across all three replicates (see section “N-terminal and C-terminal Modifications” and Table 4), with mass shifts having the strongest N-terminal enrichment at the top

Fig. 5

Distribution of Ct-biased mass shifts across positions in protein sequence. The widths of traces within each row represent the density of unique modified positions (i.e. positions with more than one modified PSM; each position is counted once per protein) identified for each mass shift (indicated to the left) along target proteins, normalized by protein length (x-axis). Traces are plotted symmetrically about the x-axis. Mass shifts are ranked from top to bottom by combined p-value from the Fisher’s exact test for N-terminal modification enrichment across all three replicates (see section “N-terminal and C-terminal Modifications” and Table 5), with mass shifts having the strongest C-terminal enrichment at the top

Table 4

Mass shifts occurring more frequently on N-terminal ends of proteins

+42 Da

1

759 / 1869 (40.61%)

132 / 7860 (1.68%)

0.0

 

2

657 / 1561 (42.09%)

111 / 6814 (1.63%)

0.0

 

3

682 / 1930 (35.34%)

147 / 7681 (1.91%)

0.0

+16 Da

1

1187 / 8016 (14.81%)

13633 / 510869 (2.67%)

0.0

 

2

972 / 6971 (13.94%)

9350 / 385682 (2.42%)

0.0

 

3

531 / 5216 (10.18%)

20571 / 423856 (4.85%)

7.36×10−55

+28 Da

1

16 / 128 (12.50%)

54 / 31261 (0.17%)

4.56×10−24

 

2

27 / 136 (19.85%)

11 / 28221 (0.04%)

1.78×10−55

 

3

45 / 140 (32.14%)

305 / 33420 (0.91%)

1.65×10−54

+156 Da

1

20 / 1051 (1.90%)

277 / 54832 (0.51%)

1.17×10−6

 

2

14 / 832 (1.68%)

154 / 58376 (0.26%)

1.34×10−7

 

3

26 / 674 (3.86%)

313 / 43231 (0.72%)

2.73×10−11

+17 Da

1

60 / 3107 (1.93%)

585 / 178417 (0.33%)

8.94×10−26

 

2

45 / 2730 (1.65%)

512 / 116280 (0.44%)

6.38×10−13

 

3

54 / 2070 (2.61%)

1960 / 147203 (1.33%)

7.64×10−6

+1 Da

1

138 / 6868 (2.01%)

27945 / 2042671 (1.37%)

1.88×10−5

 

2

151 / 5934 (2.54%)

25111 / 1455121 (1.73%)

5.28×10−6

 

3

204 / 6226 (3.28%)

30808 / 1810086 (1.70%)

1.52×10−17

+172 Da

1

15 / 15 (100.00%)

0 / 9431 (0.00%)

3.11×10−48

 

2

1 / 1 (100.00%)

1 / 11114 (0.01%)

1.8×10−4

 

3

14 / 14 (100.00%)

11 / 7365 (0.15%)

2.77×10−37

+64 Da

1

13 / 58 (22.41%)

13 / 1620 (0.80%)

1.84×10−13

 

2

4 / 48 (8.33%)

6 / 1340 (0.45%)

2.27×10−4

 

3

13 / 103 (12.62%)

19 / 2119 (0.90%)

3.58×10−10

Table 5

Mass shifts occurring more frequently on C-terminal ends of proteins

+129 Da

1

165 / 703 (23.47%)

78 / 10189 (0.77%)

2.00×10−142

 

2

212 / 886 (23.93%)

49 / 8541 (0.57%)

1.98×10−177

 

3

82 / 502 (16.33%)

59 / 8478 (0.70%)

4.36×10−67

+16 Da

1

39 / 205 (19.02%)

14781 / 518680 (2.85%)

7.88×10−21

 

2

42 / 245 (17.14%)

10280 / 392408 (2.62%)

7.76×10−22

 

3

22 / 103 (21.36%)

21080 / 428969 (4.91%)

5.13×10−09

+130 Da

1

56 / 427 (13.11%)

2 / 366 (0.55%)

1.18×10−14

 

2

75 / 542 (13.84%)

0 / 549 (0.00%)

1.02×10−24

 

3

18 / 276 (6.52%)

0 / 196 (0.00%)

8.29×10−05

Eight mass shifts were identified as Nt-biased after filtering (Table 4 and Fig. 4). The strongest Nt preference is displayed by the +42 Da mass shift, corresponding to N-terminal acetylation, with modified N termini representing 35–42% of total observed counts for positions with at least one +42 Da count. The remaining Nt-biased mass shifts fall into two broad categories. The first are rare modifications that occur at a small number of positions at high frequency, such as the +28 Da mass shift (possible retention of formylation on an Nt-terminal fMet, 12–32%), the +64 Da mass shift (possible modification by acetate, 8–22%) and the +172 Da mass shift (100% in all replicates). The second category is comprised of common modifications that occur at low frequency across a larger number of positions; this includes oxidation (+16 Da, 10–14%), most commonly of a retained Nt methionine, and protonation (+1 Da, 2–3%). A beneficial feature of our analysis is the ability of MODa to identify modified N-terminal residues even in the presence of un-annotated N-terminal methionine cleavages. For the protein SecB, for example, we recovered abundant N-terminal peptides which had both undergone N-terminal Met cleavage and putative acetylation at the penultimate N-terminal Ser residue (see Additional file 3), despite the fact that this protein had not been annotated as having its N-terminal Met cleaved in the UniProt database.

Only three mass shifts were identified as Ct-biased after consistency filtering (Table 5 and Fig. 5). Two of these, +129 Da (16–24% of counts at C-terminal positions across the three replicates modified, compared to < 1% of counts at all other positions) and +130 Da (6.5–14% of counts at C-terminal positions across the three replicates modified, compared to < 1% of counts at all other positions), most likely correspond to the same modification, C-terminal addition of a glutamate residue. Interestingly, the third C-terminal mass shift is oxidation (+16 Da), which is observed to occur at high frequency (17–20% modified counts across replicates at C-terminal residues with at least one +16 Da modification, compared to 2.6–5% at all other modified positions) on C-terminal residues as well as N-terminal residues, although the C-terminal modification is observed for a smaller set of proteins.

The C-terminal glutamylation modification is especially interesting. The most frequent target for this modification is the C terminus of the 30S ribosomal protein S6 (RpsF), which is known to undergo post-translational modification with 1–4 glutamate residues (mass = 129 Da) [47]. The enzymatic addition of these Glu residues to S6 proceeds in a stepwise fashion, and any modification of two or more Glu residues would fall outside the range of mass shifts that were considered in our analysis, so it is likely that the mono-glutamylated S6 we observed only represents a subset of the total modified S6 present in our samples.

We also identified a previously unreported C-terminal +129 Da modification of the stationary phase ribosomal stability factor RaiA / YfiA [22]. YfiA binds within the mRNA tunnel of the 30S subunit [48, 49], where it inhibits translation [48, 50] and prevents subunit dissociation and 100S dimer formation for a subset of ribosomes in stationary phase [51]. YfiA and S6 lie near one another within the 30S subunit, and both proteins’ C termini extend towards the same region of the 16S rRNA on the subunit surface (Additional file 4), although the modified C-terminal tails themselves are not resolved in the crystal structure. The temporal modification patterns of S6 and YfiA differ dramatically (Additional file 5). S6 levels of both total PSM counts and Ct +129-Da modified counts peak in mid-exponential phase, followed by a steep drop to a lower number of counts that is maintained through late stationary phase; the relative proportion of +129 Da modified counts remains nearly unchanged across all time points. In contrast, YfiA shows low or no counts of either modified or unmodified PSMs until the onset of stationary phase, when overall counts increase dramatically, accompanied by a low but constant level of C-terminal +129 Da modification through late stationary phase. The exponential phase enrichment we observed for the +129 Da mass shift is therefore due largely to changes in overall expression of its target proteins rather than differential modification.

Temporal patterns

The glucose starvation dataset used in our analysis is unique in the wide range of timepoints (3h–336h) that were sampled. Changes in abundance during different phases of the growth cycle in liquid culture have been observed for individual PTMs, but an unbiased examination of temporal variation in the global PTM profile has not been performed in E. coli. To identify mass shifts with significant frequency changes over the growth cycle, we first pooled four of our nine time-point samples into exponential-phase samples (3h, 4h, 5h, and 6h, EXP) and four into stationary-phase samples (24h, 48h, 168h, 336h, STA). (We did not include the 8h sample in this analysis.) We then grouped counts across modified amino-acid positions by mass shift–AA pairs and compared the ratio of modified:unmodified counts at all modified positions in the EXP and STA pools using Fisher’s exact test (FET) [52]. Mass shift–AA pairs were called as significant if their FET p-values passed a false-discovery rate filter (< 5% FDR by the Benjamini-Hochberg step-down procedure) in all three biological replicates. Because we used a two-tailed test that was unable to determine the direction of enrichment (i.e., EXP > STA or EXP < STA), we subsequently divided significant mass shift-AA pairs into EXP > STA or EXP < STA groups using the FET log-odds score.

We identified only a single mass shift that consistently shows significantly higher levels of modification in exponential phase across all three biological replicates, a +16 Da modification of tryptophan (3.78–4.33% of total counts at modified positions across the three biological replicates have the mass shift in exponential phase, 1.60–1.69% in stationary phase, Table

6

). The behavior of this mass shift differs slightly across the three biological replicates: in biological replicates 1 and 2, the +16 Da Trp modification shows a spike in abundance near the Exponential-Stationary phase transition (8h), followed by a drop to near zero by mid-stationary phase (48h), while replicate 3 shows a spike of enrichment earlier in exponential phase (4h) followed by a steep drop off at the 5h timepoint (Fig.

6

).

Fig. 6

Abundance across all growth timepoints of tryptophan monooxidation, the sole mass shift with stronger modification in exponential phase. The plot shows the fractional modification N mod/(N mod+N unmod) across all nine time points, for positions having at least one W +16 Da modification at any time point. Individual traces show results for individual biological replicates 1 (solid lines), 2 (dashed lines), and 3 (dotted lines)

Table 6

Mass-shift–amino-acid pairs with elevated frequency in exponential phase

+16 Da

W

1

285/6836 (4.17%)

248/8775 (2.83%)

5.55×10−06

   

2

238/6182 (3.85%)

141/7242 (1.95%)

4.31×10−11

   

3

282/6182 (4.56%)

113/6414 (1.76%)

9.13×10−20

We identified five mass shifts that consistently show significantly higher levels of modification in stationary phase across all three biological replicates: a +1 Da modification of asparagine (1.90–3.06% of total counts at modified positions have the mass shift in exponential phase, 2.95–4.60% in stationary phase); +42 Da modifications of serine, alanine, and threonine (29.78–31.71% EXP, 46.30–60.07% STA; 18.33–22.45% EXP, 34.81–46.46% STA; and 0.0–3.37% EXP, 9.46–15.70%, respectively), and a +48 Da modification of cysteine (0.94–1.09% EXP, 3.11–4.19% STA) (Table

7

). As with the exponential-phase-biased mass shifts, we observed different temporal patterns when timepoints are considered individually (Fig.

7

). For example, the +1 Da asparagine modification and the +48 Da cysteine modification show steady increases across stationary phase, reaching their highest value at the latest stationary phase timepoint (336h), while the +42 Da modification to serine shows a more step-like increase in abundance near the onset of stationary phase, with abundance remaining fairly constant through the latest timepoints.

Fig. 7

Abundance across all growth timepoints of mass shifts with stronger modification in stationary phase. Each plot shows the fractional modification N mod/(N mod+N unmod) across all nine time points, for positions having at least one modification of the indicated type at any time point. Individual traces within each plot show results for individual biological replicates 1 (solid lines), 2 (dashed lines), and 3 (dotted lines). Mass shift are ranked from top to bottom by p-value from the Fisher’s exact test for modification enrichment in exponential phase (STA>EXP; see Section “Temporal patterns” and Table 7), averaged across all three replicates, with the most stationary-phase-enriched (lowest p-values) at the top

Table 7

Mass-shift–amino-acid pairs with elevated frequency in stationary phase

+1 Da

N

1

1273/45738 (2.78%)

2260/52371 (4.32%)

2.49×10−38

   

2

1133/31975 (3.54%)

1663/36568 (4.55%)

3.02×10−11

   

3

988/40658 (2.43%)

1313/43285 (3.03%)

8.55×108

+42 Da

S

1

176/585 (30.09%)

322/534 (60.30%)

2.08×10−24

   

2

143/461 (31.02%)

287/509 (56.39%)

2.02×10−15

   

3

195/597 (32.66%)

269/570 (47.19%)

4.77×10−07

+42 Da

A

1

55/261 (21.07%)

59/135 (43.70%)

4.15×10−06

   

2

55/202 (27.23%)

68/125 (54.40%)

1.22×10−06

   

3

38/210 (18.10%)

63/165 (38.18%)

2.08×10−05

+48 Da

C

1

15/1604 (0.94%)

51/1217 (4.19%)

1.18×10−08

   

2

14/1285 (1.09%)

30/770 (3.90%)

4.80×10−05

   

3

1/964 (0.10%)

22/707 (3.11%)

6.93×10−08

+42 Da

T

1

0/244 (0.00%)

57/363 (15.70%)

3.54×10−14

   

2

5/263 (1.90%)

51/321 (15.89%)

1.31×10−9

   

3

11/326 (3.37%)

40/423 (9.46%)

1.11×10−3

Preferential persistence of N-terminally acetylated proteins in stationary phase

The N-terminal bias and preference for serine, alanine, and threonine residues observed for the +42 Da mass shift strongly suggests that this modification corresponds to N-terminal N α-acetylation. Although cotranslational N-terminal N α acetylation (NtAc) is widespread in eukaryotic proteins, the prevalence and physiological significance of this modification in prokaryotes is poorly understood. In E. coli, only five native proteins are known to possess an NtAc modification: the ribosomal proteins S5 (encoded by the rpsE gene), S18 (encoded by the rpsR gene), and L12/7 (encoded by the rplL gene)[53]; elongation factor Tu (EFTu, encoded by the tufB gene) [54]; and the chaperone SecB [55]. In addition, a number of heterologous eukaryotic proteins are modified with an NtAc when overexpressed in E. coli [5659].

We identified 44 Nt-acetylated proteins (Additional files

2

,

3

,

6

, and

7

) and were able to recover modified peptides from known Nt-acetylation target SecB (Additional file

3

) and a small number of peptides matching Nt-acetylated ribosomal protein S5 (Additional file

7

) in our initial MODa dataset. The low peptide counts for S5, as well as the absence of modified PSMs for the other known (and highly abundant) targets ribosomal proteins S18 and L7/12, as well as EFTu, are likely due to the presence of tryptic cleavage sites within a few residues of the N-terminus in all three of these proteins (Nt-AHIE

K

QAGE for S5, Nt-A

R

YF

RRRK

F for S18, Nt-SIT

K

DQIEE for L7/12, and Nt-S

K

E

K

FERT

K

for EFTu). This means that most copies of the protein present in our samples will produce N-terminal peptides too short to recover during subsequent liquid chromatography and MS/MS steps. Consistent with this interpretation, we were able to recover abundant peptides from non-N-terminal regions of all four of these proteins, and the small number of S5 N-terminal peptides that were recovered were all the result of missed cleavage events at the N-terminal-most cleavage site. Among the NtAC peptides that were recovered in our modA dataset, the Nt fragment from SecB is by far the most frequently observed, representing 15–41% of the total Nt-Acetylated peptides across the nine time points. In addition, six other proteins from our dataset were previously identified as Nt-acetylation targets in an enrichment-based analysis of N-terminal modifications in

Pseudomonas aeruginosa

[

13

] (see Table

8

).

Table 8

Overlapping N-terminal N α-acetylation targets between current data and P. aeruginosa [13]

ECB_00686

sucB

SSVDILVPDLPESVADATVATWHKK

PA14_44000

MAIEIK

Dihydrolipoamide Succinyltransferase

ECB_03391

dppF

STQEATLQQPLLQAIDLKK

PA14_58490

METVLTAR

Dipeptide transporter ATP-binding subunit

ECB_00915

rpsA

TESFAQLFEESLKE

PA14_23330

SESFAELFEESLK

30S ribosomal protein S1

ECB_00155

yadR

SDDVALPLEFTDAAANKV

PA14_08510

SIETFTPTPLLFTPGAANK

Iron-sulfur cluster insertion protein ErpA

ECB_00183

accA

SLNFLDFEQPIAELEAKI

PA14_23860

SNWLVDKLIPSIMR

Acetyl-CoA carboxylase carboxyltransferase subunit alpha

ECB_03467

secB

SEQNNTEMTFQIQRI

PA14_67720

TEQATNGAADEQQPQFSLQR

Preprotein translocase subunit SecB

We observed that NtAc modified proteins are proportionally more heavily modified in stationary phase (Fig.

7

). This pattern could be explained by (i) an increase in acetylation activity in stationary phase and/or (ii) a proportionally larger decrease in non-acetylated copies of a protein relative to acetylated copies in stationary phase. To differentiate between these scenarios, we plotted total PSM counts and NtAc-modified PSM counts for pooled NtAc-targeted proteins across all nine time points (Fig.

8

). When all NtAc-targeted proteins are considered (Fig.

8

, top left panel), the total number of PSMs stays appoximately constant, while the number of NtAc-modified PSMs increases by nearly twofold in early stationary phase, consistent with scenario (i). However, NtAc-targeted proteins pooled by penultimate amino acid (Fig.

8

) or individual NtAc-targeted proteins (Additional files

3

,

6

and

7

) show a mixture of both scenarios. NtAc-targeted proteins with a penultimate serine or threonine residue, for example, exhibit a pattern consistent with scenario (i), similar to the pattern for all targets (Fig.

8

, top right and bottom left panels). Proteins with a penultimate alanine, however, show a slight increase in modified peptides at the onset of stationary phase, accompanied by a large drop in unmodified peptides (Fig.

8

, top right panel). Many of the the most heavily NtAc-modified proteins also show this pattern, such as LysS, SpeA, PdxH, and SecB (Additional file

3

), and IlvA and KdgR (Additional file

7

). This preferential retention of NtAc-modified peptides in stationary phase suggests that NtAc may play role in protein stability by acting as an anti-degradation signal (see Discussion). A table of all Nt-acetylation sites recovered by MODa is included in Additional file

2

.

Fig. 8

N-terminal +42 Da modified proteins are preferentially retained in stationary phase. Plots show unmodified (green) and +42 Da modified (blue) PSM counts for all N-terminal positions possessing at least one +42 Da modification at any time point, averaged across the three biological replicates. Shown are total counts (a), counts for peptides with a penultimate (i.e. following a cleaved N-terminal methionine) serine residue (b), a penultimate threonine residue (c), and a penultimate alanine residue (d)

Asparagine deamidation is strongly enriched in very late stationary phase

An interesting temporal pattern was also identified for the +1 Da modification of Asparagine residues, which increases in frequency throughout stationary phase and peaks at the last timepoint (336h) (Fig. 7 and Additional file 8). A +1 Da modification occurring on an asparagine residue is known to be a signature of nonenzymatic asparagine deamidation, in which a backbone nitrogen initiates a nucleophilic attack on the amide carbon of the asparagine side chain (or the asparagine amide nitrogen on the backbone carbonyl carbon) to form a cyclic succinimide intermediate [6062]. This intermediate can then resolve by hydrolysis to either convert the original asparagine to an aspartate residue, or rearrange to form an isopeptide linkage through isoaspartate; both of these events result in a +1 Da mass shift.

We found that +1 Da modifications were the most frequently observed modification in our dataset. They likely result from a variety of sources, most importantly isotopic mass shifts from 13C-containing peptides. While our dataset certainly contains peptides with +1 Da modifications resulting from isotopic peak shifts, two observations support our hypothesis that a significant fraction of +1 Da modifications to Asp are due to deamidation. First, +1 Da modifications from isotopic mass shifts are expected to be more prevalent on peptides with higher m/z values. There is indeed a general correlation between peptides with high m/z values and peptides called by MODa as having a +1 Da mass shift; while the median m/z value for all peptides is 808.04, that for +1 Da modified peptides is 1001. However, while median m/z values for peptides with +1 Da modifications to all AAs except cysteine vary between 980 and 1065, the Asn +1 Da peptides stand out as having a lower median of 931.4, and have an m/z distribution that is significantly different from the overall +1 Da m/z distribution (Kolmogorov-Smirnov test, p=2.48×10−251). This finding is consistent with Asn +1 Da peptides being a mixture of 13C-peak selection artifacts and genuine Asn deamidation modifications. In addition, high-PSM-count Asn +1 Da modifications, but not +1 Da modifications to other AA types, are enriched for Glycine, Serine, and Asparagine residues at the amino acid position following the modified Asn (Additional file 9), a pattern that is consistent with known sequence preferences for Asn deamidation [63].

Although asparagine deamidation can occur spontaneously as an experimental artifact during preparation of proteomic samples [64], a number of lines of reasoning suggest that at least a subset of the modifications we observe were present in the samples prior to processing. First, we observe a nearly identical pattern of increasing Asp +1 Da modification across all three of our biological replicates (Fig. 7). All timepoints were collected from a single set of cultures started on the same day, each biological replicate was grown independently (on a different day) from the others, and all timepoints from a single replicate were processed for proteomic analysis in parallel. The bulk of nonenzymatic deamidation during proteomic sample prep has been shown to occur during tryptic digest [64], with both longer incubation time and basic pH increasing the occurrence of deamidated peptides. The samples used to generate our data were treated with a short tryptic digest (5h) in the presence of near-neutral buffer (50mM Tris, pH 8.0), conditions that should limit spontaneous asparagine deamidation. In addition, both pH and incubation time should be identical across all timepoints (because samples were processed together), and any nonenzymatic deamidation should therefore also be constant across timepoints. The pattern does not appear to be explained simply by increased expression of modified proteins during stationary phase, as the pattern is observed even for individual modifications that have high abundance in both exponential and stationary phases (Additional file 8). A table of all putative asparagine deamidation sites recovered by MODa is included in Additional file 2.

Oxidative modifications of methionine and tryptophan are variable across biological replicates

Oxidation (+16 Da) modifications, particularly of methionine, are very common in our data, but with the exception of +16 Da modification of tryptophan residues (Fig. 6), oxidations in general are not identified as having a significant bias for either exponential or stationary phase. Both Met +16 Da and Trp +16 Da show significant variability among the three biological replicates, with replicates one and two showing a similar pattern of relative modification enrichment over time, while replicate three has a different pattern (Additional files 10 and 11). In addition, for both modifications replicates one and two show a peak of modified peptide counts centered at or near the 8 h time point (the exponential-stationary phase transition; this timepoint was excluded from our initial comparisons of stationary vs. exponential enrichment), with the proportion of modified PSMs then decreasing to early-exponential-phase levels or below by 24 h.

The reason for the discrepancy between the third replicate and the two others is unclear; the third replicate was prepared at a later date than the first two replicates, so it is likely that much of this variability is due to batch effects. This observation in combination with the common occurrence of oxidative modifications as experimental artifacts [65] makes it difficult to draw any biological conclusions from the temporal patterns of oxidative modifications. Because all samples within each replicate were prepared in parallel, we would expect any artifactual modifications to covary across samples in a replicate; our observation of within-replicate variance correlated across at least two samples is therefore difficult to explain unless some samples have a higher intrinsic rate of artifactual oxidation, or some amount of genuine biological variation is present.

We observe the discrepancy among replicates only for oxidative modifications and not for other modified peptide counts or overall peptide levels, so one possibility is that a difference in redox conditions in sample processing influenced the number of oxidized peptides that were recovered. Differential modification in the third replicate is apparent in the temporal modification patterns of individual target sites (Additional files 12 and 13), but does not display a consistent pattern across sites. Tables of all methionine and tryptophan oxidation sites are included in Additional file 2.

About

We Support OUR Contributors

Get Our Newsletter

 Receive podcast updates
Exclusive insights
Patient Engagement Tips from industry experts
We hate SPAM as much as you do and promise to keep your email address safe.
  • Subscribe to the Podcast