Wednesday, 25 March 2015

Tutorial: estimating the stability effect of a mutation with FoldX

Introduction:

Here is a brief tutorial on how to use FoldX to estimate the stability effect of a mutation in a 3D structure. The stability (ΔG) of a protein is defined by the free energy, which is express in kcal/mol. The lower it is, the more stable it is. G is difference of free energy between a wild-type and mutant. A mutation that brings energy (ΔΔG>kcal/mol) will destabilise the structure, while a mutation that remove energy (ΔΔG<kcal/mol) will stabilise the structure. A common threshold is to say that a mutation has a significant effect if ΔΔG is >1 kcal/mol, which roughly corresponds to a single hydrogen bond.

A way to compute the free energy is to use molecular dynamics. Main problem: it can be very time-consuming.

FoldX uses an empirical method to estimate the stability effect of a mutation. The executable is available here: http://foldx.crg.es/

NB: I strongly encourage to read the manual (before or in parallel of this tutorial).

Foldx was used in many studies, i.e.:
Tokuriki N, Stricher F, Serrano L, Tawfik DS. How protein stability and new functions trade off. PLoS Comput Biol. 2008 Feb 29;4(2):e1000002 http://dx.doi.org/10.1371/journal.pcbi.1000002

Dasmeh P, Serohijos AW, Kepp KP, Shakhnovich EI. Positively selected sites in cetacean myoglobins contribute to protein stability. PLoS Comput Biol. 2013;9(3):e1002929. http://dx.doi.org/10.1371/journal.pcbi.1002929
And I personally used it in three of my studies:
Studer RA, Christin PA, Williams MA, Orengo CA. Stability-activity tradeoffs constrain the adaptive evolution of RubisCO. Proc Natl Acad Sci U S A. 2014 Feb 11;111(6):2223-8. http://dx.doi.org/10.1073/pnas.1310811111
Studer RA, Opperdoes FR, Nicolaes GA, Mulder AB, Mulder R. Understanding the functional difference between growth arrest-specific protein 6 and protein S: an evolutionary approach. Open Biol. 2014 Oct;4(10). pii: 140121. http://dx.doi.org/10.1098/rsob.140121

Rallapalli PM, Orengo CA, Studer RA, Perkins SJ. Positive selection during the evolution of the blood coagulation factors in the context of their disease-causing mutations. Mol Biol Evol. 2014 Nov;31(11):3040-56. http://dx.doi.org/10.1093/molbev/msu248

Example:

The structure is a bacterial cytochrome P450 (PDB:4TVF). You can download it PDB file (4TVF.pdb) from here: http://www.rcsb.org/pdb/explore.do?structureId=4TVF

We would to test the stability of mutatinf the leucine (L) at position 280 to an aspartic acid (D).

Here is the original structure, with Leu280 in green, and residues around 6Å in yellow:



FoldX has different modes to run it, but I use the mode "runfile", which contains COMMANDS and OPTIONS in one single file.


FoldX works in two steps:


1) Repair the structure.

There are frequently problems in PDB structures, like steric clashes. FoldX will try to fix them and lower the global energy (ΔG). The command "RepairPDB" is better than the "Optimize" command. Here is the command file "foldx_repair.txt":
<TITLE>FOLDX_runscript;
<JOBSTART>#;
<PDBS>4TVF.pdb;
<BATCH>#;
<COMMANDS>FOLDX_commandfile;
<RepairPDB>#;
<END>#;
<OPTIONS>FOLDX_optionfile;
<Temperature>298;
<R>#;
<pH>7;
<IonStrength>0.050;
<water>-CRYSTAL;
<metal>-CRYSTAL;
<VdWDesign>2;
<OutPDB>true;
<pdb_hydrogens>false;
<END>#;
<JOBEND>#;
<ENDFILE>#;
We indicate which PDB file it needs to use, that we want to repair it (<RepairPDB>), that it will use water and metal bonds from the PDB file (<water>-CRYSTAL; <metal>-CRYSTAL;) and that we want a PDB as output (<OutPDB>true).

You have to run the command file "foldx_repair.txt" like this:
foldx3b6 -runfile foldx_repair.txt
This process is quite long (around 10 minutes). Here is the result (the original structure is now in white, while the repaired structure is in yellow/green):

 We can see that some side chains have slightly move (in particular Phe16).

The starting free energy ΔG was 64.99 kcal/mol, and it was lowered to -48.15 kcal/mol, which is now stable (remember that a "+" sign means unstable while a "-" sign means stable).

Once it's finished, it will produce a file named "RepairPDB_4TVF.pdb", which you will use in the next step.


2) Perform the mutation

The mutation itself is perform by the BuildModel function. There are other methods, but the BuildModel is the most robust. You need also to specify the mutation in a separate file "individual_list.txt".

In the command file, you will see that is RepairPDB_4K33.pdb and not 4K33.pdb that is mutated. You will also notice "<numberOfRuns>3;". This is because some residues can have many rotamers and could have some convergence problems. You may to increase this values to 5 or 10, in case you are mutated long residues (i.e. Arginine) that have many rotamers.

Here the command file "foldx_build.txt":
<TITLE>FOLDX_runscript;
<JOBSTART>#;
<PDBS>RepairPDB_4TVF.pdb;
<BATCH>#;
<COMMANDS>FOLDX_commandfile;
<BuildModel>#,individual_list.txt;
<END>#;
<OPTIONS>FOLDX_optionfile;
<Temperature>298;
<R>#;
<pH>7;
<IonStrength>0.050;
<water>-CRYSTAL;
<metal>-CRYSTAL;
<VdWDesign>2;
<OutPDB>true;
<numberOfRuns>3;
<END>#;
<JOBEND>#;
<ENDFILE>#;
and the "individual_list.txt" (just one line):
LA280D;
It contains the starting amino acid (L), the chain (A), the position (280) and the amino acid you want at the end (D). One line correspond to one mutant. It means you can mutate many residues at the same per line (mutant) and also  produce different mutants by different numbers of lines.

You can run it by:
foldx3b6 -runfile foldx_build.txt
It is much faster this time (i.e. a few seconds) and will produce many files.

FoldX will first mutate the target residue (L) to itself (L) and move it as well as all neighbouring side chains multiple times. We can see that Leu280 (green) was rotated:

=> This is will give the free energy of the wild-type (let's call it ΔGwt).

Then, it will mutate the target residue (L) to the desired mutant (D) and move it as well as all neighbouring side chains multiple times. We can see that Leu280 is mutated to Asp280 (see the two oxygen atoms in red):


=> This is will give the free energy of the mutant (let's call it ΔGmut).



The difference in free energy (ΔΔG) is given by ΔGmut-ΔGwt.

In the file "Raw_BuildModel_RepairPDB_4TVF.fxout", you can retrieve the energy of the three runs for both WT and Mutant.

Run1:
  • ΔGmut = RepairPDB_4TVF_1_0.pdb = -41.1377 kcal/mol
  • ΔGwt = WT_RepairPDB_4TVF_1_0.pdb = -46.0464 kcal/mol
  • => ΔΔG = ΔGmut-ΔGwt = (-41.1377)-(-46.0464) = +4.9087 kcal/mol

One file contains the average difference over all runs: "Average_BuildModel_RepairPDB_4K33.fxout".
You will notice that the difference in free energy ΔΔG is +4.84 kcal/mol (+- 0.06 kcal/mol).

=> It means the mutation L280D is highly destabilising (positive value, and much above 1.0 kcal/mol). Here is the final mutant:


PS: Another way to define the threshold is to use the SD deviation multiple times:

The reported accuracy of FoldX is 0.46 kcal/mol (i.e., the SD of the difference
between ΔΔGs calculated by FoldX and the experimental values). We can bin the ΔΔG values into seven categories:
  1. highly stabilising (ΔΔG < −1.84 kcal/mol); 
  2. stabilising (−1.84 kcal/mol ≤ ΔΔG < −0.92 kcal/mol); 
  3. slightly stabilising (−0.92 kcal/mol ≤ ΔΔG < −0.46 kcal/mol); 
  4. neutral (−0.46 kcal/mol < ΔΔG ≤ +0.46 kcal/mol);
  5. slightly destabilising (+0.46 kcal/mol < ΔΔG ≤ +0.92 kcal/mol);
  6. destabilising (+0.92 kcal/mol < ΔΔG ≤ +1.84 kcal/mol);
  7. highly destabilising (ΔΔG > +1.84 kcal/mol).



35 comments:

  1. Hi, I have saved the PDB file and the foldx_repair.txt file in the FoldX folder. When I tried to execute the runfile command, I keep getting a return message that says enter a valid runfile. Do you know why this is so? Am I missing a step?

    ReplyDelete
    Replies
    1. Hi,

      It should work. I just tried it and it works on my computer (under MacOSX).

      Please send me back your files, I could see if something is wrong.

      Delete
    2. Dear Himani,


      Thank you for your message.

      A quick question before I can answer you: which version of FoldX do you use?

      Because this tutorial is made for FoldX3, and they recently released FoldX4: http://foldxsuite.crg.eu/

      And apparently, the input format is totally different.

      Best regards,
      Romain

      Delete
    3. Dear Romain

      Thanks for your reply. I am using FOLDX 3 ....I also downloaded FOLDX 4 but its not working on my Mac.

      I just figured out the reason of the error, I did not enter full path to the pdb. And now I did, FOLDX is able to read the pdb. But now it cannot find rotabase.txt. Now I am not sure where in the code can I enter the path of rotabase.txt?

      Thank you very much again for your prompt reply and your help.

      Delete
    4. The file "rotabase.txt" has to be in the same folder as your other files.

      Which is not explained in this tutorial. I will correct that, thanks!


      Romain

      Delete
    5. Dear Romain

      The file is in the same folder as other files. But I still need to tell FOLDX about its absolute path, otherwise it cannot find it and shows the following error:

      Make a choice by entering a number.
      1 - Single Run
      2 - Batch Run
      3 - Run File
      4 - Default Single
      5 - Default Batch
      9 - end
      : 3

      enter a valid runfile > /Users/eccroot/Dropbox/Myostatin/FOLDX_example/foldx_repair.txt

      Start time of FoldX: Fri Oct 23 11:24:15 2015

      /Users/eccroot/Dropbox/Myostatin/FOLDX_example/4TVF.pdb
      1 models read.
      Error, can't open rotabase.txt
      logout

      [Process completed]

      Could you please suggest how to rectify this?

      Delete
    6. Could you contact me by email ? rstuder [at] ebi.ac.uk

      Delete
    7. Yes Sure. I have sent you an email.

      Delete
  2. Hi,

    Thanks for your tutorial. Do you know if there are other softwares like foldx to mutate a protein in all positions by all amino acids. I think that foldx is great for this, at least in the aspect of time it takes to do it (around 7-8 hours, if the protein has around 100 positions).

    Im looking to do it with another software to compare the results with some experimental results. Would you say that foldx is the best sofware to use? Do you know SCWRL4? Is it better?

    Thank you

    ReplyDelete
    Replies
    1. Hi,

      FoldX was the best tool for this task, as it is fast and scriptable. Still, it is not perfect, as it is based on empirical model.

      I didn't look recently if other tools improved over FoldX.

      But Rosetta could be a tool to try, I heard it is quite good too (however I don't how speed it is).

      I don't know SCWRL4, sorry.
      Romain

      Delete
    2. Thank you for your answer. I do need a scriptable prediction software. I took a look at some homology based predictors like swiss model but my impression is that they were not created for highthroughput mutational studies (not scriptable).
      I will try Rosetta, thank you for the advice.

      Rodrigo

      Delete
  3. Hello- Thank you for such use full tutorial. It was very helpfull. My doubt is how to find change in fold of mutant over wild type. I basically want to find the structural change with mutation which in my case mutation should cause complete change in fold of the protein.

    ReplyDelete
  4. Hello- Thank you for such use full tutorial. It was very helpfull. My doubt is how to find change in fold of mutant over wild type. I basically want to find the structural change with mutation which in my case mutation should cause complete change in fold of the protein.

    ReplyDelete
  5. Hey,
    Nice tutorial. I'm using foldx4 and mutated several aminoacids in my protein of interest. However, I just receive an empty output file after each run - I know, they state in the foldx manual that DDG's below 0,05 will be excluded (but analyzing 10 aa for all possibilities and not receiving anything?). However, when I run the stability command and calculate the DDG by myown I get many values > 0.05.
    Did I miss a flag for the positionscan command? Does anyone have an idea whats happening?

    ReplyDelete
    Replies
    1. Hi,

      My tutorial is based on FoldX3. They totally redesigned the interface and parameter files for FoldX4. Did you converting my command lines to the new parameter file system?

      (It is in my todo list, update the tutorial to FoldX4).

      Delete
    2. Hey,
      Yes I did (I also tried several other things to force some kind of output - like mutations which should make a mess of the protein)
      A nice control would be to test whether I get the same results with FoldX3 - But it's not online anymore ...

      Delete
  6. Can we run these scripts for Multiple Mutation in a single PDB file?

    ReplyDelete
    Replies
    1. Yes. Just put them in the individual_file.txt on the same line separated by a comma. It is well described in the documentation.

      Delete
    2. Sir, can you please help me out. After giving 1 protein as an input, followed by the command to the screen just disappears so can you please help me solving this problem.

      Delete
    3. Dear Mahendra Varma,

      I am sorry I cannot help there, your question is too vague. This tutorial is for FoldX3, FoldX4 uses a different user interface. Have a look in the manual to see how to use the new interface. And for your technical question, please contact FoldX authors.

      Romain

      Delete
  7. Hi Romain;
    May i ask if you know how to make the input of AnalyseComplex in foldx4 a list of pdbs instead of just one. I have a list of the pdbs from BuildModel ( WT and mutants) and i do not want to input each file at a time for AnalyzeComplex??

    ReplyDelete
    Replies
    1. Hi Ahmed,

      You can create a file "pdblist" which will contains the list of all your PDBs. In the In-Out parameters:
      http://foldxsuite.crg.eu/command/AnalyseComplex
      http://foldxsuite.crg.eu/parameter/pdb-list

      Alas, the manual for FoldX4 is really poor in example.

      Delete
  8. Update: I just updated the tutorial for FoldX4: http://evosite3d.blogspot.co.uk/2016/04/tutorial-estimating-stability-effect-of.html

    ReplyDelete
  9. I am facing some problems as follows

    """Starting BuildModel
    I did not find the wild-type DNA or PROTEIN sequence in the X-tal structure
    There was a problem in BuildModel
    in mutant_file.txtt found: H40A,H87A;""""

    Can you help please guide me in solving them...Thanks in Advance

    ReplyDelete
    Replies
    1. It looks like your forgot to specify the chain in the mutation file.
      http://foldxsuite.crg.eu/parameter/mutant-file
      "In this mode the file individual_list.txt contains the mutations we want to make separated by a comma, in the classical FoldX way (WT residue, chain, residue number, mutant residue). It is required than the name of the file containing the mutations starts by individual_list."


      Romain

      Delete
    2. I have deleted all the extra chains from pdb file....and tried to execute the effect of mutations on on single chain stability....can you please mention...How to mention chain in mutant file....Please provide me with some examples....

      Delete
    3. There are examples provided there:
      http://foldxsuite.crg.eu/parameter/mutant-file

      In this mode the file individual_list.txt contains the mutations we want to make separated by a comma, in the classical FoldX way (WT residue, chain, residue number, mutant residue).

      Delete
    4. I got it...thanks for your kind help....But unfortunately i am facing another issue....i am error as follows
      Residue to Mutate ASNA30 has residue index 1
      terminate called after throwing an instance of 'std::out_of_range'
      what(): basic_string::basic_string: __pos (which is 2) > this->size() (which is 0)
      Aborted (core dumped)

      can you please tell me where i am doing mistakes...how to overcome this issues..

      Delete
    5. It is probably a problem the numbering of residue in your PDB file, versus the numbering of residues in the sequence. Please check that first.

      If it doesn't work, I would suggest you directly contact the FOLDX team, as it seems more a technical problem: http://foldxsuite.crg.eu/about


      Romain

      Delete
  10. Hi, Romain

    Do you know any way to put a residue with an "insertion code" into an individual_list mutation file?

    --Ted

    ReplyDelete
    Replies
    1. Hi Ted,

      No, I don't know, sorry. Do you have an example about this insertion code?

      Romain

      Delete
    2. Hi, Romain -

      this is a small section of a pdb file (allegedly from 5FYK, but actually from another researcher)

      ATOM 6813 N GLY G 321 54.901 13.289 -37.867 1.00100.21
      ATOM 6814 CA GLY G 321 55.409 12.916 -39.174 1.00 88.91
      ATOM 6815 C GLY G 321 54.374 12.159 -39.984 1.00119.62
      ATOM 6816 O GLY G 321 54.163 12.446 -41.163 1.00 90.92
      ATOM 6817 N GLU G 321A 53.721 11.193 -39.341 1.00119.98
      ATOM 6818 CA GLU G 321A 52.680 10.403 -39.988 1.00116.31
      ATOM 6819 C GLU G 321A 51.783 9.720 -38.963 1.00112.32
      ATOM 6820 O GLU G 321A 51.960 9.890 -37.757 1.00111.80
      ATOM 6821 CB GLU G 321A 53.294 9.358 -40.916 1.00117.91
      ATOM 6822 CG GLU G 321A 52.736 9.385 -42.325 1.00119.77
      ATOM 6823 CD GLU G 321A 51.759 8.257 -42.579 1.00124.80
      ATOM 6824 OE1 GLU G 321A 50.567 8.416 -42.242 1.00124.21
      ATOM 6825 OE2 GLU G 321A 52.180 7.217 -43.128 1.00129.59
      ATOM 6826 N ILE G 322 50.814 8.956 -39.463 1.00116.63
      ATOM 6827 CA ILE G 322 49.865 8.226 -38.625 1.00115.96

      I would like to mutate the residue at 321A but a substitution token like "QG321AP" (GLU to PRO) is rejected as an error by the foldx BuildModel command.

      I noticed this section of a pdb file syntax document:

      22 Character chainID Chain identifier.
      23 - 26 Integer resSeq Residue sequence number.
      27 AChar iCode Code for insertion of residues.

      Which appears to claim a letter in column 27 is legal.

      Thanks very much for your attention.

      --Ted

      Delete
    3. Hi Ted,

      I finally have a look at your problem, and this is one of the oddities of the PDB format. To be honest, I am not sure how to handle that with FoldX. I would recommand to directly contact the authors of this software. Sorry for not be more helpful.

      Best regards,
      Romain

      Delete
  11. Dear Sir,
    I am trying to give trajectory of pdbs as an input for computing the stabilty using foldx. But I dont know how to provide the multiple pdbs with your script. I would be much thankful if you do help me out.

    with regards
    kesavan

    ReplyDelete
    Replies
    1. Dear Kesavan,

      This tutorial is FoldX3. FoldX4, the current version, has a totally different interface. I updated this tutorial here:
      https://evosite3d.blogspot.co.uk/2016/04/tutorial-estimating-stability-effect-of.html


      The interface of FoldX is now as a command line interface, with various parameters.

      For example, to compute the global stability of one structure, you could run it like:
      ./foldx --command=RepairPDB --pdb=1bsx.pdb --ionStrength=0.05 --pH=7 --water=CRYSTAL --vdwDesign=2 --pdbHydrogens=false

      ./foldx --command=RepairPDB --pdb=1nq7.pdb --ionStrength=0.05 --pH=7 --water=CRYSTAL --vdwDesign=2 --pdbHydrogens=false;


      As it is in command line, it is easy to incorporate it in a script, or in a for-loop.

      Just to check:
      for i in *.pdb; do ls $i; done
      1bsx.pdb
      1nq7.pdb

      The core command:
      for i in *.pdb; do ./foldx --command=Stability --pdb=$i --ionStrength=0.05 --pH=7 --water=CRYSTAL --vdwDesign=2 --pdbHydrogens=false; done

      Which output the result file:
      ls *.fxout
      1bsx_0_ST.fxout
      1nq7_0_ST.fxout


      You have to tweak the command with your files based on how you based your numbering to record the files from the trajectory.

      Best regards,
      Romain


      Best regards,
      Romain

      Delete