HOW TOs

Please read the disclaimer before you move on.

How is undergraduate research defined?

According to the Council on Undergraduate Research: ``Undergraduate research is an inquiry or investigation conducted by an undergraduate that makes an original intellectual or creative contribution to the discipline.''

What does an undergraduate researcher do (in this research group)?

Your research in this group will have an emphasis on computational physics. Depending on your particular project, your experience may also include research on chemistry, biology, and computer sciences. Your work may include any of the following:

  • reading and understanding about ethics in research
  • research in the literature to find published results about a topic
  • read scientific literature
  • write code
  • prepare and perform calculations
  • analyze results
  • discuss results and their implications
  • design some aspect of the project
  • opportunities for oral communication (informal and formal presentations)
  • opportunities for written communication (end of semester formal report)
As part of your research experience you will:
  • work independently
  • use careful and reproducible techniques
  • strive to produce a significant finding
  • own the project

Advice for being succesful in undergraduate research

Engaging in a research project as an undergraduate student can be a highly rewarding (and demanding!) activity. Research has shown that undergraduate students benefit significantly from such an experience. However, it is a complex activity that requires one to engage in behaviors not practiced so far (extreme levels of organization, perseverence, attention to details among others). The following are pieces of advice based on experience, and other references including "How to mentor undergraduate researchers" published by the Council on Undergraduate Research.

  • Always engage in ethical behaviors in research
  • Keep communication channels open with your advisor and honestly inform her/him about your progress
  • Communicate in a professional manner with your advisor
  • Be extremely careful, pay attention to all details no matter how time consuming that is
  • Remember that quality is a lot more important than quantity
  • Try to find blocks of several hours for doing research rather than isolated single-hour periods
  • Have good work ethics and work consistently on your project
  • Be organized, keep good and detailed records
  • Be persistent!
You may also find some helpful information here:

How to behave ethically in research

This is the most important requirement while undertaking a research project. Unethical behavior in research undermines signficantly the confidence our society has in science. The physical and mental well being of human beings depends on the advancement of science. If our society loses its confidence in science, the well being of society itself is at stake. In addition, behaving unethically can also have direct impacts on the health of humans (see for example Tampered data cast shadow on drug trial). The following are some resources where you can learn about ethics in research and some other interesting articles:

How to keep good records of your work

During your research experience, you will likely create a lot of data. To be able to repeat your research (either experiments or simulations), to communicate to others, to analyze your data, it is paramount that you have an organized record of your work. These are some resources that may be helpful and/or interesting:

To search for published articles and books

In order to find an article you need (assuming you have a specific reference), you will have to access the website of your institution's library. Find out how to search for a specific journal and type the name of the journal where the article is published. If your institution has electronic access to that journal, then go to the website of the journal by clicking the appropriate link. Enter the volume and page numbers on the appropriate entry and you should have access to the PDF of the article you are looking for. If your institution does not have electronic access to that journal, then you may have access to the physical hard copies of the journal (it should show up as an option when you see the results of your search). If that is the case, then you will have to go to the library (yes, the actual building), and make photocopies of the article. If your institution does not have any access to that journal then use the Interlibrary Loan process to obtain access to this article.

Computational Sciences

Computational sciences now contribute equally along with experiments and theory to the advancement of science. Computations can complement experiments and theory in a way that allows us to understand even more about our natural world. For example, simulations allow scientists to obtain insight for natural phenomena that experiments cannot provide. Sometimes, experiments are too time consuming, or expensive, and in those cases simulations may provide us with required results faster as well as more inexpensively.

It is important to keep in mind though that computational models need to be validated before used to predict any physical quantity. This can be done by comparing the new computational method with an already validated computational method or with good quality experimental data.

You can read more about computational sciences here:

To write a manuscript/report for your research (including an abstract)

To prepare for a research presentation

Giving a presentation about your research is a wonderful experience for undergraduate students. These are some things to keep in mind:

  • Dress appropriately and comfortably (you will be standing for a few hours)
  • Stand next to your poster for the whole time period (have a bottle of water with you)
  • If you see someone that is looking at your poster, feel free to greet them, and ask them if they would like you to explain your work
  • Sometimes alcoholic drinks may be offered during those poster sessions. If you are a presenting author do not drink alcoholic drinks during your presentation, it is not professional behavior
These are some links with helpful information:

To benefit from a scientific conference/meeting

Attending a scientific conference is a great experience for undergraduate students. You can maximize your gains from attending a conference by keeping a few things in mind.

  • Conferences may include sessions designed specifically for undergraduate students. These sessions may include talks/workshops about careers, graduate schools, working in industry or academia etc.
  • Find interesting invited talks to attend. Sometimes the invited talks last longer and almost always are given by people that are experts in their field.
  • Look at the topics of each session and if you find something interesting look further into the titles and abstracts of talks from that session. Make a schedule and attend the talks you find most interesting.
  • Attend symposia/workshops/plenary sessions.
  • Attend the poster sessions (including the Sci-Mix ACS poster session), walk around, find interesting posters, and talk with the presenting authors.

To prepare a document with Latex

Latex is a program that you can use to create professional documents (a typesetting program). This is a very different program compared to Microsoft Word as what you see in your Latex file is not what you get. You have to create the Latex input file that contains the text as well as your instructions for formatting the text. Then you have to compile your input file in order to obtain first a DVI file and then a PDF (portable document format) file. If there are any typos, or mistakes in your input file regarding the formatting instructions, Latex will not be able to compile it. You will have to fix those errors in order to obtain the PDF file. Even though there is a learning curve associated with Latex, it is a program that will allow you to create professional documents, much better than what you can do with word processors.

First you need to create the Latex input file (say for example main.tex). An example of a simple Latex input file can be found here. You will also need to download a figure. Make a folder, for example "REPORT", and then make a folder "FIGURES" in the REPORT folder. The figure should be saved in the FIGURES folder, whereas the latex main.tex file should be in the REPORT folder. This example contains the instructions about creating a simple document, with a table, and a figure. Read it carefully (you can use gedit to read it) and try to understand what each line/command does.

Once you have your input file ready you will need to compile your document. To do this, you will need to use the terminal and go to the directory where you have the input file. We suggest you to have a separate folder where you keep this input file only because compiling it with Latex will result in other files being created. To compile your input file, run the command ''latex main.tex''. If your input file was compiled succesfully then you are now back to the terminal. Run the same command once again. If the program finished cleanly, then you have a ''main.dvi'' file. To convert the main.dvi (DVI) file to a PS (Post-Script) file you will need to run the command ''dvips main.dvi''. If everything goes well, you have a main.ps file created in this directory. Then, you will have to convert the PS file to a PDF file by running the command "ps2pdf main.ps". If everything goes well, you have a main.pdf file created in your directory.

If when compiling your input file, you got error messages then try to indentify which line the error message is at. This is typically shown with an error like ''l.54'' that indicates that the error is on line 54 of your input file. Latex will likely indicate what is wrong with the input file you have (for example, Latex may indicate that a right bracket is missing). Find and correct the error before compiling your Latex input file again. If Latex gets stuck when compiling your input file press Control + C to stop Latex and get back to the terminal (if Control + C does not work try Control + Z)

This is a list with some helpful introductory tutorials on Latex:

Latex is available for free. MikTex is the Windows version of Latex, also available for free. You can install Miktex on your Windows PC. If you are running Linux or MAC OS then you can install Latex for free.

If you are having issues with the margins then you may have to instruct dvips to use letter size pages with the command: ''texconfig dvips paper letter''. See here for more information.

To include references in your Latex document

You can use bibtex to include references in your Latex document. You will need to have one (1) more file, in addition to the ``main.tex'' file. Specifically, you will need a references.bib file. Download those two files and include them in the same directory with your ``main.tex'' file. Then, in the ``main.tex'' file include at the very end, but before the end of the document, ``\bibliographystyle{plain}'' and ``\bibliography{references}''. In the ``references.bib'' file you can include the information about all your references and you can use the ``\cite{}'' command in your document for referencing a particular article, book, etc. When compiling your code, you will have to compile once with ``latex main'', then do ``bibtex main'', then again ``latex main'', then do ``bibtex main'' again, then again ``latex main'', and finally convert your DVI file to a PS and then PDF file. This is a list with some helpful introductory tutorials on Bibtex:

To make a research presentation with Latex

You can use Latex to make a presentation of very high quality. You can find a Latex presentation template here and use it to make your own presentation. The template includes an itemized list, table, figure, slide columns. Download the figure from here and save it in a directory named "figures". To be able to compile this template you will need to have the Beamer package. If you compile the latex file and then convert it to a PDF you should get this. For more information on Beamer:

To make a research poster with Latex

You can use Latex to make a research poster of very high quality. Please find a Latex poster template here and use it to make your own research poster. If you compile this latex poster you should get this PDF poster. You can choose any theme you prefer, just comment out the current choice and uncomment the one you want to use. You can make tables, equations and insert pictures in the same way you do for a Latex document.

To create graphics (including flow-charts) with Latex

To make a plot with GNUPlot

UNDER CONSTRUCTION

How to use the Linux OS (basic commands)

Our computers use the Centos Linux operating system. You can check your home directory when you login to the computer and you can see what files and folders you have. For your research you will have to use the terminal, which includes a command line. In the command line you can type a command and when you press "Enter" the command will be executed. For example, after you are in the command line, type "ll", press "Enter", and then you will see a list with the contents of the current folder. Type "pwd" which stands for "print working directory" and you will see the current "working" directory. To change directory you can type "cd" followed by the name of the directory you want to go to. If you type "cd .." you will go up a directory.

You can read more about Linux and how to use a terminal here:

How to run programs installed in our computers

ProgramCommand
AutoDock Toolsadt
Chimerachimera
Latexlatex

Introduction to the PDB file format

Three-dimensional structures of small molecules as well as macromolecules can be deduced from experimental data. Typically, the structures of molecules are deposited in data banks, so that other researchers can use them (for structure-based drug design for example). The structure of a molecules is deposited as a file that includes the atoms of the molecule followed by the X, Y, and Z coordinates of each atom. In addition to the XYZ coordinates, the file may also contain information about the experimental method used to obtain the structure. The Protein Data Bank (PDB) is the data bank that includes structures of biological macromolecules, proteins, RNA, and DNA. The PDB uses a specific format for the structures deposited there (the PDF format). Another popular file format is the XYZ format.

You can read more about the PDB format and other file formats here:

Using CIF data

Data obtained from x-ray crystallography is most often in fractional coordinates, a coordinate system which describes the positions of atomic nuclei by the edges of the unit cell. To create a PDB file, fractional coordinates must be converted to cartesian coordinates, either by manually using a matrix or using a program such as Chimera. First, create a .cif file from the data. The IUCr guide for authors link is a helpful resource for this. If the crystal symmetry is monoclinic (alpha = gamma = 90 degrees, beta is not equal to 90), the coordinates may need to be in standard form (a parallel to x-axis, b in xy plane) for accurate conversion. The matrix for this may be generated by a program such as OpenBabel. Then, use Chimera (or a similar program) to open the .cif file; save the molecule as a PDB file under the "File" option. An alternative option is to generate the fractional to cartesian matrix from a program or other resource and manually apply it to the data.

What are the structural characteristics of biological macromolecules?

Proteins, RNA and DNA are all biological macromolecules with different three-dimensional structures. The structures of biological macromolecules ultimately affect their function, therefore it is important that the structure of those molecules remains intact.

You can read more about the structure of biological molecules here:

Experimental methods to determine structural characteristics of molecules

General information:

Experimentally determined molecular structures (what you see when you visualize a PDB or XYZ file) are the result of interpretation of experimental data. Thus, each molecular structure has limitations, depending on the quality of the interpretation. These are some guidelines when judging crystallographic structures deposited in databases. The resolution must be higher than 2.5 Angstroms, R and R free values should be calculated and in an accepted range of values, occupancies should be exactly 1.0 and B values that indicate the mobility of the atoms should be extraordinary large, for example larger than 50 Angstroms². For more specific guidelines, please read the articles below that discuss the limits of experimental methods in determining structures of molecules.

Limitations of experimental methods:

To search for the structure of a small molecule

When searching for structures of small molecules you are practically searching for two things: a publication regarding the 3D experimental structure and the coordinates of all atoms. Typically publications do not include all XYZ coordinates so you will have to obtain that information from some database. You could either try to find the publication first and then find the coordinates (typically in PDB or XYZ format) or vice versa. To search for publications that report the structure follow the usual steps for searching in literature. To find the coordinates you will have to search on a database. Some databases are free, some are not.

These are some databases that include information on molecules:

In order to find publications that report the structure try searching on: If you do get a published article that describes the experimental structure, you can try asking from CCDC for the coordinates (if they are deposited there): If nothing of the above works, you could try find the structure of the molecule when it is bound to a macromolecule (protein or DNA) by searching in the PDB or NDB.

To do homology modelling

To a build a DNA model

To a build a GAG model

To choose the most representative conformer from a set of NMR-based structures

To validate structural data

To find binding affinity data

To find the net formal charge of a protein

We have a script available in our computers that allows us to calculate the net formal charge of an amino acid sequence (protein). First, you will need to create a simple text file that contains the amino acid sequence of the protein. Make sure you use the 3-letter abbreviation for the amino acids. See here for an example of this text file. To create this text file you can use (if available) the sequence that in the PDB file of that protein. For example, to make this text file, I copied the sequence from the PDB file of the protein (search for the SEQRES keyword). When you have this text file ready, then run the command "net-charge-protein.pl your-amino-acid-sequence-file" and the program will print on the monitor the net formal charge of that amino acid sequence. This calculation assumes that there are two negatively charged amino acids (ASP and GLU), and two positively charged amino acids (LYS and ARG). Histidine is assumed to be neutral.

How to submit an AutoGrid calculation

When you have all input files ready, then you can submit the autogrid job that will calculate the free energy grids. To submit the autogrid calculation you can type in the command line: "subag4 FILE.gpf -q veryshort". Replace FILE.gpf with the name of your GPF file.

How to check the results of an AutoGrid calculation

When the free energy grids are calculated you can check the log file (FILE.glg) for any warnings, errors, and the succesful completion of the grids. Open the grid log file and check for any warnings or errors. At the end of the log file, the line should read "succesfull completion".

When you have all input files ready and the energy grids calculated, you can submit the autdock job that will find the probable physical binding sites. To submit the autodock calculation you can type in the command line: "subad4 FILE.dpf -q veryshort/short/medium/long/verylong". Replace FILE.dpf with the name of your own DPF file and choose the most appropriate queue for your calculation.

How to check the results of an AutoDock calculation

When the docking calculations are completed you can check the docking log file (FILE.dlg) for any warnings, errors, and the succesful completion of the docking. Open the docking log file and check for any warnings or errors. At the end of the log file, the line should read "succesfull completion".

Queueing system information

We use a queueing system that manages our calculations in the most efficient way. Our cluster has five (5) queues, with different time limits. When you submit your calculation please make sure that you use the most appropriate queue so that your calculations will finish as soon as possible. We currently have the following queues with time limits:

  • veryshort: 1 HOUR
  • short: 24 HOURS
  • medium: 7 DAYS
  • long: 28 DAYS
  • verylong: 84 DAYS
Once you submit your own calculation you can check the status by typing "qstat" on the command line. To get a lot of information about a single calculation you can type "qstat -j JOBID" where JOBID is the ID of your calculation (you can retrieve this by typing "qstat"). To see everyone's calculations you can type "qstat -u "*"" in the command line.

Which files to keep from an AutoGrid and AutoDock calculation?

In general, we need to keep all input and output files. The input files are necessary because they contain all the information we need to redo a calculation if necessary and to know how did a particular calculation. The output files are necessary because they contain the results from our calculations but also logs of all calculations. The input files you need to keep are: MACRO.PDBQT, LIGAND.PDBQT, FILE.GPF, FILE.DPF. The output files you need to keep are: FILE.GLG, FILE.DLG, FILE.*.FLD. You can delete all files that end with .map as they use a lot of disk space and they are not needed, they are intermediate files and can be created again using the input files if we need them.

Reclustering AutoDock results with and without a reference ligand geometry

Depending on your project you may have to recluster your AutoDock results. To do this, use the terminal and go to the directory where you have your results. Then run the command "pythonsh /share/apps/MGLTOOLS/mgltools_x86_64Linux2_1.5.4/MGLToolsPckgs/AutoDockTools/Utilities24/summarize_results4.py -d . -t 2.0". If you a reference file and you want to recluster your AutoDock results against that ligand geometry then do instead "pythonsh /share/apps/MGLTOOLS/mgltools_x86_64Linux2_1.5.4/MGLToolsPckgs/AutoDockTools/Utilities24/summarize_results4.py -d . -t 2.0 -f YOUR-RMSD-REFERENCE", where YOUR-RMSD-REFERENCE is the ligand file with the reference geometry. This will create a file called "summary_of_results_2.0". This file contains the reclustered results. The 2nd column is the "number in cluster", the 3rd column is the "lowest binding free energy", the 4th column is "RMSD", the 5th column is "number of ligand atoms", and the 6th column is "number of ligand rotatable bonds". Usually, the 5th and the 6th columns are not usefull.

Making high quality figures with the molecular visualization program Chimera

Once you have AutoDock results you may want to do some high-quality figures for reports and presentations. Use AutoDock Tools to open the DLG file with the results and visualize the conformations. Find the conformation you want to use for your picture. Click on FILE, then WRITE PDB, and then save your coordinates in PDB form. Use Chimera to read the macromolecule PDB file and the ligand PDB file (the one you just created). Click on PRESETS, then PUBLICATION 1. Do any other changes that will make you figure clear. Click on FILE, then SAVE IMAGE, change SUPERSAMPLE to 4X4, and then save image in EPS format.

If you have a molecular surface then you need to increase the quality of that surface. To do this, go to model panel in favorites and click on the MSMS model of surface (attributes), and then increase ten-fold the vertex density. You will also have to increase ten-fold the subdivision quality in vieweing effects.

Find the binding free energy at a specific geometry

If you want to find the binding free energy of a ligand-macromolecule system for a specific geometry (for example the geometry of the crystal structure model), then you will have to use the "epdb" keyword in your DPF file. In particular, you will have to write "epdb" in a separate line in your DPF file and you will have to comment out all keywords which invoke a search ("ga_run" and "analysis"). For an example, see here.

Find the local minimum of the binding free energy around a specific geometry

If you want to find a local minimum of the binding free energy around a geometry that you specify (for example around the geometry of the crystal structure model), then you will have to use the "do_local_only" keyword. In particular, you will have to:

  • To keep the ligand's input position, use the same "tran0" x,y,z values as those specified in the "about" line of the DPF (copy and paste the x,y,z values)
  • To keep the ligand's input orientation, use "axisangle0 1. 0. 0. 0." (instead of the default)
  • To keep the ligand's input conformation, use "dihe0" followed by a 0. (zero) for every torsional degree of freedom (for example you should have "dihe0 0. 0." if there are two torsional degrees of freedom.
  • Comment out the "ga_run" variable since you will not be doing any LGA calculations.
  • Include the "do_local_only" keyword followed by the number of local minimization calculations (for example "do_local_only 100") before the "analysis" keyword.
You can take a look at a DPF file that requests the geometries at exactly the specified geometry of the ligand (with the "epdb" keyword) as well as a local minimization calculation here.

Deligkaris Group Member Responsibilities

Having the privilege of working in a research group, means that you also have responsibilities. For our research group these are your responsibilities:

  • Always logout from our computers. Never switch user, never shut down.
  • Clear your Firefox history/cache/etc often.
  • Empty your trash box often.
  • Do not use USB sticks in our computers. If you have to transfer files, you can do it using Google Drive, Dropbox etc.
  • Drinks, food are not allowed in our computer lab (TSC152). If you have a water bottle with you, leave it on the floor, away from the computers.
  • Make sure you always lock the door when you leave from our computer lab.
  • You will need a security pass to have access to our computer lab. Make sure you carry that all the time with you, Drury Security will ask you to leave if you do not have the pass with you.
  • If you need to get in our computer lab, you can call security and they will come and unlock the door for you.
  • All files, scripts, programs etc related with your research in this group must be saved in our computers. Data contained in our computers is frequently backed up and we do have hard drive redundancy.
  • Naming: use all upper-case for folders, lower-case for files. Use dashes to separate words in file/folder names, never leave a space in any name.
  • Naming of articles: Use the last name of the first author, followed by the year the article it was published. If there are more than one papers with the same name, year, add "a", "b" etc. For example Deligkaris2014.pdf or Deligkaris2014a.pdf and Deligkaris2014b.pdf etc.
  • Check often the amount of space you are using. Open a terminal, go to your home directory (you may already be there when you open the terminal), and type the command "du -h". After a few seconds you should see the amount of data you have. It is important to keep the amount of data under control. If you run out of disk space, your calculations will not finish, and you may not even be able to log in. Your disk quota is 10GB.
  • Organize your folders according to this scheme.