Conversion
Use of the conversion file
This file defines the data input to the ARIA project:
- the molecule or the molecular assembly on which the ARIA run is performed
- the NMR data (chemical shift, distance and angle restraints)
Definition of attributes
The list of elements and attributes encountered in a xml conversion file is:
- 'filename' is an attribute of input or output, giving the name of the corresponding input and output files. For the cross-peak file, 'filename is only input.
- element project:
- 'molecule type" can be either "PROTEIN", "DNA" or "RNA". This attribute determines the topology and parameters files used in the structure calculation. PROTEIN corresponds to the set of files: parallhdg5.2.pro, topallhdg.pep and topallhdg5.2.pro, "DNA" and "RNA" to the set of files: dna-rna-allatom.param, dna-rna.link and dna-rna-allatom.top.
- 'molecule_name' is used to define the root name of structure files produced by ARIA. For example, if 'molecule_name' is 'prot', the file names will be: prot_xx.pdb, where xx is the generation rank.
- 'molecule_segid' defines the chain name. The default value is one chain (for a dimer, see the tutorial Symmetric homodimers). The chain ids are generally numbered as: ' A', ' B', ' C' etc...
- 'first_residue_number' specifies where the residue numbering starts (in case of SEQ format).
- 'format' defines the format of the sequence file. Allowed formats are "seq" (a sequence of three letter codes for the residues: see an example of it) or "pdb" (a PDB file). If the input data are in PDB format, a naming convention for the residues and atoms has to be specified; (see below) and the "molecule_segid" entry will be ignored (the segids will be read from the PDB-file).
- 'naming_convention' is the nomenclature of the atom names. Possible choices are: "iupac", "dyana" and "cns". By default, it is the IUPAC nomenclature.
- Spectrum
- 'spectrum_name' is used in the xml project file to handle the corresponding spectrum (peak list).
- 'spectrum_type' describes the type of the spectrum and has to be one of the following strings:
for 2D homonuclear noesy: "noesy.hh"
for 3D noesy (carbon): "noesy_hsqc_HCH.hhc"
for 3D noesy (nitrogen): "noesy_hsqc_HNH.hhn"
for 4D noesy (carbon): "noesy_hsqc_HCCH.hhcc"
for 4D noesy (nitrogen): "noesy_hsqc_HNNH.hhnn"
for 4D noesy (carb/nitr): "noesy_hsqc_HCNH.hhcn"
for 4D noesy (nitr/carb): "noesy_hsqc_HNCH.hhnc"
- 'segid' denotes the id of the chain on which the spectrum was recorded. If the spectrum contains peaks from two chains 'A' and 'B'), the 'segid' is 'A/B'.
- 'format' is an attribute of chemical shifts and of cross_peaks describing the format
of the chemical shift and of the peak list files. The supported format are: "Ansig","NmrView","Pronto","Sparky" and "XEasy".
- If the attribute filename in chemical_shifts is equal to "", the chemical shift values will be calculated from the assignments in the peak list (notably for Ansig format this is necessary). This will only work for fully assigned peak lists.
- The element 'assignment' can only be used with .assign files from XEasy, as in XEasy, you may specify your cross peak assignments in a separate ".assign" file. In that case specify the attribute "filename" in the element "assignments". Set filename to "" for all other cases.
- 'proton1', 'proton2', 'hetero1', 'hetero2' define the spectral dimension number for the homonuclei ('proton') and heteronuclei. The entry for the 1st and the 2nd proton should always be present and should be a number (1, 2, 3, or 4) that corresponds to the relevant dimension in the original peak list file. For multi-dimensional experiments the corresponding hetero dimensions have to be set in the same way (note that hetero1 corresponds to the hetero dimension linked to proton1).
Example of file
The format of conversion.xml for a monomer is given below.
The file 'project_proj1.xml' is the xml project file, output of the conversion (see page Project for an explanation about the format of the project file).
The name of the sequence file is ''data/sequence/B2_72_mod.seq". The molecule xml file is "xml/satt_strucA.xml" (see page Molecule for a more detailled explanation of the format of this file).
The input file containing chemical shifts is : "data/hnhsqcnoe/hnhsqcnoe.ppm1". The output xml file for the chemical shifts is: "xml/shhsqcnoe.xml" (see page Chemical shifts for a more detailled explanation of its format).
The input cross-peak file is: "data/hnhsqcnoe/hsqcnoe.xpk", and the corresponding output is "./xml/sp_hsqcnoe.xml" (format described in the page Spectrum).
<!DOCTYPE conversion SYSTEM "conversion1.0.dtd">
<conversion>
<project>
<output filename="protein_proj1.xml"/>
</project>
<molecule molecule_type="PROTEIN"
molecule_name="prot"
molecule_segid=" A"
first_residue_number="1">
<input
filename="data/sequence/B2_72_mod.seq"
format="seq"
naming_convention=""/>
<output filename="xml/satt_strucA.xml"/>
</molecule>
<spectrum
spectrum_name="hsqcnoe"
spectrum_type="noesy_hsqc_HCH.hhc"
spectrum_ambiguity="intra"
segids=" A">
<chemical_shifts>
<input
filename="data/hnhsqcnoe/hnhsqcnoe.ppm1"
format="nmrview"/>
<output
filename="xml/shhsqcnoe.xml"/>
</chemical_shifts>
<cross_peaks>
<input
filename="data/hnhsqcnoe/hsqcnoe.xpk"
format="xeasy"
proton1="2"
hetero1="1"
proton2="3"
hetero2=""/>
<output
filename="./xml/sp_hsqcnoe.xml"/>
<assignments
filename=""/>
</cross_peaks>
</spectrum>