Skip to content

Latest commit

 

History

History
138 lines (95 loc) · 4.39 KB

special.md

File metadata and controls

138 lines (95 loc) · 4.39 KB

Special Cases When Working with Protein Structures

Alternate Locations

Some PDB entries contain alternate conformations for parts of a structure or a group. BioJava merges alternate conformations into a single group, for which alternative groups are available.

			
			Structure s = StructureIO.getStructure("1AAC");

			Chain a = s.getChainByPDB("A");

			Group g = a.getGroupByPDB( ResidueNumber.fromString("27"));

			System.out.println(g);
			for (Atom atom : g.getAtoms()) {
				System.out.print(atom.toPDB());
			}
			
			
			int pos = 0;
			for (Group alt: g.getAltLocs()) {
				pos++;
				System.out.println("altLoc: " + pos + " " + alt);
				for (Atom atom : alt.getAtoms()) {
					System.out.print(atom.toPDB());
				}
			} 

Insertion Codes

Insertion codes were introduced in the PDB, when people wanted to compare the "same" protein between different species. As it turned out the "same" protein was not showing exactly the same sequence in different species and in some cases insertions were found, resulting in a longer sequences. For the comparison of the proteins the numbering was considered important to be preserved. This was so one could say that for example "HIS 75" is an important residue. To make up for the mismatch in the lengths of the sequences insertion codes were introduced. As a consequence, in PDB, a particular residue is identified uniquely by three data items: chain identifier, residue number, and insertion code.

BioJava contains the ResidueNumber object to help with characterizing each group in a file. PDB ID 1IGY contains some extra residues around chain B position 82. BioJava can represent these like this:

			Structure s1 = StructureIO.getStructure("1IGY");
			
			Chain b = s1.getChainByPDB("B");
			
			for (Group g : b.getAtomGroups()){
				System.out.println(g.getResidueNumber() + " " + g.getPDBName() + " " + g.getResidueNumber().getInsCode());
			}
			

This will display the following table: (residuenumber, name, insertion code)

		...
			81 HIS null
			82 LEU null
			82A SER A
			82B SER B
			82C LEU C
			83 THR null
			84 SER null
		...	

Chromophores

A chromophore is the part of a molecule responsible for its color. Some proteins, such as GFP contain a chromopohre that consists of three modified residues. BioJava represents this as a single group in terms of atoms, however as three amino acids when creating the amino acid sequences.

			
						
			// make sure we download chemical component definitions
			// which is required for correctly representing the chromophore
			FileParsingParameters params = new FileParsingParameters();			
			params.setLoadChemCompInfo(true);						
			
			// now register the parameters in the cache
			AtomCache cache = new AtomCache();			
			cache.setFileParsingParams(params);						
			StructureIO.setAtomCache(cache);
			
			
			// request a GFP protein
			Structure s1 = StructureIO.getStructure("2pxw");
			
			// and print out the internals
			System.out.println(s1.getPDBHeader().toPDB());
						
			// chromophore is at PDB residue number 66
			for ( Chain c : s1.getChains()) {
			
				System.out.println("Chain " + c.getChainID() + 
						" internal " + c.getInternalChainID() +
						" ligands " + c.getAtomLigands().size());
				System.out.println("         10        20        30        40        50        60");
				System.out.println("1234567890123456789012345678901234567890123456789012345678901234567890");
				System.out.println(c.getAtomSequence());
				
				int pos = 0 ;
				for (Group g: c.getAtomGroups()) {
					pos++;					
					System.out.println(pos + " " + g.getResidueNumber() + " " + g.getPDBName() + " " + g.getType()  + " " + g.getChemComp().getOne_letter_code() + " " + g.getChemComp().getType() );									
				}				
			}

This will give this output, note 'DYG' at position 63.

           60
		...01234567890
		...AAFDYGNRVFTEY...

DYG is an unusual group - it has 3 characters as a result of .getOne_letter_code()

	...
		62 65 PHE amino F L-PEPTIDE LINKING
		63 66 DYG amino DYG L-PEPTIDE LINKING
		64 69 ASN amino N L-PEPTIDE LINKING
	...

Microheterogeneity


Navigation: Home | Book 3: The Structure Modules | Chapter 17 : Special Cases

Prev: Chapter 15 : Protein Secondary Structure

Next: Chapter 18 : Status Information