Biojava provides access to a number of external structural databases. These often use caching to reduce the amount of data which must be downloaded from the database.
![]() |
(Top) The structure 1dan contains four chains.
|
The Structural Classification of Proteins (SCOP) is a manually curated classification of protein structural domains. It provides two pieces of data:
- The breakdown of a protein into structural domains
- A classification of domains according to their structure.
The structure for a known SCOP domain can be fetched via its 7-letter domain ID (eg 'd2bq6a1') via StructureIO.getStructure()
, as described in [Local PDB Installations](caching.md#Caching of other SCOP, CATH).
The SCOP classification can be accessed through the ScopDatabase
class.
ScopDatabase scop = ScopFactory.getSCOP();
A list of domains can be retrieved for a given protein.
List<ScopDomain> domains = scop.getDomainsForPDB("4HHB");
You can get lots of useful information from the ScopDomain
object.
ScopDomain domain = domains.get(0);
String scopID = domain.getScopId(); // d4hhba_
String classification = domain.getClassificationId(); // a.1.1.2
int sunId = domain.getSunId(); // 15251
The full hierarchy is available as a tree of ScopNode
s, which can be easily traversed using their getParentSunid()
and getChildren()
methods.
ScopNode node = scop.getScopNode(sunId);
while (node != null){
System.out.println(scop.getScopDescriptionBySunid(node.getSunid()));
node = scop.getScopNode(node.getParentSunid());
}
ScopDatabase also provides access to all nodes at a particular level.
List<ScopDescription> superfams = scop.getByCategory(ScopCategory.Superfamily);
System.out.println("Total nr. of superfamilies:" + superfams.size());
Several types of ScopDatabase
are available. These can be instantiated manually when more control is needed.
- RemoteScopInstallation (default) Fetches data one node at a time from the internet. Useful when perfoming a small number of operations.
- ScopeInstallation Downloads all SCOP data as a batch and caches it for later use. Much faster when performing many operations.
Several internal BioJava classes use ScopFactory.getSCOP()
when they encounter references to SCOP domains, so it is always a good idea to notify the ScopFactory
when using a custom ScopDatabase
instance.
ScopDatabase scop = new ScopInstallation();
ScopFactory.setScopDatabase(scop);
Several versions of SCOP are available.
// Use Steven Brenner's updated version of SCOP
scop = ScopFactory.getSCOP(ScopFactory.VERSION_1_75C);
// Use an old version globally, perhaps for an older benchmark
ScopFactory.setScopDatabase(ScopFactory.VERSION_1_69);
Cath can be accessed in a very similar fashion to SCOP. In parallel to the ScopInstallation class, there is a CathInstallation. Also, the StructureIO class allows to request by CATH ID.
private static final String DEFAULT_SCRIPT ="select * ; cartoon on; spacefill off; wireframe off; select ligands; wireframe on; spacefill on;";
private static final String[] colors = new String[]{"red","green","blue","yellow"};
public static void main(String args[]){
UserConfiguration config = new UserConfiguration();
config.setPdbFilePath("/tmp/");
String pdbID = "1DAN";
CathDatabase cath = new CathInstallation(config.getPdbFilePath());
List<CathDomain> domains = cath.getDomainsForPdb(pdbID);
try {
// show the structure in 3D
BiojavaJmol jmol = new BiojavaJmol();
jmol.setStructure(StructureIO.getStructure(pdbID));
jmol.evalString(DEFAULT_SCRIPT);
System.out.println("got " + domains.size() + " domains");
// now color the domains on the structure
int colorpos = -1;
for ( CathDomain domain : domains){
colorpos++;
showDomain(jmol, domain,colorpos);
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private static void showDomain(BiojavaJmol jmol, CathDomain domain, int colorpos) {
List<CathSegment> segments = domain.getSegments();
StructureName key = new StructureName(domain.getDomainName());
String chainId = key.getChainId();
String color = colors[colorpos];
System.out.println(" * domain " + domain.getDomainName() + " has # segments: " + domain.getSegments().size() + " color: " + color);
for ( CathSegment segment : segments){
System.out.println(" * " + segment);
String start = segment.getStart();
String stop = segment.getStop();
String script = "select " + start + "-" + stop+":"+chainId + "; color " + color +";";
jmol.evalString(script );
}
}
<td>
and the text:
</td>
</tr>
<tr>
<td>
<img src="img/cath_1dan.png" width=300 />
</td>
<td>
<pre>
got 4 domains
- domain 1danH01 has # segments: 2 color: red
- CathSegment [segmentId=1, start=16, stop=27, length=12, sequenceHeader=null, sequence=null]
- CathSegment [segmentId=2, start=121, stop=232, length=112, sequenceHeader=null, sequence=null]
- domain 1danH02 has # segments: 2 color: green
- CathSegment [segmentId=1, start=28, stop=120, length=93, sequenceHeader=null, sequence=null]
- CathSegment [segmentId=2, start=233, stop=246, length=14, sequenceHeader=null, sequence=null]
- domain 1danU00 has # segments: 1 color: blue
- CathSegment [segmentId=1, start=91, stop=210, length=120, sequenceHeader=null, sequence=null]
- domain 1danT00 has # segments: 1 color: yellow
- CathSegment [segmentId=1, start=6, stop=80, length=75, sequenceHeader=null, sequence=null]
This will show the following |
Navigation: Home | Book 3: The Structure Modules | Chapter 10 : External Databases