The AFPChain
data structure was designed to store pairwise structural
alignments. The class functions as a bean, and contains many variables
used internally by the alignment algorithms implemented in biojava.
Some of the important stored variables are:
- Algorithm Name
- Optimal Alignment: described later.
- Optimal RMSD: final and total RMSD value of the alignment.
- TM-score
- BlockRotationMatrix: rotation component of the superposition transformation.
- BlockShiftVector: translation component of the superposition transformation.
BioJava class: org.biojava.bio.structure.align.model.AFPChain
The residue equivalencies of the alignment (EQRs) are described in the optimal alignment variable, a triple array of integers, where the indices stand for:
int[][][] optAln = afpChain.getOptAln();
int residue = optAln[block][chain][eqr];
- block: the blocks divide the alignment into different parts. The division can be due to non-topological rearrangements (e.g. circular permutations) or due to flexible parts (e.g. domain switch). There can be any number of blocks in a structural alignment, defined by the structure alignment algorithm.
- chain: in a pairwise alignment there are only two chains, or structures.
- eqr: EQR stands for equivalent residue position, i.e. the alignment position. There are as many positions (EQRs) in a block as the length of the alignment block, and their number is equal for any of the two chains in the same block.
In each entry (combination of the three indices described above) an integer is stored, which corresponds to the residue index in the specified chain, i.e. the index in the Atom array of the chain. In between the same block, the stored integers (residues) are always in increasing order.
Some examples of how to get the basic properties of an AFPChain
:
afpChain.getAlgorithmName(); //Name of the algorithm that generated the alignment
afpChain.getBlockNum(); //Number of blocks
afpChain.getTMScore(); //TM-score
afpChain.getTotalRmsdOpt() //Optimal RMSD
afpChain.getBlockRotationMatrix()[0] //get the rotation matrix of the first block
afpChain.getBlockShiftVector()[0] //get the translation vector of the first block
As an overview, the AFPChain
data model:
- Only supports pairwise alignments, i.e. two chains or structures aligned.
- Can support flexible alignments and non-topological alignments. However, their combinatation (a flexible alignment with topological rearrangements) can not be represented, because the blocks mean either one or the other.
- Can not support non-sequential alignments, or they would require a new block for each EQR, because sequentiality of the residues is assumed inside each block.
Since BioJava 4.1.0, a new data model is available to store structure alignments.
The MultipleAlignment
data structure is a general model that supports any of the
following properties, and any combination:
- Multiple structures: the model is no longer restricted to pairwise alignments.
- Non-topological alignments: such as circular permutations or domain rearrangements.
- Flexible alignments: parts of the alignment with different superposition transformation.
In addtition, the data structure is not limited in the number and types of scores it can store, because the scores are stored in a key:value fashion, as it will be described later.
BioJava class: org.biojava.bio.structure.align.multiple.MultipleAlignment
The biggest difference with AFPChain
is that the MultipleAlignment
data
structure is object oriented.
The hierarchy of sub-objects is represented below:
MultipleAlignmentEnsemble | MultipleAlignment(s) | BlockSet(s) | Block(s)
-
MultipleAlignmentEnsemble: the ensemble is the top level of the hierarchy. As a top level, it stores information regarding creation properties (algorithm, version, creation time, etc.), the structures involved in the alignment (Atoms, structure identifiers, etc.) and cached variables (atomic distance matrices). It contains a collection of
MultipleAlignment
that share the same properties stored in the ensemble. This construction allows the storage of alternative alignments inside the same data structure. -
MultipleAlignment: the
MultipleAlignment
stores the core information of a multiple structure alignment. It is designed to be the return type of the multiple structure alignment algorithms. The object contains a collection ofBlockSet
and it is linked to its parentMultipleAlignmentEnsemble
. -
BlockSet: the
BlockSet
stores a flexible part of a multiple structure alignment. A flexible part needs the residue equivalencies involved, contained in a collection ofBlock
, and a transformation matrix for every structure that describes the 3D superposition of all structures. It is linked to its parentMultipleAlignment
. -
Block: the
Block
stores the aligned positions (equivalent residues) of aBlockSet
that are in sequentially increasing order. EachBlock
represents a sequential part of a non-topological alignment, if more than oneBlock
is present. It is linked to its parentBlockSet
.
In the MultipleAlignment
data structure the aligned residues are stored in a
double List for every Block
. The indices of the double List are the following:
List<List<Integer>> optAln = block.getAlnRes();
Integer residue = optAln.get(chain).get(eqr);
The indices mean the same as in the optimal alignment of the AFPChain
, just to
remember them:
- chain: chain or structure index.
- eqr: EQR stands for equivalent residue position, i.e. the alignment position. There are as many positions (EQRs) in a block as the length of the alignment block, and their number is equal for any of the chains in the same block.
As in AFPChain
, each entry (combination of the two indices described above)
is an Integer that corresponds to the residue index in the specified chain, i.e.
the index in the Atom array of the chain. Caution has to be taken in the code,
because a MultipleAlignment
can contain gaps, which are represented as null
in the List entries.
All the objects in the hierarchy levels implement the ScoresCache
interface.
This interface allows the storage of any number of scores as a key:value set.
The key is a String
that describes the score and used to recover it after,
and the value is a double with the calculated score. The interface has only
two methods: putScore and getScore.
The following lines of code are an example on how to do score manipulations
on a MultipleAlignment
:
//Put a score into the alignment and get it back
alignment.putScore('myRMSD', 1.234);
double myRMSD = alignment.getScore('myRMSD');
BlockSet bs = alignment.getBlockSets().get(0);
//The same can be done for BlockSets
alignment.putScore('bsRMSD', 1.234);
double bsRMSD = alignment.getScore('bsRMSD');
Some classes are designed to contain utility methods for manipulating a MultipleAlignment
object.
The most important ones are ennumerated and briefly described below:
-
MultipleAlignmentScorer: contains frequent names for scores and methods to calculate them.
-
MultipleAlignmentTools: contains helper methods, such as sequence alignment calculation, transform atom arrays of the structures or calculate aligned residue distances between all structures.
-
MultipleAlignmentWriter: contains methods to generate different types of String outputs of the alignment, e.g. FASTA, XML, FatCat.
-
MultipleSuperimposer: interface for implementations that calculate the structure superpositions of the alignment. Some examples of implementations are the ReferenceSuperimposer (superimposes all the structures to a reference) and the CoreSuperimposer (only uses EQRs present in all structures, without gaps, to superimpose them).
-
MultipleAlignmentXMLParser: contains a method to create a
MultipleAlignment
object from an XML file representation.
As an overview, the MultipleAlignment
data model:
- Supports any number of aligned structures, multiple structures.
- Can support flexible alignments and non-topological alignments, and any of their combinatations (e.g. a flexible alignment with topological rearrangements).
- Can not support non-sequential alignments, or they would require a new
Block
for each EQR, because sequentiality of the residues is a requirement for eachBlock
. - Can store any score in any of the four object hierarchy level, making it easy to adapt to new requirements and algorithms.
For more examples and information about the MultipleAlignment
data structure
go to the Demo package on the biojava-structure module or look through the interface
files, where the javadoc explanations can be found.
The conversion from an AFPChain
to a MultipleAlignment
is possible trough the
ensemble constructor. An example on how to do it programatically is below:
AFPChain afpChain;
Atom[] chain1;
Atom[] chain2;
boolean flexible = false;
MultipleAlignmentEnsemble ensemble = new MultipleAlignmentEnsemble(afpChain, chain1, chain2, false);
MultipleAlignment converted = ensemble.getMultipleAlignment(0);
There is no method to convert from a MultipleAlignment
to an AFPChain
, because
the first representation supports any number of structures, while the second is
only supporting pairwise alignments. However, the conversion can be done with some
lines of code if needed (instantiate a new AFPChain
and copy one by one the
properties that can be represented from the MultipleAlignment
).
===
Go back to Chapter 8 : Structure Alignments.