BioJava supports a number of basic biological sequence types: DNA, RNA, and protein sequences.
Create a DNA sequence
DNASequence seq = new DNASequence("GTAC");
In addition to the basic DNA sequence class there are specialized classes that extend DNASequence: ChromosomeSequence, GeneSequence, IntronSequence, ExonSequence, TranscriptSequence
Create a RNA sequence
RNASequence seq = new RNASequence("GUAC");
Create a protein sequence
ProteinSequence seq = new ProteinSequence("MSTNPKPQRKTKRNTNRRPQDVKFPGG");
In particular when dealing with nucleotide sequences, sometimes the exact nucleotides are not known. BioJava supports standard conventions for dealing with such ambiguity. For example to represent the nucleotides "A or T" often "W" is getting used. The expected set of compounds in a sequence by default is strict, however it takes only one line of code to switch to supporting ambiguity codes.
// this throws an error
DNASequence dna2 = new DNASequence("WWW");
// however this works:
AmbiguityDNACompoundSet ambiguityDNACompoundSet = AmbiguityDNACompoundSet.getDNACompoundSet();
DNASequence dna2 = new DNASequence("WWW",ambiguityDNACompoundSet);
The default AminoAcidCompoundSet already supports "Asparagine or Aspartic acid" and related ambiguities. It also contains support for Selenocysteine and Pyrrolysine
See the Cookbook for [more details on dealing with sequences] (http://biojava.org/wiki/BioJava:CookBook:Core:Overview)
Navigation: Home | Book 1: The Core Module | Chapter 2 : Basic Sequence types
Prev: Chapter 1 : Installation