Skip to content

Latest commit

 

History

History
68 lines (44 loc) · 1.97 KB

sequences.md

File metadata and controls

68 lines (44 loc) · 1.97 KB

Sequences in BioJava

BioJava supports a number of basic biological sequence types: DNA, RNA, and protein sequences.

Create a basic sequence object

Create a DNA sequence

    DNASequence seq = new DNASequence("GTAC"); 

In addition to the basic DNA sequence class there are specialized classes that extend DNASequence: ChromosomeSequence, GeneSequence, IntronSequence, ExonSequence, TranscriptSequence

Create a RNA sequence

    RNASequence seq = new RNASequence("GUAC"); 

Create a protein sequence

    ProteinSequence seq = new ProteinSequence("MSTNPKPQRKTKRNTNRRPQDVKFPGG"); 

Ambiguity codes

In particular when dealing with nucleotide sequences, sometimes the exact nucleotides are not known. BioJava supports standard conventions for dealing with such ambiguity. For example to represent the nucleotides "A or T" often "W" is getting used. The expected set of compounds in a sequence by default is strict, however it takes only one line of code to switch to supporting ambiguity codes.

        // this throws an error
        DNASequence dna2 = new DNASequence("WWW");

        // however this works:
        AmbiguityDNACompoundSet ambiguityDNACompoundSet = AmbiguityDNACompoundSet.getDNACompoundSet();
        DNASequence dna2 = new DNASequence("WWW",ambiguityDNACompoundSet);

Protein sequences and ambiguity

The default AminoAcidCompoundSet already supports "Asparagine or Aspartic acid" and related ambiguities. It also contains support for Selenocysteine and Pyrrolysine

More details

See the Cookbook for [more details on dealing with sequences] (http://biojava.org/wiki/BioJava:CookBook:Core:Overview)


Navigation: Home | Book 1: The Core Module | Chapter 2 : Basic Sequence types

Prev: Chapter 1 : Installation

Next: Chapter 3 : Reading and Writing sequences