-
Notifications
You must be signed in to change notification settings - Fork 17
Output Files
bj8th edited this page Jul 27, 2021
·
16 revisions
Output files are customized based on "name" parameter provided as input. here that parameter will be called "name"
reference_tables
Output directory: outdir/name/reference_tablesfilename | description |
---|---|
ensg_gene.tsv | GENCODE gene_id to gene_name mapping |
enst_isoname.tsv | GENCODE transcript_id to transcript_name mapping |
gene_ensp.tsv | gene_name to GENCODE protein_id mapping |
gene_isoname.tsv | gene_name and transcript_name mapping |
gene_lens.tsv | Gene nucleotide length statistics |
isoname_lens.tsv | Gene isoform length information |
protein_coding_genes.txt | list of protein coding genes determined by GENCODE |
gencode_db
Output directory: outdir/name/gencode_dbfilename | description |
---|---|
gencode_isoname_cluster.tsv | Listing of GENCODE transcript_names (i.e., isonames) that create the same proteins. Reference transcript_name (arbitrarily selected) and clustered transcript_names provided |
gencode_protein.fasta | protein sequence of GENCODE, reference isonames only |
isoseq3
Output directory: outdir/name/isoseq3filename | description |
---|---|
name.collapsed.abundance.txt | Collapsed isoform abundances |
name.collapsed.fasta | Representative transcript sequence of collapsed |
name.collapsed.gff | Collapsed transcript alignment |
name.collapsed.report.json | Collapsed isoform statistics |
name.demult.lima.summary | Statistics after lima command |
name.flnc.bam | Full-length non-concatemer reads |
name.flnc.bam.pbi | Full-length non-concatemer reads |
name.flnc.filter_summary.json | Full-length non-concatemer reads summary |
star_index
Output directory: outdir/name/star_indexstar
Output directory: outdir/name/starfilename | description |
---|---|
nameSJ.out.tab | STAR results in tab format |
nameLog.final.out | Log file and summary statistics |
sqanti3
Output directory: outdir/name/sqanti3filename | description |
---|---|
name_classification.txt | SQANTI transcript classification of isoforms |
name_corrected.fasta | Transcript sequences after correction using genome sequence |
name_corrected.gtf | Alignment of corrected sequences |
name_junctions.txt | File with attribute information at splice junction level (table explaining feature meaning inside output_info). |
name_sqanti_report.pdf | PDF file showing different quality control and descriptive plots. An example can be found here |
name.params.txt | SQANTI parameters used |
sqanti3-filtered
Output directory: outdir/name/sqanti3-filteredfilename | description |
---|---|
filtered_name_classification.tsv | SQANTI classification filtered based on protein coding, percent polyA downstream, RTS stage |
filtered_name_corrected.fasta | SQANTI fasta filtered based on protein coding, percent polyA downstream, RTS stage |
filtered_name_corrected.gtf | SQANTI gtf filtered based on protein coding, percent polyA downstream, RTS stage |
name_classification.5degfilter.txt | SQANTI classification for filtered_ criteria and additionally for 5' degregation |
name_corrected.5degfilter.fasta | SQANTI fasta for filtered_ criteria and additionally for 5' degregation |
name_corrected.5degfilter.gtf | SQANTI gtf for filtered_ criteria and additionally for 5' degregation |
pacbio_6frm_gene_grouped
Output directory: outdir/name/pacbio_6frm_gene_groupedfilename | description |
---|---|
name.6frame.fasta | all possible frames (3+, 3-) of PacBio translated |
transcriptome_summary
Output directory: outdir/name/transcriptome_summaryfilename | description |
---|---|
gene_level_tab.tsv | CPM (long-read) and TPM (short-read) info provided on gene level |
pb_gene.tsv | PacBio to gene mapping |
sqanti_isoform_info.tsv | simplified SQANTI classification info |
cpat
Output directory: outdir/name/cpatfilename | description |
---|---|
CPAT_run_info.log | CPAT logging file |
name_cpat.error | CPAT error / logging info |
name_cpat.output | CPAT output info |
name.no_ORF.txt | list of PacBio isoforms that did not produce a valid ORF |
name.ORF_prob.best.tsv | ORF file, best defined by CPAT |
name.ORF_prob.tsv | All ORFs found and scored by CPAT |
name.ORF_seqs.fa | All ORF nucleotide sequences found by CPAT |
name.r | code run by CPAT to produce ORFs |
orf_calling
Output directory: outdir/name/orf_callingfilename | description |
---|---|
name_best_orf.tsv | Best ORF for each PacBio accession, as determined by algorithm |
refined_database
Output directory: outdir/name/refined_databasefilename | description |
---|---|
name_orf_refined.fasta | protein sequence of ORFs after collapsing based on transcripts producing same protein sequence |
name_orf_refined.tsv | ORF info, transcripts producing same protein collapsed |
pacbio_cds
Output directory: outdir/name/pacbio_cdsfilename | description |
---|---|
name_no_transcript_with_cds.gtf | PacBio gtf with CDS info added, transcript line not included |
name_with_cds.gtf | PacBio gtf with CDS info added |
make_pacbio_cds.log | logging file |
rename_cds
Output directory: outdir/name/rename_cdsfilename | description |
---|---|
gencode.cds_renamed_exon.gtf | GENCODE gtf file, exons removed and CDS renamed to exon |
gencode.transcript_exons_only.gtf | GENCODE gtf file, exons and transcript only |
name.cds_renamed_exon.gtf | PacBio gtf file, exons removed and CDS renamed to exon |
name.transcript_exons_only.gtf | PacBio gtf file, exons and transcript only |
sqanti_protein
Output directory: outdir/name/sqanti_proteinfilename | description |
---|---|
name_sqanti_protein_classification.tsv | splice classification data for proteins generated by PacBio |
protein_classification
Output directory: outdir/name/protein_classificationfilename | description |
---|---|
name_genes.tsv | Mapping of PacBiio accession to transcript gene and protein gene. These can be different if transcript read spans multiple genes |
name_unfiltered.protien_classification.tsv | protein classification of all PacBio proteins |
protein_gene_rename
Output directory: outdir/name/protein_gene_renamefilename | description |
---|---|
name_orf_refined_gene_update.tsv | refined database with gene name updated to reflect protein gene |
name_with_cds_refined.gtf | PacBio gtf file that includes CDS information with gene name updated to reflect protein gene |
name_protein_refined.fasta | protein fasta with gene name updated to reflect protein gene |
protein_filter
Output directory: outdir/name/protein_filterfilename | description |
---|---|
name_with_cds_filtered.gtf | GTF, filtered to remove intergenic and truncations |
name_classification_filtered.tsv | Protein classification, filtered to remove intergenic and truncations |
name.filtered_protein.fasta | protein sequences, filtered to remove intergenic and truncations |
hybrid_protein_database
Output directory: outdir/name/hybrid_protein_databaseHigh confidence: 3+CPM per gene, 1-4kb average nucleotide length of gene
filename | description |
---|---|
name_cds_high_confidence.gtf | GTF of high confidence genes |
name_high_confidence_genes.tsv | list of high confidence genes |
name_hybrid.fasta | sequence information of high confidence PacBio and Gencode genes |
name_refined_high_confidence.tsv | high confidence ORF metadata |
metamorpheus
Database Information
database | directory | database name in files |
---|---|---|
GENCODE | gencode | Gencode |
UniProt | uniprot | UniProt |
PacBio Filtered | pacbio/filtered | filtered |
PacBio Refined | pacbio/refined | refined |
PacBio Hybrid | pacbio/hybrid | hybrid |
PacBio Rescue & Resolve | pacbio/resue_resolve | rescue_resolve |
toml files
directory database/toml
filename | description |
---|---|
CalibrationTask.toml | not used |
GlycoSearchTask.toml | not used |
GptmdTask.toml | not used |
SearchTask.toml | Metamorpheus run parameters |
XLSearchTask.toml | not used |
Search Results Files In search_results/Task1SearchTask
filename | description |
---|---|
AllPSMs.psmtsv | PSM's found |
AllQuantifiedPeaks.tsv | quantified peaks found |
prose.txt | Run information |
AllPSMs_FormattedForPercolator.tab | PSM's found in Percolator format |
AllQuantifiedPeptides.tsv | quantified peptides |
results.txt | summary statistics |
AllPeptides.database.psmtsv | peptides found |
AllQuantifiedProteinGroups.database.tsv | protein groups found |
peptide_analysis
Output directory: outdir/name/peptide_analysisfilename | description |
---|---|
gc_pb_overlap_peptides.tsv | overlap of GENCODE peptides with theoretical peptides that could be found in Pacbio databases |
track_visualization
Output directory: outdir/name/track_visualization/reference
filename | description |
---|---|
gencode_shaded.bed12 | GENCODE bed alignment colored |
gencode.filtered.gtf | GENCODE alignment |
Output directory: outdir/name/track_visualization/database
database | database name |
---|---|
PacBio Refined | refined |
PacBio Filtered | filtered |
PacBio Hybrid | hybrid |
peptide
filename | description |
---|---|
name_database_peptides.bed12 | peptide bed alignment |
name_database_peptides.gtf | peptide gtf alignment |
name_database_shaded_peptides.bed12 | peptide bed alignment, shaded green |
protein
filename | description |
---|---|
name_hybrid_shaded_cpm.bed12 | protein alignment, shaded by transcript abundance (CPM) |
name_hybrid_shaded_protein_class.bed12 | protein alignment, shaded by protein classification |
accession_mapping
Output directory: outdir/name/accession_mappingfilename | description |
---|---|
accession_map_gencode_uniprot_pacbio.tsv | accession mapping between GENCODE, UniProt and Pacbio |
accession_map_stats.tsv | frequency between database overlap |
protein_group_compare
Output directory: outdir/name/protein_group_comparefilename | description |
---|---|
ProteinInference_GENCODE_PacBio_comparisons.xlsx | protein inference overlap between GENCODE and PacBio Hybrid |
ProteinInference_UniProt_PacBio_comparisons.xlsx | protein inference overlap between UniProt and PacBio Hybrid |
ProteinInference_GENCODE_UniProt_comparisons.xlsx | protein inference overlap between GENCODE and UniProt |
novel_peptides
Output directory: outdir/name/novel_peptidesfilename | description |
---|---|
name_database.pacbio_novel_peptides_to_gencode.tsv | novel peptides found in PacBio compared to GENCODE database |
name_database.pacbio_novel_peptides_to_uniprot.tsv | novel peptides found in PacBio compared to UniProt database |
name_database.pacbio_novel_peptides.tsv | novel peptides found in PacBio compared to GENCODE and UniProt databases |
Sheynkman-Lab