Welcome to pathos’s documentation!

Contents:

Overview

_images/diagram.png

Installation

With Docker:

$ docker build -t pathos .
$ docker run  <...>

Without:

curl -sSL https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /tmp/miniconda.sh && \
conda install -y python=3.6 && \
git clone https://github.com/averagehat/pathos && \
cd install && bash assume-conda-install.sh /usr/local/bin/

Configuration

The configuration file is specified with the -c option when running pathos_sheet or pathos_single. There is an example file at the bottom of the page. Particular options:

assembler: should be ray2 or abyss

the assembly options are the commandline options used with the chosen assembler.

param_file is the pathdiscov param.txt file used for the ray2/CAP assembly.

The bowtieDB and starDB filepaths refer to host index databases (they must have already been indexed by bowtie-build and STAR respectively).

The minCompressionScore of lzwfilter is the minimum “read complexity” that the filtering algorithm uses to determine to keep the read for further processing. The compression score of a read is calculated by dividing the length of the compressed read by the length of the original read.

The max_target_seqs flag of blastn tells BLAST to stop searching after finding that many sequence matches. A high number results in long BLAST runtimes.

pricefilter:
  highQualPercent: 85
  calledPercent: 90
  highQualMin: 0.98
bowtie2:
  bowtieDB: /Users/happyuser/pathos/databses/test/human-bowtie/bowtie-index
star:
  starDB: /Users/happyuser/pathos/databses/test/star-index-dir/
  skip: true
threads: 16
assembly:
  assembler: ray2
  options: " "
  kmer: 25
  minimum_contig_length: 200
cdhitdup:
  minDifference: 5
ncbi:
  ntDB: /Users/happyuser/pathos/databses/test/nt/nt
  nrDB: /Users/happyuser/pathos/databses/test/nr/nr
  ktTaxonomy: /Users/happyuser/pathos/databses/test/krona
lzwfilter:
  minCompressionScore: 0
blastn:
  max_target_seqs: 3
param_file: /Users/happyuser/pathos/param.txt
ete2_db:

Single Sample

Usage:
    pipeline.py (--fastq <FASTQS>...) --config <config> [-o <outdir>] [--log <log>] [--control <CONTROLS>...]

Options:
    -f <FASTQS>, --fastq <FASTQS>       
    -r <CONTROLS>, --control <CONTROLS>
    -c <config>, --config <config>
    --o <outdir>, --outdir <outdir>
    --log <log>, -l <log>
    
Notes: 
   For the --fastq argument, provide any number of paired read files ordered like: 
   --fastq   <Samp0_R1> <Samp0_R2> <Samp1_R1> <Samp1_R2> . . . 
   The --control argument should be provided in the same way.
   
   --log  is the file path to store log output in.

Pathos Sheet

Usage:
    pathos_sheet.py <samplesheet> --config <config> --sampledir <sampledir> [--log <log>] [--outdir <outdir>] [-p] [--qsub]

Options:
    -c <config>, --config <config>
    --o <outdir>, --outdir <outdir>
    --log <log>
    --sampledir <sampledir>
    --qsub, -q

The input samplsheet must be in TSV form with the input sample names in the first column and the control samples on the right column (if there are multiple control samples, each one should be seperated by a semi-colon.)

191\t132;133
192\t132;133
193;194\t144

Summary Creation

Usage:
    summary.py <indir> <outdir>
    
Run after `pathos_single` or `pathos_sheet` to get summary information on for that run.
<indir> is the directory with `pathos` output. 

Indices and tables