Welcome to pathos’s documentation!¶
Contents:
Overview¶

Installation¶
With Docker:
$ docker build -t pathos .
$ docker run <...>
Without:
curl -sSL https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /tmp/miniconda.sh && \
conda install -y python=3.6 && \
git clone https://github.com/averagehat/pathos && \
cd install && bash assume-conda-install.sh /usr/local/bin/
Configuration¶
The configuration file is specified with the -c option when running pathos_sheet or pathos_single. There is an example file at the bottom of the page. Particular options:
assembler: should be ray2 or abyss
the assembly options are the commandline options used with the chosen assembler.
param_file is the pathdiscov param.txt file used for the ray2/CAP assembly.
The bowtieDB and starDB filepaths refer to host index databases (they must have already been indexed by bowtie-build and STAR respectively).
The minCompressionScore of lzwfilter is the minimum “read complexity” that the filtering algorithm uses to determine to keep the read for further processing. The compression score of a read is calculated by dividing the length of the compressed read by the length of the original read.
The max_target_seqs flag of blastn tells BLAST to stop searching after finding that many sequence matches. A high number results in long BLAST runtimes.
pricefilter:
highQualPercent: 85
calledPercent: 90
highQualMin: 0.98
bowtie2:
bowtieDB: /Users/happyuser/pathos/databses/test/human-bowtie/bowtie-index
star:
starDB: /Users/happyuser/pathos/databses/test/star-index-dir/
skip: true
threads: 16
assembly:
assembler: ray2
options: " "
kmer: 25
minimum_contig_length: 200
cdhitdup:
minDifference: 5
ncbi:
ntDB: /Users/happyuser/pathos/databses/test/nt/nt
nrDB: /Users/happyuser/pathos/databses/test/nr/nr
ktTaxonomy: /Users/happyuser/pathos/databses/test/krona
lzwfilter:
minCompressionScore: 0
blastn:
max_target_seqs: 3
param_file: /Users/happyuser/pathos/param.txt
ete2_db:
Single Sample¶
Usage:
pipeline.py (--fastq <FASTQS>...) --config <config> [-o <outdir>] [--log <log>] [--control <CONTROLS>...]
Options:
-f <FASTQS>, --fastq <FASTQS>
-r <CONTROLS>, --control <CONTROLS>
-c <config>, --config <config>
--o <outdir>, --outdir <outdir>
--log <log>, -l <log>
Notes:
For the --fastq argument, provide any number of paired read files ordered like:
--fastq <Samp0_R1> <Samp0_R2> <Samp1_R1> <Samp1_R2> . . .
The --control argument should be provided in the same way.
--log is the file path to store log output in.
Pathos Sheet¶
Usage:
pathos_sheet.py <samplesheet> --config <config> --sampledir <sampledir> [--log <log>] [--outdir <outdir>] [-p] [--qsub]
Options:
-c <config>, --config <config>
--o <outdir>, --outdir <outdir>
--log <log>
--sampledir <sampledir>
--qsub, -q
The input samplsheet must be in TSV form with the input sample names in the first column and the control samples on the right column (if there are multiple control samples, each one should be seperated by a semi-colon.)
191\t132;133
192\t132;133
193;194\t144
Summary Creation¶
Usage:
summary.py <indir> <outdir>
Run after `pathos_single` or `pathos_sheet` to get summary information on for that run.
<indir> is the directory with `pathos` output.