Commit 25178981 authored by Raphael Müller's avatar Raphael Müller
Browse files

rewrite README and add pdfs

parent 1c51bdad
# (new) Nextflow pipeline
# Backmap Nextflow pipeline
## Install
```
conda env create -f environment.yml
conda activate backmap
```
......@@ -58,7 +57,7 @@ Optional Options: [default]
Will be created if not existing
--prefix STR Prefix of output files
--keep-temporary Keep temporary, intermediate files [false]
--threads INT Maximum number of threads per process [1]
--threads INT Maximum number of threads per process [1]
This is not the total number of available threads!
--skip-quality-control Skip qualimap bamqc run [false]
......@@ -79,55 +78,79 @@ Optional Options: [default]
```
## Test
## Changes to the original perl pipeline
### Install dev/test environments
### Parameters
#### Development
- {+ all options now have a long version, short version is currently not available +}
- {+ added option `--illumina-bam-files`, `--pacbio-bam-files`, `--nanopore-bam-files`, which substitute `-b` +}
- {+ `--debug` prints internal variables for faster development and bug investigations +}
- {+ `--threads` sets the maximum number of threads per process (not maximum number of threads of the whole workflow) +}
- {- removed `-b`:splitted into three different options -}
- {- removed `-sort`: every bam is now sorted -}
- {- removed `-v`: removed, because this is a nextflow feature -}
- {- removed `-dry-run` -}
- {- removed `-t`: handled by nextflow -}
### Multiple files of one kind
Multiple files of one kind, e.g., three different PacBio runs, are now given as one string with each run comma separated instead of giving one parameter multiple times.
```
conda env create -f environment-dev.yml
conda activate backmap-dev
$ nextflow run backmap.nf --assembly assembly.fasta --pacbio pacbio1.fq,pacbio2.fq,pacbio3.fq
```
#### Snakemake
### Genome size estimation with bam files
Genome size estimation with bam files can now be done if assembly is given
```
conda env create -f environment-test.yml
conda activate backmap-test
snakemake --profile sm_profile/ --conda-create-envs-only -F
snakemake --profile sm_profile/ -f install_perl_packages
snakemake --profile sm_profile/ -F
$ nextflow run backmap.nf --assembly assembly.fasta --pacbio-bam-files-- pacbio1.bam,pacbio2.bam,pacbio3.bam
```
## What's new? What did change?
### Improved R scripts
### new options
R scripts have been improved and rewritten.
1. all options now have a long version, short version is currently not available
1. `--illumina-bam-files`, `--pacbio-bam-files`, `--nanopore-bam-files` instead of `-b`
1. `--debug` prints internal variables for faster development and bug investigations
1. `--threads` sets the maximum number of threads per process (not maximum number of threads of the whole workflow)
![R All plot from the original perl pipeline][oldRall]
![R All plot from the new nextflow pipeline][newRall]
### options not available anymore
### Resolved bugs of the perl pipeline
1. `-b` -> splitted into three different options
1. `-sort` -> every bam is now sorted
1. `-v` -> removed, because this is a nextflow feature
1. `-dry-run` -> removed
1. `-t` -> removed, because this is handled by nextflow
While running the original pipeline with different kinds of parameters, some bugs appeared
| Issue | Solved? | PR |
| --------------------------------------------------------------- | ------- | ------------------------------------------|
| keep temporary option did not work properly | yes | https://github.com/schellt/backmap/pull/1 |
| plot all did not work, if illumina reads were not present | yes | https://github.com/schellt/backmap/pull/5 |
| pipeline crashed on rerun due to not enforcing symlink creation | yes | https://github.com/schellt/backmap/pull/4 |
### multifile call
## Test
Multi files of one kind, e.g., three different pacbio runs, are now given as one string with each run comma separated instead of giving one parameter multiple times.
### Install dev/test environments
### Genome size estimation with bam files
#### Development
Genome size estimation with bam files can now be done if assembly is given
The development environment includes everything, which is needed for running the original perl pipeline, the newly created nextflow pipeline, and the snakemake test pipeline
### Improved R scripts
```
#
```
```
conda env create -f environment-dev.yml
conda activate backmap-dev
```
#### Test
```
conda env create -f environment-test.yml
conda activate backmap-test
snakemake --profile sm_profile/ --conda-create-envs-only -F
snakemake --profile sm_profile/ -f install_perl_packages
snakemake --profile sm_profile/ -F
```
R scripts have been improved and rewritten
# original perl script
......@@ -153,7 +176,7 @@ Quinlan AR, Hall IM (2010). BEDTools: a flexible suite of utilities for comparin
- Rscript:
R Core Team (2019). R: A Language and Environment for Statistical Computing. <http://www.R-project.org/>
# testdata
# Internal notes
There are no Nanopore reads.
......@@ -166,3 +189,6 @@ There are no Nanopore reads.
| PacBio reads | pacbio_gal.target-and-other.blobfilter.fastq.gz |
| Illumina mapping | dgal_ra_pb-target-and-other_ill-confilter.blobfilter.rmmt_sspace-lr3_lrgc3_pg3_pilon_3.fasta.sort.bam |
| PacBio mapping | dgal_ra_pb-target-and-other_ill-confilter.blobfilter.rmmt_sspace-lr3_lrgc3_pg3_pilon_3.fasta.pb.sort.bam |
[oldRall]: img/RallOld.pdf
[newRall]: img/RallNew.pdf
File added
File added
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment