Commit c1e8b234 authored by Raphael Müller's avatar Raphael Müller
Browse files

added more information for internal usage

parent 3f815dc4
......@@ -2,6 +2,21 @@
## Install
### Dependencies
- {+ nextflow +}
- samtools
- bedtools
- bwa
- minimap2
- qualimap
- multiqc
- r-base
- r-ggplot2
- r-dplyr
### Installation with conda
```
conda env create -f environment.yml
conda activate backmap
......@@ -111,7 +126,12 @@ $ nextflow run backmap.nf --assembly assembly.fasta --pacbio-bam-files-- pacbio1
R scripts have been improved and rewritten.
#### R All plot perl
![R All plot from the original perl pipeline][oldRall]
#### R All plot perl
![R All plot from the new nextflow pipeline][newRall]
### Resolved bugs of the perl pipeline
......@@ -123,7 +143,7 @@ While running the original pipeline with different kinds of parameters, some bug
| plot all did not work, if illumina reads were not present | yes | https://github.com/schellt/backmap/pull/5 |
| pipeline crashed on rerun due to not enforcing symlink creation | yes | https://github.com/schellt/backmap/pull/4 |
## Test
## Development/Test
### Install dev/test environments
......@@ -131,28 +151,89 @@ While running the original pipeline with different kinds of parameters, some bug
The development environment includes everything, which is needed for running the original perl pipeline, the newly created nextflow pipeline, and the snakemake test pipeline
```yaml
---
name: backmap-development
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- snakemake
- mamba
- nextflow
- perl
- perl-app-cpanminus
- samtools
- bedtools
- bwa
- minimap2
- qualimap
- multiqc
- r-base
- r-ggplot2
- r-dplyr
```
#
```
Perl packages needed for original workflow
- Cwd
- IPC::Cmd
- Number::FormatEng
- Parallel::Loops
```
# create and activate conda environment
conda env create -f environment-dev.yml
conda activate backmap-dev
conda activate backmap-development
# install perl packages and check if they are really available
env PERL5LIB="" PERL_LOCAL_LIB_ROOT="" PERL_MM_OPT="" PERL_MB_OPT="" cpanm Cwd IPC::Cmd Number::FormatEng Parallel::Loops
for i in Cwd IPC::Cmd Number::FormatEng Parallel::Loops;
do
perl -M"$i" -e 'print "Modul exists\\n";'
done
```
#### Test
#### Testing with Snakemake pipeline
##### Install
```yaml
name: backmap-test
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- snakemake
- mamba
```
##### Prepare Test environments
```
# create and activate conda environment
conda env create -f environment-test.yml
conda activate backmap-test
# create environments for all jobs
snakemake --profile sm_profile/ --conda-create-envs-only -F
# install perl packages
snakemake --profile sm_profile/ -f install_perl_packages
```
##### Run all tests
```
snakemake --profile sm_profile/ -F
```
#### Linter
TODO
# original perl script
----------------------------------------
# Original perl script
https://github.com/schellt/backmap
......@@ -176,13 +257,46 @@ Quinlan AR, Hall IM (2010). BEDTools: a flexible suite of utilities for comparin
- Rscript:
R Core Team (2019). R: A Language and Environment for Statistical Computing. <http://www.R-project.org/>
--------------------------------------------------------------------------------------
# Internal notes
There are no Nanopore reads.
## Directory
`/vol/cb/projects/112020_tbg_backmap`
## Directory structure
## Test files
The dataset for testing was brought to us via JLUBox and is now stored at
`/vol/cb/projects/112020_tbg_backmap/dataset1/Test_datasets.zip`
```
-r--r--r-- 1 rmueller cb 16G Dec 1 15:13 Test_datasets.zip
```
### Content
```
Archive: Test_datasets.zip
Length Date Time Name
--------- ---------- ----- ----
2209018648 2020-08-26 14:16 dgal_1.paired.confilter.fq.gz
2375279937 2020-08-26 14:14 dgal_2.paired.confilter.fq.gz
38668802 2020-08-26 14:14 dgal_ra_pb-target-and-other_ill-confilter.blobfilter.rmmt_sspace-lr3_lrgc3_pg3_pilon_3.fasta.gz
1659448324 2020-08-26 14:10 pacbio_gal.target-and-other.blobfilter.fastq.gz
473770619 2020-08-26 14:11 dgal.confilter.fq.gz
2760385779 2020-08-26 14:09 dgal_ra_pb-target-and-other_ill-confilter.blobfilter.rmmt_sspace-lr3_lrgc3_pg3_pilon_3.fasta.pb.sort.bam
6667551038 2020-08-26 14:13 dgal_ra_pb-target-and-other_ill-confilter.blobfilter.rmmt_sspace-lr3_lrgc3_pg3_pilon_3.fasta.sort.bam
--------- -------
16184123147 7 files
```
There are no Nanopore reads. The snakemake pipeline simulates them
| what | file |
| Purpose | File |
| ---- | ---- |
| genome assembly | dgal_ra_pb-target-and-other_ill-confilter.blobfilter.rmmt_sspace-lr3_lrgc3_pg3_pilon_3.fasta.gz |
| Genome assembly | dgal_ra_pb-target-and-other_ill-confilter.blobfilter.rmmt_sspace-lr3_lrgc3_pg3_pilon_3.fasta.gz |
| Illumina forward reads | dgal_1.paired.confilter.fq.gz |
| Illumina reverse reads | dgal_2.paired.confilter.fq.gz |
| Illumina unpaired reads | dgal.confilter.fq.gz |
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment