Commit d5daad7f authored by Raphael Müller's avatar Raphael Müller
Browse files

Merge branch 'feature_css_reads' into 'master'

Feature: PacBio CCS Reads

See merge request !3
parents 2874ec5a dc5bbee2
......@@ -25,71 +25,74 @@ conda activate backmap
## Usage
```
backmap v0.4
backmap v0.5
Description:
Automatic mapping of paired, unpaired, PacBio and Nanopore reads to an
assembly, execution of qualimap bamqc, multiqc and estimation of genome size
from mapped nucleotides and peak coverage.
With the provided conda environment configuration file (environment.yml),
a conda environment with all necessary tools can be created.
Automatic mapping of paired, unpaired, PacBio and Nanopore reads to an
assembly, execution of qualimap bamqc, multiqc and estimation of genome size
from mapped nucleotides and peak coverage.
With the provided conda environment configuration file (environment.yml),
a conda environment with all necessary tools can be created.
Be aware that this pipeline was transferred to a nextflow pipeline with major
changes to version v0.3.
Be aware that this pipeline was transferred to a nextflow pipeline with major
changes from version v0.3 to v.0.4.
Usage:
With mapping:
nextflow run backmap.nf --assembly assembly.fa
[--single-end <unpaired.fq>[,unpaired2.fq,...]]
[--paired-end <paired1.1.fq>,<paired1.2.fq>[,<paired2.1.fq>,<paired2.2.fq>,...]]
[--pacbio <pacbio.fq>[,<pacbio2.fq>,...]]
[--nanopore <nanopore.fq>[,<nanopore2.fq>,...]]
Without mapping:
nextflow run backmap.nf [--assembly assembly.fa]
[--illumina-bam-files illumina.bam[,illumina2.bam,...]]
[--pacbio-bam-files pacbio.bam[,pacbio2.bam,...]]
[--nanopore-bam-files nanopore.bam[,nanopore2.bam,...]]
With mapping:
nextflow run backmap.nf --assembly assembly.fa
[--single-end <unpaired.fq>[,unpaired2.fq,...]]
[--paired-end <paired1.1.fq>,<paired1.2.fq>[,<paired2.1.fq>,<paired2.2.fq>,...]]
[--pacbio <pacbio.fq>[,<pacbio2.fq>,...]]
[--pacbio-ccs <pacbio_ccs.fq>[,<pacbio_ccs2.fq>,...]]
[--nanopore <nanopore.fq>[,<nanopore2.fq>,...]]
Without mapping:
nextflow run backmap.nf [--assembly assembly.fa]
[--illumina-bam-files illumina.bam[,illumina2.bam,...]]
[--pacbio-bam-files pacbio.bam[,pacbio2.bam,...]]
[--pacbio-ccs-bam-files pacbio_ccs.bam[,pacbio_ccs2.bam,...]]
[--nanopore-bam-files nanopore.bam[,nanopore2.bam,...]]
Mandatory options:
--assembly STR Assembly were reads should mapped to in fasta format
AND AT LEAST ONE OF
--single-end STR Fastq files with unpaired Illumina reads, comma separated
--paired-end STR Two fastq files with paired Illumina reads, comma separated
--pacbio STR Fasta or fastq file with PacBio reads, comma separated
--nanopore STR Fasta or fastq file with Nanopore reads, comma separated
OR
--illumina-bam-files STR Premapped Illumina Reads in bam format, comma separated
--pacbio-bam-files STR Premapped Pacbio Reads in bam format, comma separated
--nanopore-bam-files STR Premapped Nanopore Reads in bam format, comma separated
Skips read mapping
Genome size estimation only possible if assembly is given
--assembly STR Assembly were reads should mapped to in fasta format
AND AT LEAST ONE OF
--single-end STR Fastq files with unpaired Illumina reads, comma separated
--paired-end STR Two fastq files with paired Illumina reads, comma separated
--pacbio STR Fasta or fastq file with PacBio reads (CLR), comma separated
--pacbio-ccs STR Fasta or fastq file with PacBio CCS reads, comma separated
--nanopore STR Fasta or fastq file with Nanopore reads, comma separated
OR
--illumina-bam-files STR Premapped Illumina Reads in bam format, comma separated
--pacbio-bam-files STR Premapped Pacbio Reads in bam format, comma separated
--pacbio-ccs-bam-files STR Premapped Pacbio CCS Reads in bam formac, comma separated
--nanopore-bam-files STR Premapped Nanopore Reads in bam format, comma separated
Skips read mapping
Genome size estimation only possible if assembly is given
Optional Options: [default]
--file-separator STR Overrides file separator [,]
--output STR Output directory [./Results/]
Will be created if not existing
--prefix STR Prefix of output files [tbg_backmap]
--keep-temporary Keep temporary, intermediate files [false]
--threads INT Maximum number of threads per process [1]
This is not the total number of available threads!
--skip-quality-control Skip qualimap bamqc run [false]
--skip-coverage-histogram Skip creation of coverage histograms [false]
--skip-genome-size-estimation Skip genome size estimation [false]
genome size estimation is only possible, if assembly is given
--bwa-options STR Options passed to bwa [-a -c 10000]
--minimap-pacbio-options STR Options passed to minimap ["-H"]
--minimap-nanopore-options STR Options passes to nanopore minimap runs [""]
--qualimap-options STR Options passed to qualimap [""]
Pass options with quotes e.g. --bwa-options "<options>"
--help Print this help and exit
--version Print version number and exit
--debug Print inner variables
--file-separator STR Overrides file separator [,]
--output STR Output directory [./Results/]
Will be created if not existing
--prefix STR Prefix of output files [tbg_backmap]
--keep-temporary Keep temporary, intermediate files [false]
--threads INT Maximum number of threads per process [1]
This is not the total number of available threads!
--skip-quality-control Skip qualimap bamqc run [false]
--skip-coverage-histogram Skip creation of coverage histograms [false]
--skip-genome-size-estimation Skip genome size estimation [false]
genome size estimation is only possible, if assembly is given
--bwa-options STR Options passed to bwa [-a -c 10000]
--minimap-pacbio-options STR Options passed to pacbio CLR minimap runs ["-H"]
--minimap-pacbio-ccs-options STR Options passed to pacbio CCS minimap runs [""]
--minimap-nanopore-options STR Options passes to nanopore minimap runs [""]
--qualimap-options STR Options passed to qualimap [""]
Pass options with quotes e.g. --bwa-options "<options>"
--help Print this help and exit
--version Print version number and exit
--debug Print inner variables
```
......@@ -99,6 +102,7 @@ Optional Options: [default]
graph TD;
Assembly[[Assembly]] --> BWAIndex{bwa index};
Assembly --> MinimapP{minimap2};
Assembly --> MinimapPCCS{minimap2};
Assembly --> MinimapN{minimap2};
BWAI --> BWAMemSE{bwa mem};
BWAI --> BWAMemPE{bwa mem};
......@@ -114,10 +118,14 @@ graph TD;
end
subgraph PacBio
PB[[PacBio Reads]] --> MinimapP;
PB[[PacBio CLR Reads]] --> MinimapP;
PBCCS[[PacBio CCS Reads]] --> MinimapPCCS;
MinimapP --> MBam[[Pacbio Mapped Reads]] --> SMP{samtools merge};
MinimapPCCS --> MPCCSBam[[Pacbio CCS Mapped Reads]] --> SMPCCS{samtools merge};
SMP --> PMer[Pacbio Merged Bam Files] --> SSP{samtools sort};
SMPCCS --> PCCSMer[Pacbio CCS Merged Bam Files] --> SSPCCS{samtools sort};
SSP --> PSor[Pacbio Sorted Bam Files];
SSPCCS --> PCCSSor[Pacbio CCS Sorted Bam Files];
end
subgraph Nanopore
......@@ -129,17 +137,20 @@ graph TD;
ISor --> QMI{qualimap} --> MultiQC{{multiqc}}
PSor --> QMI
PCCSSor --> QMI
NSor --> QMI
ISor --> bedI{bedtools genomecov} --> RCovSingleI{{"R Coverage Plot (single)"}}
PSor --> bedI
PCCSSor --> bedI
NSor --> bedI
bedI --> RCovMulti{{"R Coverage Plot (multi)"}}
ISor --> SSTI{{samtools stat}}
PSor --> SSTI
PCCSSor --> SSTI
NSor --> SSTI
bedI --> GSEI{{Genome Size Estimation}}
......@@ -153,6 +164,7 @@ graph TD;
- {+ all options now have a long version, short version is currently not available +}
- {+ added option \`--illumina-bam-files\`, \`--pacbio-bam-files\`, \`--nanopore-bam-files\`, which substitute \`-b\` +}
- {+ added option \`--pacbio-ccs-bam-files\`, \`--pacbio-ccs\` +}
- {+ \`--debug\` prints internal variables for faster development and bug investigations +}
- {+ \`--threads\` sets the maximum number of threads per process (not maximum number of threads of the whole workflow) +}
- {- removed \`-b\`:splitted into three different options -}
......
#!/usr/bin/env groovy
final String VERSION = 'v0.4'
final String VERSION = 'v0.5'
final String NO_FILE = 'NO_FILE'
final int PROCESSMAXTHREADS = 8
final String HELP = """
......@@ -15,7 +15,7 @@ Description:
a conda environment with all necessary tools can be created.
Be aware that this pipeline was transferred to a nextflow pipeline with major
changes to version v0.3.
changes from version v0.3 to v.0.4.
Usage:
With mapping:
......@@ -23,12 +23,14 @@ Usage:
[--single-end <unpaired.fq>[,unpaired2.fq,...]]
[--paired-end <paired1.1.fq>,<paired1.2.fq>[,<paired2.1.fq>,<paired2.2.fq>,...]]
[--pacbio <pacbio.fq>[,<pacbio2.fq>,...]]
[--pacbio-ccs <pacbio_ccs.fq>[,<pacbio_ccs2.fq>,...]]
[--nanopore <nanopore.fq>[,<nanopore2.fq>,...]]
Without mapping:
nextflow run backmap.nf [--assembly assembly.fa]
[--illumina-bam-files illumina.bam[,illumina2.bam,...]]
[--pacbio-bam-files pacbio.bam[,pacbio2.bam,...]]
[--pacbio-ccs-bam-files pacbio_ccs.bam[,pacbio_ccs2.bam,...]]
[--nanopore-bam-files nanopore.bam[,nanopore2.bam,...]]
Mandatory options:
......@@ -36,11 +38,13 @@ Mandatory options:
AND AT LEAST ONE OF
--single-end STR Fastq files with unpaired Illumina reads, comma separated
--paired-end STR Two fastq files with paired Illumina reads, comma separated
--pacbio STR Fasta or fastq file with PacBio reads, comma separated
--pacbio STR Fasta or fastq file with PacBio reads (CLR), comma separated
--pacbio-ccs STR Fasta or fastq file with PacBio CCS reads, comma separated
--nanopore STR Fasta or fastq file with Nanopore reads, comma separated
OR
--illumina-bam-files STR Premapped Illumina Reads in bam format, comma separated
--pacbio-bam-files STR Premapped Pacbio Reads in bam format, comma separated
--pacbio-ccs-bam-files STR Premapped Pacbio CCS Reads in bam formac, comma separated
--nanopore-bam-files STR Premapped Nanopore Reads in bam format, comma separated
Skips read mapping
Genome size estimation only possible if assembly is given
......@@ -60,7 +64,8 @@ Optional Options: [default]
genome size estimation is only possible, if assembly is given
--bwa-options STR Options passed to bwa [-a -c 10000]
--minimap-pacbio-options STR Options passed to minimap ["-H"]
--minimap-pacbio-options STR Options passed to pacbio CLR minimap runs ["-H"]
--minimap-pacbio-ccs-options STR Options passed to pacbio CCS minimap runs [""]
--minimap-nanopore-options STR Options passes to nanopore minimap runs [""]
--qualimap-options STR Options passed to qualimap [""]
Pass options with quotes e.g. --bwa-options "<options>"
......@@ -75,10 +80,12 @@ params['assembly'] = NO_FILE
params['single-end'] = NO_FILE
params['paired-end'] = NO_FILE
params['pacbio'] = NO_FILE
params['pacbio-ccs'] = NO_FILE
params['nanopore'] = NO_FILE
params['illumina-bam-files'] = NO_FILE
params['pacbio-bam-files'] = NO_FILE
params['pacbio-ccs-bam-files'] = NO_FILE
params['nanopore-bam-files'] = NO_FILE
params['file-separator'] = ','
......@@ -98,6 +105,7 @@ params['skip-genome-size-estimation'] = false
params['bwa-options'] = '-a -c 10000'
params['minimap-pacbio-options'] = '-H'
params['minimap-pacbio-ccs-options'] = ''
params['minimap-nanopore-options'] = ''
params['qualimap-options'] = ''
......@@ -105,11 +113,13 @@ assembly = params['assembly']
singleEnd = params['single-end']
pairedEnd = params['paired-end']
pacbio = params['pacbio']
pacbioCCS = params['pacbio-ccs']
nanopore = params['nanopore']
illuminaBamFiles = params['illumina-bam-files']
pacbioBamFiles = params['pacbio-bam-files']
nanoporeBamFiles = params['nanopore-bam-files']
illuminaBamFiles = params['illumina-bam-files']
pacbioBamFiles = params['pacbio-bam-files']
pacbioCCSBamFiles = params['pacbio-ccs-bam-files']
nanoporeBamFiles = params['nanopore-bam-files']
fileSeparator = params['file-separator']
......@@ -120,6 +130,7 @@ threads = params['threads']
bwaOptions = params['bwa-options']
minimapPacbioOptions = params['minimap-pacbio-options']
minimapPacbioCCSOptions = params['minimap-pacbio-ccs-options']
minimapNanoporeOptions = params['minimap-nanopore-options']
qualimapOptions = params['qualimap-options']
......@@ -132,16 +143,19 @@ skipQualityControl = params['skip-quality-control']
skipCoverageHistogram = params['skip-coverage-histogram']
skipGenomeSizeEstimation = params['skip-genome-size-estimation']
boolean assemblyAvailable = assembly != NO_FILE
boolean unpairedAvailable = singleEnd != NO_FILE
boolean pairedAvailable = pairedEnd != NO_FILE
boolean pacbioAvailable = pacbio != NO_FILE
boolean nanoporeAvailable = nanopore != NO_FILE
boolean illuminaBamsAvailable = illuminaBamFiles != NO_FILE
boolean pacbioBamsAvailable = pacbioBamFiles != NO_FILE
boolean nanoporeBamsAvailable = nanoporeBamFiles != NO_FILE
boolean assemblyAvailable = assembly != NO_FILE
boolean unpairedAvailable = singleEnd != NO_FILE
boolean pairedAvailable = pairedEnd != NO_FILE
boolean pacbioAvailable = pacbio != NO_FILE
boolean pacbioCCSAvailable = pacbioCCS != NO_FILE
boolean nanoporeAvailable = nanopore != NO_FILE
boolean bamAvailable = (illuminaBamsAvailable || pacbioBamsAvailable || nanoporeBamsAvailable)
boolean illuminaBamsAvailable = illuminaBamFiles != NO_FILE
boolean pacbioBamsAvailable = pacbioBamFiles != NO_FILE
boolean pacbioCCSBamsAvailable = pacbioCCSBamFiles != NO_FILE
boolean nanoporeBamsAvailable = nanoporeBamFiles != NO_FILE
boolean bamAvailable = (illuminaBamsAvailable || pacbioBamsAvailable || pacbioCCSBamsAvailable || nanoporeBamsAvailable)
boolean qualityControl = !skipQualityControl
boolean coverageHistogram = !skipCoverageHistogram
......@@ -156,11 +170,13 @@ if (printDebug) {
println " Nanopore $nanopore"
println " IlluminaBamFiles $illuminaBamFiles"
println " PacbioBamFiles $pacbioBamFiles"
println " PacbioCCSBamFiles $pacbioCCSBamFiles"
println " NanoporeBamFiles $nanoporeBamFiles"
println ''
println 'Additional Tool Options:'
println " bwaOptions $bwaOptions"
println " minimapPacbioOptions $minimapPacbioOptions"
println " minimapPacbioCCSOptions $minimapPacbioCCSOptions"
println " minimapNanoporeOptions $minimapNanoporeOptions"
println " qualimap options $qualimapOptions"
println ''
......@@ -168,6 +184,7 @@ if (printDebug) {
println " QualityControl $qualityControl"
println " CoverageHistogram $coverageHistogram"
println " GenomeSizeEstimation $genomeSizeEstimation"
println ''
println 'Misc:'
println " Prefix $prefix"
println " Output folder $output"
......@@ -183,6 +200,7 @@ if (printDebug) {
println " unpairedAvailable $unpairedAvailable"
println " pairedAvailable $pairedAvailable"
println " pacbioAvailable $pacbioAvailable"
println " pacbioCCSAvailable $pacbioCCSAvailable"
println " nanoporeAvailable $nanoporeAvailable"
println " bamAvailable $bamAvailable"
println " illuminaBamsAvailable $illuminaBamsAvailable"
......@@ -210,23 +228,27 @@ final Map<String, String> LOGFILES = [path: output, mode: 'copy', pattern: '*.{l
*/
if (bamAvailable) {
chIlluminaBamMerge = (illuminaBamsAvailable) ? Channel.from(illuminaBamFiles)
chIlluminaBamMerge = (illuminaBamsAvailable ) ? Channel.from(illuminaBamFiles )
.map { files -> files.split(fileSeparator) }.flatten().map { f -> file(f) } : Channel.empty()
chPacbioBamMerge = (pacbioBamsAvailable ) ? Channel.from(pacbioBamFiles )
.map { files -> files.split(fileSeparator) }.flatten().map { f -> file(f) } : Channel.empty()
chPacbioBamMerge = (pacbioBamsAvailable) ? Channel.from(pacbioBamFiles )
chPacbioCCSBamMerge = (pacbioCCSBamsAvailable) ? Channel.from(pacbioCCSBamFiles)
.map { files -> files.split(fileSeparator) }.flatten().map { f -> file(f) } : Channel.empty()
chNanoporeBamMerge = (nanoporeBamsAvailable) ? Channel.from(nanoporeBamFiles)
chNanoporeBamMerge = (nanoporeBamsAvailable ) ? Channel.from(nanoporeBamFiles )
.map { files -> files.split(fileSeparator) }.flatten().map { f -> file(f) } : Channel.empty()
} else if (assemblyAvailable) { //fi: bamAvailable
chAssembly = Channel.fromPath(assembly).first()
chBwaIndexGenome = (pairedAvailable || unpairedAvailable) ? Channel.fromPath(assembly) : Channel.empty()
chUnpairedReadMapping = (unpairedAvailable) ? Channel.from(singleEnd)
chUnpairedReadMapping = (unpairedAvailable ) ? Channel.from(singleEnd)
.map { files -> files.split(fileSeparator) }.flatten().map { f -> file(f) }.collate(1) : Channel.empty()
chPairedReadMapping = (pairedAvailable ) ? Channel.from(pairedEnd)
chPairedReadMapping = (pairedAvailable ) ? Channel.from(pairedEnd)
.map { files -> files.split(fileSeparator) }.flatten().map { f -> file(f) }.collate(2) : Channel.empty()
chPacbioReadMapping = (pacbioAvailable ) ? Channel.from(pacbio )
chPacbioReadMapping = (pacbioAvailable ) ? Channel.from(pacbio )
.map { files -> files.split(fileSeparator) }.flatten().map { f -> file(f) }.collate(1) : Channel.empty()
chNanoporeReadMapping = (nanoporeAvailable) ? Channel.from(nanopore )
chPacbioCCSReadMapping = (pacbioCCSAvailable) ? Channel.from(pacbioCCS)
.map { files -> files.split(fileSeparator) }.flatten().map { f -> file(f) }.collate(1) : Channel.empty()
chNanoporeReadMapping = (nanoporeAvailable ) ? Channel.from(nanopore )
.map { files -> files.split(fileSeparator) }.flatten().map { f -> file(f) }.collate(1) : Channel.empty()
// Create index files for bwa mapping
......@@ -297,6 +319,23 @@ if (bamAvailable) {
"""
} // process pacbioMapping
// Minimap2 mapping with PacBio CCS reads
process pacbioCCSMapping {
publishDir COPYMODETMP
cpus { PROCESSMAXTHREADS > threads ? threads : PROCESSMAXTHREADS }
input:
file pb from chPacbioCCSReadMapping
file gen from chAssembly
output:
file "${prefix}.pb_ccs${task.index}.bam" into chPacbioCCSBamMerge
file '*.err' into chPacbioCCSMappingLog
script:
"""
minimap2 $minimapPacbioCCSOptions -x asm20 -a -t ${task.cpus} $gen $pb 2> \
${prefix}_minimap_pb${task.index}.err | samtools view -1 -b - > ${prefix}.pb_ccs${task.index}.bam
"""
} // process pacbioCCSMapping
// Minimap2 mapping with Nanopore reads
process nanoporeMapping {
publishDir COPYMODETMP
......@@ -347,6 +386,20 @@ process mergePacbioBamFiles {
"""
} //process: mergePacbioBamFiles
// Merging of PacBio CCS mapped reads
process mergePacbioCCSBamFiles {
publishDir COPYMODETMP
cpus { PROCESSMAXTHREADS > threads ? threads : PROCESSMAXTHREADS }
input:
file pbs from chPacbioCCSBamMerge.collect()
output:
tuple val('Pacbio CCS'), file("${prefix}.pb_ccs.bam") into chSortPacbioCCSBamFile
script:
"""
samtools merge -@ ${task.cpus} ${prefix}.pb_ccs.bam $pbs
"""
} //process: mergePacbioCCSBamFiles
// Merging of Nanopore mapped reads
process mergeNanoporeBamFiles {
publishDir COPYMODETMP
......@@ -366,7 +419,7 @@ process sortAllBamFiles {
publishDir COPYMODE
cpus { PROCESSMAXTHREADS > threads ? threads : PROCESSMAXTHREADS }
input:
tuple val(tech), file(bam) from chSortIlluminaBamFile.mix(chSortPacbioBamFile).mix(chSortNanoporeBamFile)
tuple val(tech), file(bam) from chSortIlluminaBamFile.mix(chSortPacbioBamFile).mix(chSortPacbioCCSBamFile).mix(chSortNanoporeBamFile)
output:
tuple val("$tech"), file("${bam.baseName}.sort.bam") into chQualimapSortedBams,
chComputeCoverageSortedBams,
......
......@@ -11,6 +11,7 @@ ASSEMBLY_BIG = config["assembly_big"]
WD = config["workdir"]+"/"
include: "snakemake_workflows/bam_prep.smk"
include: "snakemake_workflows/read_simulations.smk"
include: "snakemake_workflows/bams_illumina_pacbio_nanopore.smk"
include: "snakemake_workflows/nanopore.smk"
include: "snakemake_workflows/nanopore_multi.smk"
......@@ -31,6 +32,7 @@ include: "snakemake_workflows/single_paired.smk"
include: "snakemake_workflows/single_paired_nanopore.smk"
include: "snakemake_workflows/single_paired_pacbio.smk"
include: "snakemake_workflows/single_paired_pacbio_nanopore.smk"
include: "snakemake_workflows/pacbio_clr_and_ccs.smk"
rule all:
input:
......@@ -80,6 +82,7 @@ rule all:
rules.bams_illumina_paired_pacbio_nanopore_nextflow.output,
rules.bams_with_assembly_illumina_paired_pacbio_nanopore_nextflow.output,
rules.assembly_big_single_pacbio_nextflow.output,
rules.assembly_pacbio_clr_ccs.output,
rule data_prep:
......@@ -106,36 +109,6 @@ rule install_perl_packages:
done
"""
rule nanosim:
input:
genome = ASSEMBLY,
training = multiext("no_backup/nanosim/", #rules.nanosim_training.output
"training_aligned_reads.pkl",
"training_aligned_region.pkl",
"training_chimeric_info",
"training_error_markov_model",
"training_error_rate.tsv",
"training_first_match.hist",
"training_gap_length.pkl",
"training_ht_length.pkl",
"training_ht_ratio.pkl",
"training_match_markov_model",
"training_model_profile",
"training_reads_alignment_rate",
"training_strandness_rate",
"training_unaligned_length.pkl"
)
output:
"data_prep/nanopore.sim.{number}.fastq"
conda: "envs/nanosim.yml"
shadow: "minimal"
threads: 8
shell:
"""
simulator.py genome --ref_g {input.genome} --output {output} --number {wildcards.number} --fastq --num_threads {threads} --basecaller guppy -c no_backup/nanosim/training;
cat {output}_aligned_reads.fastq {output}_unaligned_reads.fastq > {output}
"""
#rule perl_backmap_with_assembly:
# input:
# infiles = lambda wildcards: [os.path.join(DATA,infile) for infile in config["cases"][wildcards.run]["input_files"]],
......
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- pbsim=1.0.3
......@@ -6,4 +6,4 @@ keep-going: True
cores: 1
restart-times: 2
max-jobs-per-second: 1
shadow-prefix: "/tmp/rmueller/"
shadow-prefix: "/var/scratch/backmap_pipeline_tbg/"
OF = WD+"assembly_pacbio_clr_ccs/nextflow/"
NAME = "clrccs"
rule assembly_pacbio_clr_ccs:
input:
assembly = ASSEMBLY,
clr = "data_prep/pbsim_clr_5.fastq",
ccs = "data_prep/pbsim_ccs_5.fastq",
output:
sort_stats_pb = multiext(OF+NAME+".pb.sort_stats/", "genome_results.txt", "qualimapReport.html"),
pacbio_clr = multiext(OF+NAME+".pb","1.bam",".bam",*multiext(".sort.bam","",*multiext(".cov-hist","",".pdf"), ".stats",".stats.err"),*multiext(".sort_stats_bamqc.","err","log")) + [OF+NAME+"_minimap_pb1.err"],
params:
prefix=NAME,
outdir=OF,
prgm=config["nextflow_script"]
threads: 16
conda: "../envs/nextflow.yml"
benchmark: repeat("benchmarks/nextflow/{}.tsv".format(NAME), config["benchmark_repeats"])
log:
stdout = "{folder}{name}.stdout.log".format(folder=OF, name=NAME),
stderr = "{folder}{name}.stderr.log".format(folder=OF, name=NAME)
shell:
"""
rm -rf {params.outdir};
mkdir -p {params.outdir};
{params.prgm} --keep-temporary --assembly {input.assembly} --output {params.outdir} --prefix {params.prefix} --pacbio {input.clr} --pacbio-ccs {input.ccs} 1> {log.stdout} 2> {log.stderr}
"""
rule nanosim:
input:
genome = ASSEMBLY,
training = multiext("no_backup/nanosim/", #rules.nanosim_training.output
"training_aligned_reads.pkl",
"training_aligned_region.pkl",
"training_chimeric_info",
"training_error_markov_model",
"training_error_rate.tsv",
"training_first_match.hist",
"training_gap_length.pkl",
"training_ht_length.pkl",
"training_ht_ratio.pkl",
"training_match_markov_model",
"training_model_profile",
"training_reads_alignment_rate",
"training_strandness_rate",
"training_unaligned_length.pkl"
)
output:
"data_prep/nanopore.sim.{number}.fastq"
conda: "../envs/nanosim.yml"
shadow: "minimal"
threads: 8
shell:
"""
simulator.py genome --ref_g {input.genome} --output {output} --number {wildcards.number} --fastq --num_threads {threads} --basecaller guppy -c no_backup/nanosim/training;
cat {output}_aligned_reads.fastq {output}_unaligned_reads.fastq > {output}
"""
rule pbsim:
input:
genome = ASSEMBLY,
model = "no_backup/pbsim/model_qc_{datatype}",
output:
"data_prep/pbsim_{datatype}_{depth}.fastq"
conda: "../envs/pbsim.yml"
params:
datatype = lambda wildcards: wildcards.datatype.upper(),
prefix = lambda wildcards: "pbsim_{}_{}".format(wildcards.datatype, wildcards.depth)
shadow: "minimal"
shell:
"""
awk '/^>/{{a++}} a>10{{exit}} 1' {input.genome} > genome.fasta
pbsim --data-type {params.datatype} --depth {wildcards.depth} --prefix {params.prefix} --model_qc {input.model} genome.fasta;
cat {params.prefix}*.fastq > {output}
"""
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment