concepts.rst 2.09 KB
Newer Older
Lukas Jelonek's avatar
Lukas Jelonek committed
1
2
3
4
5
6
7
8
9
10
11
12
13
Concepts
========

PSOT is a system that executes bioinformatic tools on a file with protein
sequences and converts the results into easy to process json documents. It
contains a live mode that writes the results of already finished tools into
a directory, which can be polled and further processed, e.g. by a website
that displays results as they become ready.

Vocabulary
----------

Module
Lukas Jelonek's avatar
Lukas Jelonek committed
14
15
    A module implements a bioinformatic tool and the corresponding json
    converter.  It is defined in a module manifest.
Lukas Jelonek's avatar
Lukas Jelonek committed
16
17

Profile
Lukas Jelonek's avatar
Lukas Jelonek committed
18
19
20
21
22
    A profile is a set of modules that are executed during an execution of
    PSOT.  Profiles can override default parameters of modules.

Repository
    A collection of profiles, modules, scripts and configurations.
Lukas Jelonek's avatar
Lukas Jelonek committed
23
24
25
26

Workflow
--------

Lukas Jelonek's avatar
Lukas Jelonek committed
27
1. Load all module manifests and profiles from all available repositories
Lukas Jelonek's avatar
Lukas Jelonek committed
28
29
30
31
32
33
2. Create an execution directory
3. Generate a nextflow script for the choosen profile in the execution directory
4. Run the nextflow script
5. Remove the execution directory

Structure of the Nextflow Script
34
--------------------------------
Lukas Jelonek's avatar
Lukas Jelonek committed
35
36
37
38

1. Run all analyses in parallel
2. Convert all analyses in parallel
3. In live mode: generate a json document for each module and each sequence within the live directory
Lukas Jelonek's avatar
Lukas Jelonek committed
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
4. In retrieve mode: retrieve all information from the referenced databases
5. Join all json files into a single one containing all information
6. Split the large json file into separate files for each sequence


Loading of configuration artifacts
----------------------------------

Profiles, modules and configurations are organized in repositories. A
repository can contain the following elements:

* config.yaml (file)
* modules/ (directory with yaml files)
* profiles/ (directory with yaml files)
* scripts/ (directory with scripts for modules)

PSOT uses a repository search path to find bundled and own repositories. It can
Lukas Jelonek's avatar
Lukas Jelonek committed
56
57
be set by defining the environment variable `PSOT_REPOSITORIES` with a ':'
separated list of paths.
Lukas Jelonek's avatar
Lukas Jelonek committed
58
59
60

The repositories are loaded in the following order. Later respositories 
overwrite values from previous repositories.
Lukas Jelonek's avatar
Lukas Jelonek committed
61

Lukas Jelonek's avatar
Lukas Jelonek committed
62
63
* default repository
* PSOT_REPOSITORIES