CHAP Pipeline
To run a CHESS Analysis Pipeline (CHAP), you will need:
A
CHAPconfiguration file in YAML formatA
CHAPcommand line executable (CLI) executable
Run a CHAP pipeline by executing:
$ CHAP pipeline.yaml
How to run CHAP on the CHESS Linux system with centrally maintained workflow executables is discussed below.
Constructing a CHAP configuration file
CHAP configuration files must be in YAML format. At the top level, the file contains a single document, the document contains a single structure, the structure contains at least two keys, one of the keys must be config, and all other keys are pipeline names.
Example of a complete CHAP pipeline configuration file:
config:
root: .
pipeline:
- common.YAMLReader:
filename: data.yaml
- common.PrintProcessor
The config section
The config section contains the values of the instance variables for an instance of CHAP.models.RunConfig. It is techinically optional, but it should be included in every pipeline file for reproducibility / provenance. It can also be helpful for applying the same pipeline on many datasets, depending on how your dataset files are organized. The keys you can use in this section are:
Key |
Description |
Default value |
|---|---|---|
|
Path to the working directory |
Directory from which |
|
Path to a directory where all |
Same value as |
|
Path to a directory where all |
Same value as |
|
Flag to allow certain optional data / parameter checks that |
|
|
Name of a python logging level (not case sensitive) |
|
Example config section containing all default values:
config:
root: .
inputdir: .
outputdir: .
interactive: false
log_level: info
Pipeline sections
Sections with names that are not config are actual pipelines. A single CHAP configuration file may contain more than one pipeline. Each pipeline must be an list of Readers, Processors, and Writers (Pipelinetems) to execute consecutively, and configure the instance variables and other parameters for each one. To assemble your own pipeline configuration:
Decide which
PipelineItems to use and in what order.For each
PipelineItem, refer to the Reference Guide (API documentation) to find out what instance variables it has. The Reference Guide also contain a description of every variable, its expected type, and its default value (for optional variables). Remember to include the instance variables for any object from which the relevantPipelineIteminherrits. For example,YAMLReaderlists no instance variables, but it does inherit fromReader, which hasfilename, soYAMLReaderalso has thefilenameinstance variable.
Example: MapProcessor
Suppose you want to configure a pipeline that collects all raw data from a CHESS dataset in a NeXus file, and that you already have a valid CHAP.common.models.map.MapConfig object for the dataset saved to a file named map_config.yaml. To create a suitable pipeline file:
Decide on the required
PipelineItems. The pipeline will need aReaderthat supports YAML files, aProcessorthat collectsMapConfigdata in a NeXus structure, and aWriterthat supports NeXus files. So, the pipeline configuration looks like this to start:pipeline: - common.YAMLReader: TBD - common.MapProcessor: TBD - common.NexusWriter: TBD
Now, fill in all the TBD’s by referring to the Reference Guide for each
PipelineItemto specify the instance variables.pipeline: - common.YAMLReader: filename: map_config.yaml - common.MapProcessor: detector_config: detectors: - id: detector_id shape: [0, 0] attrs: foo: bar - common.NexusWriter: filename: map_data.nxs
CHAP CLI usage
To diplay a description on how to use CHAP from the command line, execute:
$ CHAP --help
to get:
usage: PROG [-h] [-p [PIPELINE ...]] [--regex [{match,search,fullmatch}]]
[--batch] [--batch-logdir LOGDIR]
config
positional arguments:
config Input configuration file
options:
-h, --help show this help message and exit
-p [PIPELINE ...], --pipeline [PIPELINE ...]
Pipeline name(s)
--regex [{match,search,fullmatch}]
Name of Python RegEx function
(https://docs.python.org/3/howto/regex.html) to use
for matching configured pipeline names against the
string provided with the -p / --pipeline option.
--batch Enables "batch mode" operation where every sub-
pipeline is run in separate parallel processes. Log
files for each pipeline process will be created in the
directory specified with the `--batch-logdir` option.
--batch-logdir LOGDIR
Destination directory for individual pipeline log
files when running multiple pipelines in batch mode.
Option |
Description |
|---|---|
|
When more than one named pipeline configuration is present in a |
|
This option augments the behavior of |
|
This option augments the behavior of |
Example commands
Suppose pipeline.yaml contains:
config:
root: .
pipeline_1:
- common.YAMLReader:
filename: data_1.yaml
- common.PrintProcessor
pipeline_2:
- common.YAMLReader:
filename: data_2.yaml
- common.PrintProcessor
Command |
Behavior |
|---|---|
|
Concatenate |
|
Execute |
|
Execute |
Python executables for CHAP on the CHESS Linux system
Running CHAP on the CHESS Linux system does not require users to create their own Conda environment or CHAP executables. Instead CHESS maintains regularly updated CHAP executables to run any of the maintained workflows located in the shared software releases directory for CHESS: /nfs/chess/sw/CHESS-software-releases. Specifically, production and development versions of the CHAP executables can be found in /nfs/chess/sw/CHESS-software-releases/prod and /nfs/chess/sw/CHESS-software-releases/dev, respectively.
Production version executables are updated each time a new tagged release is created for the main branch of the CHAP Github repository. Links to executables for the latest production version can be found in /nfs/chess/sw/CHESS-software-releases/prod, links to older releases can be found in subdirectories identified by its release version number. Release notes can be found here. The CHAP Reference Guide (API documentation) is also updated automatically with each new tagged release.
Development version executables are updated each time a new commit is pushed to the dev branch of the CHAP Github repository. Links to executables for the latest development version can be found in /nfs/chess/sw/CHESS-software-releases/dev.
For example, to run the Tomo workflow using the latest production release version, execute:
$ /nfs/chess/sw/CHESS-software-releases/prod/CHAP_tomo pipeline.yaml
or to run the EDD workflow using the latest development release version, execute:
$ /nfs/chess/sw/CHESS-software-releases/dev/CHAP_edd pipeline.yaml
You may find it convenient to add an alias to your ~/.bascrc or ~/.bash_aliases, for example for the CHAP Tomography workflow production release:
alias CHAP_tomo_prod='/nfs/chess/sw/CHESS-software-releases/prod/CHAP_tomo'
after which you can run the Tomo workflow using the latest production release version by simply executing:
$ CHAP_tomo_prod pipeline.yaml
Python environments for CHAP on any Linux system
Developing a user PipelineItem for CHAP or running CHAP on a Linux system other than the CHESS farm does require users to create their own Conda environment by taking the following steps:
Create a base Conda environent and clone the
CHAPrepository according to steps 1 and 2 of the Conda installation instructions.Create a Conda environment suitable to your own
PipelineItemor create a Conda environment for each workflow that you want to run.
For example, to create the SAXSWAXS Conda environment and run a SAXSWAXS workflow:
Activate your base Conda environment:
$ source <path_to_CHAP_clone_dir>/bin/activate
Create a Conda environment inside your base environment with:
(base) $ mamba env create -f <path_to_CHAP_clone_dir>/CHAP/saxswaxs/environment.yml
Activate the
CHAP_saxswaxsenvironment:(base) $ conda activate CHAP_saxswaxs
Try running:
(CHAP_saxswaxs) $ CHAP --help
to confirm that the package and the environment were installed correctly.
Navigate to your work directory.
Create the required
CHAPpipeline file for the workflow (see above) and any additional workflow specific input files.Run the workflow using your own
CHAP_saxswaxsexecutable:
(CHAP_saxswaxs) $ CHAP pipeline.yaml