Introduction

The CHESS Analysis Pipeline (CHAP) is an object-oriented python framework for refactoring monolithic data analysis programs into modular pipelines composed of interchangeable, reusable code components. The basic blueprint for a Pipeline consists of a Reader, Processor, and Writer, as shown in the diagram below:

_images/chap-base.png

The basic blueprint of a CHAP pipeline component.

The Reader and Writer handle data input and output for the Pipeline. These base classes encapsulate the CHESS-specific logistics, such as file operations and data format conversions. Inherited subclasses are defined for specific file types, e.g. H5Reader, NexusWriter. A Pipeline can be constructed with multiple Readers or Writers.

The Processor base class encapsulates the data analysis algorithm. Because Processors are independent of CHESS infrastructure, researchers who are unfamiliar with CHESS can easily contribute Processor subclasses with custom code for data analysis. Data is passed into and out of a Processor via PipelineData containers. Multiple Processors can be chained together within a single Pipeline.

Workflows are defined by CHAP configuration files written in YAML. Each file may contain one or more Pipelines that can be executed individually or all at once (sequentially). The diagram below shows a schematic workflow with series of Pipelines linked together in a single configuration file.

_images/chap-schematic.png

The basic blueprint of a CHAP workflow.

Workflows for specific X-ray techniques are constructed from concrete implementations of the CHAP base classes. For example, in the EDD workflow example pipeline file, “energy” is for energy calibration with “Processor 0” being edd.MCAEnergyCalibrationProcessor, “twotheta” is for detector theta calibration with “Processor 1” being edd.MCATthCalibrationProcessor, and so on.