CHAP package

Subpackages

Submodules

CHAP.TaskManager module

Python thread pool, see http://code.activestate.com/recipes/577187-python-thread-pool/ Author: Valentin Kuznetsov <vkuznet [AT] gmail [DOT] com>

class StoppableThread(target, name, args)

Bases: Thread

Thread class with a stop() method. The thread itself has to check regularly for the stopped() condition.

running(): Return running status of the thread.

stop(): Set event to stop the thread.

stopped(): Return stopped status of the thread.

class TaskManager(nworkers=10, name='TaskManager')

Bases: object

Task manager class based on thread module which executes assigned tasks concurently. It uses a pool of thread workers, queue of tasks and pid set to monitor jobs execution.

Use case:
mgr  = TaskManager()
jobs = []
jobs.append(mgr.spawn(func, args))
mgr.joinall(jobs)

clear(tasks): Clear all tasks in a queue. It allows current jobs to run, but will block all new requests till workers event flag is set again.

is_alive(pid): Check worker queue if given pid of the process is still running.

joinall(tasks): Join all tasks in a queue and quit.

nworkers(): Return number of workers associated with this manager.

quit(): Put None task to all workers and let them quit.

remove(pid): Remove pid and associative process from the queue.

spawn(func, *args, **kwargs): Spawn new process for given function.

status(): Return status of task manager queue.

class UidSet

Bases: object

UID holder keeps track of uid frequency.

add(uid): Add given uid or increment uid occurence in a set.

discard(uid): Either discard or downgrade uid occurence in a set.

get(uid): Get value for given uid.

class Worker(name, taskq, pidq, uidq, logger=None)

Bases: Thread

Thread executing worker from a given tasks queue.

force_exit(): Force run loop to exit in a hard way.

run(): Run thread loop.

genkey(query): Generate a new key-hash for a given query. We use md5 hash for the query and key is just hex representation of this hash.

set_thread_name(ident, name): Set thread name for given identified.

start_new_thread(name, func, args, unique=False): Wrapper wroung standard thread.strart_new_thread call.

CHAP.models module

Common Pydantic model classes.

class CHAPBaseModel

Bases: BaseModel

Base CHAP configuration class implementing robust serialization tools.

dict(*args, **kwargs)

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_dump(*args, **kwargs)

Dump the class implemention to a dictionary

Returns:: Class implementation.
Return type:: dict

model_dump_json(*args, **kwargs)

Dump the class implemention to a JSON string

Returns:: Class implementation.
Return type:: str

class RunConfig(*, root: Annotated[Path, PathType(path_type=dir)] | None = '/home/runner/work/ChessAnalysisPipeline/ChessAnalysisPipeline/docs', inputdir: Annotated[Path, PathType(path_type=dir)] | None = None, outputdir: Annotated[Path, PathType(path_type=dir)] | None = None, interactive: bool | None = False, log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | None = 'INFO')

Bases: CHAPBaseModel

Pipeline run configuration class.

Variables:

root – Default work directory, defaults to the current run directory.
inputdir – Input directory, used only if any input file in the pipeline is not an absolute path, defaults to ‘root’.
outputdir – Output directory, used only if any output file in the pipeline is not an absolute path, defaults to ‘root’.
interactive – Allows for user interactions, defaults to False.
log_level – Logger level (not case sensitive), defaults to ‘INFO’.

inputdir: Annotated[Path, PathType(path_type=dir)] | None

interactive: bool | None

log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:: self: The BaseModel instance. context: The context.

outputdir: Annotated[Path, PathType(path_type=dir)] | None

property profile: Return the profiling flag.

root: Annotated[Path, PathType(path_type=dir)] | None

property spawn: Return the spawned worker flag.

classmethod validate_log_level(log_level): Capitalize log_level.

classmethod validate_runconfig_before(data)

Ensure that valid directory paths are provided.

Parameters:: data (RunConfig, pydantic_core._pydantic_core.ValidationInfo) – Pydantic validator data object.
Returns:: The currently validated list of class properties.
Return type:: dict

CHAP.pipeline module

File : pipeline.py Author : Valentin Kuznetsov <vkuznet AT gmail dot com> Description:

class Pipeline(*, args: Annotated[list[dict], Len(min_length=1, max_length=None)], logger: Logger | None = None, mmcs: Annotated[list[ModelMetaclass], Len(min_length=1, max_length=None)])

Bases: CHAPBaseModel

Class representing a full Pipeline object.

args: Annotated[list[dict], Len(min_length=1, max_length=None)]

execute(): Executes the pipeline.

logger: Logger | None

mmcs: Annotated[list[ModelMetaclass], Len(min_length=1, max_length=None)]

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:: self: The BaseModel instance. context: The context.

validate_pipeline_after()

Validate the Pipeline configuration and initialize and validate the private attributes.

Returns:: The validated configuration.
Return type:: Pipeline

class PipelineData(name=None, data=None, schema=None)

Bases: dict

Wrapper for all results of PipelineItem.execute.

class PipelineItem(*, root: Annotated[Path, PathType(path_type=dir)] | None = '/home/runner/work/ChessAnalysisPipeline/ChessAnalysisPipeline/docs', inputdir: Annotated[Path, PathType(path_type=dir)] | None = None, outputdir: Annotated[Path, PathType(path_type=dir)] | None = None, interactive: bool | None = False, log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | None = 'INFO', logger: Logger | None = None, name: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None = None, schema: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None = None)

Bases: RunConfig

Class representing a single item in a Pipeline object.

execute(data)

Run the appropriate method of the object and return the result.

Parameters:: data (list[PipelineData]) – Input data.
Returns:: The wrapped result of running read, process, or write.
Return type:: Union[PipelineData, tuple[PipelineData]]

get_args()

get_config(data=None, config=None, schema=None, remove=True)

Look through data for the last item which value for the ‘schema’ key matches schema. Convert the value for that item’s ‘data’ key into the configuration’s Pydantic model identified by schema and return it. If no item is found and config and schema are specified, validate config against the configuration’s Pydantic model identified by schema and return it. Return config if no item is found and config is specified, but schema is not.

Parameters:

data (list[PipelineData], optional) – Input data from a previous PipelineItem.
config (dict, optional) – Initialization parameters for an instance of the Pydantic model identified by schema, required if data is unspecified, invalid or does not contain an item that matches the schema, superseeds any equal parameters contained in data.
schema (str, optional) – Name of the PipelineItem class to match in data & return, defaults to the internal PipelineItem schema attribute.
remove (bool, optional) – If there is a matching entry in data, remove it from the list, defaults to True.

Raises:

ValueError – If there’s no match for schema in data.

Returns:

The last matching validated configuration model.

Return type:

PipelineItem

static get_data(data, name=None, schema=None, remove=True)

Look through data for the last item which ‘data’ value is a nexusformat.nexus.NXobject object or matches a given name or schema. Pick the last item for which the ‘name’ key matches name if set or the ‘schema’ key matches schema if set, pick the last match for a nexusformat.nexus.NXobject object otherwise. Return the data object.

Parameters:

data (list[PipelineData].) – Input data from a previous PipelineItem.
name (str, optional) – Name of the data item to match in data & return.
schema (Union[str, list[str]], optional) – Name of the PipelineItem class to match in data & return.
remove (bool, optional) – If there is a matching entry in data, remove it from the list, defaults to True.

Raises:

ValueError – If there’s no match for name or ‘schema` in data, or if there is no object of type nexusformat.nexus.NXobject.

Returns:

The last matching data item.

Return type:

obj

static get_default_nxentry(nxobject)

Given a nexusformat.nexus.NXroot or nexusformat.nexus.NXentry object, return the default or first nexusformat.nexus.NXentry match.

Parameters:: nxobject (nexusformat.nexus.NXroot, nexusformat.nexus.NXentry) – Input data.
Raises:: ValueError – If unable to retrieve a nexusformat.nexus.NXentry object.
Returns:: The input data if a nexusformat.nexus.NXentry object or the default or first nexusformat.nexus.NXentry object if a nexusformat.nexus.NXroot object.
Return type:: nexusformat.nexus.NXentry

get_schema()

has_filename()

logger: Logger | None

property method

property method_type

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:: self: The BaseModel instance. context: The context.

name: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None

property run_config

schema_: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None

set_args(**args)

property status

static unwrap_pipelinedata(data)

Given a list of PipelineData objects, return a list of their data values.

Parameters:: data (list[PipelineData]) – Input data to read, write, or process that needs to be unwrapped from PipelineData before use.
Returns:: The ‘data’ values of the items in the input data.
Return type:: list[object]

validate_pipelineitem_after()

Validate the PipelineItem configuration.

Returns:: The validated configuration.
Return type:: PipelineItem

CHAP.processor module

File : processor.py Author : Valentin Kuznetsov <vkuznet AT gmail dot com> Description: Processor module

Define a generic Processor object.

class OptionParser

Bases: object

User based option parser.

class Processor(*, root: Annotated[Path, PathType(path_type=dir)] | None = '/home/runner/work/ChessAnalysisPipeline/ChessAnalysisPipeline/docs', inputdir: Annotated[Path, PathType(path_type=dir)] | None = None, outputdir: Annotated[Path, PathType(path_type=dir)] | None = None, interactive: bool | None = False, log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | None = 'INFO', logger: Logger | None = None, name: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None = None, schema: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None = None)

Bases: PipelineItem

Generic data processor.

The job of any Processor in a Pipeline is to receive data returned by the previous PipelineItem, process it in some way, and return the result for the next PipelineItem to use as input.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:: self: The BaseModel instance. context: The context.

process(data)

Extract the contents of the input data, add a string to it, and return the amended value.

Parameters:: data – Input data.
Returns:: Processed data.

classmethod validate_processor_before(data)

main(opt_parser=<class 'CHAP.processor.OptionParser'>): Main function.

CHAP.reader module

File : reader.py Author : Valentin Kuznetsov <vkuznet AT gmail dot com> Description: generic Reader module

Define a generic Reader object.

class OptionParser

Bases: object

User based option parser.

class Reader(*, root: Annotated[Path, PathType(path_type=dir)] | None = '/home/runner/work/ChessAnalysisPipeline/ChessAnalysisPipeline/docs', inputdir: Annotated[Path, PathType(path_type=dir)] | None = None, outputdir: Annotated[Path, PathType(path_type=dir)] | None = None, interactive: bool | None = False, log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | None = 'INFO', logger: Logger | None = None, name: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None = None, schema: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None = None, filename: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)])

Bases: PipelineItem

Generic file reader.

The job of any Reader in a Pipeline is to provide data stored in a file to the next PipelineItem. Note that a Reader used on its own disrupts the flow of data in a Pipeline – it does not receive or pass along any data returned by the previous PipelineItem.

Variables:: filename – Name of file to read from.

filename: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)]

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:: self: The BaseModel instance. context: The context.

read()

Read and return the contents of filename as text.

Returns:: The file content.
Return type:: str

main(opt_parser=<class 'CHAP.reader.OptionParser'>): Main function.

validate_reader_model(reader)

CHAP.runner module

File : runner.py Author : Valentin Kuznetsov <vkuznet AT gmail dot com> Description:

main(): Main function.

parser(): Return an argument parser for the CHAP CLI. This parser has one argument: the input CHAP configuration file.

run(run_config, pipeline_config, logger=None, log_handler=None, comm=None)

Run a given pipeline_config.

Parameters:

run_config (CHAP.runner.RunConfig) – CHAP run configuration.
pipeline_config (dict) – CHAP Pipeline configuration.
logger (logging.Logger, optional) – CHAP logger.
log_handler (logging.StreamHandler, optional) – Logging handler.
comm (mpi4py.MPI.Comm, optional) – MPI communicator.

Returns:

The data field of the first item in the returned list of pipeline items.

runner(run_config, pipeline_config, comm=None)

Main runner funtion.

Parameters:

run_config (CHAP.runner.RunConfig) – CHAP run configuration.
pipeline_config (dict) – CHAP Pipeline configuration.
comm (mpi4py.MPI.Comm, optional) – MPI communicator.

Returns:

The pipeline’s returned data field.

set_logger(log_level='INFO')

Helper function to set CHAP logger.

Parameters:: log_level (str) – Logger level, defaults to “INFO”.
Returns:: The CHAP logger and logging handler.
Return type:: logging.Logger, logging.StreamHandler

CHAP.server module

File : server.py Author : Valentin Kuznetsov <vkuznet AT gmail dot com> Description: Python server with thread pool and CHAP pipeline

Client side

cat /tmp/chap.json {“pipeline”: [{“common.PrintProcessor”: {}}], “input”: 1}

Curl call to the server with CHAP pipeline

curl -X POST -H “Content-type: application/json” -d@/tmp/chap.json http://localhost:5000/pipeline {“pipeline”: [{“common.PrintProcessor”:{}}], “status”:”ok”}

Server side

flask –app server run

Serving Flask app ‘server’
Debug mode: off

WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

Running on http://127.0.0.1:5000

Press CTRL+C to quit

CHAP output:

CHAP.server         : Call pipeline args=()
    kwds={'pipeline': [{'common.PrintProcessor': {}}]}
CHAP.server         : pipeline [{'common.PrintProcessor': {}}]
CHAP.server         : Loaded
    <CHAP.common.processor.PrintProcessor object at 0x10e0f1ed0>
CHAP.server         : Loaded
    <CHAP.pipeline.Pipeline object at 0x10e0f1f10> with 1 items
CHAP.server         : Calling "execute" on <CHAP.pipeline.Pipeline
    object at 0x10e0f1f10>
Pipeline            : Executing "execute"
Pipeline            : Calling "process" on
    <CHAP.common.processor.PrintProcessor object at 0x10e0f1ed0>
PrintProcessor      : Executing "process" with
    type(data)=<class 'NoneType'>
PrintProcessor data : None
PrintProcessor      : Finished "process" in 0.000 seconds
Pipeline            : Executed "execute" in 0.000 seconds

daemon(name, queue, interval): Daemon example based on Queue.

index_route(): Server main end-point.

pipeline_route(): Server /pipeline end-point.

run_route(): Server main end-point.

task(*args, **kwds): Helper function to execute CHAP pipeline.

CHAP.writer module

File : writer.py Author : Valentin Kuznetsov <vkuznet AT gmail dot com> Description: generic Writer module

Define a generic Writer object.

class OptionParser

Bases: object

User based option parser.

class Writer(*, root: Annotated[Path, PathType(path_type=dir)] | None = '/home/runner/work/ChessAnalysisPipeline/ChessAnalysisPipeline/docs', inputdir: Annotated[Path, PathType(path_type=dir)] | None = None, outputdir: Annotated[Path, PathType(path_type=dir)] | None = None, interactive: bool | None = False, log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'] | None = 'INFO', logger: Logger | None = None, name: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None = None, schema: Annotated[str, StringConstraints(strip_whitespace=True, to_upper=None, to_lower=None, strict=None, min_length=1, max_length=None, pattern=None)] | None = None, filename: str, force_overwrite: bool | None = False, remove: bool | None = False)

Bases: PipelineItem

Generic file writer.

The job of any Writer in a Pipeline is to receive input returned by a previous PipelineItem, write its data to a particular file format, then return the same data unaltered so it can be used by a successive PipelineItem.

Variables:

filename – Name of file to write to.
force_overwrite – Flag to allow data in filename to be overwritten if it already exists, defaults to False.
remove – Flag to remove the dictionary from data, defaults to False.

filename: str

force_overwrite: bool | None

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:: self: The BaseModel instance. context: The context.

remove: bool | None

write(data)

Write the last CHAP.pipeline.PipelineData item in data as text to a file.

Parameters:: data (list[CHAP.pipeline.PipelineData]) – Input data.
Returns:: Contents of the input data.
Return type:: list[PipelineData]

main(opt_parser=<class 'CHAP.writer.OptionParser'>): Main function.

validate_writer_model(writer)

Module contents

The ChessAnalysisPipeline (CHAP) provides infrastructure to construct and run X-ray data processing / analysis workflows using a set of modular components. We call these components PipelineItem`s (subclassed into `Reader`s, `Processor`s, and `Writer`s). A `Pipeline uses a sequence of PipelineItem`s to execute a data processing workflow where the data returned by one `PipelineItem becomes input for the next one.

Many PipelineItem`s can be shared by data processing workflows for multiple different X-ray techniques, while others may be unique to just a single technique. The `PipelineItem`s that are shared by many techniques are organized in the `CHAP.common subpackage. PipelineItem`s unique to a tomography workflow, for instance, are organized in the `CHAP.tomo subpackage.

[CHAP.utils](CHAP.utils.md) contains a broad selection of utilities to assist in some common tasks that appear in specific Processor implementations.