SAXS/WAXS module (CHAP.saxswaxs)

The CHAP.saxswaxs module contains processing tools unique to SAXS/WAXS processing. This document describes how to use them in a pipeline configuration YAML file so that SAXS/WAXS workflows can be run from the terminal on the CHESS Linux system.

Workflow description

General overview

  1. Setup a Zarr container for the complete raw and processed dataset.

    First, we set up a container to hold the raw and processed datasets before any processing begins. This step needs the following information to allocate an appropriate amount of space:

    1. Dataset size

    2. Detector size(s)

    3. Integrated data size(s)

    In practice, you should prepare two supplementary configuration files in addition to the parameters for the tools involved in this step: one for a MapConfig, object, another for a PyFaiIntegrationProcessorConfig object. This step also sets the number of chunks for each data array in the container. Selecting the right number of chunks is important for optimizing performance during the next step.

    Example pipeline configuration
    config:
      root: .
      log_level: debug
    setup:
    # Step 1: read in configuration files
    - common.YAMLReader:
        filename: map_config.yaml
        schema: common.models.map.MapConfig
    - common.YAMLReader:
        filename: pyfai_integration_processor_config.yaml
        schema: common.models.integration.PyfaiIntegrationConfig
    # Step 2: set up Zarr container for processed data based on
    #         configurations provided from Step 1
    - saxswaxs.SetupProcessor:
        dataset_shape:
        - 50
        dataset_chunks:
        - 10
        detectors:
        - id: PIL9
      shape:
      - 407
      - 487
        - id: PIL11
      shape:
      - 407
      - 487
    # Step 3: Write the Zarr container to file
    - common.ZarrWriter:
        filename: data.zarr
        force_overwrite: true   
    
  2. Fill container with processed data

    Next, we read in the raw data and perform the configured integration(s). To get the best performance for large datasets, this step should be performed across multiple pipeline processes runnning in parallel. Each process will handle reading, processing, and writing just one part of the whole dataset – exactly one chunk of each array in the container set up previously, to be precise. To fill the container from the previous step with processed data, each parallel process will need to know the following information:

    1. The spec file for the scan whose data will be processed

    2. The scan number for the scan whose data will be processed

    3. Parameters indicating a specific slice of the scan to process

    Example pipeline configuration
    config:
      root: .
      log_level: debug
    update:
    # Step 1: read in configuration files
    - common.YAMLReader:
        filename: map_config.yaml
        schema: common.models.map.MapConfig
    - common.YAMLReader:
        filename: pyfai_integration_processor_config.yaml
        schema: common.models.integration.PyfaiIntegrationConfig
    # Step 2: read and process the data
    - saxswaxs.UpdateValuesProcessor:
        spec_file: spec_file
        scan_number: 1
        detectors:
        - id: PIL9
          shape:
          - 407
          - 487
        - id: PIL11
          shape:
          - 407
          - 487
        idx_slice:
          start: 0
          stop: 10
          step: 1
    # Step 3: write processed data to the container set up earlier
    - common.ZarrValuesWriter:
        filename: data.zarr
    
  3. Perform final user adjustments

    These “final adjustments” are all optional, but they can include the following:

    • converting from .zarr format to NeXus format

    • performing flux, absorption, and / or background corrections on integrated data

    • inserting links to coordinate axes next to processed data arrays

    • reshaping the dataset from an “unstructured” to a “structured” representation

    Example format conversion pipeline configuration
    config:
      root: .
      log_level: debug
    convert:
    # One tool only: ZarrToNexusProcessor takes care of reading old
    # zarr file and writing new NeXus file, too
    - common.ZarrToNexusProcessor:
        zarr_filename: data.zarr
        nexus_filename: data.nxs
    
    Example flux correction pipeline configuration
    config:
      root: .
      log_level: debug
    flux_correct:
    # Step 1: Read in uncorrected data and flux data to perform flux correction
    - common.NexusReader:
        filename: data.nxs
        nxpath: waxs_azimuthal/data/I
        nxmemory: 100000
        name: intensity
    - common.NexusReader:
        filename: data.nxs
        nxpath: spec_file_012/scalar_data/presample_intensity
        nxmemory: 100000
        name: presample_intensity
    # Step 2: Perform flux correction calculations
    - saxswaxs.FluxCorrectionProcessor:
        nxprocess: true
        presample_intensity_reference_rate: 50000
    # Write flux corrected results back to original file as their own
    # new NXprocess group
    - common.NexusWriter:
        filename: data.nxs
        nxpath: /waxs_azimuthal_flux_corrected
        force_overwrite: true
    
    Example coordinate linking pipeline configuration
    config:
      root: .
      log_level: debug
    linkdims:
    # Step 1: Read in the Nexus file in which we're adding linked data fields
    - common.NexusReader:
        filename: data.nxs
        mode: rw
    # Step 2: Create new linked data arrays from all the groups listed
    # in `link_from` to all the existing data arrays listed in
    # `link_to`
    - common.NexusMakeLinkProcessor:
        link_from:
        - waxs_azimuthal/data
        - waxs_cake/data
        - waxs_radial/data
        - waxs_azimuthal_flux_abs_bg_corrected/data
        - waxs_azimuthal_flux_abs_corrected/data
        - waxs_azimuthal_flux_corrected/data
        - waxs_cake_flux_abs_bg_corrected/data
        - waxs_cake_flux_abs_corrected/data
        - waxs_cake_flux_corrected/data
        - waxs_radial_flux_abs_bg_corrected/data
        - waxs_radial_flux_abs_corrected/data
        - waxs_radial_flux_corrected/data
        link_to:
        - spec_file_012/independent_dimensions/samx
        - spec_file_012/independent_dimensions/samzmakelink
    # No writing tool necessary, `common.NexusMakeLinkProcessor` will
    # modilfy the file in place as long as it's read in with `mode:
    # rw`.
    
    Example unstructured-to-structured pipeline configuration
    config:
      root: .
      log_level: debug
    struct:
    # Step 1: Read in the data arrays to structure, and all their
    # coordinate axes arrays too
    - common.NexusReader:
        filename: data.nxs
        nxpath: spec_file_012/independent_dimensions/samx
        name: samx
    - common.NexusReader:
        filename: data.nxs
        nxpath: spec_file_012/independent_dimensions/samz
        name: samz
    - common.NexusReader:
        filename: data.nxs
        nxpath: waxs_azimuthal/data/q_A^-1
        name: waxs_azimuthal_q
    - common.NexusReader:
        filename: data.nxs
        nxpath: waxs_radial/data/chi_deg
        name: waxs_radial_chi
    - common.NexusReader:
        filename: data.nxs
        nxpath: waxs_azimuthal/data/I
        name: waxs_azimuthal_I
    - common.NexusReader:
        filename: data.nxs
        nxpath: waxs_radial/data/I
        name: waxs_radial_I
    # Step 2: restructure signal arrays in a single shared NXdata object
    - saxswaxs.UnstructuredToStructuredProcessor:
        name: structured_data
        fields:
        - name: samx
          type: axis
        - name: samz
          type: axis
        - name: waxs_azimuthal_q
          type: axis
        - name: waxs_radial_chi
          type: axis
        - name: waxs_azimuthal_I
          axes: [samx, samz, waxs_azimuthal_q]
          type: signal
        - name: waxs_radial_I
          axes: [samx, samz, waxs_radial_chi]
          type: signal
    # Step 3: Write the structured data to a new group in the original NeXus file
    - common.NexusWriter:
        filename: data.nxs
        nxpath: /structured
        force_overwrite: true
    

Optimizing performance

Guide on selecting appropriate values for dataset_chunks and running multiple update jobs in parallel

Data in / output formats

CHESS data in, .zarr data out

Notes on corrections calculations

There are currently three convenience tools available for performing corrections: saxswaxs.FluxCorrectionProcessor, saxswaxs.FluxAbsorptionCorrectionProcessor, and saxswaxs.FluxAbsorptionBackrgroundCorrectionProcessor. The exact calculations that each ones performs are detailed below.

Definitions

  • \(I_{uncorrected}\) is the uncorrected, integrated dataset. When reading this data as input for a saxswaxs.*CorrectionProcessor, use name: intensity.

  • \(I_{incident}\) is the presample intensity dataset. When reading this data as input for a saxswaxs.*CorrectionProcessor, use name: presample_intensity.

  • \(I_{transmitted}\) is the postsample intensity dataset. When reading this data as input for a saxswaxs.*CorrectionProcessor, use name: postsample_intensity.

  • \(t_{dwell}\) is the scan’s dwell time at each point in the map configuration to which the correction tool is applied. When reading this data as input for a saxswaxs.*CorrectionProcessor, use name: dwell_time_actual.

  • \(I_{background}\) is the background integrated detector data. When reading this data as input for a saxswaxs.*CorrectionProcessor, use name: background_intensity.

  • \(I_{incident, background}\) is the background presample intensity. When reading this data as input for a saxswaxs.*CorrectionProcessor, use name: background_presample_intensity.

  • \(I_{transmitted, background}\) is the background postsample intensity. When reading this data as input for a saxswaxs.*CorrectionProcessor, use name: background_postsample_intensity.

  • \(\phi_{reference}\) is presample_intensity_reference_rate in the correction tool’s own parameters. However, if presample_intensity_reference_rate was not given in the tool’s configuration, a value for this quantity will be calculated with:

\[ \phi_{reference} = \frac{\overline{I_{incident}}}{\overline{t_{dwell}}} \]
  • \(T = \frac{I_{transmitted}}{I_{incident}} / \frac{I_{transmitted, background}}{I_{incident, background}}\)

  • \(t\) is the sample thickness in \(cm\). This quantity only appears in saxswaxs.FluxAbsorptionBackgroundCorrectionProcessor. The value of this parameter can be set in one of two ways:

    1. Use the sample_thickness_cm parameter of saxswaxs.FluxAbsorptionBackgroundCorrectionProcessor, OR

    2. Use the sample_mu_inv_cm (”\(\mu\)”) parameter of saxswaxs.FluxAbsorptionBackgroundCorrectionProcessor. \(\mu\) is known as the the linear attenuation coefficient of the sample, and is related to the mass attenuation coefficient, \((\mu/\rho)\) [cm\(^2\)/g], by the sample density. Tabulated values of the \((\mu/\rho)(E)\) for each element are available here: https://physics.nist.gov/PhysRefData/XrayMassCoef/tab3.html. For our purposes:

\[ t = \frac{-\ln{T}}{\mu} \]
  • NB: When using saxswaxs.FluxAbsorptionBackgroundCorrectionProcessor, do not use both the sample_thickness_cm and sample_mu_inv_cm parameters at the same time. Specifying both parameters makes the definition of the sample thickness ambiguous. The Processor will raise an error if both parameters are supplied.

  • \(C_f\) is the scalar factor for putting flux, absorption, background, and thickness corrected data into absolute intensity units. Taken directly from the absolute_intensity_scalar parameter from the correction tool config file.

saxswaxs.FluxCorrectionProcessor

\[ I_{corrected} = I_{uncorrected} * \frac{\phi_{reference}}{I_{incident}} \]

saxswaxs.FluxAbsorptionCorrectionProcessor

\[ I_{corrected} = \frac{1}{T} * I_{uncorrected} * \frac{\phi_{reference}}{I_{incident}} \]

saxswaxs.FluxAbsoprtionBackgroundCorrectionProcessor

This tool functions differently depending on what parameters are provided:

  • If neither sample_thickness_cm or sample_mu_inv_cm are provided:

\[ I_{corrected} = (I_{uncorrected} * \frac{1}{T} * \frac{\phi_{reference}}{I_{incident}}) - (I_{background} * \frac{\phi_{reference}}{I_{incident, background}}) \]
  • If either sample_thickness_cm or sample_mu_inv_cm are provided:

\[ I_{corrected} = \frac{1}{t} * [(I_{uncorrected} * \frac{1}{T} * \frac{\phi_{reference}}{I_{incident}}) - (I_{background} * \frac{\phi_{reference}}{I_{incident, background}})] \]

Running a workflow

At CHESS

Use environment: source /nfs/chess/sw/miniforge3_chap/bin/activate; conda activate CHAP_saxswaxs And/or add this to your ~/.bashrc: alias CHAP_saxswaxs='/nfs/chess/sw/miniforge3_chap/envs/CHAP_saxswaxs/bin/CHAP'

Batch jobs

Drop in & replace: PyfaiIntegrationProcessorConfig idx_slice on update jobs will need adjustments to drop in & replace: MapConfig Script for constructuing and / or sending off multiple update jobs in parallel