src.toolbox.pipeline_manager#

Classes#

PipelineManager

A class enabling the execution of multiple pipelines in sequence.

Module Contents#

class src.toolbox.pipeline_manager.PipelineManager[source]#

Bases: toolbox.utils.config_mirror.ConfigMirrorMixin

A class enabling the execution of multiple pipelines in sequence.

pipelines[source]#
alignment_map[source]#
settings[source]#
load_mission_control(config_path, mirror_keys=None)[source]#

Load MissionControl YAML into private self._parameters. - Builds pipelines - Builds alignment_map - Mirrors selected keys as attributes (e.g., ‘settings’)

add_pipeline(name, config_path)[source]#

Add a single pipeline with a unique name.

save_manager_config(path: str)[source]#

Save MissionControl/Manager config from self._parameters.

save_pipeline_configs(out_dir: str, filename='{name}.yaml')[source]#

Ask each Pipeline to write its private config to YAML. The pipeline file content comes from pipeline._parameters (including its steps).

save_all_configs(manager_path: str, pipelines_dir: str, pipeline_filename='{name}.yaml')[source]#

Convenience: save manager config and all pipeline configs.

run_all()[source]#

Run all registered pipelines and cache the resulting contexts.

get_contexts()[source]#

Retrieve the context dictionary from each pipeline.

load_data(filepath, platform_name)[source]#
summarise_all_profiles() pandas.DataFrame[source]#

For all pipelines, summarise profiles and plot glider-to-glider distance time series. This includes:

  • Computing median TIME, LATITUDE, LONGITUDE per profile

  • Matching each profile to its closest in time from another source

  • Plotting a distance grid comparing all gliders

Returns:

Concatenated summary of all glider profiles, with closest match info appended.

Return type:

pd.DataFrame

preview_alignment(target='None')[source]#

Align all datasets to a target dataset and compute R² against ancillary sources.

This version: - Renames each pipeline’s variables to the standard names (from alignment_map) - Runs interpolate + aggregate ONCE per pipeline and caches the results - Uses the cached medians for pairing/merging/R² - Populates exportable handles for raw/processed/lite data

fit_and_save_to_target(target, out_dir=None, variable_r2_criteria=None, max_time_hr=None, max_dist_km=None, ancillaries=None, overwrite=False, show_plots=True)[source]#

Fit ancillary variables to target datasets using profile-pair medians and per-variable R² criteria.

validate_with_device(target='None', **overrides)[source]#

Run the validation workflow using settings[‘validation’]. Optionally pass keyword overrides (e.g., show_plots=False) for this call only.

Examples:

mngr.validate_with_device(“Doombar”) mngr.validate_with_device(“Doombar”, show_plots=False, apply_and_save=True)

fit_to_device(target='None')[source]#

Fit TARGET variables to a validation device using profile-pair medians and per-variable R² criteria. The mapping is fit as: device = slope * target + intercept, then applied to the FULL target dataset to create new variables {VAR}_ALIGNED_TO_{DEVICE}.

Reads options from self.settings[‘validation’]: validation:

device_name: “<device label>” variable_names: [“CNDC”,”TEMP”, …] # optional; defaults to alignment_map keys variable_r2_criteria: {CNDC: 0.95, TEMP: 0.9, …} max_time_threshold: <float> max_distance_threshold: <float> save_plots: <bool> show_plots: <bool> plot_output_path: “<file or dir>” apply_and_save: <bool> output_path: “<dir or empty for timestamped dir>”

Returns:

  • dict with

  • - “path” (output NetCDF (if saved))

  • - “fits” ({var: {slope, intercept, r2, n}, …})

  • - “device_name” (device label used)

apply_adjustment(target, fit_params)[source]#
save(dir, raw=True, processed=True)[source]#