Developer Guide¶
This page explains how to add a new step to the Toolbox (e.g., a new processing stage, validation routine, or export).
What is a “step”?¶
A step is a stage in the pipeline that can be defined via Python, and configured via the Pipelines Config file. Examples of steps include:
I/O Steps (e.g., reading from a load_data.py, containing
Load OG1, orexport.py, containingData Export)Variable Processing Steps (e.g., salinity.py, containing
QC: SalinityandADJ: Salinity)Data Processing Steps (e.g., derive_ctd.py, containing
Find Profiles)
Steps are not limited to one per file - in fact, a single file can contain multiple steps. For example, the salinity.py file contains both a QC and an ADJ step.
How to add a new step¶
Create a new Python file in the appropriate directory under
src/toolbox/steps/custom/.
NOTE: if you are creating a step for specific vairables - such as a salinity QC step - then it should go in thevariablessubdirectory.Define a new class for your step, inheriting from
BaseStep(or another appropriate base class, such asVariableSteporDataStep).from toolbox.steps.base import BaseStep class MyNewStep(BaseStep): ...
Define the step_name attribute, which is the name that will be used in the Pipelines Config file to refer to this step.
from toolbox.steps.base import BaseStep class MyNewStep(BaseStep): step_name = "My New Step" ...
Register the step using the
@register_stepdecorator.from toolbox.steps import BaseStep, register_step @register_step class MyNewStep(BaseStep): step_name = "My New Step" ...
This ensures that the step is discoverable by the Pipeline Manager, as well as allowing you do define other classes in the same file without registering them.
Implement the
runmethod, which contains the logic for your step. This method should take no arguments other thanself, and should return aself.contextobject.from toolbox.steps.base import BaseStep, register_step @register_step class MyNewStep(BaseStep): step_name = "My New Step" def run(self): # Your processing logic here return self.context
Optionally, implement the
generate_diagnosticsmethod if your step produces any diagnostic plots or outputs.from toolbox.steps.base import BaseStep, register_step @register_step class MyNewStep(BaseStep): step_name = "My New Step" def run(self): # Your processing logic here return self.context def generate_diagnostics(self): # Your diagnostics logic here pass
There are already default methods for generating common diagnostics, such as time series plots and scatter plots. See the utils.diagnostics documentation for more information.
Add the step to your Pipelines Config file, using the
step_nameyou defined in step 3.# Pipeline Configuration pipeline: name: "My Pipeline" description: "A pipeline for demonstration purposes" # Steps in the pipeline steps: - name: "My New Step" parameters: param1: value1 param2: value2
Any parameters defined in the
parameterssection of the config file will be passed to your step as attributes. You can access them in yourrunmethod usingself.param1,self.param2, etc.
NOTE This is handled automatically by theBaseStepclass. More information can be found in the BaseStep documentation.