src.toolbox.utils.alignment#
Functions#
|
Interpolate missing DEPTH values per PROFILE_NUMBER, normalize depth sign, |
|
Compute medians per (PROFILE_NUMBER, DEPTH_bin) and return a NEW dataset |
|
Filter an xarray.Dataset to keep only rows / entries with profile IDs in valid_ids. |
|
Identify profile pairs between a target and ancillary glider within time/distance thresholds. |
|
Build a dataset with one row per (target_profile, ancillary_profile, depth_bin). |
|
Compute R² (coefficient of determination) for Major Axis (Type II) regression using xarray. |
|
Compute R² for each profile-pair in an xarray.Dataset, and append the results directly to the dataset. |
|
Create ONE heatmap per ancillary pairing showing counts of unique ancillary profiles |
|
From an R² dataset (one ancillary), collect all (x=ancillary, y=target) binned points |
|
Fit y = slope * x + intercept using sklearn LinearRegression. |
|
Grid of scatter plots: rows = ancillary, cols = variables. |
|
Apply y = slope * da + intercept to an xarray.DataArray. |
Module Contents#
- src.toolbox.utils.alignment.interpolate_DEPTH(ds: xarray.Dataset, bin_size: int = 5, depth_positive: str = 'down', plot: bool = False, plot_kind: str = 'hist', max_profiles_heatmap: int = 50, figsize=(9, 4), save_path: str | None = None, show: bool = True) xarray.Dataset[source]#
Interpolate missing DEPTH values per PROFILE_NUMBER, normalize depth sign, build regular depth bins, and (optionally) visualize the bin coverage.
- Parameters:
ds (xr.Dataset) – Must contain ‘DEPTH’, ‘TIME’, ‘PROFILE_NUMBER’ on the ‘N_MEASUREMENTS’ axis.
bin_size (int) – Depth bin size in the same units as DEPTH (e.g., meters).
depth_positive ({"down","up"}) – Normalize depth sign before binning. “down” => positive increasing with depth.
plot (bool) – If True, draw a quick visualization of the resulting depth bins.
plot_kind ({"hist","heatmap"}) –
“hist”: overall histogram of measurement counts per bin (barh).
”heatmap”: per-profile vs bin coverage (first max_profiles_heatmap profiles).
max_profiles_heatmap (int) – Max number of profiles to include in the heatmap (to avoid huge plots).
figsize (tuple) – Matplotlib figure size for the plot.
save_path (str or None) – If provided, save the plot to this path.
show (bool) – If True, plt.show() the figure; otherwise close it.
- Returns:
- With:
DEPTH_interpolated (float64, N_MEASUREMENTS)
DEPTH_bin (float32, N_MEASUREMENTS)
DEPTH_range (str, N_MEASUREMENTS) e.g. “50–55”
- Return type:
xr.Dataset
- src.toolbox.utils.alignment.aggregate_vars(ds: xarray.Dataset, vars_to_aggregate, profile_dim='PROFILE_NUMBER', bin_dim='DEPTH_bin') xarray.Dataset[source]#
Compute medians per (PROFILE_NUMBER, DEPTH_bin) and return a NEW dataset with variables named ‘median_{var}’, each shaped (PROFILE_NUMBER, DEPTH_bin).
Notes: - This does NOT attach medians back to the raw dataset (which would broadcast
onto N_MEASUREMENTS). Use the returned dataset for downstream steps.
Requires that both PROFILE_NUMBER and DEPTH_bin are coordinates aligned to the measurement dimension (typically ‘N_MEASUREMENTS’).
- src.toolbox.utils.alignment.filter_xarray_by_profile_ids(ds: xarray.Dataset, profile_id_var: str, valid_ids: numpy.ndarray | list) xarray.Dataset[source]#
Filter an xarray.Dataset to keep only rows / entries with profile IDs in valid_ids. Works for both raw (with N_MEASUREMENTS) and aggregated (PROFILE_NUMBER as a dim) datasets.
- src.toolbox.utils.alignment.find_profile_pair_metadata(df_target: pandas.DataFrame, df_ancillary: pandas.DataFrame, target_name: str, ancillary_name: str, time_thresh_hr: float = 2.0, dist_thresh_km: float = 5.0) pandas.DataFrame[source]#
Identify profile pairs between a target and ancillary glider within time/distance thresholds.
- Parameters:
df_target (pd.DataFrame) – Summary dataframe for the target glider (from summarising_profiles()).
df_ancillary (pd.DataFrame) – Summary dataframe for the ancillary glider.
target_name (str) – Name of the target glider (used in output column names).
ancillary_name (str) – Name of the ancillary glider (used in output column names).
- Returns:
Matched profile pairs with columns: - [target_name]_PROFILE_NUMBER - [ancillary_name]_PROFILE_NUMBER - time_diff_hr - dist_km
- Return type:
pd.DataFrame
- src.toolbox.utils.alignment.merge_pairs_from_filtered_aggregates(paired_df, agg_target: xarray.Dataset, agg_anc: xarray.Dataset, target_name: str, ancillary_name: str, variables, bin_dim: str = 'DEPTH_bin', pair_dim: str = 'PAIR_INDEX') xarray.Dataset[source]#
Build a dataset with one row per (target_profile, ancillary_profile, depth_bin). Each row has: target/ancillary profile IDs, time/distance diffs, and median_{var} for both sides.
- src.toolbox.utils.alignment.major_axis_r2_xr(x: xarray.DataArray, y: xarray.DataArray) float[source]#
Compute R² (coefficient of determination) for Major Axis (Type II) regression using xarray.
- Parameters:
x (xr.DataArray) – First variable (e.g., target glider).
y (xr.DataArray) – Second variable (e.g., ancillary glider).
- Returns:
R² value, or NaN if fewer than 2 valid observations.
- Return type:
- src.toolbox.utils.alignment.compute_r2_for_merged_profiles_xr(ds: xarray.Dataset, variables: list[str], target_name: str, ancillary_name: str) xarray.Dataset[source]#
Compute R² for each profile-pair in an xarray.Dataset, and append the results directly to the dataset.
- Parameters:
ds (xr.Dataset) – Dataset with one row per (PAIR_INDEX, DEPTH_bin), and aligned variables for target and ancillary gliders.
variables (list of str) – List of variable base names (e.g., [“salinity”, “temperature”]).
target_name (str) – Name of the target glider (used in suffix suffix: _TARGET_{name}).
ancillary_name (str) – Name of the ancillary glider (used in suffix: _{name}).
- Returns:
Same dataset with new variables: r2_{var}_{ancillary_name}, one per profile pair. These will be aligned with the “PAIR_INDEX” dimension only.
- Return type:
xr.Dataset
- src.toolbox.utils.alignment.plot_r2_heatmaps_per_pair(r2_datasets, variables, target_name=None, r2_thresholds=[0.99, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7], time_thresh_hr=None, dist_thresh_km=None, figsize=(9, 6), save_plots=False, output_path=None, show_plots=True)[source]#
Create ONE heatmap per ancillary pairing showing counts of unique ancillary profiles that meet R² thresholds for each variable. Optionally filter by time/dist thresholds.
- src.toolbox.utils.alignment.collect_xy_from_r2_ds(r2_ds, var, target_name, ancillary_name, r2_min=None, time_max=None, dist_max=None)[source]#
From an R² dataset (one ancillary), collect all (x=ancillary, y=target) binned points for a given variable across selected profile pairs, flattening PAIR_INDEX×DEPTH_bin.
Returns: x, y (1D np.ndarrays)
- src.toolbox.utils.alignment.fit_linear_map(x, y)[source]#
Fit y = slope * x + intercept using sklearn LinearRegression. Returns dict with slope, intercept, R² (model score), and n.
- src.toolbox.utils.alignment.plot_pair_scatter_grid(r2_datasets, variables, target_name, variable_r2_criteria=None, max_time_hr=None, max_dist_km=None, ancillaries_order=None, figsize=None, point_alpha=0.6, point_size=8, equal_axes=True)[source]#
Grid of scatter plots: rows = ancillary, cols = variables. Each panel: ancillary median_{var} vs target median_{var}, all depth bins from pairs that pass R²/time/distance filters. Plots 1:1 and fitted line.