src.toolbox.utils.diagnostics#

A module for diagnostic plotting and data summarization.

Functions#

plot_time_series(data, x_var, y_var[, title, xlabel, ...])

Generates a time series plot for xarray data.

plot_histogram(data, var[, bins, title, xlabel])

Generates a histogram for a given variable in xarray data.

plot_boxplot(data, var[, title, xlabel])

Generates a box plot for a given variable in xarray data.

plot_correlation_matrix(data[, variables, title])

Generates a heatmap of the correlation matrix for xarray data.

generate_info(data)

Generate info for a given dataset

check_missing_values(data)

Check for missing values in the dataset.

summarising_profiles(→ pandas.DataFrame)

Summarise profiles from an xarray Dataset by computing medians of TIME, LATITUDE, LONGITUDE

find_closest_prof(→ pandas.DataFrame)

For each profile in df_a, find the closest profile in df_b based on time,

plot_distance_time_grid(summaries[, output_path, ...])

Plot a grid of distance-over-time plots for all glider pair combinations.

find_candidate_glider_pairs(→ pandas.DataFrame)

Vectorised version: match glider A profiles to glider B profiles within time and space thresholds.

plot_heatmap_glider_df(ax, matchup_df, time_bins, ...)

Plot cumulative 2D histogram of time/distance matchups for a glider pair on a given axis.

plot_glider_pair_heatmap_grid(summaries, time_bins, ...)

Generate an NxN grid of cumulative heatmaps for all glider pair combinations.

Module Contents#

src.toolbox.utils.diagnostics.plot_time_series(data, x_var, y_var, title='Time Series Plot', xlabel=None, ylabel=None, **kwargs)[source]#

Generates a time series plot for xarray data.

src.toolbox.utils.diagnostics.plot_histogram(data, var, bins=30, title='Histogram', xlabel=None, **kwargs)[source]#

Generates a histogram for a given variable in xarray data.

src.toolbox.utils.diagnostics.plot_boxplot(data, var, title='Box Plot', xlabel=None, **kwargs)[source]#

Generates a box plot for a given variable in xarray data.

src.toolbox.utils.diagnostics.plot_correlation_matrix(data, variables=None, title='Correlation Matrix', **kwargs)[source]#

Generates a heatmap of the correlation matrix for xarray data.

src.toolbox.utils.diagnostics.generate_info(data)[source]#

Generate info for a given dataset

src.toolbox.utils.diagnostics.check_missing_values(data)[source]#

Check for missing values in the dataset.

src.toolbox.utils.diagnostics.summarising_profiles(ds: xarray.Dataset, source_name: str) pandas.DataFrame[source]#

Summarise profiles from an xarray Dataset by computing medians of TIME, LATITUDE, LONGITUDE grouped by PROFILE_NUMBER. Handles datetime median safely using pandas.

Parameters:
  • ds (xr.Dataset) – Input dataset with PROFILE_NUMBER as a coordinate.

  • source_name (str) – Name of the glider/source to include in output.

Returns:

Profile-level summary DataFrame.

Return type:

pd.DataFrame

src.toolbox.utils.diagnostics.find_closest_prof(df_a: pandas.DataFrame, df_b: pandas.DataFrame) pandas.DataFrame[source]#

For each profile in df_a, find the closest profile in df_b based on time, and calculate spatial distance to it.

Parameters:
  • df_a (pd.DataFrame) – Summary dataframe for glider A (reference).

  • df_b (pd.DataFrame) – Summary dataframe for glider B (comparison).

Returns:

df_a with additional columns:
  • closest_glider_b_profile

  • glider_b_time_diff

  • glider_b_distance_km

Return type:

pd.DataFrame

src.toolbox.utils.diagnostics.plot_distance_time_grid(summaries: Dict[str, pandas.DataFrame], output_path: str = None, show: bool = True, figsize: tuple = (16, 16))[source]#

Plot a grid of distance-over-time plots for all glider pair combinations.

Parameters:
  • summaries (dict) – Dictionary of {glider_name: pd.DataFrame} from summarising_profiles().

  • output_path (str, optional) – If provided, the grid will be saved to this path.

  • show (bool) – If True, plt.show() will be called.

  • figsize (tuple) – Size of the full figure.

src.toolbox.utils.diagnostics.find_candidate_glider_pairs(df_a: pandas.DataFrame, df_b: pandas.DataFrame, glider_a_name: str, glider_b_name: str, time_thresh_hr: float = 2.0, dist_thresh_km: float = 5.0) pandas.DataFrame[source]#

Vectorised version: match glider A profiles to glider B profiles within time and space thresholds. Returns one match per glider A profile (closest B match within threshold).

src.toolbox.utils.diagnostics.plot_heatmap_glider_df(ax, matchup_df: pandas.DataFrame, time_bins: numpy.ndarray, dist_bins: numpy.ndarray, glider_a_name: str, glider_b_name: str, i: int, j: int, grid_size: int)[source]#

Plot cumulative 2D histogram of time/distance matchups for a glider pair on a given axis.

src.toolbox.utils.diagnostics.plot_glider_pair_heatmap_grid(summaries: Dict[str, pandas.DataFrame], time_bins: numpy.ndarray, dist_bins: numpy.ndarray, output_path: str | None = None, show: bool = True, figsize: tuple = (16, 16))[source]#

Generate an NxN grid of cumulative heatmaps for all glider pair combinations.