casys.computation

Main diagnostic specification and computation classes.

Functions

`check_periodogram_params`(*, nbr_periods[, ...])	Check the periodogram diagnostic parameters.
`compute_individual_psd`(sp_type, sp_kwargs, ...)	Individual PSD computation.
`normalize_pixels`(*[, pixels])	Check and normalize provided pixels set.
`normalize_spectral_params`(*[, spectral_conf])	Normalize the spectral kwargs dictionary.
`normalize_stats`(stats[, default])	Normalize the provided statistics list.
`psd_segments_reduction`(*, wn, ...)	Reduce computed PSD across segments.

Classes

`CommonData`(*, source[, source_type, ...])	Common data container allowing to access a source of data, define and compute diagnostics.
`NadirData`(source, *[, source_type, ...])	NadirData is a data container allowing to access a source of data and define then compute diagnostics.
`SwathData`(source, *[, source_type, ...])	SwathData is a data container allowing to access a source of data and define then compute diagnostics.

Bases: CommonData

NadirData is a data container allowing to access a source of data and define then compute diagnostics.

Parameters:

source – Input source (name of the table if using OCTANT storage).
date_start – Starting date of the period of interest.
date_end – Ending date of the period of interest.
select_clip – Selection clip allowing to work on a subset of the source’s data.
select_shape – Shape file, GeoDataFrame or Geometry on which to limit source’s data.
orf – Path or name of the orf.
reference_track –
Setting this parameter enables source’s data interpolation on this reference track. Every diagnostic is then computed using these interpolated data.

File path or data of the reference track on which to interpolate read data. A list of existing theoretical reference tracks can be shown using the show_theoretical_tracks method:
```
>>> CommonData.show_theoretical_tracks()
```
Standard along track data (orbits) can be provided as well.

This parameter can be provided as a dictionary containing ‘data’, ‘path’ and ‘coordinates’ keys.
time – The time field. (if not provided, default is “time” field)
latitude – The latitude field. (if not provided, default is “LATITUDE” field)
longitude – The longitude field. (if not provided, default is “LONGITUDE” field)
cycle_number – Cycle number’s field. (if not provided, default is “CYCLE_NUMBER” field)
pass_number – Pass number’s field. (if not provided, default is “PASS_NUMBER” field)
diag_overwrite –
Define the behavior when adding a diagnostic with an already used name:
- [default] False: raise an error
- True: remove the old diagnostic and add the new one
time_extension – Whether to allow the extension of user defined time interval for specific diagnostic requirements or not.
source_type – Input source type.

KNOWN_PARAMETERS: dict[str, tuple[Any, Any]] = {'diag_overwrite': ('bool', False, _ParameterKind.KEYWORD_ONLY), 'source': ('CasysReader', <class 'inspect._empty'>, _ParameterKind.POSITIONAL_OR_KEYWORD), 'source_type': ('Optional[str]', None, _ParameterKind.KEYWORD_ONLY), 'time_extension': ('bool', False, _ParameterKind.KEYWORD_ONLY)}

add_binned_stat(name, field, x, res_x, stats=None, stat_selection=None)

Add a binned diagnostic computing requested statistics inside bands defined by values of the x parameter according to its resolution.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field on which to compute statistics.
x (Field) – Field to use for the x-axis.
stats (list[StatType | str] | str | None) – List of statistics to compute (count, max, mean, median, min, std, var, mad)
res_x (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the x-axis.
stat_selection (str | None) –
Selection clip used to invalidate (set to NaN) some bins. Valid conditions are:
- count
- min
- max
- mean
- median
- std
- var
- mad
These clips are Python vector clips. Examples:
- count :>= 10 && max :< 100
- min :> 3
- median :> 10 && mean :> 9

add_binned_stat_2d(name, field, x, res_x, y, res_y, stats=None, stat_selection=None)

Add a 2D binned diagnostic computing requested statistics inside boxes defined by values of the x and y parameter according their respective resolutions.

Binned 2d data and plots can be accessed or created using special keywords:

plot=”box” (default): color mesh representation, on an x-axis/y-axis grid.

plot=”curve”:

axis=”x”: along x-axis representation of each y-field bin

axis=”y”: along y-axis representation of each x-field bin

plot=”3d”: 3d color mesh representation, on an x-axis/y-axis/z-axis 3d grid.

plot=”box3d”: 3d bins surfaces representation, on an x-axis/y-axis/z-axis 3d grid.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field on which to compute statistics.
x (Field) – Field to use for the x-axis.
y (Field) – Field to use for the y-axis.
res_x (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the x-axis.
res_y (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the y-axis.
stats (list[StatType | str] | None) – List of statistics to compute (count, max, mean, median, min, std, var, mad)
stat_selection (str | None) –
Selection clip used to invalidate some bins. Valid conditions are:
- count
- min
- max
- mean
- median
- std
- var
- mad
These clips are Python vector clips. Examples:
- count :>= 10 && max :< 100
- min :> 3
- median :> 10 && mean :> 9

add_crossover_stat(name, field, max_time_difference, data=None, interp_mode='linear', spline_half_window_size=3, jump_threshold=2.0, stats=None, res_lon=(-180, 180, 4), res_lat=(-90, 90, 4), box_selection=None, geobox_stats=None, temporal_stats_freq=None, temporal_stats=None, temporal_freq_kwargs=None, computation_split=None, **kwargs)

Add the computation of the difference between the ascending and descending arc values at crossovers points. Temporal statistics (by cycle or day) can be added to the computation.

Values and time delta are computed at each point and requested statistics for each geographical box. These data are accessible using the requested statistics name or ‘crossover’ and ‘value’ keywords for the time delta and the values at each crossover point. Time delta (accessible using the ‘crossover’ keyword) might contain more points than the actual field statistics if the field is not defined at some crossovers points.

Crossovers data and plots can be accessed or created using special keywords:

delta parameter: cartographic representation of the difference between the two arcs
- delta=”field”: difference of the field values
- delta=”time”: difference of the time values
stat parameter: geographical box or temporal statistic representation.
- stat=”…”: requested statistic
freq parameter: temporal statistic representation.
- freq=”…”: frequency of the requested statistic

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field for which to compute the statistic.
data (NadirData) – External data (NadirData) to compute crossovers with. This option is used to compute multi-missions crossovers.
max_time_difference (str) – Maximum delta of time between the two arcs as a string with its unit. Any string accepted by pandas.Timedelta is valid. i.e. ‘10 days’, ‘6 hours’, ‘1 min’, …
interp_mode (str) – Interpolation mode used to compute the field value at the crossover position. Any value accepted by the ‘kind’ option of scipy.interpolate.interp1d is valid. i.e. : ‘linear’ ‘nearest’ ‘previous’ ‘next’ ‘zero’ ‘slinear’ ‘quadratic’ or ‘cubic’ (includes interpolation splines). ‘smooth’ is also valid and uses a smoothing spline from scipy.interpolate. UnivariateSpline. A noise level in the signal may be specified in this form: ‘smooth[0.05]’ Then, the smoothing factor (s parameter of UnivariateSpline) is computed in such a way : smoothing_factor = noise_level^2 * number_of_points The s parameter roughly represents the distance between the spline and the points on the window. In particular, with s=0, we have an interpolation spline, which is not suitable for a noisy signal. ‘smooth’ alone uses the default value of the s parameter (not recommended). The ‘smooth’ interpolation mode requires at least three valid values on both sides of the intersection point.
spline_half_window_size (int) – Half window size of the spline.
jump_threshold (float) – This parameter sets the tolerance level of the jumps (holes) in the input data. By definition, a jump is detected between (consecutive) P1 and P2 if dist(P1, P2) > jump_threshold * MEDIAN where MEDIAN is the median of the distance between all consecutive points. For example, to avoid having crossovers inside holes of one measure or more, 1.9 is a suitable value.
stats (list[StatType | str] | None) – List of statistics to compute (count, max, mean, median, min, std, var, mad) for temporal and geobox statistics.
res_lon (Union[tuple[float, float, float], DataResolution, str, None]) – Minimum, maximum and box size over the longitude (Default: -180, 180, 4).
res_lat (Union[tuple[float, float, float], DataResolution, str, None]) – Minimum, maximum and box size over the latitude (Default: -90, 90, 4).
box_selection (Field | None) – Field used as selection for computation of the count statistic. Box in which the box_selection field does not contain any data will be set to NaN instead of 0.
geobox_stats (list[StatType | str] | None) – Statistics included in the geobox diagnostic.
temporal_stats_freq (list[str | FreqType | FrequencyHandler] | None) – List of temporal statistics frequencies to compute.
temporal_stats (list[StatType | str] | None) – Statistics included in the temporal diagnostic.
temporal_freq_kwargs (dict[str, Any]) – Additional parameters to pass to pandas.date_range underlying function.
computation_split (str | FreqType | FrequencyHandler | None) – Split frequency (day, pass, cycle or any pandas offset aliases) inside which crossovers will be computed. Providing None (default) will compute crossovers over the whole data.

Raises:

AltiDataError – If a data already exists with the provided name.

add_editing(name, field, components)

Add the computation of a new editing field. Editing components are applied one after the other in the given order.

After computation the generated field will be available as a raw data and can be used in other statistics.

Editing data and plots can be accessed or created using special keywords:

plot parameter:
- plot=”time” (default): graphical along time data
- plot=”map”: along track data
- plot=”editing”: data invalidated by each editing component
group parameter:
- group=”all” (default): data from all components
- group=”valid”: data limited to unedited values
- group=”name”: data invalidated by the provided component
- group=[“name1”, “name2”]: data invalidated by the provided components

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Resulting editing field.
components (list[EditingComponent]) – Ordered list of editing components.

add_geobox_stat(name, field, stats=None, res_lon=(-180, 180, 4), res_lat=(-90, 90, 4), box_selection=None, stat_selection=None, projection=None)

Add a geographical box diagnostics computing requested statistics for the provided field in each box defined by the res_lon and res_lat parameters.

Geobox stat data and plots can be accessed or created using special keywords:

plot=”box” (default): color mesh representation, on an x-axis/y-axis grid.

plot=”3d”: 3d color mesh representation, on an x-axis/y-axis/z-axis 3d grid.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field on which to compute.
stats (list[StatType | str] | None) – List of statistics to compute (count, max, mean, median, min, std, var, mad).
res_lon (Union[tuple[float, float, float], DataResolution, str, None]) – Minimum, maximum and box size over the longitude (Default: -180, 180, 4).
res_lat (Union[tuple[float, float, float], DataResolution, str, None]) – Minimum, maximum and box size over the latitude (Default: -90, 90, 4).
projection (Proj | str | None) – Projection in which to project longitude and latitude values before binning data.
box_selection (None | str | Field) – Field used as selection for computation of the count statistic. Box in which the box_selection field does not contain any data will be set to NaN instead of 0.
stat_selection (str | None) –
Selection clip used to invalidate (set to NaN) some bins. Valid conditions are:
- count
- min
- max
- mean
- median
- std
- var
- mad
These clips are Python vector clips. Examples:
- count :>= 10 && max :< 100
- min :> 3
- median :> 10 && mean :> 9

add_histogram(name, x, res_x='auto')

Add a histogram diagnostic for the provided field computing its values’ distribution according to the res_x parameter.

Parameters:

name (str) – Name of the diagnostic.
x (Field) – Field used for the x-axis.
res_x (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the x-axis, ‘auto’ (Default: ‘auto’). ‘auto’ will use the 2.5 percentile of values as min, the 97.5 percentile as maximum and make 40 groups in between.

add_missing_points_stat(name, reference_track, theoretical_orf=None, method='real', distance_threshold=4.0, time_gap_reference=None, time_gap_threshold=1.9, geobox_stats=False, res_lon=(-180, 180, 4), res_lat=(-90, 90, 4), temporal_stats_freq=None, temporal_freq_kwargs=None, section_min_lengths=None, group_names=None, group_grid=False, group_converter=None, **kwargs)

Add the computation of missing points.

Missing points data and plots can be accessed or created using special keywords:

plot parameter:
- plot=”map” (default): along track data
- plot=”temporal”: temporal statistics data
- plot=”geobox”: geobox statistics data
- plot=”section_analyses”: section analyses data
freq parameter (required for temporal):
- freq=”…”: frequency of the requested statistic
dtype parameter:
- dtype=”all”: data from missing and available points
- dtype=”missing” (default): data limited to missing points
- dtype=”available”: data limited to available points
group parameter:
- group=”global” (default): data from all groups
- group=”…”: data from the group defined in the group_names parameter.
sections analyses parameter (optional for section analysis plot):
- sections=”all” (default): data from all sections
- sections=1: data limited to section 1
- sections=[1, 2, 10, 14]: data limited to sections 1, 2, 10 and 14

Parameters:

name (str) – Name of the diagnostic.
reference_track (Union[str, dict[str, Any], Dataset, ReferenceTrack]) –
File path or data of the reference track. A list of existing theoretical reference tracks can be shown using the show_theoretical_tracks method:
```
>>> CommonData.show_theoretical_tracks()
```
Standard along track data (orbits) can be provided as well.

This parameter can be provided as a dictionary containing ‘data’, ‘path’ and ‘coordinates’ keys.
theoretical_orf (PassIndexer | str | None) – ORF to use to determine the real starting and ending dates of the passes. Using table’s ORF might not show missing points in the beginning and end of a track as well as points for completely missing tracks. This parameter might not be necessary if using fully defined tracks (not a theoretical track) as reference.
method (MissingPointsMethod | str) – Real: The real method will use the time difference between two real measurements to determine the missing points. Theoretical: The theoretical method will try to match each theoretical points to a real points and use it to determine missing points.
distance_threshold (float) – Distance threshold between real and theoretical points expressed as a factor of the distance between two consecutive theoretical points. An exception is raised if the threshold is exceeded.
time_gap_reference (str | timedelta64) – Standard time gap between two consecutive real points. Use the reference track time gap as default value. If provided as a string include the unit otherwise it will be considered as nanoseconds.
time_gap_threshold (float) – [real method parameter] Time threshold between two consecutive real points expressed as a factor of time_gap_reference to determine that a point is missing. The standard value is 2.0: if two consecutive real points have a time gap of 2.1 * time_gap_reference then a point is considered to be missing.
geobox_stats (bool) – Whether to compute geographical box statistics or not.
res_lon (Union[tuple[float, float, float], DataResolution, str, None]) – Minimum, maximum and box size over the longitude (Default: -180, 180, 4)
res_lat (Union[tuple[float, float, float], DataResolution, str, None]) – Minimum, maximum and box size over the latitude (Default: -90, 90, 4)
temporal_stats_freq (list[str | FreqType | FrequencyHandler]) – List of temporal statistics frequencies to compute.
temporal_freq_kwargs (dict[str, Any]) – Additional parameters to pass to pandas.date_range underlying function.
section_min_lengths (int | list[int] | None) – List of missing sections minimum length values. Setting this parameter enables the section analyses computation.
group_names (dict[str, int]) – Dictionary containing the name of the group associated to its flag value(s). Multiple values can be associated to a name. Example: {“land”: 0, “ocean”: 1, “ice”: 2, “iced_ocean”: [1, 2]}.
group_grid (bool | InGridParameters) –
- If True:
  - Define two groups {“land”: 0, “ocean”: 1} using a 1/30 of degree bathymetry grid file. This method makes the assumption bathymetry values lesser than 0 are ocean’s surface which is not 100% correct.
- If an InGridParameters:
  - Define groups using the provided parameters in association with the group_names parameter.
- If False or None:
  - Does not define groups.
group_converter (Callable[[ndarray], ndarray]) – Callable to apply to grid values to transform them into group values. This callable has to take a numpy array as single parameter and return a new one. If set to None, no conversion is made.

add_periodogram(name, base_diag, nbr_periods, stats=None, first_period=None, last_period=None, ref_period=None, diag_kwargs=None)

Add the computation of a periodogram on the results of a temporal diagnostic. This diagnostic must include the computation of the MEAN statistic.

(nbr_periods, first_period, last_period)basic use without ‘ref_period’.: 0 < first_period < last_period
(nbr_periods, first_period, ref_period)optimal ‘last_period’ calculated.: 0 < first_period < ref_period < last_period
(nbr_periods, ref_period, last_period)optimal ‘first_period’ calculated.: 0 < first_period < ref_period < last_period

Parameters:

name (str) – Name of the periodogram statistic.
base_diag (str) – Name of the base temporal diagnostic.
stats (list[StatType | str] | None) – Statistics result in the base_diag to compute the periodogram from. Default to the available statistics in the provided base diagnostic.
nbr_periods (int) – Number of periods.
first_period (timedelta64 | str | None) – First period interval (to be used with ‘last_period’ or ‘ref_period’).
last_period (timedelta64 | str | None) – Last period interval (to be used with ‘first_period’ or ‘ref_period’).
ref_period (timedelta64 | str | None) – Reference period interval (to be used with ‘first_period’ or ‘last_period’).
diag_kwargs (dict[str, Any] | None) –
Dictionary containing the necessary information to identify a temporal sub diagnostics (similar to the get_data method ones).
- Crossovers: “freq”, the frequency of the sub-diagnostic to process.
- Missing points:
  - ”freq”: the frequency of the sub-diagnostic to process,
  - ”group”: one of the missing points groups, defined in “mp_groups”,
  the add_missing_points_stat method parameters (ex: “GLOBAL”)
  - ”dtype”: “missing” or “available”.

add_ratio(name, stat_numerator, stat_denominator, total=True)

Add the computation of a ratio between two diagnostics. These two diagnostic must include the computation of the COUNT statistic.

Parameters:

name (str) – Name of the ratio statistic.
stat_numerator (str) – Name of statistic to compute the ratio of, in comparison to another statistic.
stat_denominator (str) – Name of the reference statistic.
total (bool | None) – Whether the denominator statistic is the total on which to compute the ratio or the complementary part.

add_raw_comparison(name, x, y, z=None)

Add a raw data comparison.

In the case of a 3d Raw comparison (z parameter provided), data and plots can be accessed or created using special keywords:

plot=”2d” (default): 2d scatter representation.

plot=”3d”: 3d scatter representation.

Parameters:

name (str) – Name of the diagnostic.
x (Field) – Field used for the x-axis.
y (Field) – Field used for the y-axis.
z (Field) – Field used for the z-axis. Optional

Raises:

AltiDataError – If a diagnostic already exists with the provided name.

add_raw_data(name, field)

Add a raw data diagnostic (used for along track plotting).

Raw data and plots can be accessed or created using special keywords:

plot=”time” (default): along time representation.

plot=”map”: Cartographic representation.

plot=”3d”: 3d scatter representation.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field for which to get raw data.

Raises:

AltiDataError – If a diagnostic already exists with the provided name.

add_scatter(name, x, res_x, y, res_y)

Add a scatter diagnostic computing the distribution of on field against the other according to their respective resolutions.

Parameters:

name (str) – Name of the diagnostic.
x (Field) – Field used for the x-axis.
y (Field) – Field used for the y-axis.
res_x (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the x-axis.
res_y (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the y-axis.

add_section_analysis(name, condition, min_length, fill_missing=False, max_percent_of_false=0)

Add the computation of a section analysis. Section analyses research portion of data matching the provided conditions.

Section analysis plots can be created using special keywords:

sections parameter:
- sections=”all” (default): data from all sections
- sections=1: data limited to section 1
- sections=[1, 2, 10, 14]: data limited to sections 1, 2, 10 and 14

Parameters:

name (str) – Name of the analysis.
condition (str) – Clip condition determining the section.
min_length (int) – Minimum number of point required to accept the section.
fill_missing (bool) – Whether to fill missing values with False values (require parameter max_percent_of_false to be greater than 0 to have an effect on the result)
max_percent_of_false (float) – Maximum percent of False value accepted in the section.

add_spectral_analysis(name, field, segment_length, holes_max_length=None, global_resampling=False, delta_t=None, noise_amplitude=None, insulation_level=0.75, last_segment_overlapping=0, max_time_dispersion=5, max_resampling=0.25, segments_nb_delta_t=1, segments_nb_delta_x=1, spectral_conf=None, segments_reduction=None, res_segments=False, res_individual_psd=False)

Add the computation of a spectral analysis diagnostic.

Spectral analysis data and plots can be accessed or created using special keywords:

plot=”psd” (default): Power spectral density along the wave number,

plot=”segments”: Cartographic representation of the selected segments.

In the plot=”psd” case, a “spectral_name” parameter must be provided to specify the required spectral analysis. If not, data of the first spectral analysis will be returned.

The segments_reduction parameter also needs to be provided if more than one reduction was requested or if computed using dask:

segments_reduction=”mean”

segments_reduction=”median”

Additional plotting options are available to the “psd” plot type:

individual: setting it to True (Default: False) display the set of psd on each segments instead of the average psd,

n_bins_psd: integer determining the number of bins along the psd values axis for the individual=True case (Default: 100),

second_axis: flag allowing the display of the second x-axis, for the segment length values equivalent to the wave number.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field on which to compute the analysis.
segment_length (int) – Length of a segment (section) in number of points. It should be something like a few hundred points. (Example: 500 units)
holes_max_length (int) – Maximum length of a hole. It should be something like a few points. (Example: 5, Default: 1% of the segment_length parameter value)
global_resampling (bool) – Resampling Flag (Default: False). True - If one section requires to be resampled => resample all sections. False - Resample only sections requiring a resampling.
delta_t (timedelta64 | str) – Time gap between two measurements.
noise_amplitude (float) – Noise amplitude in data. Default to half of the data standard deviation.
insulation_level (float) – Minimum valid values percentage on both sides of the hole (Default: 0.75). Left and right sides are equal to hole length.
last_segment_overlapping (float) – Percentage of overlap for second-to-last segment (Default: 0.5). When the section is divided in equal segments, the last segment might be too short, so it will take some part of data (amount depending on this parameter) from the previous segment.
max_time_dispersion (int) – Maximum allowed percentage of dispersion for delta_t (Default: 5). If delta_t dispersion exceed this threshold, a warning will be displayed.
max_resampling (float) – Maximum resampled data percentage (Default: 0.25). A warning will be displayed if this threshold is exceeded. The resampling of a large amount of data can have a great impact on the final result.
segments_nb_delta_t (int) – Number of segments used to compute the average time gap between two measurements, during the segments extraction process (Default: 1).
segments_nb_delta_x (int) – Number of segments used to compute the average distance between two measurements, during the segments extraction process (Default: 1).
spectral_conf (dict[str, dict[str | SpectralType, Any]]) – Dictionary of the spectral parameters to use for the spectral curve types. Each key represents a spectral analysis name is associated with a dictionary containing the parameters. This dictionary must contain at least the “spectral_type” key and value. (Default: dictionary containing the default “periodogram” parameters: {“periodogram”: {“spectral_type”: “periodogram”, “window”: “hann”, “detrend”: “linear”, …}}).
segments_reduction (list[StatType | str] | StatType | str | None) – List of statistic types used to reduce the spectral data across segments (Default: mean).
res_segments (bool) – Flag indicating whether to save the segments data in the spectral analysis result (Default: False). True - Saving segments data. False - Not saving segments data.
res_individual_psd (bool) – Flag indicating whether to save the individual power spectrum data on each segments in the spectral analysis result (Default: False). True - Saving the individual psd data. False - Not saving the individual psd data.

add_time_stat(name, field, freq, stats=None, stat_selection=None, freq_kwargs=None, **kwargs)

Add an along time diagnostic computing requested statistics at the provided frequency.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field on which to compute statistics.
freq (str | FreqType | FrequencyHandler) – Frequency (day, pass, cycle or any pandas offset aliases [1]_)
stats (list[StatType | str] | str | None) – List of statistics to compute (count, max, mean, median, min, std, var, mad)
stat_selection (str | None) –
Selection clip used to invalidate (set to NaN) some bins. Valid conditions are:
- count
- min
- max
- mean
- median
- std
- var
- mad
These clips are Python vector clips. Examples:
- count :>= 10 && max :< 100
- min :> 3
- median :> 10 && mean :> 9
freq_kwargs (dict[str, Any]) – Additional parameters to pass to pandas.date_range underlying function.

References

property analyse_date: datetime64: Date at which diagnostics will be stored.

clear(name=None)

Clear all or the specified diagnostic from this AltiData.

Parameters:: name (str | None) – Name of the data to remove (Not providing a name will remove all data).

compute(stats=None)

Read data and compute statistics. Limited to the provided list of data if specified.

Parameters:: stats (str | list[str] | None) – Name of the data to limit the computation to.

compute_dask(stats=None, freq=None, jobs_number=None, bar=None, dask_client=None, **kwargs)

Read data and compute statistics. Limited to the provided list of data if specified.

Parameters:

stats (str | list[str] | None) – Name of the data to limit the computation to.
freq (str | FreqType | FrequencyHandler | None) – Minimal split frequency (day, pass, cycle or any pandas offset aliases [1]_) to respect.
jobs_number (int) – Number of jobs to create (Default to the maximum possible number of periods the data can be split into according to the provided frequency).
bar (bool | None) – [Does not work on xarray datasets] Whether to display a progress bar or not. If None, will display if logging level <= INFO.
dask_client (Client | str | None) –
Dask client on which to submit jobs.
- if a client, use this client
- if a scheduler_file, connect a client to it
- try to get an existing dask client in the environment
- create a local cluster and connect to it
kwargs – Additional parameters determine the time splitting characteristics. This parameter works with pandas.date_range [2]_ frequencies.

References

compute_from_store(diags=None, delayed=True)

Compute diagnostics using stored data.

Parameters:

diags (str | list[str] | None) – List of the diagnostics names to compute.
delayed (bool) – Flag indicating whether to load all data from store at once or when the data is actually requested (plot/get_data).

property data: Dataset

Change raw data used to compute diagnostics.

Parameters:: data – New data to use.

property date_end: DateHandler | None: date_end.

property date_start: DateHandler | None: date_start.

static disable_loginfo(): Set the logging level to WARNING.

static enable_loginfo(): Set the logging level to INFO.

property fields: dict[str, Field]

Returns the dictionary of existing fields in the source.

Returns:: List of existing fields as Field objects.

get_data(name, **kwargs)

Returns a computed statistic or raw data (with its time, latitude and longitude) as a Dataset.

Parameters:

name (str) – Name of the data to get.
kwargs –
Additional parameters required to get the data. Those parameters are described in the add_diagnostic method documentation. Some frequent parameters are “stat” and “plot”. Other parameters are more specific to some diagnostic, like:
- ”segments_reduction”, “individual” or “spectral_name” for the SpectralAnalysis diagnostic
- ”freq”, “group”, “dtype” for the MissingPoints diagnostic,
- ”delta” and “freq” for the Crossover diagnostic,
- ”pixel_split” for Raw and RawComparison Swath diagnostics.

Return type:

Dataset | Analysis

Returns:

Dataset containing the requested data (for raw data and statistics). Analysis for analyses.

Raises:

AltiDataError – If data with the provided name were not defined or computed. If the stat parameter is invalid.

get_data_group(name, stat=None)

Returns data related to a group of diagnostics.

Parameters:

name (str) – Name of a diagnostic belonging to the group to get.
stat (StatType | str | None) – If the data is a computed diagnostic, which statistic to get.

Return type:

Dataset | dict[StatType | DiagnosticType, Dataset]

Returns:

Dataset or dictionary of datasets containing the requested data.

Raises:

AltiDataError – If data with the provided name were not defined or computed. If the stat parameter is invalid.

get_diagnostic(name, field=None, x=None, y=None, z=None, kwargs=None)

Get the requested diagnostic data container.

If name is not an existing diagnostic, a new raw or raw_comparison diagnostic will be generated with the provided parameters.

Parameters:

name (str) – Name of the diagnostic.
field (Field | None) – Field for which to create a raw_data diagnostic.
x (Field | None) – Field to use as x-axis (raw comparison diagnostic).
y (Field | None) – Field to use as y-axis (raw comparison diagnostic).
z (Field | None) – Field to use as z-axis (raw comparison 3d diagnostic).
kwargs (dict[Any, Any]) – Additional read_data parameters.

Return type:

Diagnostic

Returns:

Diagnostic data container.

get_diagnostics(dtype=None, containing=None, computed=None, freq=None)

List the subset of diagnostics’ respecting provided criteria.

Parameters:

dtype (str | DiagnosticType) –
Limit diagnostics list to provided type. Data types are:
- RAW
- RAW_COMPARISON
- EDITING
- GEOBOX
- TEMPORAL
- BINNED
- BINNED_2D
- HISTOGRAM
- SCATTER
- RATIO
- CROSSOVER
- MISSING_POINTS
- SECTION_ANALYSES
containing (str) – Limit to diagnostic names containing this element.
computed (bool | None) –
Limited diagnostics list based on their computation status:
- [Default] None: All diagnostics
- False: Non-computed diagnostics
- True: Computed diagnostics
freq (FrequencyHandler) – Limit to diagnostics compatible with a dask computation at the provided frequency.

Return type:

list[Diagnostic]

Returns:

List of diagnostic names.

property latitude: Field: Name of the latitude coordinate.

list_diagnostics(dtype=None, containing=None, computed=None, freq=None)

List the subset of diagnostics’ respecting provided criteria.

Parameters:

dtype (str | DiagnosticType) –
Limit diagnostics list to provided type. Data types are:
- RAW
- RAW_COMPARISON
- EDITING
- GEOBOX
- TEMPORAL
- BINNED
- BINNED_2D
- HISTOGRAM
- SCATTER
- RATIO
- CROSSOVER
- MISSING_POINTS
- SECTION_ANALYSES
containing (str) – Limit to diagnostic names containing this element.
computed (bool | None) –
Limited diagnostics list based on their computation status:
- [Default] None: All diagnostics
- False: Non-computed diagnostics
- True: Computed diagnostics
freq (FrequencyHandler) – Limit to diagnostics compatible with a dask computation at the provided frequency.

Return type:

list[str]

Returns:

List of diagnostic names.

classmethod load(name)

Load a previously stored AltiData object.

Parameters:: name (str | Path) – Name and path of the file.
Return type:: CommonData
Returns:: The loaded object.

property longitude: Field: Name of the longitude coordinate.

merge_data(data, interp=False, method=None, **kwargs)

Merge the provided data container raw data into the current one.

If provided data and current data include the INTERPOLATED_INDEX field, data will be considered as already aligned otherwise the provided data will be interpolated or re-indexed along the time dimension using the provided method
Longitudes from provided data will be replaced by the current ones
Latitude from the provided data will be replaced by the current ones

Interpolation is using interp_like method from xarray. Reindexing is using reindex_like method from xarray.

Parameters:

data (CommonData) – Data container object containing computed raw data to merge.
interp (bool) – Whether to interpolate (True) or just reindex the data (False)
method (str) –
- Interpolation methods:
  - {“linear”, “nearest”} for multidimensional array
  - {“linear”, “nearest”, “zero”, “slinear”, “quadratic”, “cubic”} for 1-dimensional array.
  - linear is used by default
- Reindexing methods:
  - None (default): don’t fill gaps
  - pad / ffill: propagate last valid index value forward
  - backfill / bfill: propagate next valid index value backward
  - nearest: use the nearest valid index value
kwargs –
Additional parameters passed to the underlying xarray function.
- Interpolation options:
  - Additional keyword passed to scipy’s interpolator.
- Reindexing options:
  - tolerance: Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation
  - fill_value: Value to use for newly missing values

property orf: PassIndexer | None

plot_container(name, template, stat=None, field=None, x=None, y=None, **kwargs)

Returns a plot container for the requested data.

Parameters:

name (str) – Name of the data to get.
template (PlotTemplate) – Parameters template to use.
stat (StatType | str | None) – If the data is a computed statistic, which statistic to get.
field (Field | None) – Field for which to create a raw_data diagnostic.
x (Field | None) – Field to use as x-axis (raw comparison diagnostic).
y (Field | None) – Field to use as y-axis (raw comparison diagnostic).
kwargs – Additional parameters required to generate the container.

Return type:

PlotContainer

Returns:

PlotContainer containing the requested data.

Raises:

AltiDataError – If data with the provided name were not defined or computed. If the stat parameter is invalid.

read_data(fields, start=None, end=None, include_end=True, alt_source=False)

Read the requested fields and rename them according to the dictionary.

Parameters:

fields (list[Field]) – Dictionary of fields names matched to their source.
start (Union[datetime64, Timestamp, datetime, str, DateHandler, None]) – Starting date of the data to get.
end (Union[datetime64, Timestamp, datetime, str, DateHandler, None]) – Ending date of the data to get.
include_end (bool) – Whether to include the end date or not.
alt_source (bool) – Whether to use fields’ alternative source or not.

Return type:

Dataset

Returns:

Fields values as a Dataset

property reader: CasysReader: Data source reader.

classmethod set_signature(): Fix the class initialization signature.

classmethod show_bathymetry_grids(containing=None)

Display available bathymetry grids (named according to their resolution).

Parameters:: containing (str) – Limit grids to the one containing this element.
Return type:: list[str]
Returns:: List of available bathymetry grids.

show_fields(containing=None)

Display existing fields with their description.

Parameters:: containing (str) – Only showing fields whose name or description contain this element.

classmethod show_theoretical_tracks(containing=None)

Display available theoretical tracks.

Parameters:: containing (str) – Limit tracks to the one containing this element.
Return type:: list[str]
Returns:: List of available theoretical passes.

property source: CasysReader: Data source reader alias.

property source_type: str: Source’s type.

store(name, overwrite=False)

Store this AltiData object into a pickle format.

Parameters:

name (str | Path) – Name and path of the file.
overwrite (bool) – Whether to overwrite any existing file or not.

store_configurations(diagnostics, analyse_type=CUSTOM, analyse_date=None)

Required storage groups and their configuration for the requested diagnostics.

Parameters:

diagnostics (list[str]) – Diagnostics to include.
analyse_type (FreqType | str) – Type of period covered by this analyse (cycle, pass or custom). It’s used to determine the type of storage group to create.
analyse_date (datetime64 | None) – Date representing the set of data used in this analyse. It’s used to determine at which timestamp to store non-temporal diagnostics.

Return type:

dict[str, StorageGroupParams]

Returns:

Dictionary containing the the storage groups and their configuration.

store_diagnostic(name, store, mode=StorageMode.OVERWRITE, analyse_type=CUSTOM, analyse_date=None, lock=None)

Write the requested diagnostic results to the specified store.

Parameters:

name (str) – Name of the diagnostic to store.
store (DiagnosticStore | str) – Store to write the diagnostic results to.
mode (StorageMode | str) – Storage mode to use when writing data.
analyse_type (FreqType | str) – Type of period covered by this analyse (cycle, pass or custom). It’s used to determine the type of storage group to create.
analyse_date (Union[datetime64, Timestamp, datetime, str, DateHandler]) – Date representing the set of data used in this analyse. It’s used to determine at which timestamp to store non-temporal diagnostics.
lock (str | None) – Dask lock to use when writing data.

store_diagnostics(store, mode=StorageMode.OVERWRITE, diags=None, analyse_type=CUSTOM, analyse_date=None, lock=None)

Write the diagnostics results to the store specified with the store path.

Parameters:

store (DiagnosticStore | str) – Store or path of the store.
mode (StorageMode | str) – Storage mode to use when writing data.
diags (str | list[str] | None) – List of the diagnostics names to store.
analyse_type (FreqType | str) – Type of period covered by this analyse (cycle, pass or custom). It’s used to determine the type of storage group to create.
analyse_date (Union[datetime64, Timestamp, datetime, str, DateHandler]) – Date representing the set of data used in this analyse. It’s used to determine at which timestamp to store non-temporal diagnostics.
lock (str | None) – Dask lock to use when writing data.

property time: Field: Name of the time coordinate.

Bases: CommonData

SwathData is a data container allowing to access a source of data and define then compute diagnostics.

Parameters:

source – Input source (name of the table if using OCTANT storage).
date_start – Starting date of the period of interest.
date_end – Ending date of the period of interest.
select_clip – Selection clip allowing to work on a subset of the source’s data.
select_shape – Shape file, GeoDataFrame or Geometry on which to limit source’s data.
orf – Path or name of the orf.
reference_track –
Setting this parameter enables source’s data interpolation on this reference track. Every diagnostic is then computed using these interpolated data.

File path or data of the reference track on which to interpolate read data. A list of existing theoretical reference tracks can be shown using the show_theoretical_tracks method:
```
>>> CommonData.show_theoretical_tracks()
```
Standard along track data (orbits) can be provided as well.

This parameter can be provided as a dictionary containing ‘data’, ‘path’ and ‘coordinates’ keys.
time – The time field. (if not provided, default is “time” field)
latitude – The latitude field. (if not provided, default is “LATITUDE” field)
longitude – The longitude field. (if not provided, default is “LONGITUDE” field)
cycle_number – Cycle number’s field. (if not provided, default is “CYCLE_NUMBER” field)
pass_number – Pass number’s field. (if not provided, default is “PASS_NUMBER” field)
diag_overwrite –
Define the behavior when adding a diagnostic with an already used name:
- [default] False: raise an error
- True: remove the old diagnostic and add the new one
time_extension – Whether to allow the extension of user defined time interval for specific diagnostic requirements or not.
source_type – Input source type.
latitude_nadir – Nadir’s latitude field.
longitude_nadir – Nadir’s longitude field.
cross_track_distance – Cross track distance field.

KNOWN_PARAMETERS: dict[str, tuple[Any, Any]] = {'diag_overwrite': ('bool', False, _ParameterKind.KEYWORD_ONLY), 'source': ('CasysReader', <class 'inspect._empty'>, _ParameterKind.POSITIONAL_OR_KEYWORD), 'source_type': ('Optional[str]', None, _ParameterKind.KEYWORD_ONLY), 'time_extension': ('bool', False, _ParameterKind.KEYWORD_ONLY)}

add_binned_stat(name, field, x, res_x, stats=None, stat_selection=None)

Add a binned diagnostic computing requested statistics inside bands defined by values of the x parameter according to its resolution.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field on which to compute statistics.
x (Field) – Field to use for the x-axis.
stats (list[StatType | str] | str | None) – List of statistics to compute (count, max, mean, median, min, std, var, mad)
res_x (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the x-axis.
stat_selection (str | None) –
Selection clip used to invalidate (set to NaN) some bins. Valid conditions are:
- count
- min
- max
- mean
- median
- std
- var
- mad
These clips are Python vector clips. Examples:
- count :>= 10 && max :< 100
- min :> 3
- median :> 10 && mean :> 9

add_binned_stat_2d(name, field, x, res_x, y, res_y, stats=None, stat_selection=None)

Add a 2D binned diagnostic computing requested statistics inside boxes defined by values of the x and y parameter according their respective resolutions.

Binned 2d data and plots can be accessed or created using special keywords:

plot=”box” (default): color mesh representation, on an x-axis/y-axis grid.

plot=”curve”:

axis=”x”: along x-axis representation of each y-field bin

axis=”y”: along y-axis representation of each x-field bin

plot=”3d”: 3d color mesh representation, on an x-axis/y-axis/z-axis 3d grid.

plot=”box3d”: 3d bins surfaces representation, on an x-axis/y-axis/z-axis 3d grid.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field on which to compute statistics.
x (Field) – Field to use for the x-axis.
y (Field) – Field to use for the y-axis.
res_x (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the x-axis.
res_y (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the y-axis.
stats (list[StatType | str] | None) – List of statistics to compute (count, max, mean, median, min, std, var, mad)
stat_selection (str | None) –
Selection clip used to invalidate some bins. Valid conditions are:
- count
- min
- max
- mean
- median
- std
- var
- mad
These clips are Python vector clips. Examples:
- count :>= 10 && max :< 100
- min :> 3
- median :> 10 && mean :> 9

add_crossover_stat(name, field, max_time_difference, data=None, stats=None, res_lon=(-180, 180, 4), res_lat=(-90, 90, 4), box_selection=None, geobox_stats=None, temporal_stats_freq=None, temporal_stats=None, temporal_freq_kwargs=None, pass_multi_intersect=False, cartesian_plane=True, crossover_table=None, diamond_relocation=False, diamond_reduction=MEAN, **kwargs)

Add the computation of the difference between the ascending and descending arc values in the crossovers diamonds or at crossovers equivalent Nadir points (see diamond_reduction parameter for a statistical reduction of the diamond data). Temporal statistics (by cycle or day) can be added to the computation.

Values and time delta are computed at each point and requested statistics for each geographical box. These data are accessible using the requested statistics name or ‘crossover’ and ‘value’ keywords for the time delta and the values at each crossover point.

Crossovers data and plots can be accessed or created using special keywords:

delta parameter: cartographic representation of the difference between the two arcs
- delta=”field”: difference of the field values
- delta=”time”: difference of the time values
stat parameter: geographical box or temporal statistic representation.
- stat=”…”: requested statistic
freq parameter: temporal statistic representation.
- freq=”…”: frequency of the requested statistic

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field for which to compute the statistic.
data (CommonData) – External data (NadirData) to compute crossovers with. This option is used to compute multi-missions crossovers.
max_time_difference (str) – Maximum delta of time between the two arcs as a string with its unit. Any string accepted by pandas.Timedelta is valid. i.e. ‘10 days’, ‘6 hours’, ‘1 min’, …
stats (list[StatType | str] | None) – List of statistics to compute (count, max, mean, median, min, std, var, mad) for temporal and geobox statistics.
res_lon (Union[tuple[float, float, float], DataResolution, str, None]) – Minimum, maximum and box size over the longitude (Default: -180, 180, 4).
res_lat (Union[tuple[float, float, float], DataResolution, str, None]) – Minimum, maximum and box size over the latitude (Default: -90, 90, 4).
box_selection (Field | None) – Field used as selection for computation of the count statistic. Box in which the box_selection field does not contain any data will be set to NaN instead of 0.
geobox_stats (list[StatType | str] | None) – Statistics included in the geobox diagnostic.
temporal_stats_freq (list[str | FreqType | FrequencyHandler]) – List of temporal statistics frequencies to compute.
temporal_stats (list[StatType | str] | None) – Statistics included in the temporal diagnostic.
temporal_freq_kwargs (dict[str, Any]) – Additional parameters to pass to pandas.date_range underlying function.
pass_multi_intersect (bool) – Whether to look for multiple intersections between a set of 2 passes or not.
cartesian_plane (bool) – Flag determining the plane used for crossovers computation. If True, the crossover is calculated in the cartesian plane. If False, the crossover is calculated in the spherical plane. Defaults to True.
crossover_table (set[tuple[int, int]] | None) – The table of possible combinations of crossovers between the two passes. If this table is not defined, all crossovers between odd and even passes will be calculated.
diamond_relocation (bool) – Flag determining data relocation to nadir crossover coordinates for the statistics computation. If True, data relocation is performed. Default value is False.
diamond_reduction (str | StatType | None) – Statistic type used to reduce the data on the crossover diamond. Reduction is disabled with the “none” value. (Default value is “mean”)

Raises:

AltiDataError – If a data already exists with the provided name.

add_geobox_stat(name, field, stats=None, res_lon=(-180, 180, 4), res_lat=(-90, 90, 4), box_selection=None, stat_selection=None, projection=None)

Add a geographical box diagnostics computing requested statistics for the provided field in each box defined by the res_lon and res_lat parameters.

Geobox stat data and plots can be accessed or created using special keywords:

plot=”box” (default): color mesh representation, on an x-axis/y-axis grid.

plot=”3d”: 3d color mesh representation, on an x-axis/y-axis/z-axis 3d grid.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field on which to compute.
stats (list[StatType | str] | None) – List of statistics to compute (count, max, mean, median, min, std, var, mad).
res_lon (Union[tuple[float, float, float], DataResolution, str, None]) – Minimum, maximum and box size over the longitude (Default: -180, 180, 4).
res_lat (Union[tuple[float, float, float], DataResolution, str, None]) – Minimum, maximum and box size over the latitude (Default: -90, 90, 4).
projection (Proj | str | None) – Projection in which to project longitude and latitude values before binning data.
box_selection (None | str | Field) – Field used as selection for computation of the count statistic. Box in which the box_selection field does not contain any data will be set to NaN instead of 0.
stat_selection (str | None) –
Selection clip used to invalidate (set to NaN) some bins. Valid conditions are:
- count
- min
- max
- mean
- median
- std
- var
- mad
These clips are Python vector clips. Examples:
- count :>= 10 && max :< 100
- min :> 3
- median :> 10 && mean :> 9

add_histogram(name, x, res_x='auto')

Add a histogram diagnostic for the provided field computing its values’ distribution according to the res_x parameter.

Parameters:

name (str) – Name of the diagnostic.
x (Field) – Field used for the x-axis.
res_x (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the x-axis, ‘auto’ (Default: ‘auto’). ‘auto’ will use the 2.5 percentile of values as min, the 97.5 percentile as maximum and make 40 groups in between.

add_periodogram(name, base_diag, nbr_periods, stats=None, first_period=None, last_period=None, ref_period=None, diag_kwargs=None)

Add the computation of a periodogram on the results of a temporal diagnostic. This diagnostic must include the computation of the MEAN statistic.

(nbr_periods, first_period, last_period)basic use without ‘ref_period’.: 0 < first_period < last_period
(nbr_periods, first_period, ref_period)optimal ‘last_period’ calculated.: 0 < first_period < ref_period < last_period
(nbr_periods, ref_period, last_period)optimal ‘first_period’ calculated.: 0 < first_period < ref_period < last_period

Parameters:

name (str) – Name of the periodogram statistic.
base_diag (str) – Name of the base temporal diagnostic.
stats (list[StatType | str] | None) – Statistics result in the base_diag to compute the periodogram from. Default to the available statistics in the provided base diagnostic.
nbr_periods (int) – Number of periods.
first_period (timedelta64 | str | None) – First period interval (to be used with ‘last_period’ or ‘ref_period’).
last_period (timedelta64 | str | None) – Last period interval (to be used with ‘first_period’ or ‘ref_period’).
ref_period (timedelta64 | str | None) – Reference period interval (to be used with ‘first_period’ or ‘last_period’).
diag_kwargs (dict[str, Any] | None) –
Dictionary containing the necessary information to identify a temporal sub diagnostics (similar to the get_data method ones).
- Crossovers: “freq”, the frequency of the sub-diagnostic to process.
- Missing points:
  - ”freq”: the frequency of the sub-diagnostic to process,
  - ”group”: one of the missing points groups, defined in “mp_groups”,
  the add_missing_points_stat method parameters (ex: “GLOBAL”)
  - ”dtype”: “missing” or “available”.

add_ratio(name, stat_numerator, stat_denominator, total=True)

Add the computation of a ratio between two diagnostics. These two diagnostic must include the computation of the COUNT statistic.

Parameters:

name (str) – Name of the ratio statistic.
stat_numerator (str) – Name of statistic to compute the ratio of, in comparison to another statistic.
stat_denominator (str) – Name of the reference statistic.
total (bool | None) – Whether the denominator statistic is the total on which to compute the ratio or the complementary part.

add_raw_comparison(name, x, y, z=None)

Add a raw data comparison.

In the case of a 3d Raw comparison (z parameter provided), data and plots can be accessed or created using special keywords:

plot=”2d” (default): 2d scatter representation.

plot=”3d”: 3d scatter representation.

To split the data in the 3d plots (separate the 2 swaths), the following format can be used: plot=”3d:pixel_split”, where “pixel_split” is the pixel index value to split the data at.

Parameters:

name (str) – Name of the diagnostic.
x (Field) – Field used for the x-axis.
y (Field) – Field used for the y-axis.
z (Field) – Field used for the z-axis. Optional

Raises:

AltiDataError – If a diagnostic already exists with the provided name.

add_raw_data(name, field)

Add a raw data diagnostic (used for along track plotting).

Raw data and plots can be accessed or created using special keywords:

plot=”time” (default): along time representation.

plot=”map”: Cartographic representation.

plot=”3d”: 3d surface representation.

To split the data in the 3d plots (separate the 2 swaths), the following format can be used: plot=”3d:pixel_split”, where “pixel_split” is the pixel index value to split the data at.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field for which to get raw data.

Raises:

AltiDataError – If a diagnostic already exists with the provided name.

add_scatter(name, x, res_x, y, res_y)

Add a scatter diagnostic computing the distribution of on field against the other according to their respective resolutions.

Parameters:

name (str) – Name of the diagnostic.
x (Field) – Field used for the x-axis.
y (Field) – Field used for the y-axis.
res_x (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the x-axis.
res_y (Union[tuple[float, float, float], DataResolution, str]) – Min, max and width for the y-axis.

add_spectral_analysis(name, field, segment_length, holes_max_length=None, global_resampling=False, delta_t=None, noise_amplitude=None, insulation_level=0.75, last_segment_overlapping=0, max_time_dispersion=5, max_resampling=0.25, segments_nb_delta_t=1, segments_nb_delta_x=1, spectral_conf=None, segments_reduction=None, pixels=None, pixels_selection=RANGE, pixels_reduction=MEAN, res_segments=False, res_individual_psd=False, **kwargs)

Add the computation of a spectral analysis diagnostic.

Spectral analysis data and plots can be accessed or created using special keywords:

plot=”psd” (default): Power spectral density along the wave number,

plot=”segments”: Cartographic representation of the selected segments.

The segments_reduction parameter needs to be provided if more than one reduction was requested or if computed using dask (the stat keyword might be used instead):

segments_reduction=”mean”

segments_reduction=”median”

The pixel parameter needs to be provided to indicate which pixel to use:

pixel=20

pixel=(55, 60)

Additional plotting options are available to the “psd” plot type:

individual: setting it to True (Default: False) display the set of psd on each segments instead of the average psd,

n_bins_psd: integer determining the number of bins along the psd values axis for the individual=True case (Default: 100),

second_axis: flag allowing the display of the second x-axis, for the segment length values equivalent to the wave number.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field on which to compute the analysis.
segment_length (int) – Length of a segment (section) in number of points. It should be something like a few hundred points. (Example: 500 units)
holes_max_length (int) – Maximum length of a hole. It should be something like a few points (Example: 5)
global_resampling (bool) – Resampling Flag (Default: False). True - If one section requires to be resampled => resample all sections. False - Resample only sections requiring a resampling.
delta_t (timedelta64 | str) – Time gap between two measurements.
noise_amplitude (float) – Noise amplitude in data. Default to half of the data standard deviation.
insulation_level (float) – Minimum valid values percentage on both sides of the hole (Default: 0.75). Left and right sides are equal to hole length.
last_segment_overlapping (float) – Percentage of overlap for second-to-last segment (Default: 0.5). When the section is divided in equal segments, the last segment might be too short, so it will take some part of data (amount depending on this parameter) from the previous segment.
max_time_dispersion (int) – Maximum allowed percentage of dispersion for delta_t (Default: 5). If delta_t dispersion exceed this threshold, a warning will be displayed.
max_resampling (float) – Maximum resampled data percentage (Default: 0.25). A warning will be displayed if this threshold is exceeded. The resampling of a large amount of data can have a great impact on the final result.
segments_nb_delta_t (int) – Number of segments used to compute the average time gap between two measurements, during the segments extraction process (Default: 1).
segments_nb_delta_x (int) – Number of segments used to compute the average distance between two measurements, during the segments extraction process (Default: 1).
spectral_conf (dict[str, dict[str | SpectralType, Any]]) – Dictionary of the spectral parameters to use for the spectral curve types. Each key represents a spectral analysis name is associated with a dictionary containing the parameters. This dictionary must contain at least the “spectral_type” key and value. (Default: dictionary containing the default “periodogram” parameters: {“periodogram”: {“spectral_type”: “periodogram”, “window”: “hann”, “detrend”: “linear”, …}}).
segments_reduction (list[StatType | str] | StatType | str | None) – List of statistic types used to reduce the spectral data across segments (Default: mean).
pixels (int | list[int | tuple[int, int]]) – Pixels indexes along cross track distance dimension for which to compute the diagnostic. Either a single integer (for a single pixel) or a list/tuple of: - integers: pix - list/tuple of two integers, describing a range (pix_min, pix_max)
pixels_selection (str | SelectionType) – Mode of spectral data reduction for the different spectral curves computed for the provided pixel values (Default: “all”). “all”: reduce all the spectral curves for the reduction. “range”: reduce the spectral curves within a cross track distance range. “none”: no reduction performed.
pixels_reduction (str | StatType) – Statistic type used to reduce the spectral data across pixels (Default: mean).
res_segments (bool) – Flag indicating whether to save the segments data in the spectral analysis result (Default: False). True - Saving segments data. False - Not saving segments data.
res_individual_psd (bool) – Flag indicating whether to save the individual power spectrum data on each segments in the spectral analysis result (Default: False). True - Saving the individual psd data. False - Not saving the individual psd data.

add_time_stat(name, field, freq, stats=None, stat_selection=None, freq_kwargs=None, **kwargs)

Add an along time diagnostic computing requested statistics at the provided frequency.

Parameters:

name (str) – Name of the diagnostic.
field (Field) – Field on which to compute statistics.
freq (str | FreqType | FrequencyHandler) – Frequency (day, pass, cycle or any pandas offset aliases [1]_)
stats (list[StatType | str] | str | None) – List of statistics to compute (count, max, mean, median, min, std, var, mad)
stat_selection (str | None) –
Selection clip used to invalidate (set to NaN) some bins. Valid conditions are:
- count
- min
- max
- mean
- median
- std
- var
- mad
These clips are Python vector clips. Examples:
- count :>= 10 && max :< 100
- min :> 3
- median :> 10 && mean :> 9
freq_kwargs (dict[str, Any]) – Additional parameters to pass to pandas.date_range underlying function.

References

property analyse_date: datetime64: Date at which diagnostics will be stored.

clear(name=None)

Clear all or the specified diagnostic from this AltiData.

Parameters:: name (str | None) – Name of the data to remove (Not providing a name will remove all data).

compute(stats=None)

Read data and compute statistics. Limited to the provided list of data if specified.

Parameters:: stats (str | list[str] | None) – Name of the data to limit the computation to.

compute_dask(stats=None, freq=None, jobs_number=None, bar=None, dask_client=None, **kwargs)

Read data and compute statistics. Limited to the provided list of data if specified.

Parameters:

stats (str | list[str] | None) – Name of the data to limit the computation to.
freq (str | FreqType | FrequencyHandler | None) – Minimal split frequency (day, pass, cycle or any pandas offset aliases [1]_) to respect.
jobs_number (int) – Number of jobs to create (Default to the maximum possible number of periods the data can be split into according to the provided frequency).
bar (bool | None) – [Does not work on xarray datasets] Whether to display a progress bar or not. If None, will display if logging level <= INFO.
dask_client (Client | str | None) –
Dask client on which to submit jobs.
- if a client, use this client
- if a scheduler_file, connect a client to it
- try to get an existing dask client in the environment
- create a local cluster and connect to it
kwargs – Additional parameters determine the time splitting characteristics. This parameter works with pandas.date_range [2]_ frequencies.

References

compute_from_store(diags=None, delayed=True)

Compute diagnostics using stored data.

Parameters:

diags (str | list[str] | None) – List of the diagnostics names to compute.
delayed (bool) – Flag indicating whether to load all data from store at once or when the data is actually requested (plot/get_data).

property cross_track_distance: Field: Name of the latitude coordinate.

property cycle_number: Field: Name of the longitude coordinate.

property data: Dataset

Change raw data used to compute diagnostics.

Parameters:: data – New data to use.

property date_end: DateHandler | None: date_end.

property date_start: DateHandler | None: date_start.

static disable_loginfo(): Set the logging level to WARNING.

static enable_loginfo(): Set the logging level to INFO.

property fields: dict[str, Field]

Returns the dictionary of existing fields in the source.

Returns:: List of existing fields as Field objects.

get_data(name, **kwargs)

Returns a computed statistic or raw data (with its time, latitude and longitude) as a Dataset.

Parameters:

name (str) – Name of the data to get.
kwargs –
Additional parameters required to get the data. Those parameters are described in the add_diagnostic method documentation. Some frequent parameters are “stat” and “plot”. Other parameters are more specific to some diagnostic, like:
- ”segments_reduction”, “individual” or “spectral_name” for the SpectralAnalysis diagnostic
- ”freq”, “group”, “dtype” for the MissingPoints diagnostic,
- ”delta” and “freq” for the Crossover diagnostic,
- ”pixel_split” for Raw and RawComparison Swath diagnostics.

Return type:

Dataset | Analysis

Returns:

Dataset containing the requested data (for raw data and statistics). Analysis for analyses.

Raises:

AltiDataError – If data with the provided name were not defined or computed. If the stat parameter is invalid.

get_data_group(name, stat=None)

Returns data related to a group of diagnostics.

Parameters:

name (str) – Name of a diagnostic belonging to the group to get.
stat (StatType | str | None) – If the data is a computed diagnostic, which statistic to get.

Return type:

Dataset | dict[StatType | DiagnosticType, Dataset]

Returns:

Dataset or dictionary of datasets containing the requested data.

Raises:

AltiDataError – If data with the provided name were not defined or computed. If the stat parameter is invalid.

get_diagnostic(name, field=None, x=None, y=None, z=None, kwargs=None)

Get the requested diagnostic data container.

If name is not an existing diagnostic, a new raw or raw_comparison diagnostic will be generated with the provided parameters.

Parameters:

name (str) – Name of the diagnostic.
field (Field | None) – Field for which to create a raw_data diagnostic.
x (Field | None) – Field to use as x-axis (raw comparison diagnostic).
y (Field | None) – Field to use as y-axis (raw comparison diagnostic).
z (Field | None) – Field to use as z-axis (raw comparison 3d diagnostic).
kwargs (dict[Any, Any]) – Additional read_data parameters.

Return type:

Diagnostic

Returns:

Diagnostic data container.

get_diagnostics(dtype=None, containing=None, computed=None, freq=None)

List the subset of diagnostics’ respecting provided criteria.

Parameters:

dtype (str | DiagnosticType) –
Limit diagnostics list to provided type. Data types are:
- RAW
- RAW_COMPARISON
- EDITING
- GEOBOX
- TEMPORAL
- BINNED
- BINNED_2D
- HISTOGRAM
- SCATTER
- RATIO
- CROSSOVER
- MISSING_POINTS
- SECTION_ANALYSES
containing (str) – Limit to diagnostic names containing this element.
computed (bool | None) –
Limited diagnostics list based on their computation status:
- [Default] None: All diagnostics
- False: Non-computed diagnostics
- True: Computed diagnostics
freq (FrequencyHandler) – Limit to diagnostics compatible with a dask computation at the provided frequency.

Return type:

list[Diagnostic]

Returns:

List of diagnostic names.

property latitude: Field: Name of the latitude coordinate.

property latitude_nadir: Field: Name of the latitude coordinate.

list_diagnostics(dtype=None, containing=None, computed=None, freq=None)

List the subset of diagnostics’ respecting provided criteria.

Parameters:

dtype (str | DiagnosticType) –
Limit diagnostics list to provided type. Data types are:
- RAW
- RAW_COMPARISON
- EDITING
- GEOBOX
- TEMPORAL
- BINNED
- BINNED_2D
- HISTOGRAM
- SCATTER
- RATIO
- CROSSOVER
- MISSING_POINTS
- SECTION_ANALYSES
containing (str) – Limit to diagnostic names containing this element.
computed (bool | None) –
Limited diagnostics list based on their computation status:
- [Default] None: All diagnostics
- False: Non-computed diagnostics
- True: Computed diagnostics
freq (FrequencyHandler) – Limit to diagnostics compatible with a dask computation at the provided frequency.

Return type:

list[str]

Returns:

List of diagnostic names.

classmethod load(name)

Load a previously stored AltiData object.

Parameters:: name (str | Path) – Name and path of the file.
Return type:: CommonData
Returns:: The loaded object.

property longitude: Field: Name of the longitude coordinate.

property longitude_nadir: Field: Name of the longitude coordinate.

merge_data(data, interp=False, method=None, **kwargs)

Merge the provided data container raw data into the current one.

If provided data and current data include the INTERPOLATED_INDEX field, data will be considered as already aligned otherwise the provided data will be interpolated or re-indexed along the time dimension using the provided method
Longitudes from provided data will be replaced by the current ones
Latitude from the provided data will be replaced by the current ones

Interpolation is using interp_like method from xarray. Reindexing is using reindex_like method from xarray.

Parameters:

data (CommonData) – Data container object containing computed raw data to merge.
interp (bool) – Whether to interpolate (True) or just reindex the data (False)
method (str) –
- Interpolation methods:
  - {“linear”, “nearest”} for multidimensional array
  - {“linear”, “nearest”, “zero”, “slinear”, “quadratic”, “cubic”} for 1-dimensional array.
  - linear is used by default
- Reindexing methods:
  - None (default): don’t fill gaps
  - pad / ffill: propagate last valid index value forward
  - backfill / bfill: propagate next valid index value backward
  - nearest: use the nearest valid index value
kwargs –
Additional parameters passed to the underlying xarray function.
- Interpolation options:
  - Additional keyword passed to scipy’s interpolator.
- Reindexing options:
  - tolerance: Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation
  - fill_value: Value to use for newly missing values

property orf: PassIndexer | None

property pass_number: Field: Name of the latitude coordinate.

plot_container(name, template, stat=None, field=None, x=None, y=None, **kwargs)

Returns a plot container for the requested data.

Parameters:

name (str) – Name of the data to get.
template (PlotTemplate) – Parameters template to use.
stat (StatType | str | None) – If the data is a computed statistic, which statistic to get.
field (Field | None) – Field for which to create a raw_data diagnostic.
x (Field | None) – Field to use as x-axis (raw comparison diagnostic).
y (Field | None) – Field to use as y-axis (raw comparison diagnostic).
kwargs – Additional parameters required to generate the container.

Return type:

PlotContainer

Returns:

PlotContainer containing the requested data.

Raises:

AltiDataError – If data with the provided name were not defined or computed. If the stat parameter is invalid.

read_data(fields, start=None, end=None, include_end=True, alt_source=False)

Read the requested fields and rename them according to the dictionary.

Parameters:

fields (list[Field]) – Dictionary of fields names matched to their source.
start (Union[datetime64, Timestamp, datetime, str, DateHandler, None]) – Starting date of the data to get.
end (Union[datetime64, Timestamp, datetime, str, DateHandler, None]) – Ending date of the data to get.
include_end (bool) – Whether to include the end date or not.
alt_source (bool) – Whether to use fields’ alternative source or not.

Return type:

Dataset

Returns:

Fields values as a Dataset

property reader: CasysReader: Data source reader.

classmethod set_signature(): Fix the class initialization signature.

classmethod show_bathymetry_grids(containing=None)

Display available bathymetry grids (named according to their resolution).

Parameters:: containing (str) – Limit grids to the one containing this element.
Return type:: list[str]
Returns:: List of available bathymetry grids.

show_fields(containing=None)

Display existing fields with their description.

Parameters:: containing (str) – Only showing fields whose name or description contain this element.

classmethod show_theoretical_tracks(containing=None)

Display available theoretical tracks.

Parameters:: containing (str) – Limit tracks to the one containing this element.
Return type:: list[str]
Returns:: List of available theoretical passes.

property source: CasysReader: Data source reader alias.

property source_type: str: Source’s type.

store(name, overwrite=False)

Store this AltiData object into a pickle format.

Parameters:

name (str | Path) – Name and path of the file.
overwrite (bool) – Whether to overwrite any existing file or not.

store_configurations(diagnostics, analyse_type=CUSTOM, analyse_date=None)

Required storage groups and their configuration for the requested diagnostics.

Parameters:

diagnostics (list[str]) – Diagnostics to include.
analyse_type (FreqType | str) – Type of period covered by this analyse (cycle, pass or custom). It’s used to determine the type of storage group to create.
analyse_date (datetime64 | None) – Date representing the set of data used in this analyse. It’s used to determine at which timestamp to store non-temporal diagnostics.

Return type:

dict[str, StorageGroupParams]

Returns:

Dictionary containing the the storage groups and their configuration.

store_diagnostic(name, store, mode=StorageMode.OVERWRITE, analyse_type=CUSTOM, analyse_date=None, lock=None)

Write the requested diagnostic results to the specified store.

Parameters:

name (str) – Name of the diagnostic to store.
store (DiagnosticStore | str) – Store to write the diagnostic results to.
mode (StorageMode | str) – Storage mode to use when writing data.
analyse_type (FreqType | str) – Type of period covered by this analyse (cycle, pass or custom). It’s used to determine the type of storage group to create.
analyse_date (Union[datetime64, Timestamp, datetime, str, DateHandler]) – Date representing the set of data used in this analyse. It’s used to determine at which timestamp to store non-temporal diagnostics.
lock (str | None) – Dask lock to use when writing data.

store_diagnostics(store, mode=StorageMode.OVERWRITE, diags=None, analyse_type=CUSTOM, analyse_date=None, lock=None)

Write the diagnostics results to the store specified with the store path.

Parameters:

store (DiagnosticStore | str) – Store or path of the store.
mode (StorageMode | str) – Storage mode to use when writing data.
diags (str | list[str] | None) – List of the diagnostics names to store.
analyse_type (FreqType | str) – Type of period covered by this analyse (cycle, pass or custom). It’s used to determine the type of storage group to create.
analyse_date (Union[datetime64, Timestamp, datetime, str, DateHandler]) – Date representing the set of data used in this analyse. It’s used to determine at which timestamp to store non-temporal diagnostics.
lock (str | None) – Dask lock to use when writing data.

property time: Field: Name of the time coordinate.

casys.computation.check_periodogram_params(*, nbr_periods, first_period=None, last_period=None, ref_period=None)

Check the periodogram diagnostic parameters.

Parameters:

nbr_periods (int) – Number of periods.
first_period (timedelta64 | None) – First period interval (to be used with ‘last_period’ or ‘ref_period’).
last_period (timedelta64 | None) – Last period interval (to be used with ‘first_period’ or ‘ref_period’).
ref_period (timedelta64 | None) – Reference period interval (to be used with ‘first_period’ or ‘last_period’).

casys.computation.compute_individual_psd(sp_type, sp_kwargs, field_name, segments_data)

Individual PSD computation.

Parameters:

sp_type (SpectralType) – Type of psd to compute.
sp_kwargs (dict[str, Any]) – Parameters to use to compute the psd.
field_name (str) – Name of the field to compute the psd from.
segments_data (Dataset) – Segments’ data.

Returns:

Wave number and individual psd data.

casys.computation.normalize_pixels(*, pixels=None)

Check and normalize provided pixels set.

Parameters:: pixels (int | Sequence[int | Sequence[int]]) – Set of pixels to check.
Return type:: list[int | list[int]]
Returns:: Normalized pixels set.

casys.computation.normalize_spectral_params(*, spectral_conf=None)

Normalize the spectral kwargs dictionary.

Parameters:

spectral_conf (dict[str, dict[str | SpectralType, str]]) – Dictionary of the spectral parameters to use for the spectral curve types. Each key represents a spectral analysis name is associated with a dictionary containing the parameters. This dictionary must contain at least the “spectral_type” key and value.

Return type:

tuple[list[str], list[SpectralType], list[dict[str, Any]]]

Returns:

Tuple containing:

spectral_names: List of the spectral analyses names.
spectral_types: List of normalized spectral curve types.
spectral_kwargs: Normalized dictionary of the spectral parameters.

casys.computation.normalize_stats(stats, default=MEAN)

Normalize the provided statistics list.

Parameters:

stats (str | StatType | list[StatType] | list[str]) – List of statistics.
default (StatType) – Default value of the statistic.

Return type:

list[StatType]

Returns:

Normalized list of statistics.

casys.computation.psd_segments_reduction(*, wn, individual_psd, reduction)

Reduce computed PSD across segments.

Parameters:

wn (ndarray) – Wave number values.
individual_psd (ndarray) – PSD data on all segments.
reduction (StatType) – Reduction to apply on the spectral data across the segments.

Return type:

ndarray

Returns:

Reduced psd values array.