Data source
NadirData
allows to:specify the source of data and its characteristics
apply selection criteria (using times interval, CLIPs or shapes)
apply transformations (interpolation to a reference track)
set and compute diagnostics
NadirData is a data container allowing to access a source of data and define then
compute diagnostics.
Parameters
----------
source
Input source (name of the table if using OCTANT storage).
date_start
Starting date of the period of interest.
date_end
Ending date of the period of interest.
select_clip
Selection clip allowing to work on a subset of the source's data.
select_shape
Shape file, GeoDataFrame or Geometry on which to limit source's data.
orf
Path or name of the orf.
reference_track
Setting this parameter enables source's data interpolation on this reference
track.
Every diagnostic is then computed using these interpolated data.
File path or data of the reference track on which to interpolate read data.
A list of existing theoretical reference tracks can be shown using the
show_theoretical_tracks method:
>>> CommonData.show_theoretical_tracks()
Standard along track data (orbits) can be provided as well.
This parameter can be provided as a dictionary containing 'data', 'path'
and 'coordinates' keys.
time
The time field. (if not provided, default is "time" field)
latitude
The latitude field. (if not provided, default is "LATITUDE" field)
longitude
The longitude field. (if not provided, default is "LONGITUDE" field)
cycle_number
Cycle number's field. (if not provided, default is "CYCLE_NUMBER" field)
pass_number
Pass number's field. (if not provided, default is "PASS_NUMBER" field)
diag_overwrite
Define the behavior when adding a diagnostic with an already used name:
* [default] False: raise an error
* True: remove the old diagnostic and add the new one
time_extension
Whether to allow the extension of user defined time interval for specific
diagnostic requirements or not.
source_type
Input source type.
SwathData
allows to:specify the source of data and its characteristics
apply selection criteria (using times interval, CLIPs or shapes)
set and compute diagnostics
NadirData is a data container allowing to access a source of data and define then
compute diagnostics.
Parameters
----------
source
Input source (name of the table if using OCTANT storage).
date_start
Starting date of the period of interest.
date_end
Ending date of the period of interest.
select_clip
Selection clip allowing to work on a subset of the source's data.
select_shape
Shape file, GeoDataFrame or Geometry on which to limit source's data.
orf
Path or name of the orf.
reference_track
Setting this parameter enables source's data interpolation on this reference
track.
Every diagnostic is then computed using these interpolated data.
File path or data of the reference track on which to interpolate read data.
A list of existing theoretical reference tracks can be shown using the
show_theoretical_tracks method:
>>> CommonData.show_theoretical_tracks()
Standard along track data (orbits) can be provided as well.
This parameter can be provided as a dictionary containing 'data', 'path'
and 'coordinates' keys.
time
The time field. (if not provided, default is "time" field)
latitude
The latitude field. (if not provided, default is "LATITUDE" field)
longitude
The longitude field. (if not provided, default is "LONGITUDE" field)
cycle_number
Cycle number's field. (if not provided, default is "CYCLE_NUMBER" field)
pass_number
Pass number's field. (if not provided, default is "PASS_NUMBER" field)
diag_overwrite
Define the behavior when adding a diagnostic with an already used name:
* [default] False: raise an error
* True: remove the old diagnostic and add the new one
time_extension
Whether to allow the extension of user defined time interval for specific
diagnostic requirements or not.
source_type
Input source type.
Note
date_start
and date_end
parameters are
optional.time
field will
be used to define these parameters.date_start
and date_end
are mandatory for this kind of sources.Data readers
CasysReader
are designed to interact and read from
different kind of sources. Each source’s type is associated with a data reader:
xarray.Dataset ->
DatasetReader
zcollection.Dataset ->
ZDatasetReader
string ->
CLSTableReader
dictionary ->
CLSTableInSituReader
zcollection.Collection ->
ZCollectionReader
swot_calval.io.Collection ->
ScCollectionReader
MultiReader
require to
explicitly instantiate the reader.xarray datasets
from casys import NadirData, DateHandler
from casys.readers import ZarrDatasetReader
import os
# Instantiate your zarr compatible xarray dataset reader
reader = ZarrDatasetReader(
data_path=os.path.join(os.environ["RESOURCES_DIR"], "data_C_J3_B.zarr"),
date_start=DateHandler("2019/06/01"),
date_end=DateHandler("2019/06/20"),
time="time",
longitude="LONGITUDE",
latitude="LATITUDE",
)
# Create your NadirData object
ad_ds = NadirData(source=reader)
DatasetReader
and
ZarrDatasetReader
classes.xarray Dataset reader.
Parameters
----------
data
Dataset.
data_path
Dataset's file(s) path.
backend_fields
List of fields (variables) to read.
backend_kwargs
kwargs to provide to the backend when using a data_path to load the data.
date_start
Starting date of the interval we're working on.
date_end
Ending date of the interval we're working on.
select_clip
Selection clip allowing to work on a subset of the source's data.
select_shape
Shape file, GeoDataFrame or Geometry to select.
data_cleaner
Data cleaning applied just after the reader.
This cleaning might consist of sorting, duplication removal or removing indexes
in order to keep them increasing.
orf
Source's indexer.
reference_track
Reference track.
time
Time field.
longitude
Longitude field.
latitude
Latitude field.
swath_lines
Swath main dimension.
swath_pixels
Swath cross_track dimension.
cycle_number
The cycle number field.
pass_number
The pass number field.
longitude_nadir
The nadir's longitude field.
latitude_nadir
The nadir's latitude field.
cross_track_distance
Cross track distance field.
CLS Tables
from casys import NadirData, DateHandler
from casys.readers import CLSTableReader
reader = CLSTableReader(
name="TABLE_C_J3_B_GDRD",
date_start=DateHandler.from_orf("C_J3_GDRD", 122, 1, pos="first"),
date_end=DateHandler.from_orf("C_J3_GDRD", 122, 154, pos="last"),
orf="C_J3_GDRD",
time="time",
longitude="LONGITUDE",
latitude="LATITUDE",
)
ad = NadirData(source=reader)
CLSTableReader
class.OCTANT CLS TableMeasure data reader.
Parameters
----------
name
Table's name.
ges_table_dir
Path of the GES_TABLE_DIR to use.
date_start
Starting date of the interval we're working on.
date_end
Ending date of the interval we're working on.
select_clip
Selection clip allowing to work on a subset of the source's data.
select_shape
Shape file, GeoDataFrame or Geometry to select.
data_cleaner
Data cleaning applied just after the reader.
This cleaning might consist of sorting, duplication removal or removing indexes
in order to keep them increasing.
orf
Source's indexer.
reference_track
Reference track.
time
Time field.
longitude
Longitude field.
latitude
Latitude field.
CLS in-situ Tables
CLSTableInSituReader
class.
sensor_type
sensor_name
OCTANT TableInSitu data reader.
Parameters
----------
sensor_type
Type of the in situ sensor.
sensor_name
Name of the insitu sensor.
ges_table_dir
Path of the GES_TABLE_DIR to use.
date_start
Starting date of the interval we're working on.
date_end
Ending date of the interval we're working on.
select_clip
Selection clip allowing to work on a subset of the source's data.
select_shape
Shape file, GeoDataFrame or Geometry to select.
data_cleaner
Data cleaning applied just after the reader.
This cleaning might consist of sorting, duplication removal or removing indexes
in order to keep them increasing.
orf
Source's indexer.
reference_track
Reference track.
time
Time field.
longitude
Longitude field.
latitude
Latitude field.
source
parameter.
sensor_type
sensor_name
ZCollection datasets
ZDatasetReader
class allows to work with
ZCollection
datasets.from casys import SwathData
from casys.readers import ZDatasetReader
import os
# Instantiate your Zdataset reader
reader = ZDatasetReader(
data=zds,
time="time",
longitude="LONGITUDE",
latitude="LATITUDE",
)
# Create your SwathData object
ad_ds = SwathData(source=reader)
ZDatasetReader
class.Reader for the zcollection.Dataset format.
Parameters
----------
data
ZCollection Dataset.
data_path
Zcollection path.
backend_fields
List of fields (variables) to read.
backend_kwargs
kwargs to provide to the backend when using a data_path to load the data.
date_start
Starting date of the interval we're working on.
date_end
Ending date of the interval we're working on.
select_clip
Selection clip allowing to work on a subset of the source's data.
select_shape
Shape file, GeoDataFrame or Geometry to select.
data_cleaner
Data cleaning applied just after the reader.
This cleaning might consist of sorting, duplication removal or removing indexes
in order to keep them increasing.
orf
Source's indexer.
reference_track
Reference track.
time
Time field.
longitude
Longitude field.
latitude
Latitude field.
swath_lines
Swath main dimension.
swath_pixels
Swath cross_track dimension.
cycle_number
The cycle number field.
pass_number
The pass number field.
longitude_nadir
The nadir's longitude field.
latitude_nadir
The nadir's latitude field.
cross_track_distance
Cross track distance field.
ZCollection collections
ZCollectionReader
class allows to work with
ZCollection
collections.from casys import SwathData
from casys.readers import ZCollectionReader
import os
# Instantiate your Zcollection reader
reader = ZCollectionReader(
data_path=os.path.join(os.environ["RESOURCES_DIR"], "my_collection"),
time="time",
longitude="longitude",
latitude="latitude",
)
# Create your SwathData object
ad_ds = SwathData(source=reader)
ZCollectionReader
class.Reader for a Zcollection Collection.
Parameters
----------
collection
Collection.
data_path
Collection path.
backend_fields
List of fields (variables) to read.
backend_kwargs
Kwargs dictionary to pass to the underlying collection.
date_start
Starting date of the interval we're working on.
date_end
Ending date of the interval we're working on.
select_clip
Selection clip allowing to work on a subset of the source's data.
select_shape
Shape file, GeoDataFrame or Geometry to select.
data_cleaner
Data cleaning applied just after the reader.
This cleaning might consist of sorting, duplication removal or removing indexes
in order to keep them increasing.
orf
Source's indexer.
reference_track
Reference track.
time
Time field.
longitude
Longitude field.
latitude
Latitude field.
swath_lines
Swath main dimension.
swath_pixels
Swath cross_track dimension.
cycle_number
The cycle number field.
pass_number
The pass number field.
longitude_nadir
The nadir's longitude field.
latitude_nadir
The nadir's latitude field.
cross_track_distance
Cross track distance field.
Swot_calval collections
ScCollectionReader
class allows to work with
swot_calval
collections.from casys import SwathData
from casys.readers import ScCollectionReader
import os
# Instantiate your Sc Collection reader
reader = ScCollectionReader(
data_path=os.path.join(os.environ["RESOURCES_DIR"], "my_collection"),
time="time",
longitude="longitude",
latitude="latitude",
)
# Create your SwathData object
ad_ds = SwathData(source=reader)
ScCollectionReader
class.Reader for a swot_calval Collection.
Parameters
----------
collection
Collection.
data_path
Collection path.
backend_fields
List of fields (variables) to read.
backend_kwargs
Kwargs dictionary to pass to the underlying collection.
date_start
Starting date of the interval we're working on.
date_end
Ending date of the interval we're working on.
select_clip
Selection clip allowing to work on a subset of the source's data.
select_shape
Shape file, GeoDataFrame or Geometry to select.
data_cleaner
Data cleaning applied just after the reader.
This cleaning might consist of sorting, duplication removal or removing indexes
in order to keep them increasing.
orf
Source's indexer.
reference_track
Reference track.
time
Time field.
longitude
Longitude field.
latitude
Latitude field.
swath_lines
Swath main dimension.
swath_pixels
Swath cross_track dimension.
cycle_number
The cycle number field.
pass_number
The pass number field.
longitude_nadir
The nadir's longitude field.
latitude_nadir
The nadir's latitude field.
cross_track_distance
Cross track distance field.
Warning
Multi-readers
MultiReader
class allows to work on a set
of readers as if they were a single data source. Multi-readers’ fields can use fields
from any of their readers as source.Warning
Any index not present in the reference’s reader will be ignored.
Reference’s index missing in reader’s data are filled with numpy.nan
Reader’s index missing in reference’s data are ignored.
Reader allowing to read from a set of readers.
The first reader is used as reference (for time and coordinates).
Fields from all readers are available and prefixed by the provided markers.
Parameters
----------
readers
List of readers.
markers
List of field's prefixes for each reader.
Default to ``Sx_`` with x being the reader's number.
tolerance
Gap's tolerance used to fill missing indexes from a reader when aligning it
on the reference's reader's index (default to 0).
date_start
Starting date of the interval we're working on.
date_end
Ending date of the interval we're working on.
select_clip
Selection clip allowing to work on a subset of the source's data.
select_shape
Shape file, GeoDataFrame or Geometry to select.
data_cleaner
Data cleaning applied just after the reader.
This cleaning might consist of sorting, duplication removal or removing indexes
in order to keep them increasing.
orf
Source's indexer.
reference_track
Reference track.
time
Time field.
longitude
Longitude field.
latitude
Latitude field.
When working with multiple sources having the same (or close enough) indexes
When working with multiple sources interpolated on the same reference track
Note
Example: Sentinel 6 HR vs LR
MultiReader
in order to work with fields coming from Sentinel 6 HR and LR storage.from casys import NadirData, DateHandler, Field
from casys.readers import MultiReader, CLSTableReader
start = DateHandler.from_orf(orf="C_S6A_LR", cycle_nb=41, pass_nb=1, pos="first")
end = DateHandler.from_orf(orf="C_S6A_LR", cycle_nb=44, pass_nb=254, pos="last")
r1 = CLSTableReader(name="TABLE_C_S6A_LR_B")
r2 = CLSTableReader(name="TABLE_C_S6A_HR_B")
ad = NadirData(
source=MultiReader(
readers=[r1, r2],
markers=["LR_", "HR_"],
date_start=start,
date_end=end,
orf="C_S6A_LR",
time="time",
longitude="LONGITUDE",
latitude="LATITUDE",
)
)
ad.show_fields(containing="RANGE.ALTI")
Name | Description | Unit |
---|---|---|
LR_RANGE.ALTI | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
LR_RANGE.ALTI.B2 | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
LR_RANGE.ALTI.CORR_GEO | The geographical correction parameter provides the range correction for the acrosstrack shift induced geographical variations. | m |
LR_RANGE.ALTI.CORR_GEO_MLE3 | The geographical correction parameter provides the range correction for the acrosstrack shift induced geographical variations. | m |
LR_RANGE.ALTI.CORR_GEO_NR | The geographical correction parameter provides the range correction for the acrosstrack shift induced geographical variations. | m |
LR_RANGE.ALTI.MLE3 | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
LR_RANGE.ALTI.NR | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
HR_RANGE.ALTI | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
HR_RANGE.ALTI.CORR_GEO | The geographical correction parameter provides the range correction for the acrosstrack shift induced geographical variations. | m |
HR_RANGE.ALTI.CORR_GEO_NR | The geographical correction parameter provides the range correction for the acrosstrack shift induced geographical variations. | m |
HR_RANGE.ALTI.NR | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
Example: Sentinel 6 vs Jason 3
MultiReader
in order to work with fields coming from Sentinel 6 and Jason 3 interpolated on a
common reference track.from casys import NadirData, DateHandler, Field
from casys.readers import MultiReader, CLSTableReader
start_s6 = DateHandler.from_orf(orf="C_S6A_LR", cycle_nb=44, pass_nb=1, pos="first")
end_s6 = DateHandler.from_orf(orf="C_S6A_LR", cycle_nb=44, pass_nb=254, pos="last")
start_j3 = DateHandler.from_orf(orf="C_J3", cycle_nb=219, pass_nb=1, pos="first")
end_j3 = DateHandler.from_orf(orf="C_J3", cycle_nb=219, pass_nb=254, pos="last")
# Each reader has it own ORF (important for the along track interpolation)
r1 = CLSTableReader(
name="TABLE_C_S6A_LR_B", orf="C_S6A_LR", date_start=start_s6, date_end=end_s6
)
r2 = CLSTableReader(
name="TABLE_C_J3_B", orf="C_J3", date_start=start_j3, date_end=end_j3
)
ad = NadirData(
source=MultiReader(
readers=[r1, r2],
markers=["S6_", "J3_"],
time="time",
longitude="LONGITUDE",
latitude="LATITUDE",
reference_track="J3",
)
)
ad.show_fields(containing="RANGE.ALTI")
Name | Description | Unit |
---|---|---|
S6_RANGE.ALTI | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
S6_RANGE.ALTI.B2 | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
S6_RANGE.ALTI.CORR_GEO | The geographical correction parameter provides the range correction for the acrosstrack shift induced geographical variations. | m |
S6_RANGE.ALTI.CORR_GEO_MLE3 | The geographical correction parameter provides the range correction for the acrosstrack shift induced geographical variations. | m |
S6_RANGE.ALTI.CORR_GEO_NR | The geographical correction parameter provides the range correction for the acrosstrack shift induced geographical variations. | m |
S6_RANGE.ALTI.MLE3 | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
S6_RANGE.ALTI.NR | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
J3_RANGE.ALTI | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
J3_RANGE.ALTI.ADAPTIVE | All instrumental corrections included, i.e. distance antenna-COG (cog_corr), USO drift correction (uso_corr), internal path cor | m |
J3_RANGE.ALTI.B2 | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
J3_RANGE.ALTI.MLE3 | All instrumental corrections included, i.e. distance antenna-COG, USO drift correction, internal path correction, Doppler corre | m |
Data selection
Using clips conditions
select_clip
parameter.from casys import NadirData, DateHandler
# Selecting data having:
# 27 <= LATITUDE <= 46 and -10 <= LONGITUDE <= 40
# with LONGITUDE being normalized between -180, 180
selection = """is_bounded(27, LATITUDE, 46) &&
is_bounded(-10, deg_normalize(-180, LONGITUDE), 40)"""
ad_sel = NadirData(
date_start=DateHandler.from_orf("C_J3_GDRD", 125, 1, pos="first"),
date_end=DateHandler.from_orf("C_J3_GDRD", 125, 254, pos="last"),
source="TABLE_C_J3_B_GDRD",
orf="C_J3_GDRD",
time="time",
longitude="LONGITUDE",
latitude="LATITUDE",
select_clip=selection
)
ad_sel
AltiData | |||||||
---|---|---|---|---|---|---|---|
Source Source type ORF Time name Longitude Latitude |
TABLE_C_J3_B_GDRD CLSTableReader C_J3_GDRD time LONGITUDE LATITUDE |
Period start Period end Selection clip Selection shape |
2019-06-30 23:26:04.865308 Cycle: 125 - Pass: 1 2019-07-10 21:24:35.317009 Cycle: 125 - Pass: 254 is_bounded(27, LATITUDE, 46) && is_bounded(-10, deg_normalize(-180, LONGITUDE), 40) False |
Using shape files
select_shape
parameter.from casys import NadirData, DateHandler
# Shape file selection
import geopandas as gpd
shape = gpd.read_file(os.path.join(os.environ["RESOURCES_DIR"], "Med", "Med.shp"))
shape = shape.set_crs(crs="EPSG:4326")
ad_sel = NadirData(
date_start=DateHandler.from_orf("C_J3_GDRD", 125, 1, pos="first"),
date_end=DateHandler.from_orf("C_J3_GDRD", 125, 254, pos="last"),
source="TABLE_C_J3_B_GDRD",
orf="C_J3_GDRD",
time="time",
longitude="LONGITUDE",
latitude="LATITUDE",
select_shape=shape
)
ad_sel
AltiData | |||||||
---|---|---|---|---|---|---|---|
Source Source type ORF Time name Longitude Latitude |
TABLE_C_J3_B_GDRD CLSTableReader C_J3_GDRD time LONGITUDE LATITUDE |
Period start Period end Selection clip Selection shape |
2019-06-30 23:26:04.865308 Cycle: 125 - Pass: 1 2019-07-10 21:24:35.317009 Cycle: 125 - Pass: 254 is_bounded(29.133297406790106, LATITUDE, 46.17221385443534) && is_bounded(-5.632114355511645, deg_normalize(-180, LONGITUDE), 37.53314064518961) True |
Along track data interpolation
reference_track
parameter enables every data read by NadirData to be
interpolated on the provided track.
By giving the name of an existing reference track (these tracks can be listed using the
show_theoretical_tracks()
method)By giving the path of a reference track netCDF file
By providing a
TheoreticalTrack
By providing a
StandardAlongTrack
from casys import NadirData, DateHandler
ad_interp = NadirData(
date_start=DateHandler.from_orf("C_J3_GDRD", 122, 1, pos="first"),
date_end=DateHandler.from_orf("C_J3_GDRD", 122, 1, pos="last"),
source="TABLE_C_J3_B_GDRD",
orf="C_J3_GDRD",
reference_track="J3",
time="time",
longitude="LONGITUDE",
latitude="LATITUDE",
)
interpolation
parameter.from casys import Field
var_sla_linear = Field(
name="SLA_linear",
source="ORBIT.ALTI - RANGE.ALTI - MEAN_SEA_SURFACE.MODEL.CNESCLS15",
unit="m",
interpolation="linear",
)
var_sla_nearest = Field(
name="SLA_nearest",
source="ORBIT.ALTI - RANGE.ALTI - MEAN_SEA_SURFACE.MODEL.CNESCLS15",
unit="m",
interpolation="nearest",
)
var_sla_spline = Field(
name="SLA_spline",
source="ORBIT.ALTI - RANGE.ALTI - MEAN_SEA_SURFACE.MODEL.CNESCLS15",
unit="m",
interpolation={"mode": "smoothing_spline", "noise_level": 0.1},
)
ad_interp.add_raw_data(name="SLA linear", field=var_sla_linear)
ad_interp.add_histogram(name="SLA nearest hist", x=var_sla_nearest, res_x="auto")
ad_interp.add_time_stat(
name="SLA spline (10 minutes)", field=var_sla_spline, freq="10min"
)
Merging NadirData
NadirData
objects, add raw data, compute them and finally merge them into a single one able to
use both sets of fields.Note
Alignment can only be made on the “time” dimension. Latitudes and longitudes are considered to be shared.
If that’s not the case, diagnostics making use of latitude and longitude need to be considered as invalid.
Merge the provided data container raw data into the current one.
* If provided data and current data include the INTERPOLATED_INDEX field, data
will be considered as already aligned otherwise the provided data will be
interpolated or re-indexed along the time dimension using the provided method
* Longitudes from provided data will be replaced by the current ones
* Latitude from the provided data will be replaced by the current ones
Interpolation is using interp_like method from xarray.
Reindexing is using reindex_like method from xarray.
Parameters
----------
data
Data container object containing computed raw data to merge.
interp
Whether to interpolate (True) or just reindex the data (False)
method
* Interpolation methods:
* {“linear”, “nearest”} for multidimensional array
* {“linear”, “nearest”, “zero”,
“slinear”, “quadratic”, “cubic”} for 1-dimensional array.
* linear is used by default
* Reindexing methods:
* None (default): don't fill gaps
* pad / ffill: propagate last valid index value forward
* backfill / bfill: propagate next valid index value backward
* nearest: use the nearest valid index value
kwargs
Additional parameters passed to the underlying xarray function.
* Interpolation options:
* Additional keyword passed to scipy’s interpolator.
* Reindexing options:
* tolerance: Maximum distance between original and new labels for
inexact matches. The values of the index at the matching locations
must satisfy the equation
* fill_value: Value to use for newly missing values
See the example presented in this notebook.