casys.editing

OCTANT-NG editing components classes.

Classes

Clip(expression, *[, has_fields, ...])

Class for managing CLIPs:

ClipCondition(clip[, name])

Basic clip condition: no parameter is required.

ClipDataset(dset)

Class allowing the use of clips on a xarray Dataset.

Editing([parameters])

Editing algorithm.

EditingComponent(name, ...)

An EditingComponent contains a list of invalidity conditions of the same type and an invalidity indicator's value.

EditingComponentSchema(*[, only, exclude, ...])

EditingParameters(editing_sequence[, ...])

Parameters for Editing algorithm.

EditingParametersSchema(*[, only, exclude, ...])

EditingResults(invalidity_indicator, components)

Editing algorithms results.

InvalidityCondition(clip[, name])

InvalidityCondition base class.

InvalidityConditionBaseSchema(*[, only, ...])

InvalidityCondition base schema.

InvalidityConditionGenericSchema

alias of RegistryGenericSchema

IterativeFilter(clip, nbr_iter, filter, ...)

Iterative processing, invalidating at each step the values "too far" from the filtered values.

RobustMeanStd(clip, nbr_iter, threshold[, name])

Iterative processing, invalidating at each step the values "too far" from the global mean.

StatisticsByPass(clip, nbr_min_pts, ...[, name])

Invalidate passes where mean or std statistics exceed the thresholds.

class casys.editing.Clip(expression, *, has_fields=True, full_case_insensitive=False, allow_similar=False, default_expression='DV', name=None)

Bases: object

Class for managing CLIPs:

  • parsing

  • executing

  • converting result

  • dumping code

The expression is compiled immediately.

Parameters:
  • expression (str) – The expression to compile into a runnable (by a ClipRuntime class) AST tree.

  • has_fields (bool) – If True, fields are allowed during parsing. If false they are not, the result of the CLIP is then a constant.

  • full_case_insensitive (bool) –

    CLIPS are case-insensitive, but, by default, field names are case-sensitive.

    If this parameter is True, field names are also case-insensitive (be careful at run time to take this into account, they are all converted to uppercase).

  • allow_similar (bool) –

    if full_case_insensitive is false (default), setting this parameter to True, allow two field names with the same letters but not the same case to be accepted. If not, this raises an error.

    Not used if full_case_insensitive is True

  • default_expression (str) – If expression is an empty string or is None, this value is taken. If it is again an empty string or None, an exception is raised.

  • name (Optional[str]) – Optional. The name of the CLIP. It appears in error messages, helping to identify it.

static as_boolean(value)
Parameters:

value (Union[str, int, float, ndarray]) – Result of execution of a CLIP.

Return type:

bool

Returns:

False if:
  • value` is an empty string

  • The numeric array has a zero size

  • The numeric array contains at least one default value or a zero.

True in all other cases. i.e.:
  • value is a not empty string

  • value has at least one value and all values are different from default value or zero.

static as_date(value, *, reference=np.datetime64('2001-01-01T00:00:00.000000'), unit='us')
Parameters:
  • value (Union[str, int, float, ndarray]) – Result of execution of a CLIP.

  • reference (Union[datetime, datetime64]) –

    for not absolute date (timedelta, numbers), this is the reference taken.

    If None, the reference taken is the current time.

  • unit (str) – The unit for the datetime (as for numpy.datetime64: “D”, “s”, “us”…)

Return type:

ndarray

Returns:

If value is a string, it is converted into date. If value is numerical, it is considered as a timedelta from the given reference expressed in unit. If result is a timedelta, it is converted by adding the given reference. If result is already a datetime, nothing is done.

static as_numerical(value, *, reference=np.datetime64('2001-01-01T00:00:00.000000'), unit='us')

Converts a value as a numeric vector.

Parameters:
  • value (Union[str, int, float, ndarray]) – Result of execution of a CLIP.

  • reference (datetime64) –

    The reference date to convert dates to timedelta and then to numbers.

    If None, the reference taken is the current time.

  • unit (str) – The unit of the dates to convert into string.

Return type:

ndarray

Returns:

value converted into a numerical array.

If value is already a numeric vector (i.e. is not a string), it is returned as is. If value is a string, it is evaluated as a number.

static as_string(value, *, reference=np.datetime64('2001-01-01T00:00:00.000000'), unit='us')

Converts a value as string.

Parameters:
  • value (Union[str, int, float, ndarray]) – Result of execution of a CLIP.

  • reference (datetime64) –

    The reference date to convert time delta values into dates and then convert into string.

    If None, the reference taken is the current time.

  • unit (str) – The unit of the dates to convert into string.

Return type:

Union[str_, ndarray]

Returns:

If value is already a string, it is returned as is. If the value is a numerical array, each value is converted into number. If it is not possible, the value si DV (np.nan)

If values are date or time deltas, they are printed as ISO format

static as_time_delta(value, *, reference=np.datetime64('2001-01-01T00:00:00.000000'), unit='us')
Parameters:
  • value (Union[str, int, float, ndarray]) – Result of execution of a CLIP.

  • reference (Union[datetime, datetime64]) –

    The reference date to convert date values into timedelta.

    If None, the reference taken is the current time.

  • unit (str) – The unit of the numbers used as time delta. It also is the unit of the result (as for numpy.timedelta64: “D”, “s”, “us”…).

Return type:

ndarray

Returns:

If value is a date or array of dates, the result is obtained by subtracting the reference date.

If values is numerical it is considered as already built time delta from the reference date in unit.

If values are already time deltas, they are returned as is (except if unit is different, in this case time deltas are converted).

If values are strings they are parsed to build dates. If the string describes a time delta it is returned, else the reference is taken as a base and removed.

property clip_type: ClipType

returns: The type of the clip (see ClipType)

execute(fields=None)

Execute the clip returning its result.

Depending on the CLIP the result can be a string, datetime64, timedelta64, number or a numeric array of that types.

Parameters:

fields (Optional[ClipFields]) – Object instance used to provide the field values for the clip. If None. each reference to a field in the clip will raise an exception.

Return type:

Union[str, int, float, ndarray]

Returns:

Result of the CLIP execution

Raises:

ClipFieldError: – Only if fields were authorized in CLIP and: - ClipFields is None and a field reference is in the CLIP. - a field is used in the CLIP and the ClipField object does not know this field and is not able (or don’t want) to provide a default value.

property expression: str

The current expression used for the CLIP.

Returns:

String as given to constructor or set_expression()

property fields: frozenset[str]

List of fields used in CLIP.

Returns:

List of fields. May be empty set

property formatted: str

Dumps the compiled expression.

It shows the compiled expression structure in a YAML like format (compiled structure cannot be rebuilt with the result of this property)

Returns:

Formatted version of the compiled (code) expression.

class casys.editing.ClipCondition(clip, name=None)

Bases: InvalidityCondition

Basic clip condition: no parameter is required.

clip

Clip definition

Examples

clip: ice_flag :== 1

Parameters:
check_fields_validity(data, dim_name, comp_name)

Check that all the fields only depend on the expected dimension.

Parameters:
  • data (Dataset) – Data to edit

  • dim_name (str) – Dataset dimension involved in the editing

  • comp_name (str) – Name of the EditingComponent

property clip
compute_invalidity_mask(data, dim_name)

Compute invalidity mask.

Parameters:
  • data (Dataset) – Dataset

  • dim_name (str) – Dataset dimension involved in the editing

Return type:

ndarray

Returns:

Invalidity mask (True or False array)

has_pass_requirement()

Whether this condition has pass requirements.

Return type:

bool

Returns:

True if the condition requires full passes data, False otherwise.

property name
required_fields()

Returns the list of required data fields to apply this condition.

Return type:

list[str]

Returns:

List of required data fields to apply this condition.

class casys.editing.ClipDataset(dset)

Bases: ClipFields

Class allowing the use of clips on a xarray Dataset.

Parameters:

dset (Dataset)

default_value(name)

Should be called if the field to be returned does not exist.

Parameters:

name (str) –

Name of the field.

Warning

As CLIPs are case-insensitive the name is always uppercase.

Return type:

Union[ndarray, str]

Returns:

Default value for this field

Raises:

ClipFieldError – If field cannot have a default value

field_list()

Returns the list of known fields.

Return type:

list[str]

Returns:

list of field names

get_value(name)

The main method of the class. Called by clip runtime to provide the named field value.

Parameters:

name

Name of the field to retrieve.

Warning

CLIPS are case-insensitive except for field names. But this can be changed to be fully case-insensitive. If this is the case, the field names are all UPPERCASE.

Returns:

Field value.

Raises:

ClipFieldError – If there is a problem with retrieving the field. In the base class, it is when the field is unknown.

normalize_field(value)

Utility method that should be used in all derived classes.

Takes a value and returns a “normalized” value. i.e. a numpy string or a numpy array (of strings, datetime, timedelta, float).

Parameters:

value (Any) – The field value to check. If None an empty numpy array is returned.

Return type:

Union[ndarray, str_]

Returns:

Value unchanged or normalized.

Raises:

ClipFieldError – If the value is not a string and cannot be coerced into a numpy array.

update(**kwargs)

This method update the ClipFields dict-like structure.

Parameters:

kwargs – Used as a dict-like structure to provide fields value upon constructor: key is clip field name, value is field value.

class casys.editing.Editing(parameters=None)

Bases: BaseAlgorithm[EditingParameters, EditingResults]

Editing algorithm.

Parameters:

parameters (Optional[TypeVar(AlgoParamType, bound= BaseParameters)])

ALGORITHM_NAME: ClassVar[str] = 'editing'
PARAMETER_CLASS: ClassVar[type[BaseParameters]] = None
SCHEMA_CLASS

alias of EditingParametersSchema

check_required_fields(data)

Check whether the provided data contains all required fields.

Parameters:

data (Dataset) – Data as a dataset.

Raises:

AlgorithmError – if provided data do not contain a required field.

classmethod get_class(name)

Access registered algorithms classes by their name.

Parameters:

name (str)

Return type:

type[BaseAlgorithm]

property parameters: AlgoParamType

Algorithm parameters accessor.

classmethod register()

Registering mechanism allowing the algorithm to be used in programs such as ong-the-one.

required_fields()

Returns the list of required data fields to run this algorithm.

Return type:

list[str]

Returns:

List of required data fields to run this algorithm.

run(data)

Main function to run algorithm.

Parameters:

data (Dataset) – Data to edit

Return type:

EditingResults

Returns:

Invalidity indicator

class casys.editing.EditingComponent(name, invalidity_conditions, value)

Bases: object

An EditingComponent contains a list of invalidity conditions of the same type and an invalidity indicator’s value. Each component will tag the data with its value if invalidating it.

Parameters:
  • name (str) – Name of Component

  • invalidity_conditions (list[InvalidityCondition]) – Conditions where the values will be considered as invalid

  • value (int) – Invalidity indicator’s value.

check_fields_validity(data, dim_name)

Check that all the fields only depend on the expected dimension.

Parameters:
  • data (Dataset) – Data to edit

  • dim_name (str) – Dataset dimension involved in the editing

has_pass_requirement()

Whether this component has pass requirements.

If any condition requires full passes data, the full component requires to be given full passes data.

Return type:

bool

Returns:

True if the component requires full passes data, False otherwise.

property invalidity_conditions: list[InvalidityCondition]
property name: str
required_fields()

Returns the list of required data fields to apply this editing.

Return type:

list[str]

Returns:

List of required data fields to apply this editing.

update_indicator(data, invalidity_indicator, dim_name)

Update invalidity indicator.

Parameters:
  • data (Dataset) – Data to edit

  • invalidity_indicator (ndarray) – Invalidity indicator

  • dim_name (str) – Dataset dimension involved in the editing

Return type:

EditingComponentResult

property value: int
class casys.editing.EditingComponentSchema(*, only=None, exclude=(), many=None, context=None, load_only=(), dump_only=(), partial=None, unknown=None)

Bases: BaseSchema

Parameters:
class Meta

Bases: object

Options object for a Schema.

Example usage:

from marshmallow import Schema


class MySchema(Schema):
    class Meta:
        fields = ("id", "email", "date_created")
        exclude = ("password", "secret_attribute")

A note on type checking

Type checkers will only check the attributes of the Meta <marshmallow.Schema.Meta> class if you explicitly subclass marshmallow.Schema.Meta.

from marshmallow import Schema


class MySchema(Schema):
    # Not checked by type checkers
    class Meta:
        additional = True


class MySchema2(Schema):
    # Type checkers will check attributes
    class Meta(Schema.Opts):
        additional = True  # Incompatible types in assignment

Removed in version 3.0.0b7: Remove strict.

Added in version 3.0.0b12: Add unknown.

Changed in version 3.0.0b17: Rename dateformat to datetimeformat.

Added in version 3.9.0: Add timeformat.

Changed in version 3.26.0: Deprecate ordered. Field order is preserved by default.

additional: ClassVar[tuple[str, ...] | list[str]]

Fields to include in addition to the explicitly declared fields. additional <marshmallow.Schema.Meta.additional> and fields <marshmallow.Schema.Meta.fields> are mutually-exclusive options.

dateformat: ClassVar[str]

Default format for Date <marshmallow.fields.Date> fields.

datetimeformat: ClassVar[str]

Default format for DateTime <marshmallow.fields.DateTime> fields.

dump_only: ClassVar[tuple[str, ...] | list[str]]

Fields to exclude from serialized results

exclude: ClassVar[tuple[str, ...] | list[str]]

Fields to exclude in the serialized result. Nested fields can be represented with dot delimiters.

fields: ClassVar[tuple[str, ...] | list[str]]

Fields to include in the (de)serialized result

include: ClassVar[dict[str, Field]]

Dictionary of additional fields to include in the schema. It is usually better to define fields as class variables, but you may need to use this option, e.g., if your fields are Python keywords.

index_errors: ClassVar[bool]

If True, errors dictionaries will include the index of invalid items in a collection.

load_only: ClassVar[tuple[str, ...] | list[str]]

Fields to exclude from serialized results

many: ClassVar[bool]

Whether data should be (de)serialized as a collection by default.

ordered: ClassVar[bool]

If True, Schema.dump <marshmallow.Schema.dump> is a collections.OrderedDict.

register: ClassVar[bool]

Whether to register the Schema <marshmallow.Schema> with marshmallow’s internal class registry. Must be True if you intend to refer to this Schema <marshmallow.Schema> by class name in Nested fields. Only set this to False when memory usage is critical. Defaults to True.

render_module: Any

Module to use for loads <marshmallow.Schema.loads> and dumps <marshmallow.Schema.dumps>. Defaults to json from the standard library.

timeformat: ClassVar[str]

Default format for Time <marshmallow.fields.Time> fields.

unknown: ClassVar[str]

Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE.

OPTIONS_CLASS

alias of SchemaOpts

TYPE_MAPPING: dict[type, type[Field]] = {<class 'bool'>: <class 'marshmallow.fields.Boolean'>, <class 'bytes'>: <class 'marshmallow.fields.String'>, <class 'datetime.date'>: <class 'marshmallow.fields.Date'>, <class 'datetime.datetime'>: <class 'marshmallow.fields.DateTime'>, <class 'datetime.time'>: <class 'marshmallow.fields.Time'>, <class 'datetime.timedelta'>: <class 'marshmallow.fields.TimeDelta'>, <class 'decimal.Decimal'>: <class 'marshmallow.fields.Decimal'>, <class 'float'>: <class 'marshmallow.fields.Float'>, <class 'int'>: <class 'marshmallow.fields.Integer'>, <class 'list'>: <class 'marshmallow.fields.Raw'>, <class 'set'>: <class 'marshmallow.fields.Raw'>, <class 'str'>: <class 'marshmallow.fields.String'>, <class 'tuple'>: <class 'marshmallow.fields.Raw'>, <class 'uuid.UUID'>: <class 'marshmallow.fields.UUID'>}
property dict_class: type[dict]

dict type to return when serializing.

dump(obj, *, many=None)

Serialize an object to native Python data types according to this Schema’s fields.

Parameters:
  • obj (Any) – The object to serialize.

  • many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

Serialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

Changed in version 3.0.0rc9: Validation no longer occurs upon serialization.

dumps(obj, *args, many=None, **kwargs)

Same as dump(), except return a JSON-encoded string.

Parameters:
  • obj (Any) – The object to serialize.

  • many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

A json string

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

error_messages: dict[str, str] = {}

Overrides for default schema-level error messages

fields: dict[str, Field]

Dictionary mapping field_names -> Field objects

classmethod from_dict(fields, *, name='GeneratedSchema')

Generate a Schema <marshmallow.Schema> class given a dictionary of fields.

from marshmallow import Schema, fields

PersonSchema = Schema.from_dict({"name": fields.Str()})
print(PersonSchema().load({"name": "David"}))  # => {'name': 'David'}

Generated schemas are not added to the class registry and therefore cannot be referred to by name in Nested fields.

Parameters:
  • fields (dict[str, Field]) – Dictionary mapping field names to field instances.

  • name (str) – Optional name for the class, which will appear in the repr for the class.

Return type:

type[Schema]

Added in version 3.0.0.

get_attribute(obj, attr, default)

Defines how to pull values from an object to serialize.

Changed in version 3.0.0a1: Changed position of obj and attr.

Parameters:
classmethod get_model()

Return the model associated to this schema.

Return type:

type[TypeVar(T)] | None

Returns:

Model associated to this schema.

handle_error(error, data, *, many, **kwargs)

Custom error handler function for the schema.

Parameters:
  • error (ValidationError) – The ValidationError raised during (de)serialization.

  • data (Any) – The original input data.

  • many (bool) – Value of many on dump or load.

  • partial – Value of partial on load.

Changed in version 3.0.0rc9: Receives many and partial (on deserialization) as keyword arguments.

load(data, *, many=None, partial=None, unknown=None)

Deserialize a data structure to an object defined by this Schema’s fields.

Parameters:
  • data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to deserialize.

  • many (bool | None) – Whether to deserialize data as a collection. If None, the value for self.many is used.

  • partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

  • unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

loads(json_data, *, many=None, partial=None, unknown=None, **kwargs)

Same as load(), except it uses marshmallow.Schema.Meta.render_module to deserialize the passed string before passing data to load().

Parameters:
  • json_data (str | bytes | bytearray) – A string of the data to deserialize.

  • many (bool | None) – Whether to deserialize obj as a collection. If None, the value for self.many is used.

  • partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

  • unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

make_object(data, **_)
on_bind_field(field_name, field_obj)

Hook to modify a field when it is bound to the Schema <marshmallow.Schema>.

No-op by default.

Parameters:
  • field_name (str)

  • field_obj (Field)

Return type:

None

opts: typing.Any = <marshmallow.schema.SchemaOpts object>
post_dump(data, original_data, **_)
pre_dump(data, **_)
pre_load(data, **_)
set_class

alias of OrderedSet

validate(data, *, many=None, partial=None)

Validate data against the schema, returning a dictionary of validation errors.

Parameters:
  • data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to validate.

  • many (bool | None) – Whether to validate data as a collection. If None, the value for self.many is used.

  • partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

Return type:

dict[str, list[str]]

Returns:

A dictionary of validation errors.

Added in version 1.1.0.

class casys.editing.EditingParameters(editing_sequence, log_file=None, dim_name=None)

Bases: BaseParameters

Parameters for Editing algorithm.

Parameters:
  • editing_sequence (list[EditingComponent]) – The sequence of components (list of algorithms) defining the invalidity parameters.

  • log_file (LogFile | None) – Logging file.

  • dim_name (str | None) – Dimension name.

dim_name: str | None = None
editing_sequence: list[EditingComponent]
log_file: LogFile | None = None
required_fields()

Returns the list of required data fields to run this algorithm.

Return type:

list[str]

Returns:

List of required data fields to run this algorithm.

class casys.editing.EditingParametersSchema(*, only=None, exclude=(), many=None, context=None, load_only=(), dump_only=(), partial=None, unknown=None)

Bases: BaseSchema

Parameters:
class Meta

Bases: object

Options object for a Schema.

Example usage:

from marshmallow import Schema


class MySchema(Schema):
    class Meta:
        fields = ("id", "email", "date_created")
        exclude = ("password", "secret_attribute")

A note on type checking

Type checkers will only check the attributes of the Meta <marshmallow.Schema.Meta> class if you explicitly subclass marshmallow.Schema.Meta.

from marshmallow import Schema


class MySchema(Schema):
    # Not checked by type checkers
    class Meta:
        additional = True


class MySchema2(Schema):
    # Type checkers will check attributes
    class Meta(Schema.Opts):
        additional = True  # Incompatible types in assignment

Removed in version 3.0.0b7: Remove strict.

Added in version 3.0.0b12: Add unknown.

Changed in version 3.0.0b17: Rename dateformat to datetimeformat.

Added in version 3.9.0: Add timeformat.

Changed in version 3.26.0: Deprecate ordered. Field order is preserved by default.

additional: ClassVar[tuple[str, ...] | list[str]]

Fields to include in addition to the explicitly declared fields. additional <marshmallow.Schema.Meta.additional> and fields <marshmallow.Schema.Meta.fields> are mutually-exclusive options.

dateformat: ClassVar[str]

Default format for Date <marshmallow.fields.Date> fields.

datetimeformat: ClassVar[str]

Default format for DateTime <marshmallow.fields.DateTime> fields.

dump_only: ClassVar[tuple[str, ...] | list[str]]

Fields to exclude from serialized results

exclude: ClassVar[tuple[str, ...] | list[str]]

Fields to exclude in the serialized result. Nested fields can be represented with dot delimiters.

fields: ClassVar[tuple[str, ...] | list[str]]

Fields to include in the (de)serialized result

include: ClassVar[dict[str, Field]]

Dictionary of additional fields to include in the schema. It is usually better to define fields as class variables, but you may need to use this option, e.g., if your fields are Python keywords.

index_errors: ClassVar[bool]

If True, errors dictionaries will include the index of invalid items in a collection.

load_only: ClassVar[tuple[str, ...] | list[str]]

Fields to exclude from serialized results

many: ClassVar[bool]

Whether data should be (de)serialized as a collection by default.

ordered: ClassVar[bool]

If True, Schema.dump <marshmallow.Schema.dump> is a collections.OrderedDict.

register: ClassVar[bool]

Whether to register the Schema <marshmallow.Schema> with marshmallow’s internal class registry. Must be True if you intend to refer to this Schema <marshmallow.Schema> by class name in Nested fields. Only set this to False when memory usage is critical. Defaults to True.

render_module: Any

Module to use for loads <marshmallow.Schema.loads> and dumps <marshmallow.Schema.dumps>. Defaults to json from the standard library.

timeformat: ClassVar[str]

Default format for Time <marshmallow.fields.Time> fields.

unknown: ClassVar[str]

Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE.

OPTIONS_CLASS

alias of SchemaOpts

TYPE_MAPPING: dict[type, type[Field]] = {<class 'bool'>: <class 'marshmallow.fields.Boolean'>, <class 'bytes'>: <class 'marshmallow.fields.String'>, <class 'datetime.date'>: <class 'marshmallow.fields.Date'>, <class 'datetime.datetime'>: <class 'marshmallow.fields.DateTime'>, <class 'datetime.time'>: <class 'marshmallow.fields.Time'>, <class 'datetime.timedelta'>: <class 'marshmallow.fields.TimeDelta'>, <class 'decimal.Decimal'>: <class 'marshmallow.fields.Decimal'>, <class 'float'>: <class 'marshmallow.fields.Float'>, <class 'int'>: <class 'marshmallow.fields.Integer'>, <class 'list'>: <class 'marshmallow.fields.Raw'>, <class 'set'>: <class 'marshmallow.fields.Raw'>, <class 'str'>: <class 'marshmallow.fields.String'>, <class 'tuple'>: <class 'marshmallow.fields.Raw'>, <class 'uuid.UUID'>: <class 'marshmallow.fields.UUID'>}
property dict_class: type[dict]

dict type to return when serializing.

dump(obj, *, many=None)

Serialize an object to native Python data types according to this Schema’s fields.

Parameters:
  • obj (Any) – The object to serialize.

  • many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

Serialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

Changed in version 3.0.0rc9: Validation no longer occurs upon serialization.

dumps(obj, *args, many=None, **kwargs)

Same as dump(), except return a JSON-encoded string.

Parameters:
  • obj (Any) – The object to serialize.

  • many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

A json string

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

error_messages: dict[str, str] = {}

Overrides for default schema-level error messages

fields: dict[str, Field]

Dictionary mapping field_names -> Field objects

classmethod from_dict(fields, *, name='GeneratedSchema')

Generate a Schema <marshmallow.Schema> class given a dictionary of fields.

from marshmallow import Schema, fields

PersonSchema = Schema.from_dict({"name": fields.Str()})
print(PersonSchema().load({"name": "David"}))  # => {'name': 'David'}

Generated schemas are not added to the class registry and therefore cannot be referred to by name in Nested fields.

Parameters:
  • fields (dict[str, Field]) – Dictionary mapping field names to field instances.

  • name (str) – Optional name for the class, which will appear in the repr for the class.

Return type:

type[Schema]

Added in version 3.0.0.

get_attribute(obj, attr, default)

Defines how to pull values from an object to serialize.

Changed in version 3.0.0a1: Changed position of obj and attr.

Parameters:
classmethod get_model()

Return the model associated to this schema.

Return type:

type[TypeVar(T)] | None

Returns:

Model associated to this schema.

handle_error(error, data, *, many, **kwargs)

Custom error handler function for the schema.

Parameters:
  • error (ValidationError) – The ValidationError raised during (de)serialization.

  • data (Any) – The original input data.

  • many (bool) – Value of many on dump or load.

  • partial – Value of partial on load.

Changed in version 3.0.0rc9: Receives many and partial (on deserialization) as keyword arguments.

load(data, *, many=None, partial=None, unknown=None)

Deserialize a data structure to an object defined by this Schema’s fields.

Parameters:
  • data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to deserialize.

  • many (bool | None) – Whether to deserialize data as a collection. If None, the value for self.many is used.

  • partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

  • unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

loads(json_data, *, many=None, partial=None, unknown=None, **kwargs)

Same as load(), except it uses marshmallow.Schema.Meta.render_module to deserialize the passed string before passing data to load().

Parameters:
  • json_data (str | bytes | bytearray) – A string of the data to deserialize.

  • many (bool | None) – Whether to deserialize obj as a collection. If None, the value for self.many is used.

  • partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

  • unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

make_object(data, **_)
on_bind_field(field_name, field_obj)

Hook to modify a field when it is bound to the Schema <marshmallow.Schema>.

No-op by default.

Parameters:
  • field_name (str)

  • field_obj (Field)

Return type:

None

opts: typing.Any = <marshmallow.schema.SchemaOpts object>
post_dump(data, original_data, **_)
pre_dump(data, **_)
pre_load(data, **_)
set_class

alias of OrderedSet

validate(data, *, many=None, partial=None)

Validate data against the schema, returning a dictionary of validation errors.

Parameters:
  • data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to validate.

  • many (bool | None) – Whether to validate data as a collection. If None, the value for self.many is used.

  • partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

Return type:

dict[str, list[str]]

Returns:

A dictionary of validation errors.

Added in version 1.1.0.

class casys.editing.EditingResults(invalidity_indicator, components)

Bases: BaseResults

Editing algorithms results.

Parameters:
components: list[EditingComponentResult]
get_value(field)

Returns the value of the field specified by its name.

Parameters:

field (str)

Return type:

Any

invalidity_indicator: ndarray
class casys.editing.InvalidityCondition(clip, name=None)

Bases: ABC

InvalidityCondition base class.

Parameters:
check_fields_validity(data, dim_name, comp_name)

Check that all the fields only depend on the expected dimension.

Parameters:
  • data (Dataset) – Data to edit

  • dim_name (str) – Dataset dimension involved in the editing

  • comp_name (str) – Name of the EditingComponent

property clip
abstract compute_invalidity_mask(data, dim_name)

Compute invalidity mask.

Parameters:
  • data (Dataset) – Dataset

  • dim_name (str) – Dataset dimension involved in the editing

Return type:

ndarray

Returns:

Invalidity mask (True or False array)

has_pass_requirement()

Whether this condition has pass requirements.

Return type:

bool

Returns:

True if the condition requires full passes data, False otherwise.

property name
required_fields()

Returns the list of required data fields to apply this condition.

Return type:

list[str]

Returns:

List of required data fields to apply this condition.

class casys.editing.InvalidityConditionBaseSchema(*, only=None, exclude=(), many=None, context=None, load_only=(), dump_only=(), partial=None, unknown=None)

Bases: RegistryBaseSchema

InvalidityCondition base schema.

Parameters:
class Meta

Bases: object

Options object for a Schema.

Example usage:

from marshmallow import Schema


class MySchema(Schema):
    class Meta:
        fields = ("id", "email", "date_created")
        exclude = ("password", "secret_attribute")

A note on type checking

Type checkers will only check the attributes of the Meta <marshmallow.Schema.Meta> class if you explicitly subclass marshmallow.Schema.Meta.

from marshmallow import Schema


class MySchema(Schema):
    # Not checked by type checkers
    class Meta:
        additional = True


class MySchema2(Schema):
    # Type checkers will check attributes
    class Meta(Schema.Opts):
        additional = True  # Incompatible types in assignment

Removed in version 3.0.0b7: Remove strict.

Added in version 3.0.0b12: Add unknown.

Changed in version 3.0.0b17: Rename dateformat to datetimeformat.

Added in version 3.9.0: Add timeformat.

Changed in version 3.26.0: Deprecate ordered. Field order is preserved by default.

additional: ClassVar[tuple[str, ...] | list[str]]

Fields to include in addition to the explicitly declared fields. additional <marshmallow.Schema.Meta.additional> and fields <marshmallow.Schema.Meta.fields> are mutually-exclusive options.

dateformat: ClassVar[str]

Default format for Date <marshmallow.fields.Date> fields.

datetimeformat: ClassVar[str]

Default format for DateTime <marshmallow.fields.DateTime> fields.

dump_only: ClassVar[tuple[str, ...] | list[str]]

Fields to exclude from serialized results

exclude: ClassVar[tuple[str, ...] | list[str]]

Fields to exclude in the serialized result. Nested fields can be represented with dot delimiters.

fields: ClassVar[tuple[str, ...] | list[str]]

Fields to include in the (de)serialized result

include: ClassVar[dict[str, Field]]

Dictionary of additional fields to include in the schema. It is usually better to define fields as class variables, but you may need to use this option, e.g., if your fields are Python keywords.

index_errors: ClassVar[bool]

If True, errors dictionaries will include the index of invalid items in a collection.

load_only: ClassVar[tuple[str, ...] | list[str]]

Fields to exclude from serialized results

many: ClassVar[bool]

Whether data should be (de)serialized as a collection by default.

ordered: ClassVar[bool]

If True, Schema.dump <marshmallow.Schema.dump> is a collections.OrderedDict.

register: ClassVar[bool]

Whether to register the Schema <marshmallow.Schema> with marshmallow’s internal class registry. Must be True if you intend to refer to this Schema <marshmallow.Schema> by class name in Nested fields. Only set this to False when memory usage is critical. Defaults to True.

render_module: Any

Module to use for loads <marshmallow.Schema.loads> and dumps <marshmallow.Schema.dumps>. Defaults to json from the standard library.

timeformat: ClassVar[str]

Default format for Time <marshmallow.fields.Time> fields.

unknown: ClassVar[str]

Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE.

OPTIONS_CLASS

alias of SchemaOpts

TYPE_MAPPING: dict[type, type[Field]] = {<class 'bool'>: <class 'marshmallow.fields.Boolean'>, <class 'bytes'>: <class 'marshmallow.fields.String'>, <class 'datetime.date'>: <class 'marshmallow.fields.Date'>, <class 'datetime.datetime'>: <class 'marshmallow.fields.DateTime'>, <class 'datetime.time'>: <class 'marshmallow.fields.Time'>, <class 'datetime.timedelta'>: <class 'marshmallow.fields.TimeDelta'>, <class 'decimal.Decimal'>: <class 'marshmallow.fields.Decimal'>, <class 'float'>: <class 'marshmallow.fields.Float'>, <class 'int'>: <class 'marshmallow.fields.Integer'>, <class 'list'>: <class 'marshmallow.fields.Raw'>, <class 'set'>: <class 'marshmallow.fields.Raw'>, <class 'str'>: <class 'marshmallow.fields.String'>, <class 'tuple'>: <class 'marshmallow.fields.Raw'>, <class 'uuid.UUID'>: <class 'marshmallow.fields.UUID'>}
classmethod clear_registry()

Clear everything from this schema’s registry.

property dict_class: type[dict]

dict type to return when serializing.

dump(obj, *, many=None)

Serialize an object to native Python data types according to this Schema’s fields.

Parameters:
  • obj (Any) – The object to serialize.

  • many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

Serialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

Changed in version 3.0.0rc9: Validation no longer occurs upon serialization.

dumps(obj, *args, many=None, **kwargs)

Same as dump(), except return a JSON-encoded string.

Parameters:
  • obj (Any) – The object to serialize.

  • many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

A json string

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

error_messages: dict[str, str] = {}

Overrides for default schema-level error messages

fields: dict[str, Field]

Dictionary mapping field_names -> Field objects

classmethod from_dict(fields, *, name='GeneratedSchema')

Generate a Schema <marshmallow.Schema> class given a dictionary of fields.

from marshmallow import Schema, fields

PersonSchema = Schema.from_dict({"name": fields.Str()})
print(PersonSchema().load({"name": "David"}))  # => {'name': 'David'}

Generated schemas are not added to the class registry and therefore cannot be referred to by name in Nested fields.

Parameters:
  • fields (dict[str, Field]) – Dictionary mapping field names to field instances.

  • name (str) – Optional name for the class, which will appear in the repr for the class.

Return type:

type[Schema]

Added in version 3.0.0.

get_attribute(obj, attr, default)

Defines how to pull values from an object to serialize.

Changed in version 3.0.0a1: Changed position of obj and attr.

Parameters:
classmethod get_class(name: str) type[RegistryBaseSchema]

Return the registered class associated with the provided name.

Parameters:

name (str) – Identifier of the schema.

Return type:

type[RegistryBaseSchema]

Returns:

Corresponding schema class.

classmethod get_model()

Return the model associated to this schema.

Return type:

type[TypeVar(T)] | None

Returns:

Model associated to this schema.

classmethod get_model_schema(model: type) type[RegistryBaseSchema]

Return the registered class associated with the provided model.

Parameters:

model (type) – Identifier of the model.

Return type:

type[RegistryBaseSchema]

Returns:

Corresponding schema class.

classmethod get_type()

Schema’s ID.

Return type:

str

handle_error(error, data, *, many, **kwargs)

Custom error handler function for the schema.

Parameters:
  • error (ValidationError) – The ValidationError raised during (de)serialization.

  • data (Any) – The original input data.

  • many (bool) – Value of many on dump or load.

  • partial – Value of partial on load.

Changed in version 3.0.0rc9: Receives many and partial (on deserialization) as keyword arguments.

classmethod has_class(name: str) bool

Test if the provided name is registered.

Parameters:

name (str) – Name of the class.

Return type:

bool

Returns:

True if a class with this name is registered, False otherwise.

load(data, *, many=None, partial=None, unknown=None)

Deserialize a data structure to an object defined by this Schema’s fields.

Parameters:
  • data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to deserialize.

  • many (bool | None) – Whether to deserialize data as a collection. If None, the value for self.many is used.

  • partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

  • unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

loads(json_data, *, many=None, partial=None, unknown=None, **kwargs)

Same as load(), except it uses marshmallow.Schema.Meta.render_module to deserialize the passed string before passing data to load().

Parameters:
  • json_data (str | bytes | bytearray) – A string of the data to deserialize.

  • many (bool | None) – Whether to deserialize obj as a collection. If None, the value for self.many is used.

  • partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

  • unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

make_object(data, **_)
on_bind_field(field_name, field_obj)

Hook to modify a field when it is bound to the Schema <marshmallow.Schema>.

No-op by default.

Parameters:
  • field_name (str)

  • field_obj (Field)

Return type:

None

opts: typing.Any = <marshmallow.schema.SchemaOpts object>
post_dump(data, original_data, **_)
pre_dump(data, **_)
pre_load(data, **_)
classmethod register()

Register the current class.

classmethod register_schema(schema, exception)

Register the provided schema.

Parameters:
classmethod registry()

Returns a copy of the registry.

Return type:

dict[str, type[RegistryAbstractSchema]]

classmethod remove_registry(name)
Parameters:

name (str)

set_class

alias of OrderedSet

classmethod update_registry(schema)

Update current registry with the provided one. An error is raised if the same name is used twice.

Parameters:

schema (type[RegistryAbstractSchema]) – Schemas to register.

validate(data, *, many=None, partial=None)

Validate data against the schema, returning a dictionary of validation errors.

Parameters:
  • data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to validate.

  • many (bool | None) – Whether to validate data as a collection. If None, the value for self.many is used.

  • partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

Return type:

dict[str, list[str]]

Returns:

A dictionary of validation errors.

Added in version 1.1.0.

validate_threshold(data, **_)
casys.editing.InvalidityConditionGenericSchema

alias of RegistryGenericSchema

class casys.editing.IterativeFilter(clip, nbr_iter, filter, threshold, std_coeff=1.0, const_coeff=0.0, name=None)

Bases: InvalidityCondition

Iterative processing, invalidating at each step the values “too far” from the filtered values. The filtered_values method yields the final filtered values.

Parameters:
  • clip (Clip | str) – CLip definition

  • nbr_iter (int) – Number of iterations. The effective number may be lower if no new outliers can be obtained.

  • filter (Filter) – Filter to apply.

  • threshold (int | float | str) – At each iteration, invalidate outliers = values where |values - filter(values)| > (std_coeff*std+const_coeff)*threshold (then replace values by filter(values) for outliers). May be a numeric value or a clip.

  • std_coeff (int | float | str) – Coefficient attached to the standard deviation (std). May be a numeric value or a clip.

  • const_coeff (int | float | str) – Constant coefficient. May be a numeric value or a clip.

  • name (str | None)

check_fields_validity(data, dim_name, comp_name)

Check that all the fields only depend on the expected dimension.

Parameters:
  • data (Dataset) – Data to edit

  • dim_name (str) – Dataset dimension involved in the editing

  • comp_name (str) – Name of the EditingComponent

property clip
compute_invalidity_mask(data, dim_name)

Compute invalidity mask.

Parameters:
  • data (Dataset) – Dataset

  • dim_name (str) – Dataset dimension involved in the editing

Return type:

ndarray

Returns:

Invalidity mask (True or False array)

property const_coeff: int | float | str
property filter: Filter
property filtered_values
has_pass_requirement()

Whether this condition has pass requirements.

Return type:

bool

Returns:

True if the condition requires full passes data, False otherwise.

property name
property nbr_iter: int
required_fields()

Returns the list of required data fields to apply this condition.

Return type:

list[str]

Returns:

List of required data fields to apply this condition.

property std_coeff: int | float | str
property threshold: int | float | str
class casys.editing.RobustMeanStd(clip, nbr_iter, threshold, name=None)

Bases: InvalidityCondition

Iterative processing, invalidating at each step the values “too far” from the global mean.

Parameters:
  • clip (Clip | str) – Clip definition

  • nbr_iter (int) – Number of iterations

  • threshold (Union[int, float, str]) – At each iteration, invalidate outliers = values where |values - mean| > threshold * std (then compute mean and std again)

  • name (Optional[str])

check_fields_validity(data, dim_name, comp_name)

Check that all the fields only depend on the expected dimension.

Parameters:
  • data (Dataset) – Data to edit

  • dim_name (str) – Dataset dimension involved in the editing

  • comp_name (str) – Name of the EditingComponent

property clip
compute_invalidity_mask(data, dim_name)

Compute invalidity mask.

Parameters:
  • data (Dataset) – Dataset

  • dim_name (str) – Dataset dimension involved in the editing

Return type:

ndarray

Returns:

Invalidity mask (True or False array)

has_pass_requirement()

Whether this condition has pass requirements.

Return type:

bool

Returns:

True if the condition requires full passes data, False otherwise.

property name
property nbr_iter: int
required_fields()

Returns the list of required data fields to apply this condition.

Return type:

list[str]

Returns:

List of required data fields to apply this condition.

property threshold: int | float | str
class casys.editing.StatisticsByPass(clip, nbr_min_pts, threshold, orf, name=None)

Bases: InvalidityCondition

Invalidate passes where mean or std statistics exceed the thresholds.

Parameters:
  • clip (Clip | str) – CLip definition

  • nbr_min_pts (int) – Minimum number of points for a pass to be invalidated

  • threshold (dict[str, Any]) – Mean and std thresholds

  • orf (PassIndexer) – Orf description

  • name (Optional[str])

check_fields_validity(data, dim_name, comp_name)

Check that all the fields only depend on the expected dimension.

Parameters:
  • data (Dataset) – Data to edit

  • dim_name (str) – Dataset dimension involved in the editing

  • comp_name (str) – Name of the EditingComponent

property clip
compute_invalidity_mask(data, dim_name)

Compute invalidity mask.

Parameters:
  • data (Dataset) – Dataset

  • dim_name (str) – Dataset dimension involved in the editing

Return type:

ndarray

Returns:

Invalidity mask (True or False array)

has_pass_requirement()

Whether this condition has pass requirements.

Return type:

bool

Returns:

True if the condition requires full passes data, False otherwise.

property name
property nbr_min_pts: int
property orf: PassIndexer
required_fields()

Returns the list of required data fields to apply this condition.

Return type:

list[str]

Returns:

List of required data fields to apply this condition.

property threshold: dict[str, Any]