casys.editing

OCTANT-NG editing components classes.

Classes

`Clip`(expression, *[, has_fields, ...])	Class for managing CLIPs:
`ClipCondition`(clip[, name])	Basic clip condition: no parameter is required.
`ClipDataset`(dset)	Class allowing the use of clips on a xarray Dataset.
`Editing`([parameters])	Editing algorithm.
`EditingComponent`(name, ...)	An EditingComponent contains a list of invalidity conditions of the same type and an invalidity indicator's value.
`EditingComponentSchema`(*[, only, exclude, ...])
`EditingParameters`(editing_sequence[, ...])	Parameters for Editing algorithm.
`EditingParametersSchema`(*[, only, exclude, ...])
`EditingResults`(invalidity_indicator, components)	Editing algorithms results.
`InvalidityCondition`(clip[, name])	InvalidityCondition base class.
`InvalidityConditionBaseSchema`(*[, only, ...])	InvalidityCondition base schema.
`InvalidityConditionGenericSchema`	alias of `RegistryGenericSchema`
`IterativeFilter`(clip, nbr_iter, filter, ...)	Iterative processing, invalidating at each step the values "too far" from the filtered values.
`RobustMeanStd`(clip, nbr_iter, threshold[, name])	Iterative processing, invalidating at each step the values "too far" from the global mean.
`StatisticsByPass`(clip, nbr_min_pts, ...[, name])	Invalidate passes where mean or std statistics exceed the thresholds.

class casys.editing.Clip(expression, *, has_fields=True, full_case_insensitive=False, allow_similar=False, default_expression='DV', name=None)

Bases: object

Class for managing CLIPs:

parsing

executing

converting result

dumping code

The expression is compiled immediately.

Parameters:

expression (str) – The expression to compile into a runnable (by a ClipRuntime class) AST tree.
has_fields (bool) – If True, fields are allowed during parsing. If false they are not, the result of the CLIP is then a constant.
full_case_insensitive (bool) –
CLIPS are case-insensitive, but, by default, field names are case-sensitive.

If this parameter is True, field names are also case-insensitive (be careful at run time to take this into account, they are all converted to uppercase).
allow_similar (bool) –
if full_case_insensitive is false (default), setting this parameter to True, allow two field names with the same letters but not the same case to be accepted. If not, this raises an error.

Not used if full_case_insensitive is True
default_expression (str) – If expression is an empty string or is None, this value is taken. If it is again an empty string or None, an exception is raised.
name (Optional[str]) – Optional. The name of the CLIP. It appears in error messages, helping to identify it.

static as_boolean(value)

Parameters:

value (Union[str, int, float, ndarray]) – Result of execution of a CLIP.

Return type:

bool

Returns:

False if:

value` is an empty string
The numeric array has a zero size
The numeric array contains at least one default value or a zero.

True in all other cases. i.e.:

value is a not empty string
value has at least one value and all values are different from default value or zero.

static as_date(value, *, reference=np.datetime64('2001-01-01T00:00:00.000000'), unit='us')

Parameters:

value (Union[str, int, float, ndarray]) – Result of execution of a CLIP.
reference (Union[datetime, datetime64]) –
for not absolute date (timedelta, numbers), this is the reference taken.

If None, the reference taken is the current time.
unit (str) – The unit for the datetime (as for numpy.datetime64: “D”, “s”, “us”…)

Return type:

ndarray

Returns:

If value is a string, it is converted into date. If value is numerical, it is considered as a timedelta from the given reference expressed in unit. If result is a timedelta, it is converted by adding the given reference. If result is already a datetime, nothing is done.

static as_numerical(value, *, reference=np.datetime64('2001-01-01T00:00:00.000000'), unit='us')

Converts a value as a numeric vector.

Parameters:

value (Union[str, int, float, ndarray]) – Result of execution of a CLIP.
reference (datetime64) –
The reference date to convert dates to timedelta and then to numbers.

If None, the reference taken is the current time.
unit (str) – The unit of the dates to convert into string.

Return type:

ndarray

Returns:

value converted into a numerical array.

If value is already a numeric vector (i.e. is not a string), it is returned as is. If value is a string, it is evaluated as a number.

static as_string(value, *, reference=np.datetime64('2001-01-01T00:00:00.000000'), unit='us')

Converts a value as string.

Parameters:

value (Union[str, int, float, ndarray]) – Result of execution of a CLIP.
reference (datetime64) –
The reference date to convert time delta values into dates and then convert into string.

If None, the reference taken is the current time.
unit (str) – The unit of the dates to convert into string.

Return type:

Union[str_, ndarray]

Returns:

If value is already a string, it is returned as is. If the value is a numerical array, each value is converted into number. If it is not possible, the value si DV (np.nan)

If values are date or time deltas, they are printed as ISO format

static as_time_delta(value, *, reference=np.datetime64('2001-01-01T00:00:00.000000'), unit='us')

Parameters:

value (Union[str, int, float, ndarray]) – Result of execution of a CLIP.
reference (Union[datetime, datetime64]) –
The reference date to convert date values into timedelta.

If None, the reference taken is the current time.
unit (str) – The unit of the numbers used as time delta. It also is the unit of the result (as for numpy.timedelta64: “D”, “s”, “us”…).

Return type:

ndarray

Returns:

If value is a date or array of dates, the result is obtained by subtracting the reference date.

If values is numerical it is considered as already built time delta from the reference date in unit.

If values are already time deltas, they are returned as is (except if unit is different, in this case time deltas are converted).

If values are strings they are parsed to build dates. If the string describes a time delta it is returned, else the reference is taken as a base and removed.

property clip_type: ClipType: returns: The type of the clip (see ClipType)

execute(fields=None)

Execute the clip returning its result.

Depending on the CLIP the result can be a string, datetime64, timedelta64, number or a numeric array of that types.

Parameters:: fields (Optional[ClipFields]) – Object instance used to provide the field values for the clip. If None. each reference to a field in the clip will raise an exception.
Return type:: Union[str, int, float, ndarray]
Returns:: Result of the CLIP execution
Raises:: ClipFieldError: – Only if fields were authorized in CLIP and: - ClipFields is None and a field reference is in the CLIP. - a field is used in the CLIP and the ClipField object does not know this field and is not able (or don’t want) to provide a default value.

property expression: str

The current expression used for the CLIP.

Returns:: String as given to constructor or set_expression()

property fields: frozenset[str]

List of fields used in CLIP.

Returns:: List of fields. May be empty set

property formatted: str

Dumps the compiled expression.

It shows the compiled expression structure in a YAML like format (compiled structure cannot be rebuilt with the result of this property)

Returns:: Formatted version of the compiled (code) expression.

class casys.editing.ClipCondition(clip, name=None)

Bases: InvalidityCondition

Basic clip condition: no parameter is required.

clip: Clip definition

Examples

clip: ice_flag :== 1

Parameters:

clip (Clip | str)
name (Optional[str])

check_fields_validity(data, dim_name, comp_name)

Check that all the fields only depend on the expected dimension.

Parameters:

data (Dataset) – Data to edit
dim_name (str) – Dataset dimension involved in the editing
comp_name (str) – Name of the EditingComponent

property clip

compute_invalidity_mask(data, dim_name)

Compute invalidity mask.

Parameters:

data (Dataset) – Dataset
dim_name (str) – Dataset dimension involved in the editing

Return type:

ndarray

Returns:

Invalidity mask (True or False array)

has_pass_requirement()

Whether this condition has pass requirements.

Return type:: bool
Returns:: True if the condition requires full passes data, False otherwise.

property name

required_fields()

Returns the list of required data fields to apply this condition.

Return type:: list[str]
Returns:: List of required data fields to apply this condition.

class casys.editing.ClipDataset(dset)

Bases: ClipFields

Class allowing the use of clips on a xarray Dataset.

Parameters:: dset (Dataset)

default_value(name)

Should be called if the field to be returned does not exist.

Parameters:

name (str) –

Name of the field.

Warning

As CLIPs are case-insensitive the name is always uppercase.

Return type:

Union[ndarray, str]

Returns:

Default value for this field

Raises:

ClipFieldError – If field cannot have a default value

field_list()

Returns the list of known fields.

Return type:: list[str]
Returns:: list of field names

get_value(name)

The main method of the class. Called by clip runtime to provide the named field value.

Parameters:

name –

Name of the field to retrieve.

Warning

CLIPS are case-insensitive except for field names. But this can be changed to be fully case-insensitive. If this is the case, the field names are all UPPERCASE.

Returns:

Field value.

Raises:

ClipFieldError – If there is a problem with retrieving the field. In the base class, it is when the field is unknown.

See also

normalize_field()

normalize_field(value)

Utility method that should be used in all derived classes.

Takes a value and returns a “normalized” value. i.e. a numpy string or a numpy array (of strings, datetime, timedelta, float).

Parameters:: value (Any) – The field value to check. If None an empty numpy array is returned.
Return type:: Union[ndarray, str_]
Returns:: Value unchanged or normalized.
Raises:: ClipFieldError – If the value is not a string and cannot be coerced into a numpy array.

update(**kwargs)

This method update the ClipFields dict-like structure.

Parameters:: kwargs – Used as a dict-like structure to provide fields value upon constructor: key is clip field name, value is field value.

class casys.editing.Editing(parameters=None)

Bases: BaseAlgorithm[EditingParameters, EditingResults]

Editing algorithm.

Parameters:: parameters (Optional[TypeVar(AlgoParamType, bound= BaseParameters)])

ALGORITHM_NAME: ClassVar[str] = 'editing'

PARAMETER_CLASS: ClassVar[type[BaseParameters]] = None

SCHEMA_CLASS: alias of EditingParametersSchema

check_required_fields(data)

Check whether the provided data contains all required fields.

Parameters:: data (Dataset) – Data as a dataset.
Raises:: AlgorithmError – if provided data do not contain a required field.

classmethod get_class(name)

Access registered algorithms classes by their name.

Parameters:: name (str)
Return type:: type[BaseAlgorithm]

property parameters: AlgoParamType: Algorithm parameters accessor.

classmethod register(): Registering mechanism allowing the algorithm to be used in programs such as ong-the-one.

required_fields()

Returns the list of required data fields to run this algorithm.

Return type:: list[str]
Returns:: List of required data fields to run this algorithm.

run(data)

Main function to run algorithm.

Parameters:: data (Dataset) – Data to edit
Return type:: EditingResults
Returns:: Invalidity indicator

class casys.editing.EditingComponent(name, invalidity_conditions, value)

Bases: object

An EditingComponent contains a list of invalidity conditions of the same type and an invalidity indicator’s value. Each component will tag the data with its value if invalidating it.

Parameters:

name (str) – Name of Component
invalidity_conditions (list[InvalidityCondition]) – Conditions where the values will be considered as invalid
value (int) – Invalidity indicator’s value.

check_fields_validity(data, dim_name)

Check that all the fields only depend on the expected dimension.

Parameters:

data (Dataset) – Data to edit
dim_name (str) – Dataset dimension involved in the editing

has_pass_requirement()

Whether this component has pass requirements.

If any condition requires full passes data, the full component requires to be given full passes data.

Return type:: bool
Returns:: True if the component requires full passes data, False otherwise.

property invalidity_conditions: list[InvalidityCondition]

property name: str

required_fields()

Returns the list of required data fields to apply this editing.

Return type:: list[str]
Returns:: List of required data fields to apply this editing.

update_indicator(data, invalidity_indicator, dim_name)

Update invalidity indicator.

Parameters:

data (Dataset) – Data to edit
invalidity_indicator (ndarray) – Invalidity indicator
dim_name (str) – Dataset dimension involved in the editing

Return type:

EditingComponentResult

property value: int

class casys.editing.EditingComponentSchema(*, only=None, exclude=(), many=None, context=None, load_only=(), dump_only=(), partial=None, unknown=None)

Bases: BaseSchema

Parameters:

only (Union[Sequence[str], AbstractSet[str], None])
exclude (Union[Sequence[str], AbstractSet[str]])
many (bool | None)
context (dict | None)
load_only (Union[Sequence[str], AbstractSet[str]])
dump_only (Union[Sequence[str], AbstractSet[str]])
partial (Union[bool, Sequence[str], AbstractSet[str], None])
unknown (str | None)

class Meta

Bases: object

Options object for a Schema.

Example usage:

from marshmallow import Schema


class MySchema(Schema):
    class Meta:
        fields = ("id", "email", "date_created")
        exclude = ("password", "secret_attribute")

A note on type checking

Type checkers will only check the attributes of the Meta <marshmallow.Schema.Meta> class if you explicitly subclass marshmallow.Schema.Meta.

from marshmallow import Schema


class MySchema(Schema):
    # Not checked by type checkers
    class Meta:
        additional = True


class MySchema2(Schema):
    # Type checkers will check attributes
    class Meta(Schema.Opts):
        additional = True  # Incompatible types in assignment

Removed in version 3.0.0b7: Remove strict.

Added in version 3.0.0b12: Add unknown.

Changed in version 3.0.0b17: Rename dateformat to datetimeformat.

Added in version 3.9.0: Add timeformat.

Changed in version 3.26.0: Deprecate ordered. Field order is preserved by default.

additional: ClassVar[tuple[str, ...] | list[str]]: Fields to include in addition to the explicitly declared fields. additional <marshmallow.Schema.Meta.additional> and fields <marshmallow.Schema.Meta.fields> are mutually-exclusive options.

dateformat: ClassVar[str]: Default format for Date <marshmallow.fields.Date> fields.

datetimeformat: ClassVar[str]: Default format for DateTime <marshmallow.fields.DateTime> fields.

dump_only: ClassVar[tuple[str, ...] | list[str]]: Fields to exclude from serialized results

exclude: ClassVar[tuple[str, ...] | list[str]]: Fields to exclude in the serialized result. Nested fields can be represented with dot delimiters.

fields: ClassVar[tuple[str, ...] | list[str]]: Fields to include in the (de)serialized result

include: ClassVar[dict[str, Field]]: Dictionary of additional fields to include in the schema. It is usually better to define fields as class variables, but you may need to use this option, e.g., if your fields are Python keywords.

index_errors: ClassVar[bool]: If True, errors dictionaries will include the index of invalid items in a collection.

load_only: ClassVar[tuple[str, ...] | list[str]]: Fields to exclude from serialized results

many: ClassVar[bool]: Whether data should be (de)serialized as a collection by default.

ordered: ClassVar[bool]: If True, Schema.dump <marshmallow.Schema.dump> is a collections.OrderedDict.

register: ClassVar[bool]: Whether to register the Schema <marshmallow.Schema> with marshmallow’s internal class registry. Must be True if you intend to refer to this Schema <marshmallow.Schema> by class name in Nested fields. Only set this to False when memory usage is critical. Defaults to True.

render_module: Any: Module to use for loads <marshmallow.Schema.loads> and dumps <marshmallow.Schema.dumps>. Defaults to json from the standard library.

timeformat: ClassVar[str]: Default format for Time <marshmallow.fields.Time> fields.

unknown: ClassVar[str]: Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE.

OPTIONS_CLASS: alias of SchemaOpts

TYPE_MAPPING: dict[type, type[Field]] = {<class 'bool'>: <class 'marshmallow.fields.Boolean'>, <class 'bytes'>: <class 'marshmallow.fields.String'>, <class 'datetime.date'>: <class 'marshmallow.fields.Date'>, <class 'datetime.datetime'>: <class 'marshmallow.fields.DateTime'>, <class 'datetime.time'>: <class 'marshmallow.fields.Time'>, <class 'datetime.timedelta'>: <class 'marshmallow.fields.TimeDelta'>, <class 'decimal.Decimal'>: <class 'marshmallow.fields.Decimal'>, <class 'float'>: <class 'marshmallow.fields.Float'>, <class 'int'>: <class 'marshmallow.fields.Integer'>, <class 'list'>: <class 'marshmallow.fields.Raw'>, <class 'set'>: <class 'marshmallow.fields.Raw'>, <class 'str'>: <class 'marshmallow.fields.String'>, <class 'tuple'>: <class 'marshmallow.fields.Raw'>, <class 'uuid.UUID'>: <class 'marshmallow.fields.UUID'>}

property dict_class: type[dict]: dict type to return when serializing.

dump(obj, *, many=None)

Serialize an object to native Python data types according to this Schema’s fields.

Parameters:

obj (Any) – The object to serialize.
many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

Serialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

Changed in version 3.0.0rc9: Validation no longer occurs upon serialization.

dumps(obj, *args, many=None, **kwargs)

Same as dump(), except return a JSON-encoded string.

Parameters:

obj (Any) – The object to serialize.
many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

A json string

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

error_messages: dict[str, str] = {}: Overrides for default schema-level error messages

fields: dict[str, Field]: Dictionary mapping field_names -> Field objects

classmethod from_dict(fields, *, name='GeneratedSchema')

Generate a Schema <marshmallow.Schema> class given a dictionary of fields.

from marshmallow import Schema, fields

PersonSchema = Schema.from_dict({"name": fields.Str()})
print(PersonSchema().load({"name": "David"}))  # => {'name': 'David'}

Generated schemas are not added to the class registry and therefore cannot be referred to by name in Nested fields.

Parameters:

fields (dict[str, Field]) – Dictionary mapping field names to field instances.
name (str) – Optional name for the class, which will appear in the repr for the class.

Return type:

type[Schema]

Added in version 3.0.0.

get_attribute(obj, attr, default)

Defines how to pull values from an object to serialize.

Changed in version 3.0.0a1: Changed position of obj and attr.

Parameters:

obj (Any)
attr (str)
default (Any)

classmethod get_model()

Return the model associated to this schema.

Return type:: type[TypeVar(T)] | None
Returns:: Model associated to this schema.

handle_error(error, data, *, many, **kwargs)

Custom error handler function for the schema.

Parameters:

error (ValidationError) – The ValidationError raised during (de)serialization.
data (Any) – The original input data.
many (bool) – Value of many on dump or load.
partial – Value of partial on load.

Changed in version 3.0.0rc9: Receives many and partial (on deserialization) as keyword arguments.

load(data, *, many=None, partial=None, unknown=None)

Deserialize a data structure to an object defined by this Schema’s fields.

Parameters:

data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to deserialize.
many (bool | None) – Whether to deserialize data as a collection. If None, the value for self.many is used.
partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.
unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

loads(json_data, *, many=None, partial=None, unknown=None, **kwargs)

Same as load(), except it uses marshmallow.Schema.Meta.render_module to deserialize the passed string before passing data to load().

Parameters:

json_data (str | bytes | bytearray) – A string of the data to deserialize.
many (bool | None) – Whether to deserialize obj as a collection. If None, the value for self.many is used.
partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.
unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

make_object(data, **_)

on_bind_field(field_name, field_obj)

Hook to modify a field when it is bound to the Schema <marshmallow.Schema>.

No-op by default.

Parameters:

field_name (str)
field_obj (Field)

Return type:

None

opts: typing.Any = <marshmallow.schema.SchemaOpts object>

post_dump(data, original_data, **_)

pre_dump(data, **_)

pre_load(data, **_)

set_class: alias of OrderedSet

validate(data, *, many=None, partial=None)

Validate data against the schema, returning a dictionary of validation errors.

Parameters:

data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to validate.
many (bool | None) – Whether to validate data as a collection. If None, the value for self.many is used.
partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

Return type:

dict[str, list[str]]

Returns:

A dictionary of validation errors.

Added in version 1.1.0.

class casys.editing.EditingParameters(editing_sequence, log_file=None, dim_name=None)

Bases: BaseParameters

Parameters for Editing algorithm.

Parameters:

editing_sequence (list[EditingComponent]) – The sequence of components (list of algorithms) defining the invalidity parameters.
log_file (LogFile | None) – Logging file.
dim_name (str | None) – Dimension name.

dim_name: str | None = None

editing_sequence: list[EditingComponent]

log_file: LogFile | None = None

required_fields()

Returns the list of required data fields to run this algorithm.

Return type:: list[str]
Returns:: List of required data fields to run this algorithm.

class casys.editing.EditingParametersSchema(*, only=None, exclude=(), many=None, context=None, load_only=(), dump_only=(), partial=None, unknown=None)

Bases: BaseSchema

Parameters:

only (Union[Sequence[str], AbstractSet[str], None])
exclude (Union[Sequence[str], AbstractSet[str]])
many (bool | None)
context (dict | None)
load_only (Union[Sequence[str], AbstractSet[str]])
dump_only (Union[Sequence[str], AbstractSet[str]])
partial (Union[bool, Sequence[str], AbstractSet[str], None])
unknown (str | None)

class Meta

Bases: object

Options object for a Schema.

Example usage:

from marshmallow import Schema


class MySchema(Schema):
    class Meta:
        fields = ("id", "email", "date_created")
        exclude = ("password", "secret_attribute")

A note on type checking

Type checkers will only check the attributes of the Meta <marshmallow.Schema.Meta> class if you explicitly subclass marshmallow.Schema.Meta.

from marshmallow import Schema


class MySchema(Schema):
    # Not checked by type checkers
    class Meta:
        additional = True


class MySchema2(Schema):
    # Type checkers will check attributes
    class Meta(Schema.Opts):
        additional = True  # Incompatible types in assignment

Removed in version 3.0.0b7: Remove strict.

Added in version 3.0.0b12: Add unknown.

Changed in version 3.0.0b17: Rename dateformat to datetimeformat.

Added in version 3.9.0: Add timeformat.

Changed in version 3.26.0: Deprecate ordered. Field order is preserved by default.

additional: ClassVar[tuple[str, ...] | list[str]]: Fields to include in addition to the explicitly declared fields. additional <marshmallow.Schema.Meta.additional> and fields <marshmallow.Schema.Meta.fields> are mutually-exclusive options.

dateformat: ClassVar[str]: Default format for Date <marshmallow.fields.Date> fields.

datetimeformat: ClassVar[str]: Default format for DateTime <marshmallow.fields.DateTime> fields.

dump_only: ClassVar[tuple[str, ...] | list[str]]: Fields to exclude from serialized results

exclude: ClassVar[tuple[str, ...] | list[str]]: Fields to exclude in the serialized result. Nested fields can be represented with dot delimiters.

fields: ClassVar[tuple[str, ...] | list[str]]: Fields to include in the (de)serialized result

include: ClassVar[dict[str, Field]]: Dictionary of additional fields to include in the schema. It is usually better to define fields as class variables, but you may need to use this option, e.g., if your fields are Python keywords.

index_errors: ClassVar[bool]: If True, errors dictionaries will include the index of invalid items in a collection.

load_only: ClassVar[tuple[str, ...] | list[str]]: Fields to exclude from serialized results

many: ClassVar[bool]: Whether data should be (de)serialized as a collection by default.

ordered: ClassVar[bool]: If True, Schema.dump <marshmallow.Schema.dump> is a collections.OrderedDict.

register: ClassVar[bool]: Whether to register the Schema <marshmallow.Schema> with marshmallow’s internal class registry. Must be True if you intend to refer to this Schema <marshmallow.Schema> by class name in Nested fields. Only set this to False when memory usage is critical. Defaults to True.

render_module: Any: Module to use for loads <marshmallow.Schema.loads> and dumps <marshmallow.Schema.dumps>. Defaults to json from the standard library.

timeformat: ClassVar[str]: Default format for Time <marshmallow.fields.Time> fields.

unknown: ClassVar[str]: Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE.

OPTIONS_CLASS: alias of SchemaOpts

TYPE_MAPPING: dict[type, type[Field]] = {<class 'bool'>: <class 'marshmallow.fields.Boolean'>, <class 'bytes'>: <class 'marshmallow.fields.String'>, <class 'datetime.date'>: <class 'marshmallow.fields.Date'>, <class 'datetime.datetime'>: <class 'marshmallow.fields.DateTime'>, <class 'datetime.time'>: <class 'marshmallow.fields.Time'>, <class 'datetime.timedelta'>: <class 'marshmallow.fields.TimeDelta'>, <class 'decimal.Decimal'>: <class 'marshmallow.fields.Decimal'>, <class 'float'>: <class 'marshmallow.fields.Float'>, <class 'int'>: <class 'marshmallow.fields.Integer'>, <class 'list'>: <class 'marshmallow.fields.Raw'>, <class 'set'>: <class 'marshmallow.fields.Raw'>, <class 'str'>: <class 'marshmallow.fields.String'>, <class 'tuple'>: <class 'marshmallow.fields.Raw'>, <class 'uuid.UUID'>: <class 'marshmallow.fields.UUID'>}

property dict_class: type[dict]: dict type to return when serializing.

dump(obj, *, many=None)

Serialize an object to native Python data types according to this Schema’s fields.

Parameters:

obj (Any) – The object to serialize.
many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

Serialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

Changed in version 3.0.0rc9: Validation no longer occurs upon serialization.

dumps(obj, *args, many=None, **kwargs)

Same as dump(), except return a JSON-encoded string.

Parameters:

obj (Any) – The object to serialize.
many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

A json string

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

error_messages: dict[str, str] = {}: Overrides for default schema-level error messages

fields: dict[str, Field]: Dictionary mapping field_names -> Field objects

classmethod from_dict(fields, *, name='GeneratedSchema')

Generate a Schema <marshmallow.Schema> class given a dictionary of fields.

from marshmallow import Schema, fields

PersonSchema = Schema.from_dict({"name": fields.Str()})
print(PersonSchema().load({"name": "David"}))  # => {'name': 'David'}

Generated schemas are not added to the class registry and therefore cannot be referred to by name in Nested fields.

Parameters:

fields (dict[str, Field]) – Dictionary mapping field names to field instances.
name (str) – Optional name for the class, which will appear in the repr for the class.

Return type:

type[Schema]

Added in version 3.0.0.

get_attribute(obj, attr, default)

Defines how to pull values from an object to serialize.

Changed in version 3.0.0a1: Changed position of obj and attr.

Parameters:

obj (Any)
attr (str)
default (Any)

classmethod get_model()

Return the model associated to this schema.

Return type:: type[TypeVar(T)] | None
Returns:: Model associated to this schema.

handle_error(error, data, *, many, **kwargs)

Custom error handler function for the schema.

Parameters:

error (ValidationError) – The ValidationError raised during (de)serialization.
data (Any) – The original input data.
many (bool) – Value of many on dump or load.
partial – Value of partial on load.

Changed in version 3.0.0rc9: Receives many and partial (on deserialization) as keyword arguments.

load(data, *, many=None, partial=None, unknown=None)

Deserialize a data structure to an object defined by this Schema’s fields.

Parameters:

data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to deserialize.
many (bool | None) – Whether to deserialize data as a collection. If None, the value for self.many is used.
partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.
unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

loads(json_data, *, many=None, partial=None, unknown=None, **kwargs)

Same as load(), except it uses marshmallow.Schema.Meta.render_module to deserialize the passed string before passing data to load().

Parameters:

json_data (str | bytes | bytearray) – A string of the data to deserialize.
many (bool | None) – Whether to deserialize obj as a collection. If None, the value for self.many is used.
partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.
unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

make_object(data, **_)

on_bind_field(field_name, field_obj)

Hook to modify a field when it is bound to the Schema <marshmallow.Schema>.

No-op by default.

Parameters:

field_name (str)
field_obj (Field)

Return type:

None

opts: typing.Any = <marshmallow.schema.SchemaOpts object>

post_dump(data, original_data, **_)

pre_dump(data, **_)

pre_load(data, **_)

set_class: alias of OrderedSet

validate(data, *, many=None, partial=None)

Validate data against the schema, returning a dictionary of validation errors.

Parameters:

data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to validate.
many (bool | None) – Whether to validate data as a collection. If None, the value for self.many is used.
partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

Return type:

dict[str, list[str]]

Returns:

A dictionary of validation errors.

Added in version 1.1.0.

class casys.editing.EditingResults(invalidity_indicator, components)

Bases: BaseResults

Editing algorithms results.

Parameters:

invalidity_indicator (ndarray) – Invalidity indicator
components (list[EditingComponentResult]) – List of algorithms

components: list[EditingComponentResult]

get_value(field)

Returns the value of the field specified by its name.

Parameters:: field (str)
Return type:: Any

invalidity_indicator: ndarray

class casys.editing.InvalidityCondition(clip, name=None)

Bases: ABC

InvalidityCondition base class.

Parameters:

clip (Clip | str)
name (Optional[str])

check_fields_validity(data, dim_name, comp_name)

Check that all the fields only depend on the expected dimension.

Parameters:

data (Dataset) – Data to edit
dim_name (str) – Dataset dimension involved in the editing
comp_name (str) – Name of the EditingComponent

property clip

abstract compute_invalidity_mask(data, dim_name)

Compute invalidity mask.

Parameters:

data (Dataset) – Dataset
dim_name (str) – Dataset dimension involved in the editing

Return type:

ndarray

Returns:

Invalidity mask (True or False array)

has_pass_requirement()

Whether this condition has pass requirements.

Return type:: bool
Returns:: True if the condition requires full passes data, False otherwise.

property name

required_fields()

Returns the list of required data fields to apply this condition.

Return type:: list[str]
Returns:: List of required data fields to apply this condition.

class casys.editing.InvalidityConditionBaseSchema(*, only=None, exclude=(), many=None, context=None, load_only=(), dump_only=(), partial=None, unknown=None)

Bases: RegistryBaseSchema

InvalidityCondition base schema.

Parameters:

only (Union[Sequence[str], AbstractSet[str], None])
exclude (Union[Sequence[str], AbstractSet[str]])
many (bool | None)
context (dict | None)
load_only (Union[Sequence[str], AbstractSet[str]])
dump_only (Union[Sequence[str], AbstractSet[str]])
partial (Union[bool, Sequence[str], AbstractSet[str], None])
unknown (str | None)

class Meta

Bases: object

Options object for a Schema.

Example usage:

from marshmallow import Schema


class MySchema(Schema):
    class Meta:
        fields = ("id", "email", "date_created")
        exclude = ("password", "secret_attribute")

A note on type checking

Type checkers will only check the attributes of the Meta <marshmallow.Schema.Meta> class if you explicitly subclass marshmallow.Schema.Meta.

from marshmallow import Schema


class MySchema(Schema):
    # Not checked by type checkers
    class Meta:
        additional = True


class MySchema2(Schema):
    # Type checkers will check attributes
    class Meta(Schema.Opts):
        additional = True  # Incompatible types in assignment

Removed in version 3.0.0b7: Remove strict.

Added in version 3.0.0b12: Add unknown.

Changed in version 3.0.0b17: Rename dateformat to datetimeformat.

Added in version 3.9.0: Add timeformat.

Changed in version 3.26.0: Deprecate ordered. Field order is preserved by default.

additional: ClassVar[tuple[str, ...] | list[str]]: Fields to include in addition to the explicitly declared fields. additional <marshmallow.Schema.Meta.additional> and fields <marshmallow.Schema.Meta.fields> are mutually-exclusive options.

dateformat: ClassVar[str]: Default format for Date <marshmallow.fields.Date> fields.

datetimeformat: ClassVar[str]: Default format for DateTime <marshmallow.fields.DateTime> fields.

dump_only: ClassVar[tuple[str, ...] | list[str]]: Fields to exclude from serialized results

exclude: ClassVar[tuple[str, ...] | list[str]]: Fields to exclude in the serialized result. Nested fields can be represented with dot delimiters.

fields: ClassVar[tuple[str, ...] | list[str]]: Fields to include in the (de)serialized result

include: ClassVar[dict[str, Field]]: Dictionary of additional fields to include in the schema. It is usually better to define fields as class variables, but you may need to use this option, e.g., if your fields are Python keywords.

index_errors: ClassVar[bool]: If True, errors dictionaries will include the index of invalid items in a collection.

load_only: ClassVar[tuple[str, ...] | list[str]]: Fields to exclude from serialized results

many: ClassVar[bool]: Whether data should be (de)serialized as a collection by default.

ordered: ClassVar[bool]: If True, Schema.dump <marshmallow.Schema.dump> is a collections.OrderedDict.

register: ClassVar[bool]: Whether to register the Schema <marshmallow.Schema> with marshmallow’s internal class registry. Must be True if you intend to refer to this Schema <marshmallow.Schema> by class name in Nested fields. Only set this to False when memory usage is critical. Defaults to True.

render_module: Any: Module to use for loads <marshmallow.Schema.loads> and dumps <marshmallow.Schema.dumps>. Defaults to json from the standard library.

timeformat: ClassVar[str]: Default format for Time <marshmallow.fields.Time> fields.

unknown: ClassVar[str]: Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE.

OPTIONS_CLASS: alias of SchemaOpts

TYPE_MAPPING: dict[type, type[Field]] = {<class 'bool'>: <class 'marshmallow.fields.Boolean'>, <class 'bytes'>: <class 'marshmallow.fields.String'>, <class 'datetime.date'>: <class 'marshmallow.fields.Date'>, <class 'datetime.datetime'>: <class 'marshmallow.fields.DateTime'>, <class 'datetime.time'>: <class 'marshmallow.fields.Time'>, <class 'datetime.timedelta'>: <class 'marshmallow.fields.TimeDelta'>, <class 'decimal.Decimal'>: <class 'marshmallow.fields.Decimal'>, <class 'float'>: <class 'marshmallow.fields.Float'>, <class 'int'>: <class 'marshmallow.fields.Integer'>, <class 'list'>: <class 'marshmallow.fields.Raw'>, <class 'set'>: <class 'marshmallow.fields.Raw'>, <class 'str'>: <class 'marshmallow.fields.String'>, <class 'tuple'>: <class 'marshmallow.fields.Raw'>, <class 'uuid.UUID'>: <class 'marshmallow.fields.UUID'>}

classmethod clear_registry(): Clear everything from this schema’s registry.

property dict_class: type[dict]: dict type to return when serializing.

dump(obj, *, many=None)

Serialize an object to native Python data types according to this Schema’s fields.

Parameters:

obj (Any) – The object to serialize.
many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

Serialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

Changed in version 3.0.0rc9: Validation no longer occurs upon serialization.

dumps(obj, *args, many=None, **kwargs)

Same as dump(), except return a JSON-encoded string.

Parameters:

obj (Any) – The object to serialize.
many (bool | None) – Whether to serialize obj as a collection. If None, the value for self.many is used.

Returns:

A json string

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the serialized data rather than a (data, errors) duple. A ValidationError is raised if obj is invalid.

error_messages: dict[str, str] = {}: Overrides for default schema-level error messages

fields: dict[str, Field]: Dictionary mapping field_names -> Field objects

classmethod from_dict(fields, *, name='GeneratedSchema')

Generate a Schema <marshmallow.Schema> class given a dictionary of fields.

from marshmallow import Schema, fields

PersonSchema = Schema.from_dict({"name": fields.Str()})
print(PersonSchema().load({"name": "David"}))  # => {'name': 'David'}

Generated schemas are not added to the class registry and therefore cannot be referred to by name in Nested fields.

Parameters:

fields (dict[str, Field]) – Dictionary mapping field names to field instances.
name (str) – Optional name for the class, which will appear in the repr for the class.

Return type:

type[Schema]

Added in version 3.0.0.

get_attribute(obj, attr, default)

Defines how to pull values from an object to serialize.

Changed in version 3.0.0a1: Changed position of obj and attr.

Parameters:

obj (Any)
attr (str)
default (Any)

classmethod get_class(name: str) → type[RegistryBaseSchema]

Return the registered class associated with the provided name.

Parameters:: name (str) – Identifier of the schema.
Return type:: type[RegistryBaseSchema]
Returns:: Corresponding schema class.

classmethod get_model()

Return the model associated to this schema.

Return type:: type[TypeVar(T)] | None
Returns:: Model associated to this schema.

classmethod get_model_schema(model: type) → type[RegistryBaseSchema]

Return the registered class associated with the provided model.

Parameters:: model (type) – Identifier of the model.
Return type:: type[RegistryBaseSchema]
Returns:: Corresponding schema class.

classmethod get_type()

Schema’s ID.

Return type:: str

handle_error(error, data, *, many, **kwargs)

Custom error handler function for the schema.

Parameters:

error (ValidationError) – The ValidationError raised during (de)serialization.
data (Any) – The original input data.
many (bool) – Value of many on dump or load.
partial – Value of partial on load.

Changed in version 3.0.0rc9: Receives many and partial (on deserialization) as keyword arguments.

classmethod has_class(name: str) → bool

Test if the provided name is registered.

Parameters:: name (str) – Name of the class.
Return type:: bool
Returns:: True if a class with this name is registered, False otherwise.

load(data, *, many=None, partial=None, unknown=None)

Deserialize a data structure to an object defined by this Schema’s fields.

Parameters:

data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to deserialize.
many (bool | None) – Whether to deserialize data as a collection. If None, the value for self.many is used.
partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.
unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

loads(json_data, *, many=None, partial=None, unknown=None, **kwargs)

Same as load(), except it uses marshmallow.Schema.Meta.render_module to deserialize the passed string before passing data to load().

Parameters:

json_data (str | bytes | bytearray) – A string of the data to deserialize.
many (bool | None) – Whether to deserialize obj as a collection. If None, the value for self.many is used.
partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.
unknown (str | None) – Whether to exclude, include, or raise an error for unknown fields in the data. Use EXCLUDE, INCLUDE or RAISE. If None, the value for self.unknown is used.

Returns:

Deserialized data

Added in version 1.0.0.

Changed in version 3.0.0b7: This method returns the deserialized data rather than a (data, errors) duple. A ValidationError is raised if invalid data are passed.

make_object(data, **_)

on_bind_field(field_name, field_obj)

Hook to modify a field when it is bound to the Schema <marshmallow.Schema>.

No-op by default.

Parameters:

field_name (str)
field_obj (Field)

Return type:

None

opts: typing.Any = <marshmallow.schema.SchemaOpts object>

post_dump(data, original_data, **_)

pre_dump(data, **_)

pre_load(data, **_)

classmethod register(): Register the current class.

classmethod register_schema(schema, exception)

Register the provided schema.

Parameters:

schema (type[RegistryAbstractSchema]) – Schema to register.
exception (type[Exception]) – Exception raised when an invalid operation is made.

classmethod registry()

Returns a copy of the registry.

Return type:: dict[str, type[RegistryAbstractSchema]]

classmethod remove_registry(name)

Parameters:: name (str)

set_class: alias of OrderedSet

classmethod update_registry(schema)

Update current registry with the provided one. An error is raised if the same name is used twice.

Parameters:: schema (type[RegistryAbstractSchema]) – Schemas to register.

validate(data, *, many=None, partial=None)

Validate data against the schema, returning a dictionary of validation errors.

Parameters:

data (Union[Mapping[str, Any], Iterable[Mapping[str, Any]]]) – The data to validate.
many (bool | None) – Whether to validate data as a collection. If None, the value for self.many is used.
partial (Union[bool, Sequence[str], AbstractSet[str], None]) – Whether to ignore missing fields and not require any fields declared. Propagates down to Nested fields as well. If its value is an iterable, only missing fields listed in that iterable will be ignored. Use dot delimiters to specify nested fields.

Return type:

dict[str, list[str]]

Returns:

A dictionary of validation errors.

Added in version 1.1.0.

validate_threshold(data, **_)

casys.editing.InvalidityConditionGenericSchema: alias of RegistryGenericSchema

class casys.editing.IterativeFilter(clip, nbr_iter, filter, threshold, std_coeff=1.0, const_coeff=0.0, name=None)

Bases: InvalidityCondition

Iterative processing, invalidating at each step the values “too far” from the filtered values. The filtered_values method yields the final filtered values.

Parameters:

clip (Clip | str) – CLip definition
nbr_iter (int) – Number of iterations. The effective number may be lower if no new outliers can be obtained.
filter (Filter) – Filter to apply.
threshold (int | float | str) – At each iteration, invalidate outliers = values where |values - filter(values)| > (std_coeff*std+const_coeff)*threshold (then replace values by filter(values) for outliers). May be a numeric value or a clip.
std_coeff (int | float | str) – Coefficient attached to the standard deviation (std). May be a numeric value or a clip.
const_coeff (int | float | str) – Constant coefficient. May be a numeric value or a clip.
name (str | None)

check_fields_validity(data, dim_name, comp_name)

Check that all the fields only depend on the expected dimension.

Parameters:

data (Dataset) – Data to edit
dim_name (str) – Dataset dimension involved in the editing
comp_name (str) – Name of the EditingComponent

property clip

compute_invalidity_mask(data, dim_name)

Compute invalidity mask.

Parameters:

data (Dataset) – Dataset
dim_name (str) – Dataset dimension involved in the editing

Return type:

ndarray

Returns:

Invalidity mask (True or False array)

property const_coeff: int | float | str

property filter: Filter

property filtered_values

has_pass_requirement()

Whether this condition has pass requirements.

Return type:: bool
Returns:: True if the condition requires full passes data, False otherwise.

property name

property nbr_iter: int

required_fields()

Returns the list of required data fields to apply this condition.

Return type:: list[str]
Returns:: List of required data fields to apply this condition.

property std_coeff: int | float | str

property threshold: int | float | str

class casys.editing.RobustMeanStd(clip, nbr_iter, threshold, name=None)

Bases: InvalidityCondition

Iterative processing, invalidating at each step the values “too far” from the global mean.

Parameters:

clip (Clip | str) – Clip definition
nbr_iter (int) – Number of iterations
threshold (Union[int, float, str]) – At each iteration, invalidate outliers = values where |values - mean| > threshold * std (then compute mean and std again)
name (Optional[str])

check_fields_validity(data, dim_name, comp_name)

Check that all the fields only depend on the expected dimension.

Parameters:

data (Dataset) – Data to edit
dim_name (str) – Dataset dimension involved in the editing
comp_name (str) – Name of the EditingComponent

property clip

compute_invalidity_mask(data, dim_name)

Compute invalidity mask.

Parameters:

data (Dataset) – Dataset
dim_name (str) – Dataset dimension involved in the editing

Return type:

ndarray

Returns:

Invalidity mask (True or False array)

has_pass_requirement()

Whether this condition has pass requirements.

Return type:: bool
Returns:: True if the condition requires full passes data, False otherwise.

property name

property nbr_iter: int

required_fields()

Returns the list of required data fields to apply this condition.

Return type:: list[str]
Returns:: List of required data fields to apply this condition.

property threshold: int | float | str

class casys.editing.StatisticsByPass(clip, nbr_min_pts, threshold, orf, name=None)

Bases: InvalidityCondition

Invalidate passes where mean or std statistics exceed the thresholds.

Parameters:

clip (Clip | str) – CLip definition
nbr_min_pts (int) – Minimum number of points for a pass to be invalidated
threshold (dict[str, Any]) – Mean and std thresholds
orf (PassIndexer) – Orf description
name (Optional[str])

check_fields_validity(data, dim_name, comp_name)

Check that all the fields only depend on the expected dimension.

Parameters:

data (Dataset) – Data to edit
dim_name (str) – Dataset dimension involved in the editing
comp_name (str) – Name of the EditingComponent

property clip

compute_invalidity_mask(data, dim_name)

Compute invalidity mask.

Parameters:

data (Dataset) – Dataset
dim_name (str) – Dataset dimension involved in the editing

Return type:

ndarray

Returns:

Invalidity mask (True or False array)

has_pass_requirement()

Whether this condition has pass requirements.

Return type:: bool
Returns:: True if the condition requires full passes data, False otherwise.

property name

property nbr_min_pts: int

property orf: PassIndexer

required_fields()

Returns the list of required data fields to apply this condition.

Return type:: list[str]
Returns:: List of required data fields to apply this condition.

property threshold: dict[str, Any]