Edit page in Livemark
(2022-09-19 18:33)

Cell Checks

ASCII Value

If you want to skip non-ascii characters, this check helps to notify if there are any in data during validation. Here is how we can use this check.

Example

from pprint import pprint
from frictionless import validate, checks

source=[["s.no","code"],[1,"ssµ"]]
report = validate(source, checks=[checks.ascii_value()])
pprint(report.flatten(["type", "message"]))
[['ascii-value',
  'The cell ssµ in row at position 2 and field code at position 2 has an '
  'error: the cell contains non-ascii characters']]

Reference

checks.ascii_value (class)

checks.ascii_value (class)

Check whether all the string characters in the data are ASCII This check can be enabled using the `checks` parameter for the `validate` function.

Signature

(*, title: Optional[str] = None, description: Optional[str] = None) -> None

Parameters
  • title (Optional[str])
  • description (Optional[str])

checks.ascii_value.type (property)

Signature

ClassVar[str]

checks.ascii_value.Errors (property)

Signature

ClassVar[List[Type[Error]]]

checks.ascii_value.title (property)

Signature

Optional[str]

checks.ascii_value.description (property)

Signature

Optional[str]

Deviated Cell

This check identifies deviated cells from the normal ones. To flag the deviated cell, the check compares the length of the characters in each cell with a threshold value. The threshold value is either 5000 or value calculated using Python's built-in statistics module which is average plus(+) three standard deviation. The exact algorithm can be found here. For example:

Example

Download issue-1066.csv to reproduce the examples (right-click and "Save link as")..

from pprint import pprint
from frictionless import validate, checks

report = validate("issue-1066.csv", checks=[checks.deviated_cell()])
pprint(report.flatten(["type", "message"]))
[['deviated-cell',
  'There is a possible error because the cell is deviated: cell at row "35" '
  'and field "Gestore" has deviated size']]

Reference

checks.deviated_cell (class)

checks.deviated_cell (class)

Check if the cell size is deviated

Signature

(*, title: Optional[str] = None, description: Optional[str] = None, interval: int = 3, ignore_fields: List[str] = NOTHING) -> None

Parameters
  • title (Optional[str])
  • description (Optional[str])
  • interval (int)
  • ignore_fields (List[str])

checks.deviated_cell.interval (property)

NOTE: add docs

Signature

int

checks.deviated_cell.ignore_fields (property)

NOTE: add docs

Signature

List[str]

Deviated Value

This check uses Python's built-in statistics module to check a field's data for deviations. By default, deviated values are outside of the average +- three standard deviations. Take a look at the API Reference for more details about available options and default values. The exact algorithm can be found here. For example:

Example

from pprint import pprint
from frictionless import validate, checks

source = [["temperature"], [1], [-2], [7], [0], [1], [2], [5], [-4], [1000], [8], [3]]
report = validate(source, checks=[checks.deviated_value(field_name="temperature")])
pprint(report.flatten(["type", "message"]))
[['deviated-value',
  'There is a possible error because the value is deviated: value "1000" in '
  'row at position "10" and field "temperature" is deviated "[-809.88, '
  '995.52]"']]

Reference

checks.deviated_value (class)

checks.deviated_value (class)

Check for deviated values in a field

Signature

(*, title: Optional[str] = None, description: Optional[str] = None, field_name: str, interval: int = 3, average: str = mean) -> None

Parameters
  • title (Optional[str])
  • description (Optional[str])
  • field_name (str)
  • interval (int)
  • average (str)

checks.deviated_value.field_name (property)

NOTE: add docs

Signature

str

checks.deviated_value.interval (property)

NOTE: add docs

Signature

int

checks.deviated_value.average (property)

NOTE: add docs

Signature

str

Forbidden Value

This check ensures that some field doesn't have any forbidden or denylist values.

Example

from pprint import pprint
from frictionless import validate, checks

source = b'header\nvalue1\nvalue2'
checks = [checks.forbidden_value(field_name='header', values=['value2'])]
report = validate(source, format='csv', checks=checks)
pprint(report.flatten(['type', 'message']))
[['forbidden-value',
  'The cell value2 in row at position 3 and field header at position 1 has an '
  'error: forbidden values are "[\'value2\']"']]

Reference

checks.forbidden_value (class)

checks.forbidden_value (class)

Check for forbidden values in a field

Signature

(*, title: Optional[str] = None, description: Optional[str] = None, field_name: str, values: List[Any]) -> None

Parameters
  • title (Optional[str])
  • description (Optional[str])
  • field_name (str)
  • values (List[Any])

checks.forbidden_value.field_name (property)

NOTE: add docs

Signature

str

checks.forbidden_value.values (property)

NOTE: add docs

Signature

List[Any]

Sequential Value

This check gives us an opportunity to validate sequential fields like primary keys or other similar data. It doesn't need to start from 0 or 1. We're providing a field name.

Example

from pprint import pprint
from frictionless import validate, checks

source = b'header\n2\n3\n5'
report = validate(source, format='csv', checks=[checks.sequential_value(field_name='header')])
pprint(report.flatten(['type', 'message']))
[['sequential-value',
  'The cell 5 in row at position 4 and field header at position 1 has an '
  'error: the value is not sequential']]

Reference

checks.sequential_value (class)

checks.sequential_value (class)

Check that a column having sequential values

Signature

(*, title: Optional[str] = None, description: Optional[str] = None, field_name: str) -> None

Parameters
  • title (Optional[str])
  • description (Optional[str])
  • field_name (str)

checks.sequential_value.field_name (property)

NOTE: add docs

Signature

str

Truncated Value

Sometime during data export from a database or other storage, data values can be truncated. This check tries to detect such truncation. Let's explore some truncation indicators.

Example

from pprint import pprint
from frictionless import validate, checks

source = [["int", "str"], ["a" * 255, 32767], ["good", 2147483647]]
report = validate(source, checks=[checks.truncated_value()])
pprint(report.flatten(["type", "message"]))
[['truncated-value',
  'The cell '
  'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa '
  'in row at position 2 and field int at position 1 has an error: value  is '
  'probably truncated'],
 ['truncated-value',
  'The cell 32767 in row at position 2 and field str at position 2 has an '
  'error: value  is probably truncated'],
 ['truncated-value',
  'The cell 2147483647 in row at position 3 and field str at position 2 has an '
  'error: value  is probably truncated']]

Reference

checks.truncated_value (class)

checks.truncated_value (class)

Check for possible truncated values This check can be enabled using the `checks` parameter for the `validate` function.

Signature

(*, title: Optional[str] = None, description: Optional[str] = None) -> None

Parameters
  • title (Optional[str])
  • description (Optional[str])

checks.truncated_value.type (property)

Signature

ClassVar[str]

checks.truncated_value.Errors (property)

Signature

ClassVar[List[Type[Error]]]

checks.truncated_value.title (property)

Signature

Optional[str]

checks.truncated_value.description (property)

Signature

Optional[str]

It's a beta version of Frictionless Framework (v5). Read Frictionless Framework (v4) docs for a version that is currently installed by default by pip.