If you want to skip non-ascii characters, this check helps to notify if there are any in data during validation. Here is how we can use this check.
from pprint import pprint
from frictionless import validate, checks
source=[["s.no","code"],[1,"ssµ"]]
report = validate(source, checks=[checks.ascii_value()])
pprint(report.flatten(["type", "message"]))
[['ascii-value',
'The cell ssµ in row at position 2 and field code at position 2 has an '
'error: the cell contains non-ascii characters']]
Check whether all the string characters in the data are ASCII This check can be enabled using the `checks` parameter for the `validate` function.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None) -> None
This check identifies deviated cells from the normal ones. To flag the deviated cell, the check compares the length of the characters in each cell with a threshold value. The threshold value is either 5000 or value calculated using Python's built-in statistics
module which is average plus(+) three standard deviation. The exact algorithm can be found here. For example:
Download
issue-1066.csv
to reproduce the examples (right-click and "Save link as")..
from pprint import pprint
from frictionless import validate, checks
report = validate("issue-1066.csv", checks=[checks.deviated_cell()])
pprint(report.flatten(["type", "message"]))
[['deviated-cell',
'There is a possible error because the cell is deviated: cell at row "35" '
'and field "Gestore" has deviated size']]
Check if the cell size is deviated
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, interval: int = 3, ignore_fields: List[str] = NOTHING) -> None
Interval specifies number of standard deviation away from the center. The median is used to find the center of the data. The default value is 3.
int
List of data columns to be skipped by check. To all the data columns listed here, check will not be applied. The default value is [].
List[str]
This check uses Python's built-in statistics
module to check a field's data for deviations. By default, deviated values are outside of the average +- three standard deviations. Take a look at the API Reference for more details about available options and default values. The exact algorithm can be found here. For example:
from pprint import pprint
from frictionless import validate, checks
source = [["temperature"], [1], [-2], [7], [0], [1], [2], [5], [-4], [1000], [8], [3]]
report = validate(source, checks=[checks.deviated_value(field_name="temperature")])
pprint(report.flatten(["type", "message"]))
[['deviated-value',
'There is a possible error because the value is deviated: value "1000" in '
'row at position "10" and field "temperature" is deviated "[-809.88, '
'995.52]"']]
Check for deviated values in a field.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, field_name: str, interval: int = 3, average: str = mean) -> None
Name of the field to which the check will be applied. Check will not be applied to fields other than this.
str
Interval specifies number of standard deviation away from the mean. The default value is 3.
int
It specifies preferred method to calculate average of the data. Default value is "mean". Supported average calculation methods are "mean", "median", and "mode".
str
This check ensures that some field doesn't have any forbidden or denylist values.
from pprint import pprint
from frictionless import validate, checks
source = b'header\nvalue1\nvalue2'
checks = [checks.forbidden_value(field_name='header', values=['value2'])]
report = validate(source, format='csv', checks=checks)
pprint(report.flatten(['type', 'message']))
[['forbidden-value',
'The cell value2 in row at position 3 and field header at position 1 has an '
'error: forbidden values are "[\'value2\']"']]
Check for forbidden values in a field.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, field_name: str, values: List[Any]) -> None
The name of the field to apply the check. Check will not be applied to other fields.
str
Specify the forbidden values to check for, in the field specified by "field_name".
List[Any]
This check gives us an opportunity to validate sequential fields like primary keys or other similar data. It doesn't need to start from 0 or 1. We're providing a field name.
from pprint import pprint
from frictionless import validate, checks
source = b'header\n2\n3\n5'
report = validate(source, format='csv', checks=[checks.sequential_value(field_name='header')])
pprint(report.flatten(['type', 'message']))
[['sequential-value',
'The cell 5 in row at position 4 and field header at position 1 has an '
'error: the value is not sequential']]
Check that a column having sequential values.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, field_name: str) -> None
The name of the field to apply the check. Check will not be applied to other fields.
str
Sometime during data export from a database or other storage, data values can be truncated. This check tries to detect such truncation. Let's explore some truncation indicators.
from pprint import pprint
from frictionless import validate, checks
source = [["int", "str"], ["a" * 255, 32767], ["good", 2147483647]]
report = validate(source, checks=[checks.truncated_value()])
pprint(report.flatten(["type", "message"]))
[['truncated-value',
'The cell '
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa '
'in row at position 2 and field int at position 1 has an error: value is '
'probably truncated'],
['truncated-value',
'The cell 32767 in row at position 2 and field str at position 2 has an '
'error: value is probably truncated'],
['truncated-value',
'The cell 2147483647 in row at position 3 and field str at position 2 has an '
'error: value is probably truncated']]
Check for possible truncated values This check can be enabled using the `checks` parameter for the `validate` function.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None) -> None