Edit page in Livemark
(2024-11-07 15:17)

Row Steps

These steps are row-based including row filtering, slicing, and many more.

Filter Rows

This step filters rows based on a provided formula or function.

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.row_filter(formula="id > 1"),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+----------+------------+
| id | name     | population |
+====+==========+============+
|  2 | 'france' |         66 |
+----+----------+------------+
|  3 | 'spain'  |         47 |
+----+----------+------------+

Reference

steps.row_filter (class)

steps.row_filter (class)

Filter rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, formula: Optional[Any] = None, function: Optional[Any] = None) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • formula (Optional[Any])
  • function (Optional[Any])

steps.row_filter.formula (property)

Evaluatable expressions to filter the rows. Rows that matches the formula are returned and others are ignored. The expressions are processed using simpleeval library.

Signature

Optional[Any]

steps.row_filter.function (property)

Python function to filter the row.

Signature

Optional[Any]

Search Rows

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.row_search(regex=r"^f.*"),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+----------+------------+
| id | name     | population |
+====+==========+============+
|  2 | 'france' |         66 |
+----+----------+------------+

Reference

Search rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, regex: str, field_name: Optional[str] = None, negate: bool = False) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • regex (str)
  • field_name (Optional[str])
  • negate (bool)

steps.row_search.regex (property)

Regex pattern to search for rows. If field_name is set it will only be applied to the specified field. For example, regex=r"^e.*".

Signature

str

steps.row_search.field_name (property)

Field name in which to search for.

Signature

Optional[str]

steps.row_search.negate (property)

Whether to revert the result. If True, all the rows that does not match the pattern will be returned.

Signature

bool

Slice Rows

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.row_slice(head=2),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name      | population |
+====+===========+============+
|  1 | 'germany' |         83 |
+----+-----------+------------+
|  2 | 'france'  |         66 |
+----+-----------+------------+

Reference

steps.row_slice (class)

steps.row_slice (class)

Slice rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, start: Optional[int] = None, stop: Optional[int] = None, step: Optional[int] = None, head: Optional[int] = None, tail: Optional[int] = None) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • start (Optional[int])
  • stop (Optional[int])
  • step (Optional[int])
  • head (Optional[int])
  • tail (Optional[int])

steps.row_slice.start (property)

Starting point from where to read the rows. If None, defaults to the beginning.

Signature

Optional[int]

steps.row_slice.stop (property)

Stopping point for reading row. If None, defaults to the end.

Signature

Optional[int]

steps.row_slice.step (property)

It is the step size to read next row. If None, it defaults to 1.

Signature

Optional[int]

steps.row_slice.head (property)

Number of rows to read from head.

Signature

Optional[int]

steps.row_slice.tail (property)

Number of rows to read from the bottom.

Signature

Optional[int]

Sort Rows

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.row_sort(field_names=["name"]),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name      | population |
+====+===========+============+
|  2 | 'france'  |         66 |
+----+-----------+------------+
|  1 | 'germany' |         83 |
+----+-----------+------------+
|  3 | 'spain'   |         47 |
+----+-----------+------------+

Reference

steps.row_sort (class)

steps.row_sort (class)

Sort rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, field_names: List[str], reverse: bool = False) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • field_names (List[str])
  • reverse (bool)

steps.row_sort.field_names (property)

List of field names by which the rows will be sorted. If fields more than 1, sort applies from left to right.

Signature

List[str]

steps.row_sort.reverse (property)

The sort will be reversed if it is set to True.

Signature

bool

Split Rows

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.row_split(field_name="name", pattern="a"),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+--------+------------+
| id | name   | population |
+====+========+============+
|  1 | 'germ' |         83 |
+----+--------+------------+
|  1 | 'ny'   |         83 |
+----+--------+------------+
|  2 | 'fr'   |         66 |
+----+--------+------------+
|  2 | 'nce'  |         66 |
+----+--------+------------+
|  3 | 'sp'   |         47 |
+----+--------+------------+
...

Reference

steps.row_split (class)

steps.row_split (class)

Split rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, pattern: str, field_name: str) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • pattern (str)
  • field_name (str)

steps.row_split.pattern (property)

Pattern to search for in one or more fields.

Signature

str

steps.row_split.field_name (property)

Field name whose cell value will be split.

Signature

str

Subset Rows

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.field_update(name="id", value=1),
        steps.row_subset(subset="conflicts", field_name="id"),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name      | population |
+====+===========+============+
|  1 | 'germany' |         83 |
+----+-----------+------------+
|  1 | 'france'  |         66 |
+----+-----------+------------+
|  1 | 'spain'   |         47 |
+----+-----------+------------+

Reference

steps.row_subset (class)

steps.row_subset (class)

Subset rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, subset: str, field_name: Optional[str] = None) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • subset (str)
  • field_name (Optional[str])

steps.row_subset.subset (property)

It can take different values such as "conflicts","distinct","duplicates" and "unique".

Signature

str

steps.row_subset.field_name (property)

Name of field to which the subset functions will be applied.

Signature

Optional[str]

Ungroup Rows

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform-groups.csv")
target = transform(
    source,
    steps=[
        steps.row_ungroup(group_name="name", selection="first"),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'},
            {'name': 'year', 'type': 'integer'}]}
+----+-----------+------------+------+
| id | name      | population | year |
+====+===========+============+======+
|  3 | 'france'  |         66 | 2020 |
+----+-----------+------------+------+
|  1 | 'germany' |         83 | 2020 |
+----+-----------+------------+------+
|  5 | 'spain'   |         47 | 2020 |
+----+-----------+------------+------+

Reference

steps.row_ungroup (class)

steps.row_ungroup (class)

Ungroup rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, selection: str, group_name: str, value_name: Optional[str] = None) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • selection (str)
  • group_name (str)
  • value_name (Optional[str])

steps.row_ungroup.selection (property)

Specifies whether to return first or last row. The value can be "first", "last", "min" and "max".

Signature

str

steps.row_ungroup.group_name (property)

Field name which will be used to group the rows. And it returns the first or last row with each group based on the 'selection'.

Signature

str

steps.row_ungroup.value_name (property)

If the selection is set to "min" or "max", the rows will be grouped by "group_name" field and min or max value will be then selected from the "value_name" field.

Signature

Optional[str]