Edit page in Livemark
(2025-01-27 10:49)

Row Steps

These steps are row-based including row filtering, slicing, and many more.

Filter Rows

This step filters rows based on a provided formula or function.

Example

Python

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.row_filter(formula="id > 1"),
    ]
)
print(target.schema)
print(target.to_view())

{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+----------+------------+
| id | name     | population |
+====+==========+============+
|  2 | 'france' |         66 |
+----+----------+------------+
|  3 | 'spain'  |         47 |
+----+----------+------------+

Reference

Hide
Show

steps.row_filter (class)

Filter rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, formula: Optional[Any] = None, function: Optional[Any] = None) -> None

Parameters

name (Optional[str])
title (Optional[str])
description (Optional[str])
formula (Optional[Any])
function (Optional[Any])

steps.row_filter.formula (property)

Evaluatable expressions to filter the rows. Rows that matches the formula are returned and others are ignored. The expressions are processed using simpleeval library.

Signature

Optional[Any]

steps.row_filter.function (property)

Python function to filter the row.

Signature

Optional[Any]

Search Rows

Example

Python

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.row_search(regex=r"^f.*"),
    ]
)
print(target.schema)
print(target.to_view())

{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+----------+------------+
| id | name     | population |
+====+==========+============+
|  2 | 'france' |         66 |
+----+----------+------------+

steps.row_search (class)

Search rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, regex: str, field_name: Optional[str] = None, negate: bool = False) -> None

Parameters

name (Optional[str])
title (Optional[str])
description (Optional[str])
regex (str)
field_name (Optional[str])
negate (bool)

steps.row_search.regex (property)

Regex pattern to search for rows. If field_name is set it will only be applied to the specified field. For example, regex=r"^e.*".

Signature

str

steps.row_search.field_name (property)

Field name in which to search for.

Signature

Optional[str]

steps.row_search.negate (property)

Whether to revert the result. If True, all the rows that does not match the pattern will be returned.

Signature

bool

Slice Rows

Example

Python

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.row_slice(head=2),
    ]
)
print(target.schema)
print(target.to_view())

{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name      | population |
+====+===========+============+
|  1 | 'germany' |         83 |
+----+-----------+------------+
|  2 | 'france'  |         66 |
+----+-----------+------------+

Reference

Hide
Show

steps.row_slice (class)

Slice rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, start: Optional[int] = None, stop: Optional[int] = None, step: Optional[int] = None, head: Optional[int] = None, tail: Optional[int] = None) -> None

Parameters

name (Optional[str])
title (Optional[str])
description (Optional[str])
start (Optional[int])
stop (Optional[int])
step (Optional[int])
head (Optional[int])
tail (Optional[int])

steps.row_slice.start (property)

Starting point from where to read the rows. If None, defaults to the beginning.

Signature

Optional[int]

steps.row_slice.stop (property)

Stopping point for reading row. If None, defaults to the end.

Signature

Optional[int]

steps.row_slice.step (property)

It is the step size to read next row. If None, it defaults to 1.

Signature

Optional[int]

steps.row_slice.head (property)

Number of rows to read from head.

Signature

Optional[int]

steps.row_slice.tail (property)

Number of rows to read from the bottom.

Signature

Optional[int]

Sort Rows

Example

Python

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.row_sort(field_names=["name"]),
    ]
)
print(target.schema)
print(target.to_view())

{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name      | population |
+====+===========+============+
|  2 | 'france'  |         66 |
+----+-----------+------------+
|  1 | 'germany' |         83 |
+----+-----------+------------+
|  3 | 'spain'   |         47 |
+----+-----------+------------+

Reference

Hide
Show

steps.row_sort (class)

Sort rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, field_names: List[str], reverse: bool = False) -> None

Parameters

name (Optional[str])
title (Optional[str])
description (Optional[str])
field_names (List[str])
reverse (bool)

steps.row_sort.field_names (property)

List of field names by which the rows will be sorted. If fields more than 1, sort applies from left to right.

Signature

List[str]

steps.row_sort.reverse (property)

The sort will be reversed if it is set to True.

Signature

bool

Split Rows

Example

Python

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.row_split(field_name="name", pattern="a"),
    ]
)
print(target.schema)
print(target.to_view())

{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+--------+------------+
| id | name   | population |
+====+========+============+
|  1 | 'germ' |         83 |
+----+--------+------------+
|  1 | 'ny'   |         83 |
+----+--------+------------+
|  2 | 'fr'   |         66 |
+----+--------+------------+
|  2 | 'nce'  |         66 |
+----+--------+------------+
|  3 | 'sp'   |         47 |
+----+--------+------------+
...

Reference

Hide
Show

steps.row_split (class)

Split rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, pattern: str, field_name: str) -> None

Parameters

name (Optional[str])
title (Optional[str])
description (Optional[str])
pattern (str)
field_name (str)

steps.row_split.pattern (property)

Pattern to search for in one or more fields.

Signature

str

steps.row_split.field_name (property)

Field name whose cell value will be split.

Signature

str

Subset Rows

Example

Python

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.field_update(name="id", value=1),
        steps.row_subset(subset="conflicts", field_name="id"),
    ]
)
print(target.schema)
print(target.to_view())

{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name      | population |
+====+===========+============+
|  1 | 'germany' |         83 |
+----+-----------+------------+
|  1 | 'france'  |         66 |
+----+-----------+------------+
|  1 | 'spain'   |         47 |
+----+-----------+------------+

Reference

Hide
Show

steps.row_subset (class)

Subset rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, subset: str, field_name: Optional[str] = None) -> None

Parameters

name (Optional[str])
title (Optional[str])
description (Optional[str])
subset (str)
field_name (Optional[str])

steps.row_subset.subset (property)

It can take different values such as "conflicts","distinct","duplicates" and "unique".

Signature

str

steps.row_subset.field_name (property)

Name of field to which the subset functions will be applied.

Signature

Optional[str]

Ungroup Rows

Example

Python

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform-groups.csv")
target = transform(
    source,
    steps=[
        steps.row_ungroup(group_name="name", selection="first"),
    ]
)
print(target.schema)
print(target.to_view())

{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'},
            {'name': 'year', 'type': 'integer'}]}
+----+-----------+------------+------+
| id | name      | population | year |
+====+===========+============+======+
|  3 | 'france'  |         66 | 2020 |
+----+-----------+------------+------+
|  1 | 'germany' |         83 | 2020 |
+----+-----------+------------+------+
|  5 | 'spain'   |         47 | 2020 |
+----+-----------+------------+------+

Reference

Hide
Show

steps.row_ungroup (class)

Ungroup rows. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, selection: str, group_name: str, value_name: Optional[str] = None) -> None

Parameters

name (Optional[str])
title (Optional[str])
description (Optional[str])
selection (str)
group_name (str)
value_name (Optional[str])

steps.row_ungroup.selection (property)

Specifies whether to return first or last row. The value can be "first", "last", "min" and "max".

Signature

str

steps.row_ungroup.group_name (property)

Field name which will be used to group the rows. And it returns the first or last row with each group based on the 'selection'.

Signature

str

steps.row_ungroup.value_name (property)

If the selection is set to "min" or "max", the rows will be grouped by "group_name" field and min or max value will be then selected from the "value_name" field.

Signature

Optional[str]

Cell Steps »

« Field Steps