These steps are row-based including row filtering, slicing, and many more.
This step filters rows based on a provided formula or function.
from pprint import pprint
from frictionless import Package, Resource, transform, steps
source = Resource(path="transform.csv")
target = transform(
source,
steps=[
steps.table_normalize(),
steps.row_filter(formula="id > 1"),
]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'},
{'name': 'population', 'type': 'integer'}]}
+----+----------+------------+
| id | name | population |
+====+==========+============+
| 2 | 'france' | 66 |
+----+----------+------------+
| 3 | 'spain' | 47 |
+----+----------+------------+
Filter rows. This step can be added using the `steps` parameter for the `transform` function.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, formula: Optional[Any] = None, function: Optional[Any] = None) -> None
Evaluatable expressions to filter the rows. Rows that matches the formula are returned and others are ignored. The expressions are processed using simpleeval library.
Optional[Any]
Python function to filter the row.
Optional[Any]
from pprint import pprint
from frictionless import Package, Resource, transform, steps
source = Resource(path="transform.csv")
target = transform(
source,
steps=[
steps.row_search(regex=r"^f.*"),
]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'},
{'name': 'population', 'type': 'integer'}]}
+----+----------+------------+
| id | name | population |
+====+==========+============+
| 2 | 'france' | 66 |
+----+----------+------------+
Search rows. This step can be added using the `steps` parameter for the `transform` function.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, regex: str, field_name: Optional[str] = None, negate: bool = False) -> None
Regex pattern to search for rows. If field_name is set it will only be applied to the specified field. For example, regex=r"^e.*".
str
Field name in which to search for.
Optional[str]
Whether to revert the result. If True, all the rows that does not match the pattern will be returned.
bool
from pprint import pprint
from frictionless import Package, Resource, transform, steps
source = Resource(path="transform.csv")
target = transform(
source,
steps=[
steps.row_slice(head=2),
]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'},
{'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name | population |
+====+===========+============+
| 1 | 'germany' | 83 |
+----+-----------+------------+
| 2 | 'france' | 66 |
+----+-----------+------------+
Slice rows. This step can be added using the `steps` parameter for the `transform` function.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, start: Optional[int] = None, stop: Optional[int] = None, step: Optional[int] = None, head: Optional[int] = None, tail: Optional[int] = None) -> None
Starting point from where to read the rows. If None, defaults to the beginning.
Optional[int]
Stopping point for reading row. If None, defaults to the end.
Optional[int]
It is the step size to read next row. If None, it defaults to 1.
Optional[int]
Number of rows to read from head.
Optional[int]
Number of rows to read from the bottom.
Optional[int]
from pprint import pprint
from frictionless import Package, Resource, transform, steps
source = Resource(path="transform.csv")
target = transform(
source,
steps=[
steps.row_sort(field_names=["name"]),
]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'},
{'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name | population |
+====+===========+============+
| 2 | 'france' | 66 |
+----+-----------+------------+
| 1 | 'germany' | 83 |
+----+-----------+------------+
| 3 | 'spain' | 47 |
+----+-----------+------------+
Sort rows. This step can be added using the `steps` parameter for the `transform` function.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, field_names: List[str], reverse: bool = False) -> None
List of field names by which the rows will be sorted. If fields more than 1, sort applies from left to right.
List[str]
The sort will be reversed if it is set to True.
bool
from pprint import pprint
from frictionless import Package, Resource, transform, steps
source = Resource(path="transform.csv")
target = transform(
source,
steps=[
steps.row_split(field_name="name", pattern="a"),
]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'},
{'name': 'population', 'type': 'integer'}]}
+----+--------+------------+
| id | name | population |
+====+========+============+
| 1 | 'germ' | 83 |
+----+--------+------------+
| 1 | 'ny' | 83 |
+----+--------+------------+
| 2 | 'fr' | 66 |
+----+--------+------------+
| 2 | 'nce' | 66 |
+----+--------+------------+
| 3 | 'sp' | 47 |
+----+--------+------------+
...
Split rows. This step can be added using the `steps` parameter for the `transform` function.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, pattern: str, field_name: str) -> None
Pattern to search for in one or more fields.
str
Field name whose cell value will be split.
str
from pprint import pprint
from frictionless import Package, Resource, transform, steps
source = Resource(path="transform.csv")
target = transform(
source,
steps=[
steps.field_update(name="id", value=1),
steps.row_subset(subset="conflicts", field_name="id"),
]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'},
{'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name | population |
+====+===========+============+
| 1 | 'germany' | 83 |
+----+-----------+------------+
| 1 | 'france' | 66 |
+----+-----------+------------+
| 1 | 'spain' | 47 |
+----+-----------+------------+
Subset rows. This step can be added using the `steps` parameter for the `transform` function.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, subset: str, field_name: Optional[str] = None) -> None
It can take different values such as "conflicts","distinct","duplicates" and "unique".
str
Name of field to which the subset functions will be applied.
Optional[str]
from pprint import pprint
from frictionless import Package, Resource, transform, steps
source = Resource(path="transform-groups.csv")
target = transform(
source,
steps=[
steps.row_ungroup(group_name="name", selection="first"),
]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
{'name': 'name', 'type': 'string'},
{'name': 'population', 'type': 'integer'},
{'name': 'year', 'type': 'integer'}]}
+----+-----------+------------+------+
| id | name | population | year |
+====+===========+============+======+
| 3 | 'france' | 66 | 2020 |
+----+-----------+------------+------+
| 1 | 'germany' | 83 | 2020 |
+----+-----------+------------+------+
| 5 | 'spain' | 47 | 2020 |
+----+-----------+------------+------+
Ungroup rows. This step can be added using the `steps` parameter for the `transform` function.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, selection: str, group_name: str, value_name: Optional[str] = None) -> None
Specifies whether to return first or last row. The value can be "first", "last", "min" and "max".
str
Field name which will be used to group the rows. And it returns the first or last row with each group based on the 'selection'.
str
If the selection is set to "min" or "max", the rows will be grouped by "group_name" field and min or max value will be then selected from the "value_name" field.
Optional[str]