Edit page in Livemark
(2024-11-07 15:17)

Table Steps

These steps are meant to be used on a table level of a resource. This includes various different operations from simple validation or writing to the disc to complex re-shaping like pivoting or melting.

Aggregate Table

Group rows under the given group_name then apply aggregation functions provided as aggregation dictionary (see example)

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform-groups.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.table_aggregate(
            group_name="name", aggregation={"sum": ("population", sum)}
        ),
    ],
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'name', 'type': 'string'}, {'name': 'sum', 'type': 'any'}]}
+-----------+-----+
| name      | sum |
+===========+=====+
| 'france'  | 120 |
+-----------+-----+
| 'germany' | 160 |
+-----------+-----+
| 'spain'   |  80 |
+-----------+-----+

Reference

steps.table_aggregate (class)

steps.table_aggregate (class)

Aggregate table. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, aggregation: Dict[str, Any], group_name: str) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • aggregation (Dict[str, Any])
  • group_name (str)

steps.table_aggregate.aggregation (property)

A dictionary with aggregation function. The values could be max, min, len and sum.

Signature

Dict[str, Any]

steps.table_aggregate.group_name (property)

Field by which the rows will be grouped.

Signature

str

Attach Table

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
      steps.table_attach(resource=Resource(data=[["note"], ["large"], ["mid"]])),
    ],
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'},
            {'name': 'note', 'type': 'string'}]}
+----+-----------+------------+---------+
| id | name      | population | note    |
+====+===========+============+=========+
|  1 | 'germany' |         83 | 'large' |
+----+-----------+------------+---------+
|  2 | 'france'  |         66 | 'mid'   |
+----+-----------+------------+---------+
|  3 | 'spain'   |         47 | None    |
+----+-----------+------------+---------+

Reference

steps.table_attach (class)

steps.table_attach (class)

Attach table. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, resource: Union[Resource, str]) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • resource (Union[Resource, str])

steps.table_attach.resource (property)

Data Resource to attach to the existing table.

Signature

Union[Resource, str]

Debug Table

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
      steps.table_debug(function=print),
    ],
)
print(target.to_view())
{'id': 1, 'name': 'germany', 'population': 83}
{'id': 2, 'name': 'france', 'population': 66}
{'id': 3, 'name': 'spain', 'population': 47}
+----+-----------+------------+
| id | name      | population |
+====+===========+============+
|  1 | 'germany' |         83 |
+----+-----------+------------+
|  2 | 'france'  |         66 |
+----+-----------+------------+
|  3 | 'spain'   |         47 |
+----+-----------+------------+

Reference

steps.table_debug (class)

steps.table_debug (class)

Debug table. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, function: Any) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • function (Any)

steps.table_debug.function (property)

Debug function to apply to the table row.

Signature

Any

Diff Tables

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.table_diff(
            resource=Resource(
                data=[
                    ["id", "name", "population"],
                    [1, "germany", 83],
                    [2, "france", 50],
                    [3, "spain", 47],
                ]
            )
        ),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+----------+------------+
| id | name     | population |
+====+==========+============+
|  2 | 'france' |         66 |
+----+----------+------------+

Reference

steps.table_diff (class)

steps.table_diff (class)

Diff tables. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, resource: Union[Resource, str], ignore_order: bool = False, use_hash: bool = False) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • resource (Union[Resource, str])
  • ignore_order (bool)
  • use_hash (bool)

steps.table_diff.resource (property)

Resource with which to compare.

Signature

Union[Resource, str]

steps.table_diff.ignore_order (property)

Specifies whether to ignore the order of the rows.

Signature

bool

steps.table_diff.use_hash (property)

Specifies whether to use hash or not. If yes, alternative implementation will be used where the complement is executed by constructing an in-memory set for all rows found in the right hand table. For more information please see the link below: https://petl.readthedocs.io/en/stable/transform.html#petl.transform.setops.hashcomplement

Signature

bool

Intersect Tables

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.table_intersect(
            resource=Resource(
                data=[
                    ["id", "name", "population"],
                    [1, "germany", 83],
                    [2, "france", 50],
                    [3, "spain", 47],
                ]
            ),
        ),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name      | population |
+====+===========+============+
|  1 | 'germany' |         83 |
+----+-----------+------------+
|  3 | 'spain'   |         47 |
+----+-----------+------------+

Reference

steps.table_intersect (class)

steps.table_intersect (class)

Intersect tables. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, resource: Union[Resource, str], use_hash: bool = False) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • resource (Union[Resource, str])
  • use_hash (bool)

steps.table_intersect.resource (property)

Resource with which to apply intersection.

Signature

Union[Resource, str]

steps.table_intersect.use_hash (property)

Specifies whether to use hash or not. If yes, an alternative implementation will be used. For more information please see the link below: https://petl.readthedocs.io/en/stable/transform.html#petl.transform.setops.hashintersection

Signature

bool

Join Tables

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.table_join(
            resource=Resource(data=[["id", "note"], [1, "beer"], [2, "vine"]]),
            field_name="id",
        ),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'},
            {'name': 'note', 'type': 'string'}]}
+----+-----------+------------+--------+
| id | name      | population | note   |
+====+===========+============+========+
|  1 | 'germany' |         83 | 'beer' |
+----+-----------+------------+--------+
|  2 | 'france'  |         66 | 'vine' |
+----+-----------+------------+--------+

Reference

steps.table_join (class)

steps.table_join (class)

Join tables. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, resource: Union[Resource, str], field_name: Optional[str] = None, use_hash: bool = False, mode: str = inner) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • resource (Union[Resource, str])
  • field_name (Optional[str])
  • use_hash (bool)
  • mode (str)

steps.table_join.resource (property)

Resource with which to apply join.

Signature

Union[Resource, str]

steps.table_join.field_name (property)

Field name with which the join will be performed comparing it's value between two tables. If not provided natural join is tried. For more information, please see the following document: https://petl.readthedocs.io/en/stable/_modules/petl/transform/joins.html

Signature

Optional[str]

steps.table_join.use_hash (property)

Specify whether to use hash or not. If True, an alternative implementation of join will be used.

Signature

bool

steps.table_join.mode (property)

Specifies which mode to use. The available modes are: "inner", "left", "right", "outer", "cross" and "negate". The default mode is "inner".

Signature

str

Melt Table

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.table_melt(field_name="name"),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'name', 'type': 'string'},
            {'name': 'variable', 'type': 'string'},
            {'name': 'value', 'type': 'any'}]}
+-----------+--------------+-------+
| name      | variable     | value |
+===========+==============+=======+
| 'germany' | 'id'         |     1 |
+-----------+--------------+-------+
| 'germany' | 'population' |    83 |
+-----------+--------------+-------+
| 'france'  | 'id'         |     2 |
+-----------+--------------+-------+
| 'france'  | 'population' |    66 |
+-----------+--------------+-------+
| 'spain'   | 'id'         |     3 |
+-----------+--------------+-------+
...

Reference

steps.table_melt (class)

steps.table_melt (class)

Melt tables. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, field_name: str, variables: Optional[str] = None, to_field_names: List[str] = NOTHING) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • field_name (str)
  • variables (Optional[str])
  • to_field_names (List[str])

steps.table_melt.field_name (property)

Field name which will be use to melt table. It will keep the field 'field_name' as it is but melt other fields into data.

Signature

str

steps.table_melt.variables (property)

List of name of fields which will be melted into data.

Signature

Optional[str]

steps.table_melt.to_field_names (property)

Labels for new fields that will be created "variable" and "value".

Signature

List[str]

Merge Tables

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_merge(
            resource=Resource(data=[["id", "name", "note"], [4, "malta", "island"]])
        ),
    ]
)
print(target.schema)
print(target.to_view())

Reference

steps.table_merge (class)

steps.table_merge (class)

Merge tables. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, resource: Union[Resource, str], field_names: List[str] = NOTHING, sort_by_field: Optional[str] = None, ignore_fields: bool = False) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • resource (Union[Resource, str])
  • field_names (List[str])
  • sort_by_field (Optional[str])
  • ignore_fields (bool)

steps.table_merge.resource (property)

Resource to merge with.

Signature

Union[Resource, str]

steps.table_merge.field_names (property)

Specifies fixed headers for output table.

Signature

List[str]

steps.table_merge.sort_by_field (property)

Field name by which to sort the record after merging.

Signature

Optional[str]

steps.table_merge.ignore_fields (property)

If ignore_fields is set to True, it will merge two resource without matching headers.

Signature

bool

Normalize Table

The table_normalize step normalizes an underlaying tabular stream (cast types and fix dimensions) according to a provided or inferred schema. If your data is not really big it's recommended to normalize a table before any others steps.

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource("table.csv")
print(source.read_cells())
target = transform(
    source,
    steps=[
        steps.table_normalize(),
    ]
)
print(target.read_cells())
[['id', 'name'], ['1', 'english'], ['2', '中国人']]
[['id', 'name'], [1, 'english'], [2, '中国人']]

Reference

steps.table_normalize (class)

steps.table_normalize (class)

Normalize table. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])

Pivot Table

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform-pivot.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.table_pivot(f1="region", f2="gender", f3="units", aggfun=sum),
    ]
)
print(target.schema)
print(target.to_view())

Reference

steps.table_pivot (class)

steps.table_pivot (class)

Pivot table. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, f1: str, f2: str, f3: str, aggfun: Any) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • f1 (str)
  • f2 (str)
  • f3 (str)
  • aggfun (Any)

steps.table_pivot.f1 (property)

Field that makes the rows in the output pivot table.

Signature

str

steps.table_pivot.f2 (property)

Field that makes the columns in the output pivot table.

Signature

str

steps.table_pivot.f3 (property)

Field that forms the data in the output pivot table.

Signature

str

steps.table_pivot.aggfun (property)

Function to process and create data in the output pivot table. The function can be "sum", "max", "min", "len" etc.

Signature

Any

Print Table

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.table_print(),
    ]
)
==  =======  ==========
id  name     population
==  =======  ==========
 1  germany          83
 2  france           66
 3  spain            47
==  =======  ==========

Reference

steps.table_print (class)

steps.table_print (class)

Print table. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])

Recast Table

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.table_melt(field_name="id"),
        steps.table_recast(field_name="id"),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
+----+-----------+------------+
| id | name      | population |
+====+===========+============+
|  1 | 'germany' |         83 |
+----+-----------+------------+
|  2 | 'france'  |         66 |
+----+-----------+------------+
|  3 | 'spain'   |         47 |
+----+-----------+------------+

Reference

steps.table_recast (class)

steps.table_recast (class)

Recast table. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, field_name: str, from_field_names: List[str] = NOTHING) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • field_name (str)
  • from_field_names (List[str])

steps.table_recast.field_name (property)

Recast table by the field 'field_name'.

Signature

str

steps.table_recast.from_field_names (property)

List of field names for the output table.

Signature

List[str]

Transpose Table

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_normalize(),
        steps.table_transpose(),
    ]
)
print(target.schema)
print(target.to_view())
{'fields': [{'name': 'id', 'type': 'string'},
            {'name': '1', 'type': 'any'},
            {'name': '2', 'type': 'any'},
            {'name': '3', 'type': 'any'}]}
+--------------+-----------+----------+---------+
| id           | 1         | 2        | 3       |
+==============+===========+==========+=========+
| 'name'       | 'germany' | 'france' | 'spain' |
+--------------+-----------+----------+---------+
| 'population' |        83 |       66 |      47 |
+--------------+-----------+----------+---------+

Reference

steps.table_transpose (class)

steps.table_transpose (class)

Transpose table. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])

Validate Table

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.cell_set(field_name="population", value="bad"),
        steps.table_validate(),
    ]
)
pprint(target.schema)
try:
  pprint(target.to_view())
except Exception as exception:
  pprint(exception)
{'fields': [{'name': 'id', 'type': 'integer'},
            {'name': 'name', 'type': 'string'},
            {'name': 'population', 'type': 'integer'}]}
FrictionlessException('[step-error] Step is not valid: "table_validate" raises "[type-error] Type error in the cell "bad" in row "2" and field "population" at position "3": type is "integer/default" " ')

Reference

steps.table_validate (class)

steps.table_validate (class)

Validate table. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])

Write Table

Example

from pprint import pprint
from frictionless import Package, Resource, transform, steps

source = Resource(path="transform.csv")
target = transform(
    source,
    steps=[
        steps.table_write(path='transform.json'),
    ]
)

Let's read the output:

cat transform.json
[
  [
    "id",
    "name",
    "population"
  ],
  [
    1,
    "germany",
    83
  ],
  [
    2,
    "france",
    66
  ],
  [
    3,
    "spain",
    47
  ]
]
with open('transform.json') as file:
    print(file.read())
[
  [
    "id",
    "name",
    "population"
  ],
  [
    1,
    "germany",
    83
  ],
  [
    2,
    "france",
    66
  ],
  [
    3,
    "spain",
    47
  ]
]

Reference

steps.table_write (class)

steps.table_write (class)

Write table. This step can be added using the `steps` parameter for the `transform` function.

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, path: str) -> None

Parameters
  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • path (str)

steps.table_write.path (property)

Path of the file to write the table content.

Signature

str