Edit page in Livemark
(2024-01-29 13:37)

Table Classes

Table Header

After opening a resource you get access to a resource.header object which describes the resource in more detail. This is a list of normalized labels but also provides some additional functionality. Let's take a look:

from frictionless import Resource

with Resource('capital-3.csv') as resource:
  print(f'Header: {resource.header}')
  print(f'Labels: {resource.header.labels}')
  print(f'Fields: {resource.header.fields}')
  print(f'Field Names: {resource.header.field_names}')
  print(f'Field Numbers: {resource.header.field_numbers}')
  print(f'Errors: {resource.header.errors}')
  print(f'Valid: {resource.header.valid}')
  print(f'As List: {resource.header.to_list()}')
Header: ['id', 'name']
Labels: ['id', 'name']
Fields: [{'name': 'id', 'type': 'integer'}, {'name': 'name', 'type': 'string'}]
Field Names: ['id', 'name']
Field Numbers: [1, 2]
Errors: []
Valid: True
As List: ['id', 'name']

The example above shows a case when a header is valid. For a header that contains errors in its tabular structure, this information can be very useful, revealing discrepancies, duplicates or missing cell information:

from pprint import pprint
from frictionless import Resource

with Resource([['name', 'name'], ['value', 'value']]) as resource:
    pprint(resource.header.errors)
[{'type': 'duplicate-label',
 'title': 'Duplicate Label',
 'description': 'Two columns in the header row have the same value. Column '
                'names should be unique.',
 'message': 'Label "name" in the header at position "2" is duplicated to a '
            'label: at position "1"',
 'tags': ['#table', '#header', '#label'],
 'note': 'at position "1"',
 'labels': ['name', 'name'],
 'rowNumbers': [1],
 'label': 'name',
 'fieldName': 'name2',
 'fieldNumber': 2}]

Table Row

The extract, resource.read_rows() and other functions return or yield row objects. In Python, this returns a dictionary with the following information. Note: this example uses the Detector object, which tweaks how different aspects of metadata are detected.

from frictionless import Resource, Detector

detector = Detector(schema_patch={'missingValues': ['1']})
with Resource('capital-3.csv', detector=detector) as resource:
  for row in resource.row_stream:
    print(f'Row: {row}')
    print(f'Cells: {row.cells}')
    print(f'Fields: {row.fields}')
    print(f'Field Names: {row.field_names}')
    print(f'Value of field "name": {row["name"]}') # accessed as a dict
    print(f'Row Number: {row.row_number}') # counted row number starting from 1
    print(f'Blank Cells: {row.blank_cells}')
    print(f'Error Cells: {row.error_cells}')
    print(f'Errors: {row.errors}')
    print(f'Valid: {row.valid}')
    print(f'As Dict: {row.to_dict(json=False)}')
    print(f'As List: {row.to_list(json=True)}') # JSON compatible data types
    break
Row: {'id': None, 'name': 'London'}
Cells: ['1', 'London']
Fields: [{'name': 'id', 'type': 'integer'}, {'name': 'name', 'type': 'string'}]
Field Names: ['id', 'name']
Value of field "name": London
Row Number: 2
Blank Cells: {'id': '1'}
Error Cells: {}
Errors: []
Valid: True
As Dict: {'id': None, 'name': 'London'}
As List: [None, 'London']

As we can see, this output provides a lot of information which is especially useful when a row is not valid. Our row is valid but we demonstrated how it can preserve data about missing values. It also preserves data about all cells that contain errors:

from pprint import pprint
from frictionless import Resource

with Resource([['name'], ['value', 'value']]) as resource:
    for row in resource.row_stream:
        pprint(row.errors)
[{'type': 'extra-cell',
 'title': 'Extra Cell',
 'description': 'This row has more values compared to the header row (the '
                'first row in the data source). A key concept is that all the '
                'rows in tabular data must have the same number of columns.',
 'message': 'Row at position "2" has an extra value in field at position "2"',
 'tags': ['#table', '#row', '#cell'],
 'note': '',
 'cells': ['value', 'value'],
 'rowNumber': 2,
 'cell': 'value',
 'fieldName': '',
 'fieldNumber': 2}]

Reference

Header (class)

Row (class)

Header (class)

Header representation > Constructor of this object is not Public API

Signature

(labels: List[str], *, fields: List[Field], row_numbers: List[int], ignore_case: bool = False)

Parameters

  • labels (List[str]): header row labels
  • fields (List[Field]): table fields
  • row_numbers (List[int]): row numbers
  • ignore_case (bool): ignore case

header.to_list (method)

Convert to a list

header.to_str (method)

Row (class)

Row representation > Constructor of this object is not Public API This object is returned by `extract`, `resource.read_rows`, and other functions. ```python rows = extract("data/table.csv") for row in rows: # work with the Row ```

Signature

(cells: List[Any], *, field_info: Dict[str, Any], row_number: int)

Parameters

  • cells (List[Any]): array of cells
  • field_info (Dict[str, Any]): special field info structure
  • row_number (int): row number from 1

row.to_dict (method)

Signature

(*, csv: bool = False, json: bool = False, types: Optional[List[str]] = None) -> Dict[str, Any]

Parameters

  • csv (bool)
  • json (bool): make data types compatible with JSON format
  • types (Optional[List[str]])

row.to_list (method)

Signature

(*, json: bool = False, types: Optional[List[str]] = None)

Parameters

  • json (bool): make data types compatible with JSON format
  • types (Optional[List[str]]): list of supported types

row.to_str (method)

Signature

(**options: Any)

Parameters

  • options (Any)