After opening a resource you get access to a resource.header
object which describes the resource in more detail. This is a list of normalized labels but also provides some additional functionality. Let's take a look:
from frictionless import Resource
with Resource('capital-3.csv') as resource:
print(f'Header: {resource.header}')
print(f'Labels: {resource.header.labels}')
print(f'Fields: {resource.header.fields}')
print(f'Field Names: {resource.header.field_names}')
print(f'Field Numbers: {resource.header.field_numbers}')
print(f'Errors: {resource.header.errors}')
print(f'Valid: {resource.header.valid}')
print(f'As List: {resource.header.to_list()}')
Header: ['id', 'name']
Labels: ['id', 'name']
Fields: [{'name': 'id', 'type': 'integer'}, {'name': 'name', 'type': 'string'}]
Field Names: ['id', 'name']
Field Numbers: [1, 2]
Errors: []
Valid: True
As List: ['id', 'name']
The example above shows a case when a header is valid. For a header that contains errors in its tabular structure, this information can be very useful, revealing discrepancies, duplicates or missing cell information:
from pprint import pprint
from frictionless import Resource
with Resource([['name', 'name'], ['value', 'value']]) as resource:
pprint(resource.header.errors)
[{'type': 'duplicate-label',
'title': 'Duplicate Label',
'description': 'Two columns in the header row have the same value. Column '
'names should be unique.',
'message': 'Label "name" in the header at position "2" is duplicated to a '
'label: at position "1"',
'tags': ['#table', '#header', '#label'],
'note': 'at position "1"',
'labels': ['name', 'name'],
'rowNumbers': [1],
'label': 'name',
'fieldName': 'name2',
'fieldNumber': 2}]
The extract
, resource.read_rows()
and other functions return or yield row objects. In Python, this returns a dictionary with the following information. Note: this example uses the Detector object, which tweaks how different aspects of metadata are detected.
from frictionless import Resource, Detector
detector = Detector(schema_patch={'missingValues': ['1']})
with Resource('capital-3.csv', detector=detector) as resource:
for row in resource.row_stream:
print(f'Row: {row}')
print(f'Cells: {row.cells}')
print(f'Fields: {row.fields}')
print(f'Field Names: {row.field_names}')
print(f'Value of field "name": {row["name"]}') # accessed as a dict
print(f'Row Number: {row.row_number}') # counted row number starting from 1
print(f'Blank Cells: {row.blank_cells}')
print(f'Error Cells: {row.error_cells}')
print(f'Errors: {row.errors}')
print(f'Valid: {row.valid}')
print(f'As Dict: {row.to_dict(json=False)}')
print(f'As List: {row.to_list(json=True)}') # JSON compatible data types
break
Row: {'id': None, 'name': 'London'}
Cells: ['1', 'London']
Fields: [{'name': 'id', 'type': 'integer'}, {'name': 'name', 'type': 'string'}]
Field Names: ['id', 'name']
Value of field "name": London
Row Number: 2
Blank Cells: {'id': '1'}
Error Cells: {}
Errors: []
Valid: True
As Dict: {'id': None, 'name': 'London'}
As List: [None, 'London']
As we can see, this output provides a lot of information which is especially useful when a row is not valid. Our row is valid but we demonstrated how it can preserve data about missing values. It also preserves data about all cells that contain errors:
from pprint import pprint
from frictionless import Resource
with Resource([['name'], ['value', 'value']]) as resource:
for row in resource.row_stream:
pprint(row.errors)
[{'type': 'extra-cell',
'title': 'Extra Cell',
'description': 'This row has more values compared to the header row (the '
'first row in the data source). A key concept is that all the '
'rows in tabular data must have the same number of columns.',
'message': 'Row at position "2" has an extra value in field at position "2"',
'tags': ['#table', '#row', '#cell'],
'note': '',
'cells': ['value', 'value'],
'rowNumber': 2,
'cell': 'value',
'fieldName': '',
'fieldNumber': 2}]
Header representation > Constructor of this object is not Public API
(labels: List[str], *, fields: List[Field], row_numbers: List[int], ignore_case: bool = False)
Convert to a list
Row representation > Constructor of this object is not Public API This object is returned by `extract`, `resource.read_rows`, and other functions. ```python rows = extract("data/table.csv") for row in rows: # work with the Row ```
(cells: List[Any], *, field_info: Dict[str, Any], row_number: int)
(*, csv: bool = False, json: bool = False, types: Optional[List[str]] = None) -> Dict[str, Any]
(*, json: bool = False, types: Optional[List[str]] = None)
(**options: Any)