The Table Dialect is a core Frictionless Data concept meaning a metadata information regarding tabular data source. The Table Dialect concept give us an ability to manage table header and any details related to specific formats.
The Dialect class instance are accepted by many classes and functions:
You just need to create a Dialect instance using desired options and pass to the classed and function from above. We will show it on this examplar table:
cat capital-3.csv
id,name
1,London
2,Berlin
3,Paris
4,Madrid
5,Rome
It's a boolean flag which defaults to True
indicating whether the data has a header row or not. In the following example the header row will be treated as a data row:
from frictionless import Resource, Dialect
dialect = Dialect(header=False)
with Resource('capital-3.csv', dialect=dialect) as resource:
print(resource.header.labels)
print(resource.to_view())
[]
+--------+----------+
| field1 | field2 |
+========+==========+
| 'id' | 'name' |
+--------+----------+
| '1' | 'London' |
+--------+----------+
| '2' | 'Berlin' |
+--------+----------+
| '3' | 'Paris' |
+--------+----------+
| '4' | 'Madrid' |
+--------+----------+
...
If header is True
which is default, this parameters indicates where to find the header row or header rows for a multiline header. Let's see on example how the first two data rows can be treated as a part of a header:
from frictionless import Resource, Dialect
dialect = Dialect(header_rows=[1, 2, 3])
with Resource('capital-3.csv', dialect=dialect) as resource:
print(resource.header)
print(resource.to_view())
['id 1 2', 'name London Berlin']
+--------+--------------------+
| id 1 2 | name London Berlin |
+========+====================+
| 3 | 'Paris' |
+--------+--------------------+
| 4 | 'Madrid' |
+--------+--------------------+
| 5 | 'Rome' |
+--------+--------------------+
If there are multiple header rows which is managed by header_rows
parameter, we can set a string to be a separator for a header's cell join operation. Usually it's very handy for some "fancy" Excel files. For the sake of simplicity, we will show on a CSV file:
from frictionless import Resource, Dialect
dialect = Dialect(header_rows=[1, 2, 3], header_join='/')
with Resource('capital-3.csv', dialect=dialect) as resource:
print(resource.header)
print(resource.to_view())
['id/1/2', 'name/London/Berlin']
+--------+--------------------+
| id/1/2 | name/London/Berlin |
+========+====================+
| 3 | 'Paris' |
+--------+--------------------+
| 4 | 'Madrid' |
+--------+--------------------+
| 5 | 'Rome' |
+--------+--------------------+
By default a header is validated in a case sensitive mode. To disable this behaviour we can set the header_case
parameter to False
. This option is accepted by any Dialect and a dialect can be passed to extract
, validate
and other functions. Please note that it doesn't affect a resulting header it only affects how it's validated:
from frictionless import Resource, Schema, Dialect, fields
dialect = Dialect(header_case=False)
schema = Schema(fields=[fields.StringField(name="ID"), fields.StringField(name="NAME")])
with Resource('capital-3.csv', dialect=dialect, schema=schema) as resource:
print(f'Header: {resource.header}')
print(f'Valid: {resource.header.valid}') # without "header_case" it will have 2 errors
Header: ['ID', 'NAME']
Valid: True
Specifies char used to comment the rows:
from frictionless import Resource, Dialect
dialect = Dialect(comment_char="#")
with Resource(b'name\n#row1\nrow2', format="csv", dialect=dialect) as resource:
print(resource.read_rows())
[{'name': 'row2'}]
A list of rows to ignore:
from frictionless import Resource, Dialect
dialect = Dialect(comment_rows=[2])
with Resource(b'name\nrow1\nrow2', format="csv", dialect=dialect) as resource:
print(resource.read_rows())
[{'name': 'row2'}]
Ignores rows if they are completely blank.
from frictionless import Resource, Dialect
dialect = Dialect(skip_blank_rows=True)
with Resource(b'name\n\nrow2', format="csv", dialect=dialect) as resource:
print(resource.read_rows())
[{'name': 'row2'}]
Dialect representation
(*, descriptor: Optional[Union[types.IDescriptor, str]] = None, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, header: bool = True, header_rows: List[int] = NOTHING, header_join: str = , header_case: bool = True, comment_char: Optional[str] = None, comment_rows: List[int] = NOTHING, skip_blank_rows: bool = False, controls: List[Control] = NOTHING) -> None
# TODO: add docs
Optional[Union[types.IDescriptor, str]]
A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “_” or “-” characters.
Optional[str]
Type of the object
ClassVar[Union[str, None]]
A human-oriented title for the Dialect.
Optional[str]
A brief description of the Dialect.
Optional[str]
If true, the header will be read else header will be skipped.
bool
Specifies the row numbers for the header. Default is [1].
List[int]
Separator to join text of two column's. The default value is " " and other values could be ":", "-" etc.
str
If set to false, it does case insensitive matching of header. The default value is True.
bool
Specifies char used to comment the rows. The default value is None. For example: "#".
Optional[str]
A list of rows to ignore. For example: [1, 2]
List[int]
Ignores rows if they are completely blank
bool
A list of controls which defines different aspects of reading data.
List[Control]
Add new control to the schema
(control: Control) -> None
Describe the given source as a dialect
(source: Optional[Any] = None, **options: Any) -> Dialect
Get control by type
(type: str) -> Control
Check if control is present
(type: str)
Set control by type
(control: Control) -> Optional[Control]
Control representation. This class is the base class for all the control classes that are used to set the states of various different components.
(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None) -> None
A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “_” or “-” characters.
Optional[str]
Type of the control. It could be a zenodo plugin control, csv control etc. For example: "csv", "zenodo" etc
ClassVar[str]
A human-oriented title for the control.
Optional[str]
A brief description of the control.
Optional[str]