Edit page in Livemark
(2022-09-19 18:33)

Dialect Class

The Dialect concept give us an ability to manage table header and any details related to specific formats.

Dialect

The Dialect class instance are accepted by many classes and functions:

You just need to create a Dialect instance using desired options and pass to the classed and function from above. We will show it on this examplar table:

cat capital-3.csv
id,name
1,London
2,Berlin
3,Paris
4,Madrid
5,Rome

Header

It's a boolean flag which defaults to True indicating whether the data has a header row or not. In the following example the header row will be treated as a data row:

from frictionless import Resource, Dialect

dialect = Dialect(header=False)
with Resource('capital-3.csv', dialect=dialect) as resource:
      print(resource.header.labels)
      print(resource.to_view())
[]
+--------+----------+
| field1 | field2   |
+========+==========+
| 'id'   | 'name'   |
+--------+----------+
| '1'    | 'London' |
+--------+----------+
| '2'    | 'Berlin' |
+--------+----------+
| '3'    | 'Paris'  |
+--------+----------+
| '4'    | 'Madrid' |
+--------+----------+
...

Header Rows

If header is True which is default, this parameters indicates where to find the header row or header rows for a multiline header. Let's see on example how the first two data rows can be treated as a part of a header:

from frictionless import Resource, Dialect

dialect = Dialect(header_rows=[1, 2, 3])
with Resource('capital-3.csv', dialect=dialect) as resource:
    print(resource.header)
    print(resource.to_view())
['id 1 2', 'name London Berlin']
+--------+--------------------+
| id 1 2 | name London Berlin |
+========+====================+
|      3 | 'Paris'            |
+--------+--------------------+
|      4 | 'Madrid'           |
+--------+--------------------+
|      5 | 'Rome'             |
+--------+--------------------+

Header Join

If there are multiple header rows which is managed by header_rows parameter, we can set a string to be a separator for a header's cell join operation. Usually it's very handy for some "fancy" Excel files. For the sake of simplicity, we will show on a CSV file:

from frictionless import Resource, Dialect

dialect = Dialect(header_rows=[1, 2, 3], header_join='/')
with Resource('capital-3.csv', dialect=dialect) as resource:
    print(resource.header)
    print(resource.to_view())
['id/1/2', 'name/London/Berlin']
+--------+--------------------+
| id/1/2 | name/London/Berlin |
+========+====================+
|      3 | 'Paris'            |
+--------+--------------------+
|      4 | 'Madrid'           |
+--------+--------------------+
|      5 | 'Rome'             |
+--------+--------------------+

Header Case

By default a header is validated in a case sensitive mode. To disable this behaviour we can set the header_case parameter to False. This option is accepted by any Dialect and a dialect can be passed to extract, validate and other functions. Please note that it doesn't affect a resulting header it only affects how it's validated:

from frictionless import Resource, Schema, Dialect, fields

dialect = Dialect(header_case=False)
schema = Schema(fields=[fields.StringField(name="ID"), fields.StringField(name="NAME")])
with Resource('capital-3.csv', dialect=dialect, schema=schema) as resource:
  print(f'Header: {resource.header}')
  print(f'Valid: {resource.header.valid}')  # without "header_case" it will have 2 errors
Header: ['ID', 'NAME']
Valid: True

Reference

Dialect (class)

Control (class)

Dialect (class)

Dialect representation

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, header: bool = True, header_rows: List[int] = NOTHING, header_join: str = , header_case: bool = True, comment_char: Optional[str] = None, comment_rows: List[int] = NOTHING, controls: List[Control] = NOTHING) -> None

Parameters

  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • header (bool)
  • header_rows (List[int])
  • header_join (str)
  • header_case (bool)
  • comment_char (Optional[str])
  • comment_rows (List[int])
  • controls (List[Control])

dialect.name (property)

NOTE: add docs

Signature

Optional[str]

dialect.title (property)

NOTE: add docs

Signature

Optional[str]

dialect.description (property)

NOTE: add docs

Signature

Optional[str]

dialect.header (property)

NOTE: add docs

Signature

bool

dialect.header_rows (property)

NOTE: add docs

Signature

List[int]

dialect.header_join (property)

NOTE: add docs

Signature

str

dialect.header_case (property)

NOTE: add docs

Signature

bool

dialect.comment_char (property)

NOTE: add docs

Signature

Optional[str]

dialect.comment_rows (property)

NOTE: add docs

Signature

List[int]

dialect.controls (property)

NOTE: add docs

Signature

List[Control]

dialect.add_control (method)

Add new control to the schema

Signature

(control: Control) -> None

Parameters

  • control (Control)

Dialect.describe (method) (static)

Describe the given source as a dialect

Signature

(source: Optional[Any] = None, **options)

Parameters

  • source (Optional[Any]): data source
  • options

dialect.get_control (method)

Get control by type

Signature

(type: str) -> Control

Parameters

  • type (str)

dialect.has_control (method)

Check if control is present

Signature

(type: str)

Parameters

  • type (str)

dialect.set_control (method)

Set control by type

Signature

(control: Control) -> Optional[Control]

Parameters

  • control (Control)

Control (class)

Control representation

Signature

(*, title: Optional[str] = None, description: Optional[str] = None) -> None

Parameters

  • title (Optional[str])
  • description (Optional[str])

control.type (property)

NOTE: add docs

Signature

ClassVar[str]

control.title (property)

NOTE: add docs

Signature

Optional[str]

control.description (property)

NOTE: add docs

Signature

Optional[str]

It's a beta version of Frictionless Framework (v5). Read Frictionless Framework (v4) docs for a version that is currently installed by default by pip.