Edit page in Livemark
(2023-01-25 11:55)

Dialect Class

The Table Dialect is a core Frictionless Data concept meaning a metadata information regarding tabular data source. The Table Dialect concept give us an ability to manage table header and any details related to specific formats.

Dialect

The Dialect class instance are accepted by many classes and functions:

You just need to create a Dialect instance using desired options and pass to the classed and function from above. We will show it on this examplar table:

cat capital-3.csv
id,name
1,London
2,Berlin
3,Paris
4,Madrid
5,Rome

Header

It's a boolean flag which defaults to True indicating whether the data has a header row or not. In the following example the header row will be treated as a data row:

from frictionless import Resource, Dialect

dialect = Dialect(header=False)
with Resource('capital-3.csv', dialect=dialect) as resource:
      print(resource.header.labels)
      print(resource.to_view())
[]
+--------+----------+
| field1 | field2   |
+========+==========+
| 'id'   | 'name'   |
+--------+----------+
| '1'    | 'London' |
+--------+----------+
| '2'    | 'Berlin' |
+--------+----------+
| '3'    | 'Paris'  |
+--------+----------+
| '4'    | 'Madrid' |
+--------+----------+
...

Header Rows

If header is True which is default, this parameters indicates where to find the header row or header rows for a multiline header. Let's see on example how the first two data rows can be treated as a part of a header:

from frictionless import Resource, Dialect

dialect = Dialect(header_rows=[1, 2, 3])
with Resource('capital-3.csv', dialect=dialect) as resource:
    print(resource.header)
    print(resource.to_view())
['id 1 2', 'name London Berlin']
+--------+--------------------+
| id 1 2 | name London Berlin |
+========+====================+
|      3 | 'Paris'            |
+--------+--------------------+
|      4 | 'Madrid'           |
+--------+--------------------+
|      5 | 'Rome'             |
+--------+--------------------+

Header Join

If there are multiple header rows which is managed by header_rows parameter, we can set a string to be a separator for a header's cell join operation. Usually it's very handy for some "fancy" Excel files. For the sake of simplicity, we will show on a CSV file:

from frictionless import Resource, Dialect

dialect = Dialect(header_rows=[1, 2, 3], header_join='/')
with Resource('capital-3.csv', dialect=dialect) as resource:
    print(resource.header)
    print(resource.to_view())
['id/1/2', 'name/London/Berlin']
+--------+--------------------+
| id/1/2 | name/London/Berlin |
+========+====================+
|      3 | 'Paris'            |
+--------+--------------------+
|      4 | 'Madrid'           |
+--------+--------------------+
|      5 | 'Rome'             |
+--------+--------------------+

Header Case

By default a header is validated in a case sensitive mode. To disable this behaviour we can set the header_case parameter to False. This option is accepted by any Dialect and a dialect can be passed to extract, validate and other functions. Please note that it doesn't affect a resulting header it only affects how it's validated:

from frictionless import Resource, Schema, Dialect, fields

dialect = Dialect(header_case=False)
schema = Schema(fields=[fields.StringField(name="ID"), fields.StringField(name="NAME")])
with Resource('capital-3.csv', dialect=dialect, schema=schema) as resource:
  print(f'Header: {resource.header}')
  print(f'Valid: {resource.header.valid}')  # without "header_case" it will have 2 errors
Header: ['ID', 'NAME']
Valid: True

Comment Char

Specifies char used to comment the rows:

from frictionless import Resource, Dialect

dialect = Dialect(comment_char="#")
with Resource(b'name\n#row1\nrow2', format="csv", dialect=dialect) as resource:
    print(resource.read_rows())
[{'name': 'row2'}]

Comment Rows

A list of rows to ignore:

from frictionless import Resource, Dialect

dialect = Dialect(comment_rows=[2])
with Resource(b'name\nrow1\nrow2', format="csv", dialect=dialect) as resource:
    print(resource.read_rows())
[{'name': 'row2'}]

Skip Blank Rows

Ignores rows if they are completely blank.

from frictionless import Resource, Dialect

dialect = Dialect(skip_blank_rows=True)
with Resource(b'name\n\nrow2', format="csv", dialect=dialect) as resource:
    print(resource.read_rows())
[{'name': 'row2'}]

Reference

Dialect (class)

Control (class)

Dialect (class)

Dialect representation

Signature

(*, name: Optional[str] = None, title: Optional[str] = None, description: Optional[str] = None, header: bool = True, header_rows: List[int] = NOTHING, header_join: str = , header_case: bool = True, comment_char: Optional[str] = None, comment_rows: List[int] = NOTHING, skip_blank_rows: bool = False, controls: List[Control] = NOTHING) -> None

Parameters

  • name (Optional[str])
  • title (Optional[str])
  • description (Optional[str])
  • header (bool)
  • header_rows (List[int])
  • header_join (str)
  • header_case (bool)
  • comment_char (Optional[str])
  • comment_rows (List[int])
  • skip_blank_rows (bool)
  • controls (List[Control])

dialect.name (property)

A short url-usable (and preferably human-readable) name. This MUST be lower-case and contain only alphanumeric characters along with “_” or “-” characters.

Signature

Optional[str]

dialect.title (property)

A human-oriented title for the Dialect.

Signature

Optional[str]

dialect.description (property)

A brief description of the Dialect.

Signature

Optional[str]

dialect.header (property)

If true, the header will be read else header will be skipped.

Signature

bool

dialect.header_rows (property)

Specifies the row numbers for the header. Default is [1].

Signature

List[int]

dialect.header_join (property)

Separator to join text of two column's. The default value is " " and other values could be ":", "-" etc.

Signature

str

dialect.header_case (property)

If set to false, it does case insensitive matching of header. The default value is True.

Signature

bool

dialect.comment_char (property)

Specifies char used to comment the rows. The default value is None. For example: "#".

Signature

Optional[str]

dialect.comment_rows (property)

A list of rows to ignore. For example: [1, 2]

Signature

List[int]

dialect.skip_blank_rows (property)

Ignores rows if they are completely blank

Signature

bool

dialect.controls (property)

A list of controls which defines different aspects of reading data.

Signature

List[Control]

dialect.add_control (method)

Add new control to the schema

Signature

(control: Control) -> None

Parameters

  • control (Control)

Dialect.describe (method) (static)

Describe the given source as a dialect

Signature

(source: Optional[Any] = None, **options)

Parameters

  • source (Optional[Any]): data source
  • options

dialect.get_control (method)

Get control by type

Signature

(type: str) -> Control

Parameters

  • type (str)

dialect.has_control (method)

Check if control is present

Signature

(type: str)

Parameters

  • type (str)

dialect.set_control (method)

Set control by type

Signature

(control: Control) -> Optional[Control]

Parameters

  • control (Control)

Control (class)

Control representation. This class is the base class for all the control classes that are used to set the states of various different components.

Signature

(*, title: Optional[str] = None, description: Optional[str] = None) -> None

Parameters

  • title (Optional[str])
  • description (Optional[str])

control.type (property)

Type of the control. It could be a zenodo plugin control, csv control etc. For example: "csv", "zenodo" etc

Signature

ClassVar[str]

control.title (property)

A human-oriented title for the control.

Signature

Optional[str]

control.description (property)

A brief description of the control.

Signature

Optional[str]