Edit page in Livemark
(2024-11-07 15:17)

Index

Indexing resource in Frictionless terms means loading a data table into a database. Let's explore how this feature works in different modes.

Installation

pip install frictionless[sql]

Normal Mode

This mode is supported for any database that is supported by sqlalchemy. Under the hood, Frictionless will infer Table Schema and populate the data table as it normally reads data. It means that type errors will be replaced by null values and in-general it guarantees to finish successfully for any data even very invalid.

frictionless index table.csv --database sqlite:///index/project.db
frictionless extract sqlite:///index/project.db --table table --json
──────────────────────────────────── Index ─────────────────────────────────────

[table] Indexed 3 rows in 0.21 seconds
──────────────────────────────────── Result ────────────────────────────────────
Succesefully indexed 1 tables
{
  "project": [
    {
      "id": 1,
      "name": "english"
    },
    {
      "id": 2,
      "name": "中国人"
    }
  ]
}

Fast Mode

Fast mode is supported for SQLite and Postgresql databases. It will infer Table Schema using a data sample and index data using COPY in Potgresql and .import in SQLite. For big data files this mode will be 10-30x faster than normal indexing but the speed comes with the price -- if there is invalid data the indexing will fail.

frictionless index table.csv --database sqlite:///index/project.db --fast
frictionless extract sqlite:///index/project.db --table table --json
──────────────────────────────────── Index ─────────────────────────────────────

[table] Indexed 30 bytes in 0.368 seconds
──────────────────────────────────── Result ────────────────────────────────────
Succesefully indexed 1 tables
{
  "project": [
    {
      "id": 1,
      "name": "english"
    },
    {
      "id": 2,
      "name": "中国人"
    }
  ]
}

Solution 1: Fallback

To ensure that the data will be successfully indexed it's possible to use fallback option. If the fast indexing fails Frictionless will start over in normal mode and finish the process successfully.

frictionless index table.csv --database sqlite:///index/project.db --name table --fast --fallback

Solution 2: QSV

Another option is to provide a path to QSV binary. In this case, initial schema inferring will be done based on the whole data file and will guarantee that the table is valid type-wise:

frictionless index table.csv --database sqlite:///index/project.db --name table --fast --qsv qsv_path