validate
if needed.
[email protected]
as a feature preview and request for comments. The implementation is raw and doesn't cover many edge cases.
Indexing resource in Frictionless terms means loading a data table into a database. Let's explore how this feature works in different modes.
pip install frictionless[sql]
This mode is supported for any database that is supported by sqlalchemy
. Under the hood, Frictionless will infer Table Schema and populate the data table as it normally reads data. It means that type errors will be replaced by null
values and in-general it guarantees to finish successfully for any data even very invalid.
frictionless index table.csv --database sqlite:///index/project.db
frictionless extract sqlite:///index/project.db --table table --json
──────────────────────────────────── Index ─────────────────────────────────────
[table] Indexed 3 rows in 0.219 seconds
──────────────────────────────────── Result ────────────────────────────────────
Succesefully indexed 1 tables
{
"project": [
{
"id": 1,
"name": "english"
},
{
"id": 2,
"name": "中国人"
}
]
}
[email protected]+
command to be available.
Fast mode is supported for SQLite and Postgresql databases. It will infer Table Schema using a data sample and index data using COPY
in Potgresql and .import
in SQLite. For big data files this mode will be 10-30x faster than normal indexing but the speed comes with the price -- if there is invalid data the indexing will fail.
frictionless index table.csv --database sqlite:///index/project.db --fast
frictionless extract sqlite:///index/project.db --table table --json
──────────────────────────────────── Index ─────────────────────────────────────
[table] Indexed 30 bytes in 0.362 seconds
──────────────────────────────────── Result ────────────────────────────────────
Succesefully indexed 1 tables
{
"project": [
{
"id": 1,
"name": "english"
},
{
"id": 2,
"name": "中国人"
}
]
}
To ensure that the data will be successfully indexed it's possible to use fallback
option. If the fast indexing fails Frictionless will start over in normal mode and finish the process successfully.
frictionless index table.csv --database sqlite:///index/project.db --name table --fast --fallback
Another option is to provide a path to QSV binary. In this case, initial schema inferring will be done based on the whole data file and will guarantee that the table is valid type-wise:
frictionless index table.csv --database sqlite:///index/project.db --name table --fast --qsv qsv_path