Tobias Müller the read_csv of DuckDB does not fix the column order, right? Hence if the column of agency.txt was agency_name, agency_id this would fail, similarly when an extra column is part of the source GTFS file not specified in the table.
For us GTFS is one of the file formats that we handle (import and export), but since within Europe NeTEx is the standard for data exchange between the national access points it makes sense to be able to process NeTEx. I have done so in different forms within DuckDB. Creating a relational database based on XML Schema. Using DuckDB as 'advanced' key value store with extra attributes per key in various incarnations. For now my conclusion is that DuckDB has a set of significant and known issues when going beyond main memory. github.com/duckdb/duckdb/issues
Stefan de Konink
Steun elkaar, kopieer Nederlands waar!
The problem by doing this in this way is that the order of fields is or may be different between datasets. We have created some Python glue that takes care of that. You can find our github repo at MMTIS/badger the file is gtfs_import_to_db.py
While we initially did all our processing in DuckDB, the issues found and reported for processing huge non-csv files, made us look at alternatives. Still for CSV processing DuckDB makes sense.