Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Scanning through a CSV can be quite close to querying a SQL database in performance when the SQL database doesn't have any indices. The primary benefits of using a SQL database for querying are (1) indices and (2) a declarative query language. Using DuckDB or SQLite's CSV/JSON support gets you the best of both worlds (minus indices), where you get the declarative query language and query planner but your data's still just CSV/JSON files.

For a dataset that size, I'd probably use SQLite to avoid having to manage a persistent MySQL process, especially when it's being used as an alternative to CSV files. That is, unless there's a MySQL/Postgres server already running I can just create a new database on.



> Using DuckDB or SQLite's CSV/JSON support gets you the best of both worlds (minus indices)

DuckDB automatically creates indexes for all general-purpose columns. However, they're not persisted.

https://duckdb.org/docs/sql/indexes.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: