Thanks for the detailed reply.
In the RDBMS world, SQL joins are very slow for big data, and sometimes the preferred way is to use denormalization to avoid expensive joins. I was wondering how columnar structure deals with big joins.
I am looking forward to reading about hashing-based indexes. It may answer my question regarding optimization of join queries :)
Agshin Guliyev
engineering
Thanks, Taras, for the article.
I have few questions:
How does indexing work in columnar databases? What if I need to query 'Hire_date' info for Bob and Jim in a huge table. In RDBMS, I could use column index(B tree etc.) to fetch data very quickly.
You mentioned "array of structs vs. a struct of arrays" for column-based data. IIUC customer has to do proper mapping between columns on the client-side. If you use incorrect indexes to map columns for a specific row, it will introduce hidden bugs that are very difficult to debug. I wonder what is the best practice to fetch column-based data.
How are join queries performed? What would be the joining criteria?
Thanks!