Thanks for the detailed reply. In the RDBMS world, SQL joins are very slow for big data, and sometimes the preferred way is to use denormalization to avoid expensive joins. I was wondering how columnar structure deals with big joins. I am looking forward to reading about hashing-based indexes. It may answer my question regarding optimization of join queries :)