Use case: I have a table storing file information such as path, size etc. I need to read this table and grab the file path. Read those CSV files in parallel and ingest in MySQL in parallel.
My design:
I somehow designed this.
I need your suggestion and inputs to correct me if I am doing it wrong!
Thanks in advance.
Idea in general sounds good, but I don't think for Billion data it process any quicker. What I would suggest is, when you upload files, you store the document in MongoDB. While storing itself, you can split files as chunk and put them in MongoDB as that will make processing much faster as it is easy to read multiple small files than reading one large file!
Rajkumar
Fullstack Developer.
Jay Gandhi
Knows JS, Python & Ruby. Learning Go.
Use any messaging queue. One script can publish on Messaging queue topic and second one subscribe to it.
put your logic to read csv in first script and storing to MySQL to second one. You can Scale Up or scale down any one of them as per your requirement.
I've used this for injesting 75Million rows to ElasticSearch.