Nodejs CSV Ingest to MySQL - Billion rows use case

Use case: I have a table storing file information such as path, size etc. I need to read this table and grab the file path. Read those CSV files in parallel and ingest in MySQL in parallel.

My design:

I somehow designed this.

Cron to read the file information table in every 30 minutes.
I read those files in parallel and start a stream to read those files.
Push each stream messages i.e file content in the Message queue to say RabbitMQ.
Attach multiple listeners say 4 at the other end of the queue and fetch 100 messages at once i.e 400 messages at once.
Perform the parallel MySQL query insertion and update the tables accordingly.

I need your suggestion and inputs to correct me if I am doing it wrong!

Thanks in advance.

#nodejs #mysql #csv #rabbitmq

Responses(2)

Jay Gandhi

Knows JS, Python & Ruby. Learning Go.

Oct 9, 2017

Use any messaging queue. One script can publish on Messaging queue topic and second one subscribe to it.

put your logic to read csv in first script and storing to MySQL to second one. You can Scale Up or scale down any one of them as per your requirement.

I've used this for injesting 75Million rows to ElasticSearch.

Idea in general sounds good, but I don't think for Billion data it process any quicker. What I would suggest is, when you upload files, you store the document in MongoDB. While storing itself, you can split files as chunk and put them in MongoDB as that will make processing much faster as it is easy to read multiple small files than reading one large file!

Thread

Nodejs CSV Ingest to MySQL - Billion rows use case

Responses(2)

Recent threads

Search Hashnode

Nodejs CSV Ingest to MySQL - Billion rows use case

Responses(2)

Recent threads