Idea in general sounds good, but I don't think for Billion data it process any quicker. What I would suggest is, when you upload files, you store the document in MongoDB. While storing itself, you can split files as chunk and put them in MongoDB as that will make processing much faster as it is easy to read multiple small files than reading one large file!