I think we must go over some clarifying questions to begin, like: How are the machines connected to our machine (the one with the file)? Do they have access to Internet or are they just within the same network? Do we have any sort of access to the 10k machines? I do not remember the exact details of the talk we had back then, but it was a very knowledgable conversation. The interviewer was a senior at LinkedIn and had worked in the industry for a long time and he did tell me that we need to clarify these questions first. Peer-to-peer was one of the approaches that I suggested too. I later found out that there are distributed file systems too and they can be used in such a scenario, though I do not understand how that would work exactly. You can read about them here - https://www.unf.edu/~sahuja/cis6302/filesystems.html. Some of the approaches I think that can be used: Peer-to-peer transfer, more like how the tor network deals with transfers. Providing access to the file by uploading it to something like an S3 bucket and running a script that can loop over the machines and issue commands remotely to download the file. This would leave the whole tracking and network failure part to the S3 library. Assumption: Machines are able to connect to AWS S3 infra. We can divide the file into chunks, and then transfer these chunks to the machines, also appending a checksum for each chunk which can be verified at the client machine and an acknowledgment can be sent back to the server which will keep track of any failures and try sending the chunk some more times. If the machine, is not able to receive the chunks, then that machine can be flagged and tried later or a report can be generated for such machines. This would require establishing a protocol and writing some code. What are your views?