Hi guys. I have a project (startup) that could and up with petabytes of data in long term if it will be successful. So I am concerning here about technology that I should use to leverage the basics.
My concerns are the costs (machines, os, licence price) because of big data.
I would prefer the .NET stack with SQL server but the license prices are a bit huge (or not ?). I am open for the LAMP or MEAN too.
In house knowledge is not a problem at this moment. Except I don't have any experience with the costs and prices in a BIG DATA sense. Here is where I need your help. Thanks.
Thank you for all of you for your help :) These answers helped me a lot.
let s = "small";
let m = "very much";
let cs = "client side";
let ss = "server side";
Startup = s.Team & m.todo;
.Net = cs != ss & m.$ & !trend;
LAMP = cs != ss & !m.$ & !trend;
MEAN = cs == ss & !m.$ & !trend;
MERN = cs == ss & !m.$ & trend; /* #node is using */
Choose your side....
Your question is a little hard to answer because it leaves so much to the imagination, so allow me to state it as I understand it:
"I'm starting a new project with lots of growth potential. I don't know exactly what we'r building yet so I need to make flexible choices. What do you recommend?"
This is an extremely common problem. The simple answer is "It depends on what you'r building", but thats often an unknown until after you get started. This makes it really important to ask the question you'r asking, but the actual answer is surprisingly simple.
Chose what you know.
Any tech stack you know today will have support for the tools you need for growth, when the time comes.
PHP scales, Ruby scales, Python scales, .NET scales. Its not about the stack you chose, its knowing how to architect for the scale you need.
Really big platforms run on multiple data stores, multiple stacks, multiple services and multiple projects. But you'r not writing a Big platform. And you don't have Big Data ™ yet. You have an idea, and you need to get it running. You need speed of development and nothing else. The only thing that gives you that is what you are already comfortable in.
Someone wrote a really smart thing about how to write apps that have the flexibility you'r looking for : http://12factor.net . Open that link, bookmark it, then come back.
That link is awesome but here are a couple of off the cuff pointers, to get you off the ground:
STAY AWAY FROM WRITING TO DISK. Writing to disk ties your app to the local system's filesystem. You want to be able to run many instances of your app behind a load balancer.
Use a CRUD framework. Ideally one that makes it easy to model your data and start playing with it very quickly. ( Django (python) , Ruby on Rails (ruby), Laravel (php), .NET MVC (.net) Play Framework (java), Nodal (javascript), Meteor (javascript) ).
The database you chose doesn't matter, until it starts to become a problem. This will never happen before you have actual users generating actual traffic and ... hopefully, actual revenue.
Its not essential, but keep an eye out for ease of building a REST api. Your framework of choice should have your back on this too. (Django Rest Framework is a pretty good example). All the frameworks listed will help you in some way or another. You will need this when you realise your stakeholders want a mobile app.
You may be thinking "Thats not exactly sticking to what I know, you just gave me a laundry list of stuff to go learn!". You'r right. I did. Thats because you'll need to learn each of these to grow your platform, no matter what language / stack you are coming from.
Knowing these classes of tools, even just how they work and the problems they solve, is what will get you through all the stages of growth of your company.
I sincerely hope that is helpful, and I hope you have a great time with your new startup :)
Best of Luck!
PS: I chose to ignore your point about costs, because nobody can have any type of idea what those will be until you yourself know more about what you'r building and storing. Its a non issue until you can actually articulate the use case to yourself, and at that point you'll know how to do the math yourself.
I know a startup that is struggling to replace SQL Server with MySQL. They are making this switch because the license costs became prohibitively expensive once they started scaling. It was not a problem in the early stages when the traffic was not that high. I think that's because the license cost is connected to the number of cores in the machine you are running it on.
Completely agree with @JanVladimirMostert on starting small and the need to re-architect sometime down the line.
Regarding ASP .NET, well the framework is free but Visual Studio is what will cost you some money. Depending on what position you're in you might have some options. If you're a startup or a new company you can apply for BizSpark. That'll get you free tools and Azure for 2 or 3 years. Microsoft will essential support you as you build out your product with their tools, but after that you'll have to start paying for them. If you don't mind being on the bleeding edge, there is an open source version of ASP.NET that is pretty fast and runs cross-platform (linux, OSX, windows). Bundle that with Visual Studio Code, their cross platform editor, not IDE, then you can start building your solution.
Regarding performance of the platform, you can check out the benchmarks repo of their Github and you'll seem some fairly impressive perf stats in comparison to some other frameworks. At the time I'm writing this though, the framework is still beta/rc quality so it's a work in progress. However, for an example of some folks doing real work with the framework, check out age of accent. They use the open source version of ASP .NET to power their multiplayer game. Check out their Dev Blog too to get some more interesting stats.
I'm not a SQL Server expert, so I won't comment on that. However, you can use any database you want with .NET and not just SQL Server.
SQL Server is not the right tool for doing petabytes of data unless you plan to do some serious complex sharding. Terabytes it can still do, but even then you need some serious tech / skills and infrastructure behind it.
With regards to programming stack ...
PHP, you'll need to seriously consider what you use to run the PHP scripts, Apache usually won't cut it, PHP FPM performs a lot better, but overall it's not that performant at scale, unless you go HHVM. Compiled languages will run circles around PHP, so PHP might push up your hardware cost significantly under heavy load unless you know what you're doing.
Depending on what you plan to do, NodeJS should do fine depending on your architecture; you'll eventually hit bottlenecks and probably won't utilize the hardware to its fullest, but by that time you can throw in more hardware.
ASP.NET I have limited experience with, so can't comment on that.
With regards to your startup, start small and work with a few clients before opening the floodgates; many startups I've worked with claimed they will do petabytes of data in their first few years, the biggest I've seen from those startups that claimed they were going to go into the petabyte scale very quickly hardly made it past the 200GB mark and by that time the business requirements changed which meant re-architecting a lot of things in any case.
Jan, I don't disagree, but I don't see it a big issue. Like with the data size, using a PaaS defers some problems to a later date, buying time to worry about building the actual platform and not its infrastructure.
Botond Bertalan
Software Information Technologist
Thank you for all of you for your help :) These answers helped me a lot.