Well, 20 images at 4.2 megabytes is pretty heavy. Are those thumbnails? If so I'd try to srcset out them down smaller and/or play with the formats and encoding more. Assuming that's all thumbnails (the only way I'd allow that many images on a single page at once) you should really be at a megabyte or less. What formats are you using? What type of images are they? Are you sure the format type matches the image content? Can colour depth reduction or heavier encoding be leveraged?
THOUGH, if you have 3Mb of images, and 4mb total of code, I would be as worried about the code than the images. If you've actually got 1.2 megabytes of JS, HTML, and CSS, the bottlenecks are more likely to be there than from the images.
How many separate files total are there? That's is a huge bottleneck since every file past the first 8 averages 200ms and can reach a second or more apiece in "handshaking" overhead. That's why combining down your scripts and styles to single files can have huge payoffs in page-load time. More so if you're doing this via HTTPS given it's increased overhead.
Is there any way you could use the inaccurately named "CSS Sprite" technique to reduce the overall number of images? Could some of the images be reduced to monochrome vectors and stored in a webfont as a single smaller (and more scaleable) file?
Are you storing those images as static in one form or another so they can be served by your http server and not your back-end language? That's pretty much a must-have.
How many elements are on the page? How big is the markup? Excessive / unnecessary markup can not only delay the render, it can cause slowdowns in your scripting and make the server work harder. This is one of the many reasons I dislike (actually, more like rabidly hate) frameworks as they tend to just slop in endless pointless DIV and classes for no good reason.
What's the scripting breakdown? How much of it is off-site stuff like social media or discussion services, and how much of it is on-site template stuff?
I mean, 1.2 megs of code -- unless you've got something like disqus being loaded -- is pretty heavy. Painfully so in fact.
That you say "SSR for first load" is kind of a warning sign too, as it implies later loads are CSR only with no fallbacks? Calls into question your semantics and possibly even your mechanism of action. There's a reason I consider methodologies like those used by React and Angular to be trash that is more likely to get a developer into trouble than it is a good way of building websites.
Is there any way you could lazy-load some of this on-demand? I'm not a fan of the method, but there are times where it's the most viable option. (though if so, it should be done as an enhancement with fallback)
I'd have to see a sample of the markup and probably the content too, but dimes to dollars you've probably got two to ten times the HTML needed, ten times the CSS needed, and 50 times the JavaScript needed. That's a wild (and partly unfounded) guess, but typical of what I've been seeing people do with all the hot and trendy framework nonsense that's so popular right now. There ARE reasons to get things as big as you're saying (such as social media and discussion plugins), but they're not usually something talked about at the stage of development you seem to be at.
But that really hinges on what the site is actually doing.
I'm not fully clear on what it is you're trying to do, but I suspect you're overthinking this.
As a rule you shouldn't be generating a whole lot of images 'on the fly' server side... nor should you worry that much about making a specific image for each and every perfect size.
I generally only make two sizes, one for legacy browsers / low resolution, and one for HDR/retina -- at least if we're talking static content images on the page.
Use IMG with SRC for the legacy one, SRCSET for the HDR... and stop worrying about it.
Though it almost sounds like you're in a thumbnail situation, in which case you SHOULD have separate thumbnails from the master. Again I'd really only make a 1x and 2x of each image type, in addition to the master-on-view full size original.
... and sometimes 2x is all you really need; but it depends on the usage scenario.
Whilst if you're talking template images, aka presentation, that really should be none of the server's business and should be controlled from the stylesheet.
Basic question you should always ask: Is this image content, or part of the layout?
If it's content, it goes in a IMG tag.
if it's layout/presentation, it goes in the external stylesheet.
It's rarely a good idea to cross those lines, particularly as the handful of cases where using IMG was "the only way" have gone the way of the dodo thanks to background-size:cover; and its ilk.
I'd really need to see the content and what you're trying to do with it to say much more -- most of the above is just me guessing wildly.