PK The use of EM as recommended by the WCAG (and by extension REM which starts based on the same value) has little to do with what you're using it for.
The point is that the USER, in their OS and/or their browser can set their own default size. Whilst this is being slowly phased out in the defaults in favor of zoom type scaling, many users with accessibility needs are railing against this and reverting to the 'older' method where only the default text size is changed, leaving images alone given how some images (like the abuse of images for text content) go to hell when scaled. Though in the case of image scaling the switch to 2x or higher resolution rasters and/or vector graphics these problems are evaporating too.
But either way EM (and again REM is just a crappy EM for people who can't do math) is supposed to start out as whatever the USER has chosen or the browser default. In MOST cases this is 16px, but windows users pretty much since the IBM 8514 days have had the 'large fonts' setting of 20px available; even before we had vector fonts Windows 286/386/3.x had the option.
Which is why your page is uselessly tiny on both my workstation and media center, where the default is set to 20px and 24px respectively.
This is also why when you set "font-size:10px" on BODY, you basically wholesale told users with accessibility needs to go plowthemselves. Not intentional on your part, but that's what someone who expects a scaleable page is going to think.
Generally px measurements should be restricted to raster images, and even there that's changing as retina and HDX displays change the meaning and operation of how raster images are drawn. The higher PPI (pixels per inch) of new machines vs. old ones blurs the line of what a pixel even means for a developer. More so users are re-embracing this option as 4k and even 8k resolutions are being crammed into smaller and smaller form factors. I've seen people setting it as high as 48px now.
MIXING PX and EM (or REM)? That's just asking for it to break for such users.
Also I generally don't consider REM to be 'ready for prime time' just because it doesn't obey the system metric and/or browser setting like EM does in current builds of Firefox. Hence I prefer EM over the latter since as I said above, REM just really strikes me as a sloppy tool for people who can't handle simple math, or just want to change font sizes willy nilly because "reasons". *aka, without an actual grammatical / structural / legitimate reason.
Just to give you an idea what I mean, let's use some screenshots from one of my older websiites (It's creeping up on a decade now).
This is the site's previous design (not that different from the current one) at the default / VGA / 16px / 100% / 96dpi / Windows Small setting.
cutcodedown.com/images/ewiUSB/ewiUsbComNormal96.j…
Whilst this is the exact same site WITHOUT ZOOMING at the 8514 / 20px / 125% / 120dpi / Windows Large / Windows 7+ Medium / "Pick a honking name already" setting.
cutcodedown.com/images/ewiUSB/ewiUsbComNormal120.…
That is not zoomed, that's the effect of changing JUST the font size. Notice the image elements of the page are the same size.
Now as to mixing and matching, again if you set PX on body you just broke the entire reason the WCAG suggests using EM. As I often joke the WCAG says to use EM, so I say use 'em!
There is a section people misread about how px can be acceptable, but it requires that you have control over the user-agent, something you don't have on WEBSITES!
Likewise setting things like widths, padding, or media queries based on pixels becomes broken trash because your text is a different size. You should want the padding to auto-increase with the font. You should want the max-width of your semi-fluid layout to scale with the larger font. You basically NEED your media query breakpoints to match same!
There's a checklist for building accessible pages.
1) Semantic markup, so the HTML says what things are structurally and grammatically. If you are choosing your HTML based on what it looks like, you screwed up! Hence this is the same as "separation of presentation from content" -- remember HTML is for MORE than just the perfectly sighted user sitting in front of a screen. Which is why for all intents and purposes semantic markup is just a sick euphemism for "using HTML properly!"
Also why I say that 100% of the time you see a STYLE tag and 99% of the time you see a STYLE attribute, it's just developer ignorance and ineptitude.
2) Progressively enhanced so it gracefully degrades. Layer technologies like CSS and JS atop the already working and usable semantics. The application should keep in mind capabilities, which is why if you omit the MEDIA attribute on your stylesheet LINK tag, or if you send media="all", you're doing it all wrong -- since your "screen,projection,tv" media layout is likely not all that useful for print, speech, tty, etc.
3) Elastic design. Aka uses EM so that the layout -- ALL of IT -- scales to the user preference.
4) Semi-fluid design. Instead of screwing around with fixed widths let flow do it's job, setting only a max-width so that long lines aren't hard to follow.
5) Responsive. The new kid, by now I probably don't need to explain what that is to you.
... and really only once ALL of that is done would I even CONSIDER adding JavaScript to a page, and even then I make as much as possible work FIRST without the scripting and then use JS to enhance the experience. This is where CSR (client side rendering) is nothing more than flipping the bird at large swaths of users. The millions of downloads of browser extensions like ghostery, scriptsafe, etc, being all the proof you need of that.
Which is why whatever scripting you have going on there? LOSE IT!
But back to the subject of fonts, to do #3 you're using EM or nothing site-wide. (You can sneak in a few px borders or shadows, but care must be taken). If you do that elastic design step it means that the semi-fluid max-width has to scale with the fonts, and your media query max/min-widths ALSO need to be in EM.
Though this is all old-school for me; I was doing responsive design with JS enhancement of semi-fluid elastic layout before Yahoo stole the concept and called it "McSwitchy". LONG before media queries were a twinkle in the WhatWG's eye. I should have patented that s.... Of course that my implementation was typically 10 lines of JS that did a class swap on BODY, while "mcSwitchy" was hundreds of lines of dicking around rewriting the CSS from the scripting? Well there's a reason switching to media queries on my existing sites took five minutes.