The biggest difference is that the ACTUAL processing unit has no control over the cache, and software written for the processor is completely unaware of its existence.
Cache is just a buffer. One you have no control over. Assuming you mean CPU cache that's typically just static ram (SRAM) sitting between the slower dynamic RAM (DRAM) and the CPU. Though these days there's also a memory controller involved.
Even that high speed SRAM operates at a fraction the speed of registers, which is what ALL code ACTUALLY has to do data processing on. Registers are what the majority of assembly language operates on/with at times many, MANY times faster than even the cache can operate, much less system memory. It's expensive to create, has to be hard-wired into every operation, and that's why most early processors limited you to six to eight registers, and even modern 64 bit implementations allow only 16 to 32 registers. They are hard-wired into the PROCESSING part of the CPU.
In that way, even though CPU cache is on-die these days, they are NOT actually part of the PROCESSING UNIT! Registers are.
Some registers are even optimized or integral to certain more effective operations. in x86 machine language for example AX (the accumulator) and it's kin (EAX, AL) are faster for certain bitwise and integer math operations. CX (the counter) is specialized for the sole purpose of looping.
It's a bit confusing these days, but MOST of what's sucking up the die space on a modern CPU isn't actually the CPU, it's stuff that USED to go on the motherboard separate from the processor. External SRAM cache, peripheral control -- A modern Intel processor has more in common with "system on a chip" than they do their predecessors from the pentium I / earlier days.
THOUGH -- even the original 8086/8088 processors DID have an on-board cache, handled by the BIU -- bus interface unit. On the 8088 (original PC) the BIU had a 4 byte cache, what today we'd probably call a Level 1 cache. The BIU operated in parallel to the EU -- execution unit; what traditionally is the ACTUAL processor.
Because RAM speeds were so low and due to the nature of bus operations it usually took four clock cycles to read or write ONE BYTE from memory. (though some systems like the PCJr had extra waits involved making them even slower). Thing was many CPU operations took more than one clock cycle, so the BIU would fetch up to a whole whopping 4 bytes ahead of time from the BUS while the EU was running the code. Operations like jump, call, or ret would invalidate that cache meaning a flush and start over, but on the whole it let the processor operate at roughly three quarters the rated clock speed per instruction, instead of one-quarter.
But even there, what the BIU was doing was invisible to the EU, and nothing outside the EU had direct access to the dozen or so internal registers. (AX, BX, CX, DX, BP, IP, SP, DI, SI, CS, DS, ES, and Flags). As part of the EU those registers could be operated on at many MANY times higher a speed than the BIU, any board caching, or system memory could. That's why in really tight loops the mantra was "use as MANY registers as possible" even if it meant more code setting up or preserving values outside the loop.