Hey, first post, so please be gentle! ;-)
Just a general comp sci question. Say for example I have built a computer (just the physical wires etc) - what enables this electrical maze to then operate machine code? How does the computer know that or example 01011111 (or whatever) means fetch data from 01000111 (or whatever)
Am in right in thinking that assembly gets converted to machine code, and then the cpu follows this code?
Cheers everyone!
Well, have you read Claude Shannon's seminal works on Information Theory and Mathematical Theory of Communication.
Now this is not a coffee table book , but it was Claude Shannon who first propounded the concept of information logic.
Here is a video that is a good introduction to the subject.
Understanding the dance of transistors that ultimately result in the practical applications in Information Theory is crucial for the next generation of aspiring geeks, working at the bleeding edge of technological transformation - read Quantum.
Some great responses already. I would also highly recommend 'Code' by Charles Petzold and after that the Nand2Tetris companion book where you actually build a computer from gates up (on the computer using a Hardware Description Language). Building a computer yourself will probably be the best way to really know what is going on, especially after reading 'Code' and the other answers/sources here.
@RichardGL If you really want to explore this more go purchase a Gigatron TTL computer kit. Transistors and capacitors make up the gates and registers and no primary microprocessor so you really understand how the circuit works. I helped built one of these with my nephew for a school project and it was a blast. Check it out... https://gigatron.io/ Also a video from a guy who built one. https://www.youtube.com/watch?v=_2uXqTi42LI @cutcodedown what a great description of how a processor works at the simplest level. Well done!
Rich Glascodine
Self Taught Hacker who loves the details
Jason Knight
The less code you use, the less there is to break
Sad part is 30 years ago what you're asking about would have been the first two chapters of every major book on the subject of computers. Nowadays most of the people programming them don't even understand it. Lemme try to give you a short... uhm, well... shorter version of it starting from the bottom up.
Deep down internally all modern computers are just endless sets of miniaturized transistors and capacitors. A capacitor's role is easy, it can either have a charge storing a 1, or be discharged storing a zero. Transistors on the other hand are nothing more than switches... there's a common pin, and two other pins, control and signal. You run power across common and control, it either opens the path between signal and common, or does the reverse closing the connection. (this is a gross oversimplification that would piss off anyone who knows electronics, but I'm trying to keep this simple)
If you put a bunch of transistors together in the correct pattern you can create "or", "and", "xor" and "not" gates, you put two signals in and their binary relationship comes out the other side. You can also route signal pathways so that the switch sends it to a set of parallel lines that are "offset" by one resulting in the binary operations we know as shift and rotate. Put them in an even more complex order and you can create a simple subtraction function, then to create addition you just "not" the value you want to add and then feed it to subtraction. (sounds weird, but two's compliment math is... well... weird if you're not used to dealing with it.)
... and that's ALL the processor in your computer really amounts to, a slew of transistors performing all those operations based on commands sent to them in binary, performed on data in binary. That binary code regardless of bit-width is what we call machine language. It is the native language of the hardware.
Depending on the architecture commands could be 4 bit, 8 bit, 16 bit, 32 bit, 64 bit, or even larger in length. Many CISC (complex instruction set computer, the x86 and AMD64/EM64t/x64 are CISC) systems the bit-width of the commands varies in length so as to reduce memory bus usage letting you squeeze more code into memory, whilst also creating internal commands to do a lot of the heavy lifting for you.
On the other side of the coin you have RISC (reduced instruction set computer, ARM and MIPS being examples of this design philosophy) systems where the commands are typically all a uniform size as is the data. This often allows for a more 'organized' silicon that uses less power and simplifies creating assemblers, but does so at a cost of performance, memory footprint, and makes it harder to code for directly -- you end up brute forcing with multiple 4 or 8 byte commands things that are single byte commands on x86.
So where does assembly play into this? As I said native machine language is binary -- and humans... get confused looking at binary really quickly. Whilst most programmers state their binary code in hexadecimal so they can use one character to represent every 4 bits, even a blind stream of hex is really illegible.
All assembly language amounts to is 'mnemonics' -- short text that humans can understand that directly corresponds to one of those binary commands. An "assembler" is just a program that turns those mnemonics into binary.
mov ax, 0x3509
Is an example of 8086 machine language. MOV literally means move, in intel syntax the target/destination is first, followed by the value/source/what. In this case the target is the "ax" register.
Registers are a bit like variables apart from that fact that there's a fixed number of them, they're built into the CPU, ALL major operations inside the CPU has to be done using registers. On the x86 hardware (32 bit and 16 bit operation) there are 4 "general purpose registers" AX, BX, CX, and DX, 2 pointer registers SP and BP, 2 index registers SI and DI, three segment registers CS, DS, and ES... bottom line there's a limited number so you often end up moving values into the registers from RAM to operate on them, and then write them back to RAM.
In this case the above code just 'moves' the 16 bit hexadecimal value into the AX register. An assembler would look at that code and turn it into this series of bytes (stated in hex since binary sucks)
B8 09 35
B8 is a single byte command on x86 hardware to move an 'immediate' value (a number inline in the code) to AX. The value is backwards because Intel is a 'little endian'. If the target processor were Motorola legacy the command would be a different byte (depending on family) and the value would be the other way around because they're 'big endian'. Endianness is a long and old subject for debate, but for now just know that different processors from different manufacturers don't always agree on if the least significant bit in binary should be first or last.
Which is ALSO why processors from different manufacturers and 'families' have different instruction sets. Code written for modern x64 (AMD/Intel) isn't going to run on PowerPC Cell architecture. Those two aren't compatible with ARM, or MIPS. They have different numbers of registers, different purposes for those registers, and wildly varying sets of instruction codes. Even in the same company there can be different instruction sets and architectures as code written for x64 (the 64 bit standard invented by AMD as an extension to Intel 32 bit) won't run on Intel i64 (the Itanium processors).
Which was fun back when Win XP went 64 bit (which was just a tarted up copy of Server 2k3) and people mistakenly bought the i64 instead of the x64 version for their Intel Core and AMD64 processors.
It is only when by design a processor 'family' is compatible with itself... and that is why high level languages like C and Pascal came into being. They are an intermediate abstraction that can be turned into machine language by the compiler. This often results in code nowhere near as optimized for speed or code size as writing assembler directly, but it makes 'portable' code where instead of having to rewrite every major software package from scratch every time you want to run it on a different processor, you only need to make a new compiler and then ALL the software can be rebuilt for the new target.
Hope that clears it up, or at least gets you started on the path.