The most important thing you can do is write good clean simple easy to understand code that cleanly solves the problem.
Related you need to understand the problem you are solving, and write code that reflects that understanding.
Getting the last 2% out of the code is a very late in the game process, and often never happens.
The primary reason that efforts are made to optimize code are that the code is bad to begin with.
Further any actual need for optimization of well written code is increasingly non-existant. 40 years ago, it might be necescary to get another 5% out of something to actually be able to make it work on a 2mhz z80,
If you have code that is performance constrained on a quad core 3Ghz 64 bit processor - either you are trying to manage plasma containment in a fusion reactor, or you do not know how to code.