VERY complex question, though I often say it's a mix of experience and common sense. It does vary from programming language to programming language but on the whole once you learn basic syntax and methodologies it SHOULD be fairly apparent.
Sadly "should" is the correct word as some people just never see it.
It's often not as simple as saying "that's more code than it needs to be" as "more code" can often be more efficient. Assembly language programmers deal with this all the time. Take something as "simple" as calculating the memory offset of a scanline (y) in a 320x200 8 bit (1 byte per pixel) colour mode. The easy answer and the smallest code is to just offset = y * 320.
MOV AX, [ BP + 6 ] ; Y
MOV BX, 320
MUL BX ; beware DX is corrupted
That's 8 bytes of code that on an 8088 takes around (depending on BIU contents and value of Y at start) 140 clock cycles to execute... most would say it isn't bloated, some would say it is. Why? Because at the Machine language level 140 clock cycles is an eternity for three operations.
But because the input Y is going to be 0..199, we can leverage shifts and an exchange (assuming all the upper bits of Y are 0) to do it this way instead of multiply.
MOV AX, [ BP + 6 ] ; Y
XCHG AH, AL ; trading low for high byte is same as * 256
MOV BX, AX ; save * 256 in result
SHR AX, 1 ; same as divided by 2, 8088 can't do SHL imm8 by more than 1.
SHR AX, 1 ; again divide by two, AX is now Y * 64
ADD AX, BX ; Y * 256 + Y * 64 == Y * 320
You might think "well that's bloated" -- and sure, it's 13 bytes of code nearly double... but get this, the above will execute in roughly 30 clock cycles. It's more than four times faster!
ANOTHER technique would be to simply use a lookup table and assign AX off of that.
MOVBX, [ BP + 6 ] ; yMOVAX, [lookupTable + BX]
Looks small right, but it requires a 400 byte lookup table so it's actually going to be 410 bytes of code. (I'm ballparking here).. but at runtime the code actually run when we need to figure out that offset it's going to take around 24 clocks.
Of the above three techniques, which one is "bloated"? All of them? None of them? It's just not that simple, it depends on if you need to save memory, or need it to run as fast as possible, or need a balance twixt the two.
Sometimes code 'bloat' is just a failure on the part of the developer to THINK about their logic flow. You'll often see greenhorns and greybeards alike vomit up code like this:
if (start && start < 0) {
return length - start;
} elseif (start && start >= 0) {
return start;
} else {
return0; // just in case start isnullor void
}
That is bloated, but do you know WHY?
First off it checks the same condition -- loose true -- on "start" TWICE. One more if statement would run faster so you don't end up checking more than once. Next >= 0 being the opposite of < 0 is a pointlessly redundant check -- you don't even need to run that comparison. Finally both of the first two conditions RETURN, so there is NO reason to 'else'.
A ternary operator could simplify the code.. I would write the same thing thus:
Less code, less logic, simpler, faster, easier. STILL you'll see some people actually advocate the former claiming it's "easier to follow" (I don't find it such, YMMV) yet others never even think "can I do this in less code".
Something you'll see ALL the time for code bloat is "variables for nothing" -- people making copies of existing variables because they can... or using variables where you don't need a variable. Take something as 'simple' as a user login check. You'll often see (even in examples by alleged experts) nonsense like this:
$username = $_POST['username'];
$password = $_POST['password'];
$query = "
SELECT id
FROM users
WHERE username = '" . $password . "'
AND password = '" . md5($password) . "'";
$stmt = $db->query($query);
That is a LAUNDRY list of how not to write PHP from both a security and a bloat standpoint. There is NO reason to copy those POST values to variables if you're not cleaning them, there is NO reason to be creating that query string as a variable, and of course this is 2018 not 2003, STOP slopping variables into query strings! Do I even have to mention the stupidity of still using MD5?
Modernized and removing the bloat:
$stmt = $db->prepare('
SELECT id
FROM USERS
WHERE username = ?
AND password = ?
');
$stmt->execute([
$_POST['username'],
hash('sha256', $_POST['password'])
]);
ZERO extra variables needed other than the anonymous array being passed on the stack. If you byte count the CODE it will appear to be about the same, but the latter uses less memory and will run faster because it's not doing "things for nothing" like creating variables you don't need and wasting time copying values into them to just copy them AGAIN somewhere else.
NOTE, this is not a time to argue the merits and flaws of hash() vs. password_verify()
HTML and CSS are a GREAT place to easily learn to recognize bloat -- stuff in the HTML that could be in an existing external file -- like the stupid STYLE tag and style="" attribute slopped all over the place. EVEN when it's the same amount of code regardless of where you place it, it can be considered 'bloat' in that in the document itself it isn't cached when the page is revisited on a content update. You put it in the external stylesheet it is.
Presentational classes like "W3-red" or "text-center" or "xm-col-5" being a great example of this; saying in the HTML what you want things to look like and creating pointless classes for every single minuscule property assignment results in fat bloated HTML, and in the process flips the bird at why HTML and CSS are even separate languages in the first bloody place.
Yet that's the cornerstone of how every single one of these idiotic dumbass front-end "frameworks" even functions. ALL because the people creating and using them don't understand enough about HTML or CSS to even be using either! AT BEST all it does is shuffle code from the CSS where it belongs into the HTML where it can lead to missed caching opportunities -- at WORST it results in larger than need be codebases that are harder to manage.
... and EVEN when it MIGHT be less code (It usually is not) it can still be bloated if it consumes more bandwidth than needed. A missed caching opportunity increases your bandwidth consumption -- typically if you revisit a page the content (the important part) will have updated but the template/skin/layout will not. If you visit multiple pages on the same site the content is different, but again the overall appearance is unchanged. This means that the more you can move into the CSS and out of the markup, the more CSS being cached can save you in bandwidth.
So again we have a scenario where by trying to save effort in one place (CSS) you end up making more work someplace else (HTML) that in fact leads to bloat in bandwidth use REGARDLESS of how large the total code size is.
That's why I'm always railing against developers who see nothing wrong with vomiting up 60k or more HTML to do 10k's huffing job... the hallmark of numbskullery like bootcrap, w3.css, yui, and every other dipshit front-end framework. Well, on top of how that methodology with a lack of focus on semantic markup leads to specificity hell and tells users with accessibility needs and search engines to go plow themselves.
Basically bloat is a bit like what Potter Stewart said about Pornography in regards to the Supreme Court ruling on the movie "The Lovers".
I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description, and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.
It comes in all shapes and sizes, and is just not as plain, ordinary, or obvious as "Wow that's a lot of code".
When you find something is unnecessarily introduced or what they call it as overkill and you know it could've been done in an easy way. Usually the culprits are Design patterns applied in the wrong place.
Jason Knight
The less code you use, the less there is to break
VERY complex question, though I often say it's a mix of experience and common sense. It does vary from programming language to programming language but on the whole once you learn basic syntax and methodologies it SHOULD be fairly apparent.
Sadly "should" is the correct word as some people just never see it.
It's often not as simple as saying "that's more code than it needs to be" as "more code" can often be more efficient. Assembly language programmers deal with this all the time. Take something as "simple" as calculating the memory offset of a scanline (y) in a 320x200 8 bit (1 byte per pixel) colour mode. The easy answer and the smallest code is to just offset = y * 320.
MOV AX, [ BP + 6 ] ; Y MOV BX, 320 MUL BX ; beware DX is corruptedThat's 8 bytes of code that on an 8088 takes around (depending on BIU contents and value of Y at start) 140 clock cycles to execute... most would say it isn't bloated, some would say it is. Why? Because at the Machine language level 140 clock cycles is an eternity for three operations.
But because the input Y is going to be 0..199, we can leverage shifts and an exchange (assuming all the upper bits of Y are 0) to do it this way instead of multiply.
MOV AX, [ BP + 6 ] ; Y XCHG AH, AL ; trading low for high byte is same as * 256 MOV BX, AX ; save * 256 in result SHR AX, 1 ; same as divided by 2, 8088 can't do SHL imm8 by more than 1. SHR AX, 1 ; again divide by two, AX is now Y * 64 ADD AX, BX ; Y * 256 + Y * 64 == Y * 320You might think "well that's bloated" -- and sure, it's 13 bytes of code nearly double... but get this, the above will execute in roughly 30 clock cycles. It's more than four times faster!
ANOTHER technique would be to simply use a lookup table and assign AX off of that.
MOV BX, [ BP + 6 ] ; y MOV AX, [lookupTable + BX]Looks small right, but it requires a 400 byte lookup table so it's actually going to be 410 bytes of code. (I'm ballparking here).. but at runtime the code actually run when we need to figure out that offset it's going to take around 24 clocks.
Of the above three techniques, which one is "bloated"? All of them? None of them? It's just not that simple, it depends on if you need to save memory, or need it to run as fast as possible, or need a balance twixt the two.
Sometimes code 'bloat' is just a failure on the part of the developer to THINK about their logic flow. You'll often see greenhorns and greybeards alike vomit up code like this:
if (start && start < 0) { return length - start; } else if (start && start >= 0) { return start; } else { return 0; // just in case start is null or void }That is bloated, but do you know WHY?
First off it checks the same condition -- loose true -- on "start" TWICE. One more if statement would run faster so you don't end up checking more than once. Next >= 0 being the opposite of < 0 is a pointlessly redundant check -- you don't even need to run that comparison. Finally both of the first two conditions RETURN, so there is NO reason to 'else'.
A ternary operator could simplify the code.. I would write the same thing thus:
return start ? ( start < 0 ? length - start : start) : 0;Less code, less logic, simpler, faster, easier. STILL you'll see some people actually advocate the former claiming it's "easier to follow" (I don't find it such, YMMV) yet others never even think "can I do this in less code".
Something you'll see ALL the time for code bloat is "variables for nothing" -- people making copies of existing variables because they can... or using variables where you don't need a variable. Take something as 'simple' as a user login check. You'll often see (even in examples by alleged experts) nonsense like this:
$username = $_POST['username']; $password = $_POST['password']; $query = " SELECT id FROM users WHERE username = '" . $password . "' AND password = '" . md5($password) . "'"; $stmt = $db->query($query);That is a LAUNDRY list of how not to write PHP from both a security and a bloat standpoint. There is NO reason to copy those POST values to variables if you're not cleaning them, there is NO reason to be creating that query string as a variable, and of course this is 2018 not 2003, STOP slopping variables into query strings! Do I even have to mention the stupidity of still using MD5?
Modernized and removing the bloat:
$stmt = $db->prepare(' SELECT id FROM USERS WHERE username = ? AND password = ? '); $stmt->execute([ $_POST['username'], hash('sha256', $_POST['password']) ]);ZERO extra variables needed other than the anonymous array being passed on the stack. If you byte count the CODE it will appear to be about the same, but the latter uses less memory and will run faster because it's not doing "things for nothing" like creating variables you don't need and wasting time copying values into them to just copy them AGAIN somewhere else.
NOTE, this is not a time to argue the merits and flaws of hash() vs. password_verify()
HTML and CSS are a GREAT place to easily learn to recognize bloat -- stuff in the HTML that could be in an existing external file -- like the stupid STYLE tag and style="" attribute slopped all over the place. EVEN when it's the same amount of code regardless of where you place it, it can be considered 'bloat' in that in the document itself it isn't cached when the page is revisited on a content update. You put it in the external stylesheet it is.
Presentational classes like "W3-red" or "text-center" or "xm-col-5" being a great example of this; saying in the HTML what you want things to look like and creating pointless classes for every single minuscule property assignment results in fat bloated HTML, and in the process flips the bird at why HTML and CSS are even separate languages in the first bloody place.
Yet that's the cornerstone of how every single one of these idiotic dumbass front-end "frameworks" even functions. ALL because the people creating and using them don't understand enough about HTML or CSS to even be using either! AT BEST all it does is shuffle code from the CSS where it belongs into the HTML where it can lead to missed caching opportunities -- at WORST it results in larger than need be codebases that are harder to manage.
... and EVEN when it MIGHT be less code (It usually is not) it can still be bloated if it consumes more bandwidth than needed. A missed caching opportunity increases your bandwidth consumption -- typically if you revisit a page the content (the important part) will have updated but the template/skin/layout will not. If you visit multiple pages on the same site the content is different, but again the overall appearance is unchanged. This means that the more you can move into the CSS and out of the markup, the more CSS being cached can save you in bandwidth.
So again we have a scenario where by trying to save effort in one place (CSS) you end up making more work someplace else (HTML) that in fact leads to bloat in bandwidth use REGARDLESS of how large the total code size is.
That's why I'm always railing against developers who see nothing wrong with vomiting up 60k or more HTML to do 10k's huffing job... the hallmark of numbskullery like bootcrap, w3.css, yui, and every other dipshit front-end framework. Well, on top of how that methodology with a lack of focus on semantic markup leads to specificity hell and tells users with accessibility needs and search engines to go plow themselves.
Basically bloat is a bit like what Potter Stewart said about Pornography in regards to the Supreme Court ruling on the movie "The Lovers".
It comes in all shapes and sizes, and is just not as plain, ordinary, or obvious as "Wow that's a lot of code".