Reading someone else's code can be quite confusing. Hours can go on issues that should have been fixed in minutes. In this article, I would like to share some advice on how to write code that will be easier to understand and maintain.
Before we get started, please take notice this is not a guide about writing "clean code". People tend to understand different things by this term, some like it to be easily extendable and generic, some prefer to abstract the implementation and provide just configuration and there are some who just like to see a subjectively beautiful code. This guide focuses on readable code, by that I mean a piece of code that communicates the necessary information to other programmers as efficiently as possible.
Below are 23 guides to help you write more readable code, this is a lengthy article so feel free to jump to parts that interests you:
- Identify that you have a problem before creating the solution
- Pick the right tool for the job.
- Simplicity is king.
- Your functions, classes, and components should have a well-defined purpose.
- Naming is hard, but it's important.
- Do not duplicate code.
- Remove dead code, do not leave it commented.
- Constant values should be in static constants or enums.
- Prefer internal functions over custom solutions.
- Use language specific guidelines.
- Avoid creating multiple blocks of code nested in one another.
- It’s not about the least number of lines.
- Learn design patterns and when not to use them.
- Split your classes to data holders and data manipulators.
- Fix issues at their roots.
- Hidden trap of abstractions.
- Rules of the world are not the rules of your application.
- Type your variables if you can, even if you don’t have to.
- Write tests.
- Use static code analysis tools.
- Human code reviews.
- Comments.
- Documentation. Conclusion.
1. Identify that you have a problem before creating the solution.
No matter if you are fixing a bug, adding a new feature or designing an application, you are essentially solving a problem for someone. Ideally, you want to do that leaving the least amount of issues behind. You should be clear on what problems you are solving with your design pattern choices, refactorings, external dependencies, databases and everything else you spend your valuable time on.
Your piece of code is a potential problem. Even the beautiful one. The only time any piece of code is no longer a problem is when a project is finished and dead - no longer supported. Why? Because someone will have to read it during the project lifetime, understand it, fix it, extend it, or even remove the feature it provides entirely.
Maintaining the codebase takes a lot of time and not many programmers like to do it because it lacks creativity. Write your code simpler so a junior developer can fix it when needed and you can be free to tackle bigger issues.
Lost time is a problem. A perfect solution to your task can be available, but it may be hard to see for a developer sometimes. There are tasks, where the best solution is to convince the client that what he wants is not really what he needs. It takes a deeper understanding of the application and its purpose. Your client might want a whole new module that will end up becoming thousands of extra lines of code when he just needs some more customization to his existing options. It might turn out you only need to change the existing codebase just a little bit, saving time and money.
There are other types of problems. Let's say you need to implement a filterable list of records. You hold your data in the database, but the connections between different records are complex. After analyzing how the client wants the data to be filtered, you discover that because of the database design you will have to spend about 20 hours building complex SQL queries with multiple joins and inner queries. Why not explain there is a different solution that will take 1 hour but will miss a part of the feature? It might turn out that the extra feature is not worth so much of your time, which translates to monetary cost.
2. Pick the right tool for the job.
This shiny silver bullet language, that one framework that you love unconditionally or a new database engine can turn out to be not the right tool for the problem you are facing. Don't pick tools that you just heard are awesome for everything in a serious project. It's a recipe for disaster. If your data need relations, picking MongoDB just to learn will end badly. You know you can do it, but often you will require a workaround that will produce extra code providing suboptimal solutions. And sure, you can hit the nail even with a wooden board, but a quick google search might point you to a hammer. Maybe since you last checked there is an AI that can automatically do it for you.
3. Simplicity is king.
You might have heard the phrase " premature optimization is the root of all evil ". It holds partial truth. You should prefer simple solutions unless you are confident it will not work. Not just believing it will not work, but already tried it or calculated it beforehand and being sure it won't work. Choosing more complex solution for whatever reason, be it the speed of execution, lack of RAM, extensibility, lack of dependencies or other reasons can heavily impact code readability. Don't complicate things unless you have to. The exception to that would be if you know a more efficient solution and you know it's implementation won't impact readability or your time requirements.
On the same note, you don't need to use all the new features of your language if it doesn't profit you and your team. New doesn't mean better. If you are unsure, go back to the first point and consider what problem are you trying to solve before refactoring. Just because Javascript has way too many new ways of writing a for loop statement doesn't make for loops obsolete if you need that index variable.
4. Your functions, classes, and components should have a well-defined purpose.
Do you know SOLID principles? I found these to be pretty good for designing generic libraries, but even though I used it a couple of times and I've seen a few implementations in working projects, I think the rules are a bit too confusing and complicated.
Split your code into functions that each does one thing. For example, let's consider how we would go about implementing a button. Button could be a class that groups all functionalities of a button. You might implement the button with one function for drawing it on the screen, another function to highlight it on mouseover, yet another one to be called on clicking the button and one more to animate the button on click. You can split it even further. If you need to calculate the rectangle position of a button based on screen resolution, don't do it in the draw function. Implement it in a different class since it is usable by other GUI elements and just use it when drawing the button.
It is a simple thing to follow, whenever you think "this doesn't have to be here" you can move it to another function, providing more information to fellow developers by wrapping a block of code with the name of the function and comments.
Consider examples below that do the same thing, which one informs what it does quicker?
// C++
if (currentDistance < radius2) {
// This is the sight of a player
if (!isLight) {
// If lighting of the tile is about 30% (so sight in darkness is worse) or distance from the player is 1, tile should be visible.
if (hasInfravision || map.getLight(mapPosition) > 0.29f || ASEngine::vmath::distance(center, mapPosition) == 1) {
map.toggleVisible(true, mapPosition);
}
}
// This is for light calculations
else {
ASEngine::ivec3 region = World::inst().map.currentPosition;
ASEngine::ivec2 pos = mapPosition;
if (mapPosition.x > 63) {
pos.x -= 64;
region.x += 1;
}
else if (mapPosition.x < 0) {
pos.x += 64;
region.x -= 1;
}
if (mapPosition.y > 63) {
pos.y -= 64;
region.y += 1;
}
else if (mapPosition.y < 0) {
pos.y += 64;
region.y -= 1;
}
map.changeLight(pos, region, 1.0f - static_cast(currentDistance) / static_cast(radius2));
}
}
// C++
if (currentDistance < radius2) {
// This is sight of a player
if (!isLight) {
this->markVisibleTile(hasInfravision, map, center, mapPosition);
}
// This is for light calculations
else {
ASEngine::ivec3 region = World::inst().map.currentPosition;
ASEngine::ivec2 pos = map.getRelativePosition(mapPosition, region);
map.changeLight(pos, region, 1.0f - static_cast(currentDistance) / static_cast(radius2));
}
}
5. Naming is hard, but it's important.
Names of variables and functions should be distinct and provide a general idea of what they do. The important thing about naming is that it should describe what it does to your team, so it should conform to the conventions chosen in the project. Even if you don't agree with them. If every request for a record in the database starts with "find" word, like "findUser", then your team might get confused if you come to the project and name your database function "getUserProfile" because this is what you are used to. Try to group naming when possible, for example, if you have many classes for input validation, putting "Validator" as the suffix for the name may quickly provide information what the purpose of the class is.
Choose and stick to a case type according to the standards. It gets really confusing to read camelCase, snake_case, kebab-case and beer🍻case used in different files of the same project.
6. Do not duplicate code.
We already established that the code is a problem, so why duplicate your problems to save a few minutes? It really doesn't make sense. You might be thinking you are solving something quickly by just copying and pasting, but if you think you have to copy more than 2 lines of code, try to entertain the idea that you might be missing an opportunity for a better solution. Maybe a generic function, or a loop?
7. Remove dead code, do not leave it commented.
Commented code is confusing. Did someone remove it temporarily? Is it important? When was it commented? It's dead, take it out of its misery. Just remove it. I get it that you are hesitant to remove the code because things can go bad and you want to just uncomment it. You might even be very attached to it since you spent your time and energy to come up with it. Or maybe you think it might be needed "soon". The solution to all of these issues is version control software. Just use git history to retrieve the code if you ever need it. Clean up after yourself!
8. Constant values should be in static constants or enums.
Do you use strings or integers to define types of objects? For example, a user can have a role "admin" or "guest". How would you go about checking if the user has a role "admin"?
if ($user->role == "admin") {
// user is an admin
}
This is not great. First of all, if the name "admin" changes you will have to change it in your whole application. You might say this rarely happens and modern IDEs make it not that hard to replace it. That's true. The other reason is lack of autocomplete and because of that misspelling issues come up. These can be pretty nasty to debug.
By defining global constants or enums depending on the language you can profit from autocomplete and change the value in a single place if you ever need to. You don't even have to remember what kind of value is hidden behind the constant, you just let your IDE autocomplete magic help.
// PHP
const ROLE_ADMIN = "admin";
if ($user->role == ROLE_ADMIN) {
// user is admin
}
// C++
enum class Role { GUEST, ADMIN }; // It's possible to map these enums to strings, but it's not needed.
if (user.role == Role.ADMIN) {
// user is admin
}
It's not just types of your objects. In PHP you can define arrays with strings as names of the fields. With complex structures, it can be hard to not make a typo and for that reason, it is preferable to use objects instead. Try to avoid coding with strings and you will profit from fewer typos and speed increase from autocomplete feature.
9. Prefer internal functions over custom solutions.
If your language or framework that you picked for your project provides you with a solution to your problem, use it. Everyone can quickly google what does a function do even if it's not used often. It will probably take more time to figure out your custom solution. If you find a piece of code that does the same thing as an internal function, just refactor quickly, don't leave it be. Removed code is no longer an issue, so deleting code is great!
10. Use language specific guidelines.
If you write in PHP, you should get to know PSRs. For Javascript, there is a decent guideline from Airbnb. For C++ there is guideline from Google or core guidelines from Bjarne Stroustrup, the creator of C++. Other languages might have their own guidelines for quality code or you can even come up with your own standards for your team. The important part is to enforce the use of the chosen guideline for the project, so there is a unified vision of how it should be developed. It prevents many issues that come from different people with their own unique experiences doing what they are used to.
11. Avoid creating multiple blocks of code nested in one another.
Just compare these two blocks of code:
void ProgressEffects::progressPoison(Entity entity, std::shared_ptr<Effects> effects)
{
float currentTime = DayNightCycle::inst().getCurrentTime();
if (effects->lastPoisonTick > 0.0f && currentTime > effects->lastPoisonTick + 1.0f) {
if (effects->poison.second > currentTime) {
std::shared_ptr<Equipment> eq = nullptr;
int poisonResitance = 0;
if (this->manager.entityHasComponent(entity, ComponentType::EQUIPMENT)) {
eq = this->manager.getComponent<Equipment>(entity);
for (size_t i = 0; i < EQUIP_SLOT_NUM; i++) {
if (eq->wearing[i] != invalidEntity && this->manager.entityHasComponent(eq->wearing[i], ComponentType::ARMOR)) {
std::shared_ptr<Armor> armor = this->manager.getComponent<Armor>(eq->wearing[i]);
poisonResitance += armor->poison;
}
}
}
int damage = effects->poison.first - poisonResitance;
if (damage < 1) damage = 1;
std::shared_ptr<Health> health = this->manager.getComponent<Health>(entity);
health->health -= damage;
} else {
effects->poison.second = -1.0f;
}
}
}
void ProgressEffects::progressPoison(Entity entity, std::shared_ptr effects)
{
float currentTime = DayNightCycle::inst().getCurrentTime();
if (effects->lastPoisonTick < 0.0f || currentTime < effects->lastPoisonTick + 1.0f) return;
if (effects->poison.second <= currentTime) {
effects->poison.second = -1.0f;
return;
}
int poisonResitance = this->calculatePoisonResistance(entity);
int damage = effects->poison.first - poisonResitance;
if (damage < 1) damage = 1;
std::shared_ptr health = this->manager.getComponent(entity);
health->health -= damage;
}
The second one is much easier to read, is it not? If such solution is available, try to avoid nesting blocks of ifs and loops in one another. A common trick is to reverse the if statement and return from the function before moving to the block of code like in the example above.
12. It’s not about the least number of lines.
We often say that a piece of code that takes fewer lines of code to accomplish the task is better. Some of us even get obsessed about how many lines of code we are adding or removing, measuring our productivity with the count. We do that for simplification, but it's not a rule that should be followed without considering readability. You can squeeze everything in a single line but chances are it will be much harder to understand what is going on than splitting it into few, simple lines with one command per line.
Some languages offer the possibility of writing short if statements, like so:
$variable == $x ? $y : $z; // if ($variable == x) { $result = $y; } else { $result = $z; }
It can be a great choice, but can also be easily overdone:
$variable == $x ? ($x == $y ? array_merge($x, $y, $z) : $x) : $y; // What is this heresy?!
It should be easier to grasp after the split.
$result = $y;
if ($variable == $x && $x == $y) $result = array_merge($x, $y, $z);
else if ($variable == $x) $result = $x;
These 3 lines take more space on your screen, but it takes less time to analyze what is going on with the data.
13. Learn design patterns and when not to use them.
There are many different design patterns that are often chosen to solve coding issues. What you should keep in mind is that although these patterns solve specific issues in the application, their usefulness can be affected by many different factors like the size of the project, the number of people working on it, time (cost) constraints or required complexity of the solution. Some patterns have been named antipatterns, like the Singleton pattern because even though they provide some solutions they also introduce issues in certain cases.
Just make sure you understand the cost of implementation in terms of complexity that will be introduced before choosing a design pattern for your particular solution. You might not need the Observer pattern to communicate between components in a simple system, maybe a few booleans will produce an easy to follow solution? It is more justified to spend the time to implement a chosen design pattern in a bigger, more complex applications.
14. Split your classes to data holders and data manipulators.
A data holder class is a class that keeps some data in its internal data structures. It allows access to the data through getters and setters as required, but does not manipulate the data unless it's always changed when keeping it in the system or always has to be mutated on access.
A very good example is in the Entity Component System architectural pattern, where Components only hold the data and Systems manipulate and process it. Another use case would be the Repository design pattern implemented for communication with an external database, where a "Model" class represents data from the database in language-specific structures and "Repository" class synchronizes the data with a database, either persisting changes on the Model or fetching them.
This separation makes it much easier to understand different parts of your application. Consider the Repository example above. If you would want to display a list of data held in a collection of Models, do you need to know where this data came from? Do you need to know how in the database it's being stored and how it needs to be mapped to language-specific structures? The answer to both is no. You pull the models through existing repository methods and focus on just what you need to do in your task, which is displaying the data.
How about Entity Component System example? If you need to implement systems that will process the use of a skill, playing animation, sound, dealing damage and so on. You don't need to know how the skill was triggered. It doesn't matter if AI scripts initiated the skill on some conditions or player used a hotkey to activate it. The only thing you need is to recognize that the data in the Component was changed, indicating which ability needs to be processed.
15. Fix issues at their roots.
You need to implement a new feature, adding it to an existing codebase. You get into the part of the code you need to change and you come across an issue. The structure of your function input doesn't work well with what you need, so you have to write quite a bit of extra code to reorganize the data and pull some more of it before you can implement your solution.
Before you do that, try to move a few steps back in your codebase. Where does this data come from and how is it used? Maybe you can get it in an easier to process format from an external source or change it immediately as you acquire it? By fixing this issue in its root, you might be fixing the same issue in multiple places and in the future features or changes. Always try to simplify how you keep your data for ease of access as soon as you get it. This is especially important when you receive the data from an external source. If you need data from users of the application or external API, you should weed out unnecessary things and reorganize the rest immediately.
16. Hidden trap of abstractions.
Why do we write general purpose, abstract solutions to our problems? To easily extend our applications, make it easier to adapt to new requirements and to reuse our piece of code so we don't have to write it ever again.
There is often a heavy cost to abstraction in terms of code readability. The highest level of abstraction is when everything is solved for you while the implementation is hidden. You are given the ability to configure how your data should be processed given input, but you have no control over the details, like for example how it's going to be stored in your database, how efficiently it's going to be processed, what information is being logged and many more. The argument to such solution is that if a new source of data has to be processed the same way as the current one, it's easy to just throw it into the library and point where it should be stored. You are essentially trading control for speed of implementation.
When something goes wrong and it's not a well-documented issue, someone will have a very hard time understanding all the general purpose ideas that try to solve way more than necessary. If we can afford it, we really shouldn't want to hide implementation details. Keeping control of the codebase allows for more flexibility. Don't write a general solution to simple problems just because you think it "might" be extended in the future. It rarely is and it can be rewritten when needed.
Let's consider an example. If you can create a class that can import data from CSV file and pack it into a database in 10 - 15 lines of readable code, why bother making 2 classes and generalize the solution so it can be potentially expanded into importing from XLS or XML in the future when you don't even have a hint that this will be needed for your application? Why pull an external library of 5k lines of code you don't need to solve this issue?
There is rarely a necessity to generalize the storage place of your data. How many times in your career did you change database engine? In the last 10 years, I have come across an issue that was resolved in such a way once. Creating abstract solutions are costly and very often unnecessary unless you are creating a library that has to cater to a huge variety of projects at once.
In contrast, when you know for sure that you have to allow importing from XLS and CSV out of the box, then the general solution might be a perfectly viable choice. It's also really not a big deal if you will write a general solution later when requirements of your application change. It will be a lot easier for someone to just have a plain and simple solution when they will need to replace it.
17. Rules of the world are not the rules of your application.
I had an interesting argument about modeling "the real world" when implementing OOP paradigm in an application. Let's say we have to process big data for an advertisement system. We have 2 types of log messages. First is information about emission of an advertisement which holds some data. The second log informs of some person clicking the ad, which holds exactly the same data as emission and a few extra fields. Emission log has no data that a click does not contain.
In the real world, we might consider both actions, viewing and clicking an ad to be separate but similar. So by modeling the real world, we could create a base "Log" class that we would extend to "ClickLog" and "EmissionLog" classes, like below:
struct Log {
int x;
int y;
int z;
}
struct EmissionLog : public Log {}
struct ClickLog : public Log {
float q;
}
Above example maps how the system works in the real world quite well. Emitting an advertisement is completely different than someone clicking it. However, such a choice doesn't convey an important information. In our application, everything that can process emission logs can work on clicks. We can use the same classes to process both, but only some click processors cannot work on emissions, because of the difference in data.
In our application, different than in the real world, our ClickLog is an extension of EmissionLog. It can be processed in the same way, using the same classes that operate on EmissionLogs. If you extend clicks from emission logs, you inform your colleagues that everything that can happen to emission can happen to clicks without them needing to know about all the possible log processors in the application.
struct EmissionLog {
int x;
int y;
int z;
}
struct ClickLog : public EmissionLog {
float q;
}
18. Type your variables if you can, even if you don’t have to.
You can skip this one if you only write in statically typed languages. In dynamically typed languages like PHP or Javascript it can be very hard to understand what a piece of code is supposed to do without dumping the contents of the variables. For the same reason, the code can be very unpredictable when a single variable can be an object, an array or a null depending on some conditions. Allow the least amount of types of your variables as possible in your function parameters. The solutions are available. PHP can have typed arguments and return types since version 7 and you can pick up Typescript instead of clean Javascript. It helps with code readability and prevents dumb mistakes.
If you don't have to, don't allow nulls either. Null is an abomination. It has to be explicitly checked for existence to avoid fatal errors which require unnecessary code. Things are even more dreadful in Javascript with it's null and undefined. Mark variables that can be null, so you inform your peers:
// PHP >= 7.1
function get(?int count): array {
//...
}
// Typescript
interface IUser = {
name?: string; // name field might not be available
type: number;
}
- Write tests.
As the years go by and we manage to avoid burnout, we improve to the point when we can map even complex features in our mind and implement it without checking if our code works until the first draft is fully implemented. At that point, it can feel like a bit of a waste of time writing in TDD cycles, as it's a bit slower to check every single thing before writing it. It is a good practice to write integration tests that make sure your whole feature works as required, simply because you will likely leave some small errors behind and you can check that in milliseconds.
If you are not experienced yet with your language or library and you try a different angle often when looking for ways to solve your problem, you can benefit greatly from writing tests. It encourages splitting your work into more manageable chunks. Integration tests explain what kind of issues does your code solve very quickly, which may provide the information quicker than a generic implementation. A simple "this input expects this output" can speed up the process of understanding the application.
20. Use static code analysis tools.
There are many open source static code analysis tools available. Quite a lot is also provided real-time by advanced IDEs. They help to keep your projects on track. You can automate some in your repository pipelines to be run on every commit in a docker environment.
Solid choices for PHP:
- Copy / Paste detector
- PHP Mess Detector - checks potential bugs and complexity
- PHP Code Sniffer - checks coding standards.
- PHPMetrics - static analysis tool with dashboard and charts.
Javascript:
- JsHint / JsLint - discover errors and potential issues, can be integrated with an IDE for real-time analysis.
- Plato - source code visualization and complexity tool.
C++:
Multilanguage support:
- pmd - mess detector.
21. Human code reviews.
Code review is just another programmer looking through your code to find mistakes and help improve the quality of software. As much as they can help the overall quality of the application and allow for a flow of knowledge in the team, they are only useful if everyone is open to constructive criticism. Sometimes people reviewing want to enforce their vision and experience and won't accept a different view, which can also be hard to swallow.
It can be hard to pull off depending on the team environment, but the gains can be incredible. The biggest and cleanest application I have ever participated in coding was done with very thorough code reviews.
22. Comments.
You may have already noticed that I like to keep the rules of coding simple, so they are easy to follow by everyone on the team. The same is with comments.
I believe comments should be added to every function, including constructors, every class property, every static constant and every class. It is a matter of discipline. When you allow laziness by allowing exceptions to commenting when "something doesn't require comments because it's self-explanatory" then laziness is often what you will get.
Whatever you are thinking about when implementing the feature (relevant to work!) is a good thing to write in the comments. Especially how everything works, how a class is used, what is the purpose of this enum and so on. The purpose is very important as it's hard to explain just through proper naming unless someone already knows the conventions beforehand.
I understand that "InjectorToken" makes complete sense to you and you might consider it "self-explanatory". Quite frankly it's a great name. But what I want to know when I view this class is - what is this token for, what does it do, how can I use it and what is this Injector thingy. It would be perfect to see that in the comments so nobody will have to look all over the application, right?
23. Documentation.
I know, I know, I hate writing documentation too. If you write everything you know in your comments, then this point could potentially be generated automatically by some tool. Still, documentation can give you a quick way to look up important information about how your application should work.
You can use Doxygen for automatic documentation generation.
Conclusion
The reason why this article is a set of guidelines and not rules is that I believe that there are many ways to be right. If you are convinced beyond doubt that everything should be abstracted, have at it. If you believe SOLID principles are to be used in every application, or that if a solution is not done through well-known design pattern then it's immediately bad, it's fine.
Choose the path that is right for you and your team and stick with it. And if you are ever in an experimental mood, try some of the things I mentioned in this article. Hopefully, it will improve the quality of your work. Thanks for reading and don't forget to share if you found it interesting!