Sign in
Log inSign up

DRY, Where Did We Get It Wrong?

Sébastien Portebois's photo
Sébastien Portebois
·Jan 26, 2022·

4 min read

"Don’t Repeat Yourself". Every developer has already heard this principle.

Yet, in my experience, most of us didn’t get it. At least not how it was intended.

I won’t ask you to trust me. Let’s trust the authors. What did they say when they came up with this principle?

In 1999, Andrew Hunt and David Thomas wrote in The Pragmatic Programmer:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

Most software engineering teams translated and understood this as: « duplicate code is bad. »
And that’s true most of the time.
But what’s the subtle difference many people miss?
The Pragmatic Programmers’ section about this is titled "The Evils of Duplication".
So clearly we’re close.

The authors explain this much better than I’d do, so let’s look at their words, then we’ll look at an example to illustrate one of my pet peeves of bad DRY misapplied in tests.
In the 20th anniversary edition of the book, Andrew Hunt and David Thomas add this:

Let’s get something out of the way up-front. In the first edition of this book, we did a poor job of explaining just what we meant by Don’t Repeat Yourself. Many people took it to refer to code only: they thought that DRY means "don’t copy-and-paste lines of source".
This is part of DRY, but it’s a tiny and fairly trivial part.
DRY is about the duplication of knowledge, of intent. It’s about expressing the same thing in two different places, possibly in two totally different ways.

Later they have a section about "Not All Code Duplication Is Knowledge Duplication".

If that sparks some curiosity, go read the book, it details the duplication in data, in documentation, in representation, across APIs and much more.

A very common occurrence I see during code review is the false duplication of some pre-condition knowledge.

Let’s look at a very simple function, used to validate some string:

import re

# The urn pattern describe a workspace, and a resource id
# Examples:
#   drn:23f742:bc4b57dd0bec433198a58cd36b6c7ee8
drn_pattern = r"^drn:[a-z0-9]{6}:[a-f0-9]{32}$"


def is_valid_project_brn(urn: str) -> bool:
    return re.match(drn_pattern, urn, re.ASCII) is not None

We want to add basic tests for this function. We even want to go further and get good coverage of the various states the input can take, so we use Hypothesis to do some property-based testing on this:

import re
from hypothesis import assume, example, given, strategies

from app.dry_demo_01 import brn_pattern, is_valid_project_brn

valid_drn = strategies.from_regex(brn_pattern)


@strategies.composite
def invalid_drn(draw):
    """Generate random invalid DRN identifiers"""
    s = draw(
        strategies.one_of(
            # Pretty much any random string
            strategies.text(alphabet=strategies.characters(blacklist_categories=("Cc", "Cs")), min_size=0),
            # Some obvious bounds
            strategies.sampled_from(
                ["", "zzz:a29157:a3a0632b8a10427ebcd2ca1d51097cf8"]
            ),
        )
    )
    # If the random string ends up being a valid value, we reject it
    assume(re.match(brn_pattern, s) is None)
    return s


@given(brn=valid_drn)
@example(brn="brn:ac0a09:a99b52bd10614c29adda8131f7a523f1")
def test_valid_patterns_are_validated(brn):
    assert is_valid_project_brn(brn) is True


@given(brn=invalid_drn())
@example(brn="arn:121f23:4c324bd260974d5f94cab825a644958e")  # Invalid prefix
@example(brn="brn::4c324bd260974d5f94cab825a644958e")  # missing workspace
@example(brn="arn:121f23:")  # missing resource id
@example(brn="arn:1234567:4c324bd260974d5f94cab825a644958e")  # Workspace too long
def test_invalid_patterns_are_not_validated(brn):
    assert is_valid_project_brn(brn) is False

We submit this for code review, pretty confident, we do cover many positive and negative cases. What could be wrong?

Which DRY violation could be happening here?
Stop reading for a second, read these snippets again until you have a guess.

My main concern with such code is that the duplication we avoid makes our tests brittle and much less valuable.
Our tests rely on the code’s pattern to generate test data. We have a single code representation for 2 pieces of knowledge: what is applied and what is expected.
That means that is the pattern is updated, maybe by mistake, then the behavior of the function will change, yet our tests will continue to pass, accepting the new values.
So what would be a good way to do it here? Obviously, in this case, we want to duplicate the pattern, or rebuild it, but basically, to ensure that any change of the knowledge in the code requires us to update the test to let them pass.

This is a toy example to fit in a post, so it might look obvious here.
But when the code gets bigger, I see similar code constants re-use in the tests very often, and it becomes less obvious for more complex examples.
Does it look familiar?
During your next code-writing and code-review sessions, try to keep this in mind, and spot this misuse of DRY. It's surprising how common it is, and how easy we overlook it.

The key take-away?

Don't forget that DRY is about intents and knowledge.

Don't reject all code duplication without a thought, spend more time thinking about the other kinds of duplication.

We all know it's better, yet we all fail to do it often enough.