When developing software or writing code there are so many principles or best practices to follow in order for your code to be readable, maintainable, and easy to optimize, sometimes this might be not very clear for beginners. One of these principles is DRY, which stands for Don't Repeat Yourself, it is a software development principle aiming to reduce code duplication which can lead to poor refactoring and maintenance. The DRY principle is simply about the duplication of knowledge, of intent. It's about expressing the same thing in two different places, possibly in two totally different ways.
The DRY Principle
The DRY principle states that every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
It is always easy to copy and paste some code when you need it in some other place of your application. Especially when it is a hotfix, you are under a pressure, and you should do it as quickly as possible. If you think that now it is ok, later I will return and refactor this. And we know, that in most cases this will never happen until something again will be broken but this time because of this duplicated piece of code.
Every duplicated logic is a potential delayed-action mine. Duplication often leads to a maintenance nightmare. When you have the same logic in multiple locations and you need this code to be changed, chances are high that the necessary changes won't correctly be applied to every location where that piece of logic occurs. When these multiple locations come out of sync with one another a heap of bugs will appear. So, to avoid these types of bugs we need to be sure that the relevant piece of code exists only in a single location:
- Assign to this source of truth a clear name.
- Choose the right location for it
- Reuse it, don't duplicate
Next time when we need to change it we exactly know where to look for. Avoiding duplication also improves the readability of the code. A small simple function or method is much easier to read and understand than a huge complex one.
Think before you DRY
We often think about DRY as duplicate code is bad, and code reuse is good, right? But it is a wrong approach. The principle is not about a code, it is about knowledge. Next time when you see a duplication of code do not immediately extract it to a method or an abstraction.
Maybe in this current context, duplication is not evil, it is normal.
Consider two classes Student and Teacher. Both have the same method with exactly the same logic. Should we extract a base Person class? Are we sure that the logic in these methods represents the same knowledge? Or this happened by accident.
<?php
class Student
{
public function getFullName()
{
return $this->lastname . ' ' . $this->firstname;
}
}
class Teacher
{
public function getFullName()
{
return $this->lastname . ' ' . $this->firstname;
}
}
We have two classes Student
and Teacher
. Both of them gets fullname. Furthermore, the implementations of getting the fullname are identical. Should we extract? Let's try and see what happens. We can extract a parent class Person, put getFullName()
method in it and then extend our classes from Person.
<?php
class Person
{
public function getFullName()
{
return $this->lastname . ' ' . $this->firstname;
}
}
class Student extends Person {
public function getFullName()
{
$this->getFullName();
}
}
class Teacher extends Person {
public function getFullName()
{
$this->getFullName();
}
}
Really looks very nice, but in some days we are told that the teachers' full name should contain their middle name. Now what? We need to go and override parent's implementation.
As the result of our extraction we achieved nothing but complexity. We have a reusable thing that is used only once. It is not very clever. We should keep this duplication in these classes.
In this case, the duplication is not about a knowledge, it is only about a code. But this code represents different knowledge, even if it looks quite the same. When we make changes to Teacher, it is unlikely that the same changes will be done to a Student.
Premature Refactoring
You shouldn't apply the DRY principle if your business logic doesn't have any duplication yet. Again, it depends of the context, but, as a rule of thumb, trying to apply DRY to something which is only used in one place can lead to premature generalization.
One more example when we don't need to remove duplication is when the application will never be changed. When you are creating a prototype or some script that will be run once and then thrown away, it is ok to copy and paste some pieces of code, if they will never be changed. Code duplication costs a lot, but only if the code changes.
Rule Of Three
This rule can help us in making decision when to extract a duplicated code. It says: Extract duplication only when you see it the third time.
The first time you do something, you just write the code.
The second time you do a similar thing, you duplicate your code.
The third time you do something similar, you can extract it and refactor.
But why? Why should we duplicate our code, when we have always been said that it is wrong?
These simple steps are build to prevent you from prematurely and wrong refactoring, when the problem we are trying to solve is not yet clear. Prematurely refactoring a piece of code when we don't exactly see the problem behind it can cause tremendous problems in the future.
With only two examples of duplication, it can be dangerous to abstract, because some details of the specific scenario can bleed into the abstraction.
But when we have at least three examples of duplicated code we can more precisely define patterns and abstractions for it. These rules also prevent you wasting your time and making something reusable when at the end of the day you end up with only one use of it.
DRY and Code Duplication
Code reuse and code duplication are two different things. DRY states that you shouldn't duplicate knowledge, not that you should use abstractions to reuse everything, everywhere.
So DRY is all about knowledge? All about business logic? Let's look at some code snippets to understand why:
<?php
class CsvValidation
{
public function validateProduct(array $product)
{
if (!isset($product['color'])) {
throw new \Exception('Import fail: the product attribute color is missing');
}
if (!isset($product['size'])) {
throw new \Exception('Import fail: the product attribute size is missing');
}
if (!isset($product['type'])) {
throw new \Exception('Import fail: the product attribute type is missing');
}
}
}
This code doesn't look that bad, does it?, The method validates some CSV output in only one place (validateProduct())
. This is the knowledge, and it's not repeated. but you might notice that there are conditionals statements everywhere Those If statements Isn't it an obvious violation of the DRY principle?
Well… no. It's not. I would call that unnecessary code duplication, but not a violation of the DRY principle.
we can restucture this code snipplet in a more readable and maintainable way, for example
<?php
class CsvValidation
{
private $productAttributes = [
'color',
'size',
'type',
];
public function validateProduct(array $product)
{
foreach ($this->productAttributes as $attribute) {
if (!isset($product[$attribute])) {
throw new \Exception(sprintf('Import fail: the product attribute %s is missing', $attribute));
}
}
}
}
It looks better, does it? There's no code duplication anymore!
Conclusion
- Knowledge duplication is always a DRY principle violation.
- Code duplication doesn't necessarily violate the DRY principle.
The idea of DRY is simple in theory: you shouldn't need to update in parallel multiple things when one change occurs. If your knowledge is repeated two times in your code and you need to change it, you might forget to change it everywhere. In your documentation, it could lead to misconceptions, confusion, and ultimately wrong implementations.
Latest comments (0)