Anna Zubova

Posted on Sep 21, 2019

SOLID Programming (Part 1): Single Responsibility Principle

#solidprogramming #oop #python #drycode

SOLID principles are among the most valuable in Software Engineering. They allow to write code that is clean, scalable and easy to extend. In this series of posts I will explain what each of the principles is and why it is important to apply.

Some people believe that SOLID is only applicable to OOP, while in reality most of its principles can be used in any paradigm.

‘S’ in SOLID stands for single responsibility. The error of many novice programmers is to write complex functions and classes that do a lot of things. However, according to the Single Responsibility Principle, a module, a class or a function has to only do one thing. In other words, they have to have only one responsibility. This way the code is more robust, easier to debug, read and reuse.

Let’s look at this function that takes a word and a file path as parameters and returns a ratio of number of the word's occurrences in the text to the total number of words.

def percentage_of_word(search, file):
    search = search.lower()
    content = open(file, "r").read()
    words = content.split()
    number_of_words = len(words)
    occurrences = 0
    for word in words:
        if word.lower() == search:
            occurrences += 1
    return occurrences/number_of_words

The code does many things in one function: reads file, calculates number of total words, number of word's occurrences, and then returns the ratio.

If we want to follow the Single Responsibility Principle, we can substitute it with this code:

def read_localfile(file):
    '''Read file'''

    return open(file, "r").read()


def number_of_words(content):
    '''Count number of words in a file'''

    return len(content.split())


def count_word_occurrences(word, content):
    '''Count number of word occurrences in a file'''

    counter = 0
    for e in content.split():
        if word.lower() == e.lower():
            counter += 1
    return counter


def percentage_of_word(word, content):
    '''Calculate ratio of number of word occurrences to number of all words in a text'''

    total_words = number_of_words(content)
    word_occurrences = count_word_occurrences(word, content)
    return word_occurrences/total_words


def percentage_of_word_in_localfile(word, file):
    '''Calculate ratio of number of word occurrences to number
       of all words in a text file'''

    content = read_localfile(file)
    return percentage_of_word(word, content)

Now each function does only one thing. The first one reads the file. The second one calculates the total number of words. There is a function that calculates the number of occurrences of a word in a text. Another function calculates the ratio of word's occurrences to total number of words. And if to get this ratio we prefer to pass the file path instead of text as a parameter, there is a function for that specifically.

So what are we gaining restructuring the code this way?

The functions are easily reusable and can be mixed depending on the task, thus making the code easily extendable. For example, if we wanted to calculate the frequency of a word in a text that is contained in a AWS S3 bucket instead of a local file, we just need to write a new function read_s3, the rest of the code would work without modification.
The code is DRY. No code is repeated, so if we need to make a modification in one of the functions, we would only need to do it in one place.
The code is clean, organized and very easy to read and understand.
We can write tests for each function separately, so it is easier to debug the code. You can check out tests for these functions here.

Code in GitHub

The code and tests from this article are available in GitHub:
https://github.com/AnnaLara/SOLID_blogposts

Top comments (9)

ThinkDigitalSoftware • Sep 22 '19

Thanks for this write-up! It's short and sweet.
Can you help me to see why this isn't overkill? As you can probably tell, I currently write my functions like the first example and only refactor if I start to need to reuse code. It's a "Refactor when it's needed" approach. The reason I ask is, i don't feel like I'm presented with enough information to decide when it's broken down too much. Anything can be defined as a single responsibility. This function is supposed to get the percentage of word occurances, that's a single responsibility. But so is grabbing a file from disk and so is converting the file to a string. How does one know where to draw the line?

Matt Shirlaw • Sep 23 '19 • Edited

The reality is there is no golden rule that it going to be correct in all cases.

I think the example in this article is something that is manufactured for the purposes of illustration & possibly a bit artificial.

Ultimately, if you can ask the question "what does this function/class do?" and there is one obvious answer you are probably doing ok. It is probably easier to spot a function that is doing too many things rather than one that just does one thing.

Some good indicators of a function that is trying to do too much could be

when a large number of parameters are being passed to the function or
when the function receives a Boolean "flag" as an argument which is designed to modify it's behaviour.

There's a great chapter in "clean code" with lots more information on this topic 👍

Anna Zubova • Sep 23 '19

That's actually a very good question! Breaking down code into functions/classes with single responsibility makes it easy to extend the code when needed as well as debug and modify. So it is a good practice to write smaller functions right from the start so you don't have to suffer when your code gets bigger. So the line is in the question: what does my function/class do? If it is one thing, you are good to go.

Imagine that you want to calculate the percentage of word's occurrences from AWS S3 bucket file instead of a local file. Since the functions are broken down, you just need to write a function that retrieves the content from S3 file (which you can also reuse in other tasks). Next you just combine it with percentage_of_word() function.

If you had it all in one function you would need to modify the code inside the function. But what if you wanted to be able to use both local file and S3? Now you would need an if statement which makes code bulky and slows it down. What if you needed even more content sources? Even more complexity to your function.

The Single Responsibility Principle is actually very connected with the Open/Close Principle which I am going to write next about. The principle says that the code should be open for extension and closed for modification. That is, instead of modifying the existing code it should be rather extended using already existing components and adding what is necessary.

Hope it answers your question!

ThinkDigitalSoftware • Sep 23 '19 • Edited

Yes, it's a great point. I would have some if statements in there. My code tends to have a lot of those. I appreciate you writing these in such a short format because it's difficult to understand SOLID as a whole in large chunks

Krzysztof Wasielewski • Sep 28 '19

Hey, as Uncle Bob said(the creator of SOLID principles) Single responsibility principle is not only about splitting the code to the smallest possible chunks - as in this article - but to split by business domain. There should be only one business/technical/finance etc. reason to change that class.
Here is the link with a good example from SOLID author: blog.cleancoder.com/uncle-bob/2014...

So in my opinion example from this article is not about SRP, but about code splitting.

Vlastimil Pospichal • Sep 21 '19

This code is not DRY. content.split() is repeated and make this function slower.

Matt Shirlaw • Sep 22 '19

Since the article is about the single responsibility principle I think it is overly harsh to attack it based on one function being repeated. It would be easily refactored if performance became an issue. The article does a good job at getting the main point across. There's no place for that sort of attack on dev.to

Anna Zubova • Sep 22 '19

Hi Vlastimil,

Calling a built-in method in different functions doesn't mean that the code in not DRY. You could argue that the code could be optimized without repeating split() call, but as we know, premature optimization is the root of all evil. I would not try to optimize it further without having a big picture of the whole code in mind. Besides, the point of the article was to show an example of single responsibility, not optimizing for speed.