I really love clean code and I always want to code my code in the best possible way. But there was always one thing, I didn't really understand:
When is it too much of "separation of concerns" regarding methods?
Let's say we have the following method:
def get_last_appearance_of_keyword(file, keyword):
with open(file, 'r') as file:
line_number = 0
for line in file:
if keyword in line:
line_number = line
return line_number
I think this method is just fine as it is. It's simple, easy to read and it clearly does, what it's name says. But: It's not really doing "just one thing". It actually opens the file, and then finds it. That means I could split it even further (Also considering the "Single Responsibility Principle"):
Variation B (Well, this make somehow sense. In this way we can easily reuse the algorithm of finding the last appearance of a keyword in a text, yet it seems like "too much". I cant explain why, but I just "feel" it that way):
def get_last_appearance_of_keyword(file, keyword):
with open(file, 'r') as text_from_file:
line_number = find_last_appearance_of_keyword(text_from_file, keyword)
return line_number
def find_last_appearance_of_keyword(text, keyword):
line_number = 0
for line in text:
if keyword in line:
line_number = line
return line_number
Variation C (This is just absurd in my opinion. We are basically encapsulating a one-liner into another method with just one line twice. But one could argue, that the way of opening something may change in the future, because of some feature requests, and as we don't want to change it many times, but just once, we just encapsulate it and separate our main function even further):
def get_last_appearance_of_keyword(file, keyword):
text_from_file = get_text_from_file(file)
line_number = find_keyword_in_text(text_from_file, keyword)
return line_number
def get_text_from_file(file):
with open(file, 'r') as text:
return text
def find_last_appearance_of_keyword(text, keyword):
line_number = 0
for line in text:
if check_if_keyword_in_string(line, keyword):
line_number = line
return line_number
def check_if_keyword_in_string(text, keyword):
if keyword in string:
return true
return false
So my question now: What is the correct way of writing this code and why are the other approaches right or wrong? I always learned: Separation, but never when it is simply too much. And how can I be sure in the future, that it is "just right" and that it doesn't need more separation when I'm coding again?
Best Answer
Your various examples of splitting out concerns into separate functions all suffer from the same issue: you are still hard-coding the file dependency into
get_last_appearance_of_keyword
. This makes that function hard to test as it now has to reply on a file existing in the file system when the test is run. This leads to brittle tests.So I'd simply change your original function to:
Now you have a function that has just one responsibility: find the last occurrence of a keyword in some text. If that text is to come from a file, that becomes the caller's responsibility to deal with. When testing, you can then just pass in a block of text. When using it with runtime code, first the file is read, then this function is called. That is real separation of concerns.