Divide your program into short methods that perform one identifiable task. The main function should call well-named sub-functions which call sub-sub-functions (and so on, and so on...), so a developer can easily "travel" up and down between different levels of abstraction (each time concentrating only on the current one) through hierarchies of any depth.
Those subprograms have different names in different programming languages (methods, procedures, subroutines, scripts etc.) but I will call them all "functions". The discussed idea is named
Pay attention: the problem of too long functions usually goes together with the already discussed problem of extra indentation!
Let’s read what Jorn Olmheim wrote in the book "97 Things Every Programmer Should Know":
"There is one quote, from Plato, that I think is particularly good for all software developers to know and keep close to their hearts:
"Beauty of style and harmony and grace and good rhythm depends on simplicity."
In one sentence, this sums up the values that we as software developers should aspire to. There are a number of things we strive for in our code:
Plato is telling us that the enabling factor for all of these qualities is simplicity.
I have found that code that resonates with me, and that I consider beautiful, has a number of properties in common. Chief among these is simplicity. I find that no matter how complex the total application or system is, the individual parts have to be kept simple: simple objects with a single responsibility containing similarly simple, focused methods with descriptive names.
The bottom line is that beautiful code is simple code. Each individual part is kept simple with simple responsibilities and simple relationships with the other parts of the system. This is the way we can keep our systems maintainable over time, with clean, simple, testable code, ensuring a high speed of development throughout the lifetime of the system.
Beauty is born of and found in simplicity."
Steve McConnell writes in "Code Complete":
"A large percentage of routines in object-oriented programs will be accessor routines, which will be very short. From time to time, a complex algorithm will lead to a longer routine, and in those circumstances, the routine should be allowed to grow organically up to 100-200 lines.
Let issues such as the routine's cohesion, depth of nesting, number of variables, number of decision points, number of comments needed to explain the routine, and other complexity-related considerations dictate the length of the routine rather than imposing a length restriction per se.
That said, if you want to write routines longer than about 200 lines, be careful. None of the studies that reported decreased cost, decreased error rates, or both with larger routines distinguished among sizes larger than 200 lines, and you're bound to run into an upper limit of understandability as you pass 200 lines of code."
And now - ideas of different developers, found by me in the Internet:
"When reading code for a single function, you should be able to remember (mostly) what it is doing from beginning to the end. If you get partway through a function and start thinking "what was this function supposed to be doing again?" then that's a sure sign that it's too long..."
"Usually if it can't fit on my screen, it's a candidate for refactoring. But, screensize does vary, so I usually look for under 25-30 lines."
"IMO you should worry about keeping your methods short and having them do one "thing" equally. I have seen a lot of cases where a method does "one" thing that requires extremely verbose code - generating an XML document, for example, and it's not an excuse for letting the method grow extremely long."
"...you should make functions as small as you can make them, as long as they remain discrete sensible operations in your domain. If you break a function ab() up into a() and b() and it would NEVER make sense to call a() without immediately calling b() afterwards, you haven't gained much.
Perhaps it's easier to test or refactor, you might argue, but if it really never makes sense to call a() without b(), then the more valuable test is a() followed by b().
Make them as simple and short as they can be, but no simpler!"
"As a rule of thumb, I'd say that any method that does not fit on your screen is in dire need of refactoring (you should be able to grasp what a method is doing without having to scroll. Remember that you spend much more time reading code than writing it).
~20 lines is a reasonable maximum, though.
Aside from method length, you should watch out for
"During my first years of study, we had to write methods/functions with no more than 25 lines (and 80 columns max for each line). Nowadays I'm free to code the way I want but I think being forced to code that way was a good thing ... By writing small methods/functions, you more easily divide your code into reusable components, it's easier to test, and it's easier to maintain."
"I often end up writing C# methods with 10 - 30 lines. Sometimes I find longer methods suitable, when it’s easier to read/test/maintain."
"My problem with big functions is that I want to hold everything that's happening in the code, in my head all at once. That's really the key. It also means that we're talking about a moving target.
Dense perl should appear in shorter chunks than padded VB. And, worse, at only 38, I'm finding that while my abilities have matured in nice ways, by core ability to grok a bunch of code is diminishing and thus the size of code blocks I want to handle is shrinking.
Because the goal is usability, the one screen rule really does make sense even though you can point to seeming flaws like varying screen resolutions. If you can see it all at once without paging around the editor, you are very much more likely to handle it all as a block.
What if you're working on a team? I suppose the best thing for the team would be to determine the lowest common denominator and target that size. If you have someone with a short attention-span or an IDE set up displaying around 20 lines, that's probably a good target. Another team might be good with 50.
And yeah, whatever your target is, sometimes you'll go over. 40 lines instead of 25 when the function can't really be broken up reasonably is fine here and there. You and your partners will deal with it. But the 3K line single-function VB6 modules that exist in some of my employer's legacy suites are insane!"
"I prefer to try to keep everything in a function on the same level of abstraction and as short as possibly. Programming is all about managing complexity and short one purpose functions make it easier to do that."
Kent Beck wrote in his book "Smalltalk Best Practice Patterns": "Keep all of the operations in a method at the same level of abstraction. This will naturally result in programs with many small methods, each a few lines long."
Don't allow code blocks to overgrow.
A code block is a fragment, placed between opening and closing operators. These operators are (in different languages):
{
and }
) in C-like languages and, for example, BEGIN
/END
in PL/SQL.IF
... ELSE
... END IF
).FOR
... NEXT
, LOOP
... END LOOP
, DO
... WHILE
etc.)
It's a great idea to keep the opening and closing operators on one screen. If you see that it's impossible then think about extracting the block's code into a new function. That solution will also decrease the indenting: for example, the fragment
PB | if [condition] then [very long code fragment with its own indents] end if |
C# | if ([condition]) { [very long code fragment with its own indents] } |
will be looking in the new function as
PB | if not [condition] then return [very long code fragment with its own indents] |
C# | if (![condition]) return; [very long code fragment with its own indents] |
But that is already the subject of the next paragraph:
If a large code fragment is executed after a few validations (and is placed inside a few if
-s), take them all (the fragment and the validations) into a new function and exit that function just after one of the validations has failed.
It will not only save your code from extra indenting but also convey the following information: the
whole algorithm (not its part!) is executed after ALL the preliminary validations have been passed successfully. For
example, if the first line in a function is if (Page.IsPostBack)
return;
and the code is longer than one screen then the function readers
immediately know that all the executed stuff is done only if it's the
first load of the page (not a refresh), while if there is an if
block like if (!Page.IsPostBack) { [many-screens-code-fragment]
}
then the function readers are forced to scroll down to see if there is any
code after the if
block (i.e. a code which runs always, even on refresh). See how you can convert a code from monstrous to
elegant:
PB | if this.uf_data_ok_1() then if this.uf_data_ok_2() then if this.uf_data_ok_3() then [code fragment with its own indents] end if end if end if |
C# | if (this.DataOk1()) { if (this.DataOk2()) { if (this.DataOk3()) { [code fragment with its own indents] } } } |
PB | if not this.uf_data_ok_1() then return if not this.uf_data_ok_2() then return if not this.uf_data_ok_3() then return [code fragment with its own indents] |
C# | if (!this.DataOk1()) return; if (!this.DataOk2()) return; if (!this.DataOk3()) return; [code fragment with its own indents] |
Here you can ask: and what about
If a code fragment after many validations is not very long and you
don't want to extract it into a special function then use the exceptions mechanism: put the whole fragment between try
and catch
, throwing an exception on failure of any of the sub-validations and processing it locally (without propagating outwards).
If your programming language doesn't support exceptions then use one of the following tricks: the "flag method" or the "fake loop method" - but not the "multi-indents method"! Both PB and C# have exceptions (PB - beginning from version 7), but anyway I will use them to demonstrate the idea:
PB | boolean lb_ok lb_ok = this.uf_data_ok_1() if lb_ok then lb_ok = this.uf_data_ok_2() end if if lb_ok then lb_ok = this.uf_data_ok_3() end if if lb_ok then [code fragment with its own indenting levels] end if |
C# | bool ok; ok = this.DataOk1(); if (ok) { ok = this.DataOk2(); } if (ok) { ok = this.DataOk3(); } if (ok) { [code fragment with its own indenting levels] } |
PB | boolean lb_ok do while true lb_ok = false if not this.uf_data_ok_1() then break if not this.uf_data_ok_() then break if not this.uf_data_ok_3() then break lb_ok = true break loop if lb_ok then [code fragment with its own indenting levels] end if |
C# | bool ok; while (true) // or "for(;;)", if you wish { ok = false; if (!this.DataOk1()) break; if (!this.DataOk2()) break; if (!this.DataOk3()) break; ok = true; break; } if (ok) { [code fragment with its own indenting levels] } |
As you see, the fake loop is an eternal loop with unconditional break
in the end of the first iteration, so the second iteration will never happen. This solution is looking strange (a loop construction which never loops!), but it works!
Merge functions with duplicating functionality into one generic function.
If such a functions appear in classes inherited from a same ancestor, create the generic function in the ancestor and call it from the descendants. If the classes are not inherited from one ancestor then create the generic function in a third class (even if you have to create that class only for one function!). If you find yourself thinking whether to duplicate code using copy-paste (10 minutes of work) or take it to a third place (2 hours including testing) - stop thinking immediately! NEVER THINK ABOUT THAT, even in last days of your contract - SIMPLY TAKE THE CODE TO A THIRD PLACE and call it from where needed. If you are still in doubt about spending your time (which really belongs to the company) - ask your manager, but it's better to do the work well and after that to explain the manager why it has taken longer time: if the manager understands what the quality programming is then your effort will be appreciated!
Merge functions with similar functionality into one generic function.
If the functions, being taken by you into a third place to prevent duplication, are very similar but not exactly identical, then you need to exploit your brain a little bit more. Do the following:
For example, we have two original functions in different classes that are like these (fragments 1 and 2 of the second object are exactly as fragments 1 and 2 accordingly of the first class, but the DataObjects are different):
SomeFunction() of the first class:
[fragment 1] _entityDescription = "Car"; [fragment 2]
SomeFunction() of the second class:
[fragment 1] // exactly like [fragment 1] in the first class _entityDescription = "Bike"; // oops, this line is different... [fragment 2] // exactly like [fragment 2] in the first class
SomeFunction() taken into the ancestor class:
[fragment 1] _entityDescription = this.GetEntityDescription(); [fragment 2]
So, we use the function GetEntityDescription
to overcome the problem of
difference between two the discussed functions. GetEntityDescription
is created in the
ancestor class. Create it as an abstract function if your programming language supports them. If doesn't support (so you are forced to implement the function in the ancestor level) then display an error message or throw an exception letting know that this code must not be ever called. The function must be implemented in the descendant classes to supply circumstances-specific information, i.e. entities descriptions in our example: in the first descendant the function should be coded as
return "Car";
, in the second one - as return "Bike";
.
If the function is taken into a third class (which doesn't belong to the
inheritance hierarchy) then the SPECIFIC (different) data can be
supplied as argument(s) of the new merged function, so the fragment
_entityDescription = this.GetEntityDescription();
will become _entityDescription = aEntityDescription;
where
aEntityDescription
is the function's argument.
Finally, there is one
more method to achieve the goal - we can populate the _entityDescription
instance
variable while initializing the instance (for example, in its
constructor), but this approach is not always applicable.
So, that is the rule to obey:
DIFFERENT PARTS OF THE APPLICATION MUST SUPPLY SPECIFIC (UNCOMMON) DATA TO A GENERIC (UNIVERSAL) ALGORITHM IMPLEMENTED ONLY ONCE.
In one of my projects the old problem of duplicated "almost same" code was so serious that I suggested to perform refactoring of entire application sub-system, and my suggestion was accepted - I spent a week doing that, and it was one of the most interesting tasks in my career! As a result, my resume was decorated with the following beautiful fragment:
Proposed and performed code refactoring of existing corporative reporting system, moving duplicating functionality from descendant to ancestor classes. Facilitated maintenance and troubleshooting and made adding of new reports much easier and quicker.
But, of course, it's better to spend some time before development and think how to organize classes instead of thoughtless straightforward coding which forces the keys Ctrl+C and Ctrl+V to work hard.
A given object should assume as little as possible about the structure or properties of anything else (including its subcomponents) (
In simpler words: if you want to get data from an object, referenced by a variable, then use a public function, declared in the referenced object itself, but not variables and functions of its nested object(s).
You can find an example in the page for PowerBuilder developers.
If a function of a class will be accessed only from inside the class then declare that function as private
(if also from descendants then as protected
).
If you do so, then developers, using the class, will immediately see its "outer world" interface: may be, they want only to learn how to use the class as a "black box" - in that case the implementation details (like private
functions) are not needed. If a function, which is practically used privately, is mistakenly declared as public
, then the class consumers can think they can (or even should!) use it to work with the class. And oppositely: if a developer is working on the class itself (changing or increasing its functionality), then he/she cannot be 100% sure that a public
function is not called from somewhere outside, and is forced to apply the global search if that function's logic or signature must be changed.
<< prev | CONTENTS | next >> |