Designing for Maintainability and Extensibility 

There are a few principles by which one can design and implement code to 
be maintainable and extensible, which is tantamount to saying that it can be 
easily modified, tested, and understood. 

	He has your giant program. He has no time to read it all, much less 
	understand it. He wants to rapidly find the place to make his change, 
	make it and get out and have no unexpected side effects from the 
	change. 
	'How to Write Unmaintainable Code', 
	<URL:http://mindprod.com/unmain.html> 

Use Cases 

The maintainer and extender of your software have a similarity in approach 
that can be summarized as 'pinpoint, change, test'. Each stage of this 
approach can be simplified. 

Pinpoint 

To pinpoint the relevant part of the code, your code should have a clear 
functional decomposition. That is, there is usually some observable 
effect which the end-user wants to change, and there should be a clear 
procedure or algorithm which the maintainer may follow to locate the 
correct part of the code. 

For single-directory projects, it may be sufficient to have the user 
grep for the files with the appropriate keywords or fixed strings from the 
output of the program. However, this becomes somewhat difficult as the size 
of the system increases. 

A typical example may be to read the README to determine the overall structure 
of the directory hierarchy, including the documentation area. The next step 
may be for the maintainer and extender to read the high-level design 
documentation to familiarize themselves with the topmost level of design 
objects you've chosen to model your application domain. The final step might 
be for them to change into a subdirectory of the source code area named 
after that top-level design object. 

I believe that seperation of documentation from the source code is a 
fundamentally bad idea for the following reasons. First, it makes it 
difficult to divide the source code along functional boundaries (for example, 
making part of the source code into a library) and still retain the supporting 
documentation. It also breaks what library scientists call the 'rule of 
application', meaning that you catalog documentation about foo with foo, 
not under documentation. File extensions are a convention that serves 
nicely to seperate the format or types of the files. 

Change 

The code itself should be well-structured and documented. 
Certainly since most search facilities work over the entire file, the need 
for hierarchical structuring is less important inside a file than for 
multiple levels of subdirectories. Good (clear) coding practices have been 
expanded upon elsewhere, and I won't rehash second-hand material. 

Test 

Flexibility 

Perhaps least-understood is structuring code for simplicity of testing. 
The most important part of testability is to decouple the subsystem to 
be tested from other subsystems. The facility for doing this must be 
accessible without writing or changing code! That is, a maintainer 
or extender should be able to test this part of the code and then proceed 
directly towards distribution of the modified code, without turning 'on' 
and 'off' debugging statements. Not only does making code changes introduce 
a burden on the maintainer and extender, but it also introduces the 
possibility that they may forget to remove them before commiting the code; 
don't laugh, this has _already happened_ here. Placing this control in 
configuration files is to be eschewed for the same reason as placing this 
control in code (an extra editor session, at minimum, per change, and the 
increased automation difficulty leading to the virtual exclusion of automated 
tests - it is difficult to automate 'editing a file' in a way that is not 
brittle with respect to changes in that file's contents and structure). 

For complete executable files and scripts in Unix, I think the way to do this 
is via command-line switches or (less desirably) environment variables. 
Command line switches are preferred because environment variables are usually 
set on a semi-permanent basis (e.g. in dot-files or on the command line as a 
per-session basis) whereas command-line arguments are generally understood 
to pertain to a single command. A maintainer or extender will likely wish 
to switch back and forth between 'standard' and 'debugging' modes, thereby 
suggesting command-line arguments (to include 'switches' and 'options'). 
There are ways to set environment variables for a single command, but these 
are somewhat less convenient than command-line arguments. 

Often the difference between a 'black box' piece of code and a 'white box' 
code fragment designed for testability is relatively small and easy to write. 
Frequently, in fact, it is a smaller change and more robust than writing 
a test using some kind of interactive interface, and is almost always faster 
to write and easier to maintain (i.e. changes in the UI don't break your 
tests). 

Here's a justification quoting from 
<URL:http://www.xprogramming.com/testfram.htm>: 

	I don't like user interface-based tests. In my experience, tests 
	based on user interface scripts are too brittle to be useful. When 
	I was on a project where we used user interface testing, it was 
	common to arrive in the morning to a test report with twenty or 
	thirty failed tests. A quick examination would show that most or 
	all of the failures were actually the program running as expected. 
	Some cosmetic change in the interface had caused the actual output 
	to no longer match the expected output. Our testers spent more time 
	keeping the tests up to date and tracking down false failures and 
	false successes than they did writing new tests. 

Similarly, one can write code which 'wraps' existing code and perhaps 
shadows (overloads) methods (functions, symbols) to make it more 
testable. However, this frequently results in some code duplication 
that could be eliminated by making changes to the source instead. 

Fragility 

Fragile systems are systems which require a great deal of coaxing to run. 
Such systems typically require executables to be in particular places 
(instead of using the system PATH), require setting various environment 
variables, and require several configuration files to be edited and modified 
in order for it to execute. By contrast, a robust system tries to figure 
out the most common running configuration based on what it can discern about 
its environment. 

Test Harness Code 

For subsystems which are not modeled as a complete script or executable, 
it may be handy to have a piece of 'test harness' code which wraps the 
subsystem into a complete executable for testing. Such a piece of code 
must map from a common non-programmatic idiom (such as a standard input 
stream on a Unix system) to the application-domain object used as input 
to the subsystem (for example, an 'event' of some kind). Similar mapping 
should be done for output. This allows the maintainer and extender to use 
common (command-line) tools for testing. For example, the ipfilter project 
has a tool ipftest(1) which maps a Unix stream in one of formats to 
simulated IP packets and tests them against the ruleset loaded into the 
kernel at that time. This enables people to store and manipulate 
input/output combinations as files, and means that we can automate the 
testing into a single command (e.g. 'make test'). Similarly, if you wrote 
a multiple-precision arithmetic class, you might write some test harness 
code which read integers and operations from standard input and wrote its 
results to standard output (a la dc(1)), and then testers could script 
together known-value tests which exercised most of the functionality. 

Drivers and Stubs, Oh My! 

Basically, if you have a piece of code to be tested B, which is used by class 
A and uses class C, then to isolate and test the code you will probably have 
to have a 'driver' to replace A and possibly some 'stubs' to replace C. If 
class C has side-effects (e.g. modifying the file system) you will probably 
want to create stubs. These stubs can sometimes be empty, but in cases 
where the class keeps local state you may have to model it; for example, if 
the class C deals with files and class B manipulates and tests their offsets, 
you will probably have to track and simulate file offsets. 

Direct Internal Access 

Typically your unit tests must have direct access and knowledge of the 
internals of the code under test. For example, suppose you have a state 
machine whose transition tables contain no circuits; then you must 'reset' 
the state machine each time to test multiple paths. If the number of paths 
through the state machine (or through your code) is high and share a large 
number of segments, it may be inefficient (of the code writer's time, or 
of system resources) to traverse the common segments repeatedly in order to 
get to the parts where they differ. Finally, although it is tempting to 
write tests entirely in terms of their externally-viewable attributes, they 
may be insufficient to do so. For example, if your class is a state machine 
which models a vending machine accepting coins for a $0.50 soda, it may only 
have methods such as accept_coin(value) and dispense_product(), the latter 
of which returns true if there is enough money and returns change to the 
customer as a side effect. However, this does not tell us if, after 
inserting two quarters, there is too much credit in the machine. 
XXX This is a weak example. 

Quoting from <URL:http://www.xprogramming.com/testfram.htm> 
	One final bit of philosophy. It is tempting to set up a bunch 
	of test data, then run a bunch of tests, then clean up. In my 
	experience, this always causes more problems that it is worth. 
	Tests end up interacting with one another, and a failure in one 
	test can prevent subsequent tests from running. The testing 
	framework makes it easy to set up a common set of test data, 
	but the data will be created and thrown away for each test. The 
	potential performance problems with this approach shouldn't be a 
	big deal because suites of tests can run unobserved. 

Flexible I/O 

Subsystems which process input from other (sub)systems should have a simple 
mechanism for hooking the input up to another source. Similarly, subsystems 
which produce output meant for other subsystems should have a mechanism to 
redirect it as well. 

Time/Event Simulation 

For those systems which have time dependencies, it is best to eliminate them 
while testing. A common method for this is to feed the subsystem a file of 
events with a simulated time value attached to each event; the system 
then steps through virtual time, processing the requests. This is a 
technique taken from simulations (e.g. civil engineering apps, like 
traffic simulation). To deal with callouts (i.e. delayed callbacks into your 
code), you need a callback heap. When you are done processing the current 
event, you advance the clock you advance to the minimum of the new time 
provided in the input file and the top item on the heap. Repeat this process 
in your driver loop. 

System Tests Aren't Enough 

System tests typically require more setup and more effort to run. System 
tests may require installation of a completed program into a standard 
directory which makes system testing dangerous (in that it can break a 
working system). System tests also are dependent on the proper functioning 
of the entire system, which means that if developer A breaks his or her code 
and developer B checks out that code then developer B will be unable to test 
his or her code. Also, a change in one subsystem may affect the one under 
test in a way that is difficult to discern immediately. Error reporting may 
not be present or as precise at the system level. Control of the system 
(simple startup/takedown, needed for automated testing) is typically more 
complicated. Furthermore, we have the direct internal state problem 
(described above) as well as UI issues (described above under Flexibility). 

Interactive Tests Are a Waste of Your Time 

Interactive tests, ones which require human interaction, require a fixed 
amount of developer time for each test. This is an inefficient use of 
your resources, particularly if you intend to do regression testing 
(testing that recent changes haven't broken earlier changes) as defined here: 
<URL:http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?regression+testing> 
Testing something interactively tests it right then, and never again; 
the same amount of effort could be put into a test which is run automatically 
by anyone who can type 'make test' (that is, a non-expert). 
Since the number of tests you have run increases with the square of the 
number of components in your system, this will quickly get out of control. 
Far better to write them once and run them automatically. There are some 
cases where this may be difficult or impossible, for example when testing 
the user interface. 

By Extension 

Where we expect end-users or other programmers to significantly extend our 
code, it behooves us (the collective 'us') to make such additions simple, 
easy, and free of errors. This is true of all code, to be sure, but extending 
code is normally done in fairly few dimensions (that is, it is usually somewhat 
linear). This means that by investing a little time in providing an 
easy-to-use test framework which will be re-used over and over, future 
extenders need not understand our code completely, and every extension 
helps offset the up-front cost of writing the test framework. 

Test Tap 

In some cases, such as the electronics manufacturing industry, it may be 
difficult to synthetically generate the signals (data) travelling from one 
section of the chip (code) to another. In this case, it is desirable to 
create a 'test tap' to record all the data at a given interface at runtime 
for playback to drive the receiving side of the component. 

Positive and Negative tests 

One common testing methodology for recognizers (that is, tools that trigger on 
or recognize some inputs and not others) is to include both positive and 
negative tests. In both cases, more commonly the negative tests, it may be 
impossible to completely characterize the space, but a few representative 
samples (some close to the border, some far) can generally catch gross errors, 
which is about as good as you're going to get. There's some psychology and 
knowledge of common coding mistakes in the source language that can assist 
you here (e.g. off-by-one errors, strict versus loose inequalities). 

An End to Time 

Obviously it is difficult to run a program that never exits within a test 
framework. One solution is to provide an exit condition that only occurs 
while testing (i.e. 'if (test_flag) break;'). Another way might be to 
count down from a number and exit the Nth time through the loop (again, 
only if relevant). 

Network Testing and Interoperability 

A network program or a program that primarily operates with another program 
via IPC of some kind presents unusual challenges. To simply drive the test, 
one method is to fork a process which plays the 'other half'. To prevent 
problems in implementation from cancelling themselves out, it is desirable 
to share as little code in common between the two cooperating programs. 
In fact, optimally, the other half should run on a system with a different 
byte order and word size (to flush out machine dependencies). In the RFC 
process, a draft is only accepted if another person has created an 
interoperable implementation. 

File Portability 

The most common problem in file portability is unspecified byte ordering 
within a machine word. Typically apps write out an integer without going 
through a macro to convert it to an architecture-neutral form. This applies 
to timestamps and any object which is not an array of bytes. Similarly, 
structure padding and alignment are typical problems. To flush these out, 
it may be necessary to test the byte ordering explicity, i.e. to write out 
a file containing, for example, 0x12345678 and then access it as bytes, 
reading a particular offset and seeing if the stored value is what you 
expect from the specification.