Designing for Maintainability and Extensibility There are a few principles by which one can design and implement code to be maintainable and extensible, which is tantamount to saying that it can be easily modified, tested, and understood. He has your giant program. He has no time to read it all, much less understand it. He wants to rapidly find the place to make his change, make it and get out and have no unexpected side effects from the change. 'How to Write Unmaintainable Code', Use Cases The maintainer and extender of your software have a similarity in approach that can be summarized as 'pinpoint, change, test'. Each stage of this approach can be simplified. Pinpoint To pinpoint the relevant part of the code, your code should have a clear functional decomposition. That is, there is usually some observable effect which the end-user wants to change, and there should be a clear procedure or algorithm which the maintainer may follow to locate the correct part of the code. For single-directory projects, it may be sufficient to have the user grep for the files with the appropriate keywords or fixed strings from the output of the program. However, this becomes somewhat difficult as the size of the system increases. A typical example may be to read the README to determine the overall structure of the directory hierarchy, including the documentation area. The next step may be for the maintainer and extender to read the high-level design documentation to familiarize themselves with the topmost level of design objects you've chosen to model your application domain. The final step might be for them to change into a subdirectory of the source code area named after that top-level design object. I believe that seperation of documentation from the source code is a fundamentally bad idea for the following reasons. First, it makes it difficult to divide the source code along functional boundaries (for example, making part of the source code into a library) and still retain the supporting documentation. It also breaks what library scientists call the 'rule of application', meaning that you catalog documentation about foo with foo, not under documentation. File extensions are a convention that serves nicely to seperate the format or types of the files. Change The code itself should be well-structured and documented. Certainly since most search facilities work over the entire file, the need for hierarchical structuring is less important inside a file than for multiple levels of subdirectories. Good (clear) coding practices have been expanded upon elsewhere, and I won't rehash second-hand material. Test Flexibility Perhaps least-understood is structuring code for simplicity of testing. The most important part of testability is to decouple the subsystem to be tested from other subsystems. The facility for doing this must be accessible without writing or changing code! That is, a maintainer or extender should be able to test this part of the code and then proceed directly towards distribution of the modified code, without turning 'on' and 'off' debugging statements. Not only does making code changes introduce a burden on the maintainer and extender, but it also introduces the possibility that they may forget to remove them before commiting the code; don't laugh, this has _already happened_ here. Placing this control in configuration files is to be eschewed for the same reason as placing this control in code (an extra editor session, at minimum, per change, and the increased automation difficulty leading to the virtual exclusion of automated tests - it is difficult to automate 'editing a file' in a way that is not brittle with respect to changes in that file's contents and structure). For complete executable files and scripts in Unix, I think the way to do this is via command-line switches or (less desirably) environment variables. Command line switches are preferred because environment variables are usually set on a semi-permanent basis (e.g. in dot-files or on the command line as a per-session basis) whereas command-line arguments are generally understood to pertain to a single command. A maintainer or extender will likely wish to switch back and forth between 'standard' and 'debugging' modes, thereby suggesting command-line arguments (to include 'switches' and 'options'). There are ways to set environment variables for a single command, but these are somewhat less convenient than command-line arguments. Often the difference between a 'black box' piece of code and a 'white box' code fragment designed for testability is relatively small and easy to write. Frequently, in fact, it is a smaller change and more robust than writing a test using some kind of interactive interface, and is almost always faster to write and easier to maintain (i.e. changes in the UI don't break your tests). Here's a justification quoting from : I don't like user interface-based tests. In my experience, tests based on user interface scripts are too brittle to be useful. When I was on a project where we used user interface testing, it was common to arrive in the morning to a test report with twenty or thirty failed tests. A quick examination would show that most or all of the failures were actually the program running as expected. Some cosmetic change in the interface had caused the actual output to no longer match the expected output. Our testers spent more time keeping the tests up to date and tracking down false failures and false successes than they did writing new tests. Similarly, one can write code which 'wraps' existing code and perhaps shadows (overloads) methods (functions, symbols) to make it more testable. However, this frequently results in some code duplication that could be eliminated by making changes to the source instead. Fragility Fragile systems are systems which require a great deal of coaxing to run. Such systems typically require executables to be in particular places (instead of using the system PATH), require setting various environment variables, and require several configuration files to be edited and modified in order for it to execute. By contrast, a robust system tries to figure out the most common running configuration based on what it can discern about its environment. Test Harness Code For subsystems which are not modeled as a complete script or executable, it may be handy to have a piece of 'test harness' code which wraps the subsystem into a complete executable for testing. Such a piece of code must map from a common non-programmatic idiom (such as a standard input stream on a Unix system) to the application-domain object used as input to the subsystem (for example, an 'event' of some kind). Similar mapping should be done for output. This allows the maintainer and extender to use common (command-line) tools for testing. For example, the ipfilter project has a tool ipftest(1) which maps a Unix stream in one of formats to simulated IP packets and tests them against the ruleset loaded into the kernel at that time. This enables people to store and manipulate input/output combinations as files, and means that we can automate the testing into a single command (e.g. 'make test'). Similarly, if you wrote a multiple-precision arithmetic class, you might write some test harness code which read integers and operations from standard input and wrote its results to standard output (a la dc(1)), and then testers could script together known-value tests which exercised most of the functionality. Drivers and Stubs, Oh My! Basically, if you have a piece of code to be tested B, which is used by class A and uses class C, then to isolate and test the code you will probably have to have a 'driver' to replace A and possibly some 'stubs' to replace C. If class C has side-effects (e.g. modifying the file system) you will probably want to create stubs. These stubs can sometimes be empty, but in cases where the class keeps local state you may have to model it; for example, if the class C deals with files and class B manipulates and tests their offsets, you will probably have to track and simulate file offsets. Direct Internal Access Typically your unit tests must have direct access and knowledge of the internals of the code under test. For example, suppose you have a state machine whose transition tables contain no circuits; then you must 'reset' the state machine each time to test multiple paths. If the number of paths through the state machine (or through your code) is high and share a large number of segments, it may be inefficient (of the code writer's time, or of system resources) to traverse the common segments repeatedly in order to get to the parts where they differ. Finally, although it is tempting to write tests entirely in terms of their externally-viewable attributes, they may be insufficient to do so. For example, if your class is a state machine which models a vending machine accepting coins for a $0.50 soda, it may only have methods such as accept_coin(value) and dispense_product(), the latter of which returns true if there is enough money and returns change to the customer as a side effect. However, this does not tell us if, after inserting two quarters, there is too much credit in the machine. XXX This is a weak example. Quoting from One final bit of philosophy. It is tempting to set up a bunch of test data, then run a bunch of tests, then clean up. In my experience, this always causes more problems that it is worth. Tests end up interacting with one another, and a failure in one test can prevent subsequent tests from running. The testing framework makes it easy to set up a common set of test data, but the data will be created and thrown away for each test. The potential performance problems with this approach shouldn't be a big deal because suites of tests can run unobserved. Flexible I/O Subsystems which process input from other (sub)systems should have a simple mechanism for hooking the input up to another source. Similarly, subsystems which produce output meant for other subsystems should have a mechanism to redirect it as well. Time/Event Simulation For those systems which have time dependencies, it is best to eliminate them while testing. A common method for this is to feed the subsystem a file of events with a simulated time value attached to each event; the system then steps through virtual time, processing the requests. This is a technique taken from simulations (e.g. civil engineering apps, like traffic simulation). To deal with callouts (i.e. delayed callbacks into your code), you need a callback heap. When you are done processing the current event, you advance the clock you advance to the minimum of the new time provided in the input file and the top item on the heap. Repeat this process in your driver loop. System Tests Aren't Enough System tests typically require more setup and more effort to run. System tests may require installation of a completed program into a standard directory which makes system testing dangerous (in that it can break a working system). System tests also are dependent on the proper functioning of the entire system, which means that if developer A breaks his or her code and developer B checks out that code then developer B will be unable to test his or her code. Also, a change in one subsystem may affect the one under test in a way that is difficult to discern immediately. Error reporting may not be present or as precise at the system level. Control of the system (simple startup/takedown, needed for automated testing) is typically more complicated. Furthermore, we have the direct internal state problem (described above) as well as UI issues (described above under Flexibility). Interactive Tests Are a Waste of Your Time Interactive tests, ones which require human interaction, require a fixed amount of developer time for each test. This is an inefficient use of your resources, particularly if you intend to do regression testing (testing that recent changes haven't broken earlier changes) as defined here: Testing something interactively tests it right then, and never again; the same amount of effort could be put into a test which is run automatically by anyone who can type 'make test' (that is, a non-expert). Since the number of tests you have run increases with the square of the number of components in your system, this will quickly get out of control. Far better to write them once and run them automatically. There are some cases where this may be difficult or impossible, for example when testing the user interface. By Extension Where we expect end-users or other programmers to significantly extend our code, it behooves us (the collective 'us') to make such additions simple, easy, and free of errors. This is true of all code, to be sure, but extending code is normally done in fairly few dimensions (that is, it is usually somewhat linear). This means that by investing a little time in providing an easy-to-use test framework which will be re-used over and over, future extenders need not understand our code completely, and every extension helps offset the up-front cost of writing the test framework. Test Tap In some cases, such as the electronics manufacturing industry, it may be difficult to synthetically generate the signals (data) travelling from one section of the chip (code) to another. In this case, it is desirable to create a 'test tap' to record all the data at a given interface at runtime for playback to drive the receiving side of the component. Positive and Negative tests One common testing methodology for recognizers (that is, tools that trigger on or recognize some inputs and not others) is to include both positive and negative tests. In both cases, more commonly the negative tests, it may be impossible to completely characterize the space, but a few representative samples (some close to the border, some far) can generally catch gross errors, which is about as good as you're going to get. There's some psychology and knowledge of common coding mistakes in the source language that can assist you here (e.g. off-by-one errors, strict versus loose inequalities). An End to Time Obviously it is difficult to run a program that never exits within a test framework. One solution is to provide an exit condition that only occurs while testing (i.e. 'if (test_flag) break;'). Another way might be to count down from a number and exit the Nth time through the loop (again, only if relevant). Network Testing and Interoperability A network program or a program that primarily operates with another program via IPC of some kind presents unusual challenges. To simply drive the test, one method is to fork a process which plays the 'other half'. To prevent problems in implementation from cancelling themselves out, it is desirable to share as little code in common between the two cooperating programs. In fact, optimally, the other half should run on a system with a different byte order and word size (to flush out machine dependencies). In the RFC process, a draft is only accepted if another person has created an interoperable implementation. File Portability The most common problem in file portability is unspecified byte ordering within a machine word. Typically apps write out an integer without going through a macro to convert it to an architecture-neutral form. This applies to timestamps and any object which is not an array of bytes. Similarly, structure padding and alignment are typical problems. To flush these out, it may be necessary to test the byte ordering explicity, i.e. to write out a file containing, for example, 0x12345678 and then access it as bytes, reading a particular offset and seeing if the stored value is what you expect from the specification.