Greg Sabo

How Much Testing is Enough?

Testing is definitely not the only way to improve the quality of software, but because it yields clear artifacts, it is the most quantifiable. This measurability has made software testing very popular in business environments. The formula seems simple enough: More Tests yield Better Software.

However, unless tests can be written which enumerate every possible combination of inputs, it is impossible to use tests to prove that a project is bug-free. This uncertainty becomes very important when a developer must choose when to stop testing. How much is enough?

Providing a measurement of test development progress helps change testing from a perfectionist fantasy into part of an everyday, deadline-meeting software development workflow.

Code Coverage Percentage

Code coverage tools trace the execution of a program and indicate the percentage of the codebase touched by the tests. They can effectively direct test development towards blocks of code which have been completely neglected. However, simply because a line of code is ‘touched’ by a test doesn’t mean that it is bug-free.

For example, consider the following code example in Python:

def validate_dict(in_dict):
    """
    recursively check that in_dict contains values
    which are only strings or dicts.
    returns False if validation fails.
    """
    for key, value in in_dict.iteritems():
        if key == 'userName':
            # the following line is not covered by tests
            logger.warning("Noticed depreciated key userName.")

        if not (isinstance(value, basestring) or
                isinstance(value, dict)):
            return False

        if isinstance(value, dict):
            validate_dict(value)
    # no false was returned, we're good

Focusing on code coverage tools will guide you towards writing a test for the case where the input dict includes a ‘userName’ key. This is time that would probably be better spent discovering the bug on line 17. Code coverage tools can be used to locate untested code, but systemically enforcing target coverage percentages is likely to draw attention away from the code that needs to be tested the most.

Use Cases

Many bugs are irrelevant - they are not triggered during normal user operation. Developers can focus on relevant bugs by having clearly-defined use cases, perhaps written by a product manager. Once the software correctly fulfils each of the use cases, it can be considered “free of relevant bugs.” For example, bugs caused by running the software on unsupported platforms can easily be considered out-of-scope.

This approach is used by Test-Driven Development, which encourages developers to write tests before their target code. Use cases fall short, however, if the organization writes strictly customer-oriented use cases instead of including internal stories. “As a system administrator, I want the tornado server to use less than 500 GB of RAM,” for example.

A Test for Each Type

Unit, Integration, Functional, Fuzz, and Performance tests each point out different kinds of software deficiencies, and having a small number of tests from each category trumps comprehensive coverage within any one of them. Similarly, one could adopt the following categories:

Codifying these categories and identifying tests with their respective category makes it more obvious when corners are being cut, and progress can be measured with software tools.