Cygnus - CSK collaboration: Testing Framework Definition

Cygnus Support, January 31, 1996
(Gordon Irlam, gordoni@cygnus.com)


We are required to define a testing framework for Guile as part of the Cygnus-CSK joint development project. This document specifies that testing framework.

Motivation

Testing is intended to help ensure Guile exists as a reliable, quality product.

We need to test that the Guile implementation is consistent with the documentation describing how Guile is intended to behave. In performing testing it is therefore only permitted to make reference to the documentation, and not to the implementation of Guile, to determine what the correct results of a particular test should be, and whether Guile passes or fails that test.

Testing framework

We have decided to use DejaGNU to drive our tests.

DejaGNU provides a simple front end from which to run our tests. DejaGNU assists in the selection of and running of tests, logging and summarizing the results of test runs, and determining and keeping track of test failures.

DejaGNU is built on top of expect and Tcl. expect is a Tcl package developed by Don Libes for controlling interactive applications. Tcl is the Tool Command Language developed by John Ousterhout. Cygnus already has experience with DejaGNU, expect, and Tcl. The author of DejaGNU, Rob Savoy, works for Cygnus.

The current release of DejaGNU appears to provide sufficient functionality for us to be able to use it effectively to test Guile.

The testsuite will be run on each of the supported platforms:

    SGI    IRIX 5.3
    Sun    Solaris 2.5
    PC     Linux

Within the Guile sources there will be a "testsuite" directory under the top level directory that contains test cases for each of the supported components of the core Guile software.

Test cases

DejaGNU provides a framework for running Guile tests. On top of this we will develop a suite of Guile tests which can then be run by DejaGNU.

The test cases will be generated based both on our knowledge of the Scheme language and our knowledge of the internals of the Guile implementation.

Some of the initial test cases will be able to be derived from Aubrey Jaffer's IEEE/R4RS compliance testsuite available on the net. Depending on the coverage provided by these Scheme language tests, additional Scheme language tests might need to be developed and incorporated into our testsuite.

Custom tests for testing Guile-specific features will also need to be developed. In particular tests are required for the slib and rx libraries. These tests will be derived from the documentation specifying the intended functionality of these libraries.

We plan to develop tests that cover each of the supported software components:

During the continued development of Guile, bugs might be encountered which will be fixed. Test cases that highlight these bugs will be created and added to the Guile testsuite.

Test coverage

We will measure test coverage to work out how good a job the test cases we are developing are doing of testing Guile. The test coverage data we generate will provide us with an indication of what proportion of the Guile sources the test cases are exercising.

There are a range of possible test coverage tools we could use: gprof, gcov, and tcov were considered. Tentatively, it was concluded that initially gprof should be used on Solaris to gather test coverage data which we will then analyze. One of the main advantages to using gprof is we have access to the sources, and we will be able to fix, extend, or improve it should we have the need.

gprof will give us sufficient data to analyze our test case and measure test coverage.

gprof works by making the compiler modify the executable code produced to increment counters each time a routine is invoked. When the program is run these counts are gathered and the results written out to a file. gprof is then run to display these results in a human readable format.

To get gprof working on all supported platforms would require some effort (for instance gprof would possibly need porting to SGI, and we would have to add a bb_exit_func() to each of our libraries). gprof is already known to run on Solaris. Running gprof on just Solaris will be sufficient for us to accurately gauge test coverage. Thus we will not need to port gprof to the other supported platforms.

The GNU version of gprof also includes support for counting the invocation of basic blocks, which can then be mapped backed onto particular lines of source code. Unfortunately this code does not currently work on any of the supported Guile platforms. We may decide to fix gprof so that it does, or use another tool such as tcov if we conclude this information is required.

We will need to develop tools to execute gprof, combine the resulting data, and analyze the results produced. These tools will be created on an ad hoc basis.

One uncertainty regarding gprof is whether it will work reliably in the presence of multiple threads. If this proves to be a problem, we may need to exclude the multi-threaded Guile test cases from our measurements of test coverage. This does not mean that we will not be generating multi-threading test cases, just that we will not be using them in our attempts to measure test coverage.

gprof will be able to provide test coverage information for the basic scheme interpreter. gprof will not, however, be able to provide test coverage information for the gls/slib sources written in Scheme. There is currently no easy way for us to gather such information. To be able to gather such information would require modifications to the Guile core. Right now we do not feel the work involved to do this is warranted:

Automated testing

Cygnus has developed an automated testing system for testing gcc. This system checks out the current gcc sources from the source tree, does a build on a particular platform, runs a series of tests, analyzes the results, sends out email reporting any possible problems, then tries to build gcc on the next platform.

Once the Guile testing framework has been created it should be integrated in with Cygnus's existing automated testing framework to try and identify any regressions.

Performance testing

Cygnus does not currently plan to formally test the performance of Guile.

Some informal performance tests may occur as part of the byte code compiler work.

Test audit

After the test cases have been created, and the test coverage has been measured, we will perform a test audit. The purpose of the test audit is to evaluate the effectiveness of our testing approach.

Some of the issues to be covered by the test audit include:

Enhancements to testing

In conjunction with CSK we will review the test audit and reach agreement on a set of enhancements to testing that need to be performed. Cygnus will then be responsible for implement these enhancements.