Jeff Miller
Department of Psychology
University of Otago
Dunedin, New Zealand

RegGen: A Program for \ Generating Regression Data Sets \[2ex] Version 1.2

RegGen: A Program for
Generating Regression Data Sets
Version 1.2

November, 2004



Copyright 1998, 2004, Jeff Miller.
DISCLAIMER: THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY FROM THE USE OF THIS SOFTWARE.
This program and documentation may be duplicated and used without charge for any educational or noncommercial purposes. If you do use this program, I would appreciate it if you would send me an acknowledgement email or letter saying so. I would also welcome bug reports and suggestions for improvement, although I can't promise fast action on those.
For commercial use, please contact the author.
Here are my contact details:
Prof Jeff Miller
Department of Psychology
Univ of Otago
Dunedin, New Zealand
email address: miller@psy.otago.ac.nz

Contents

1  Introduction
2  Installation
3  Running RegGen Interactively
4  Running RegGen With A Parameter File
5  Getting Output from RegGen
6  Fully Batch Operation

1  Introduction

This program was designed for use in teaching the statistical procedure known as Regression Analysis . It generates data sets for use as examples or practice problems.
The user specifies the number of cases (i.e., sample size), the number of variables per case, the mean and standard deviation of each variable, and the matrix of correlations between variables. The program then generates set of data satisfying these conditions exactly (up to some rounding error). The data are then written to a file for subsequent analysis by a statistical package.
A critical feature of RegGen is that the generated data satisfy the specified conditions almost exactly. For example, if you specify that a certain variable should have a mean of 100 and an SD of 10, the sample will have exactly that mean and SD, up to some small rounding error associated with the number of decimal places in the output. Thus, you specify the sample characteristics directly rather than specifying the underlying population values from which random samples are taken.

Back to table of contents

2  Installation

Copy the program RegGen.exe to any directory in your path.

Back to table of contents

3  Running RegGen Interactively

To start RegGen for an interactive run, simply type "reggen" at a command prompt.
You will see a brief warning message suggesting that it is better to use a parameter file. Ignore that message for now. The reason for it is explained in the next section.
The program will now ask you to specify the parameters of the desired data set, as in the following example.
Enter the desired number of cases : 20
Enter the desired number of variables : 2
Enter the desired correlation of var 1 with var 2 : 0.3
Variable 1: Desired mean : 100
Variable 1: Desired std dev : 10
Variable 1: Desired number of decimal places : 0
Variable 2: Desired mean : 100
Variable 2: Desired std dev : 10
Variable 2: Desired number of decimal places : 0

The user types in only the numbers at the far right of each line. I hope that most of this is self-explanatory. The only point that may deserve comment is the "number of decimal places." This option allows the user to specify whether the generated data values are to be whole numbers (0 decimal places), or to contain fractional parts measured to 1, 2, 3, etc decimal places.
After you have specified the parameters, RegGen will try to generate the data set. It will fail if the requested correlation matrix is impossible (as discussed further in the next section). In that case it halts with this message:
Requested correlation matrix is impossible.

If RegGen succeeds in generating the data set, it next allows you to display and write out the data set is has generated, as described in section 5.

Back to table of contents

4  Running RegGen With A Parameter File

The parameters of the data set can be specified in an ASCII file instead of interactively. This is useful when you want to create several data sets that differ in only a few parameters: For example, you might want to generate several 5-variable data sets varying only the correlation of variables 3 and 4. It can also be useful even if you only want one data set: Sometimes, it is difficult to be sure whether the correlation matrix you want is possible or impossible. If you specify an impossible data set interactively, you have to start over from the beginning. If your parameters are specified in a file, you can just change one or two correlations in the file and rerun reggen.
The file RegGen.Smp is a sample parameters file. The comments to the right of each line are optional, but are included to enable you to figure out the format. The file can be created with any ASCII editor. Be sure to use spaces rather than tabs between numbers.
To use a parameters file, invoke RegGen with a command line parameter giving the name of the parameters file, like this:
C:> reggen reggen.smp

There is an additional option which allows you to specify the names of the variables for an output file in the MTab format, mentioned below. Within the input parameter file, you are allowed to put the name of each variable at the beginning of the line on which you specify its mean, sd, and number of decimal places, like this:
 
Height 100 10 0  { mean, sd, & number of decimal places for var 1 }
Weight 500 10 0  { mean, sd, & number of decimal places for var 2 }
Time   100 20 0  { mean, sd, & number of decimal places for var 3 }
Age      0  1 3  { mean, sd, & number of decimal places for var 4 }

Variable names can contain any nonblank characters, and they are limited to 20 characters.

Back to table of contents

5  Getting Output from RegGen

After RegGen has generated the desired data set, it will produce a display more or less like this:
RegGen: Program to generate regression data sets  Version 1.1:  March 1998
@Copyright 1998..2000    Jeff Miller              Dept of Psychology
                         Univ of Otago            Dunedin, New Zealand
N of cases = 10.  N of variables = 4

ACTION MENU:
 S = list data to Screen
 F = list data to File
 M = write Mtab data file
 N = make New data

 action or Quit :

To select one of these options, simply type the single capitalized letter. Here are descriptions of the individual options:
S
List the data to the screen so that you can look at it.
F
List the data to an output file, in just the same format as to the screen.
M
Create an output file in the proper format to be read by the MTab program, a free DOS program for doing regression analysis and descriptive statistics.
N
Generate a new data set. The parameters of the data set will be the same as the current one, you just get another example of a data set with these parameters.
Q
Quit to DOS.

Back to table of contents

6  Fully Batch Operation

It is also possible to run RegGen in fully batch mode, without operator intervention. This may be desirable, for example, if you want to interface RegGen to a simulation program of some sort. To do this, you must of course first prepare the input parameters file as discussed above. Then, you can invoke RegGen with an additional parameter that is the name of the output file. For example,
C:> reggen reggen.smp reggen.out

would automatically write an output file called reggen.out without any user input at all. Several optional switches are available when you run RegGen. These are listed after the output file name in any order, like this:
C:> reggen reggen.smp reggen.out -m

The switches can be any of the following:
-m
Indicates that the output file should have the MTab format.
-g
Generate a random correlation matrix rather than using the one specified in the input file. In this case it does not matter what is on the correlation lines in the input file, but they must still be present. For example, the correct number of blank lines will suffice.
A possibly useful trick: By default, RegGen always asks for user confirmation before writing over an existing file. To avoid that confirmation check, use an exclamation point as the first character of the file name. If you do that, RegGen will (a) delete the exclamation point from the output file name, and (b) write the output to a file with the indicated name, automatically overwriting the file if it already exists.

Back to table of contents




File translated from TEX by TTH, version 3.59.
On 06 Nov 2004, 16:39.