DCDFLIB Library of C Routines for Cumulative Distribution Functions, Inverses, and Other Parameters Version 1.1 (November, 1997) Summary Documentation of Each Routine Compiled and Written by: Barry W. Brown James Lovato Kathy Russell Department of Biomathematics, Box 237 The University of Texas, M.D. Anderson Cancer Center 1515 Holcombe Boulevard Houston, TX 77030 This work was supported by grant CA-16672 from the National Cancer Institute. SUMMARY OF DCDFLIB This library contains routines to compute cumulative distribution functions, inverses, and parameters of the distribution for the following set of statistical distributions: (1) Beta (2) Binomial (3) Chi-square (4) Noncentral Chi-square (5) F (6) Noncentral F (7) Gamma (8) Negative Binomial (9) Normal (10) Poisson (11) Student's t (12) Noncentral t Given values of all but one parameter of a distribution, the other is computed. These calculations are done with C pointers to Doubles. -------------------- WARNINGS -------------------- The F and Noncentral F distribution are not necessarily monotone in either degree of freedom argument. Consequently, there may be two degree of freedom arguments that satisfy the specified condition. An arbitrary one of these will be found by the cdf routines. The amount of computation required for the noncentral chisquare and noncentral F distribution is proportional to the value of the noncentrality parameter. Very large values of this parameter can require immense numbers of computation. Consequently, when the noncentrality parameter is to be calculated, the upper limit searched is 10,000. For the noncentral t, the computation time is proportional to the noncentrality parameter so the upper limit searched is 10000. -------------------- END WARNINGS -------------------- COMMENTS ON THE C VERSION OF DCDFLIB The C version was obtained by converting the original Fortran DCDFLIB to C using PROMULA.FORTRAN and performing some hand crafting of the result. Information on PROMULA.FORTRAN can be obtained from PROMULA Development Corporation 3620 N. High Street, Suite 301 Columbus, Ohio 43214 (614) 263-5454 DCDFLIB.C was tested using the xlc compiler under AIX 3.1 on an IBM RS/6000. The code was also examined with lint on the same system. DCDFLIB was also successfully tested run using the gcc compiler (see below) on a Solbourne. DCDFLIB.C can be obtained by anonymous ftp to odin.mda.uth.tmc.edu (129.106.3.17) where it is available as /pub/unix/dcdflib.c.tar.Z The Fortran version of DCDFLIB is available as /pub/unix/dcdflib.f.tar.Z on the same machine. ^L CAVEAT DCDFLIB.C is written in ANSI C and makes heavy use of prototypes. It will not compile under old style (KR) C compilers (such as the default Sun cc compiler). I don't recommend conversion to an obsolete C dialect. Instead, get the Free Software Foundation's excellent ANSI C compiler, gcc. It compiles KR C as well as ANSI C. A version of gcc that runs on many varieties of Unix is available by anonymous ftp as /pub/gnu/gcc-1.40.tar.Z at prep.ai.mit.edu (18.71.0.38). A Vax version is also present on /pub/gnu. The compilers are also available on tape. Write the Free Software Foundation at: Free Software Foundation, Inc. 675 Massachusetts Avenue Cambridge, MA 02139 Phone: (617) 876-3296 A MSDOS port of gcc, performed by DJ Delorie is also available by ftp. File location: host: grape.ecs.clarkson.edu login: ftp password: send your e-mail address directory: ~ftp/pub/msdos/djgcc File in .ZIP format - djgpp.zip - one 2.2M file, contains everything. A version of DCDFLIB which compiles under old style C can be obtained by anonymous ftp to odin.mda.uth.tmc.edu (129.106.3.17) where it is available as /pub/unix/dcdflib.kr.c.tar.Z DOCUMENTATION This file contains an overview of the library and is the primary documentation. Other documentation is in directory 'doc' on the distribution as character (ASCII) files. A summary of all of the available routines is contained in dcdflib.chs (chs is an abbreviation of 'cheat sheet'). The 'chs' file will probably be the primary reference. The file, dcdflib.fdoc, contains the comments for each routine intended for direct use. The file, dcdflib.h, contains prototypes for each routine intended for direct use. INSTALLATION Directory src contains the C source. The files ipmpar.c and dcdflib.c constitute DCDFLIB. The file cdflib.h is included in dcdflib.c. A few routines use machine dependent constants. Lists of such constants for different machines are found in ipmpar.c. Uncomment the ones appropriate to your machine. The distributed version uses the IEEE arithmetic that is used by the IBM PC, Macintosh, and most Unix workstations. If you need to change the distribution version you must comment out the definitions for IEEE arithmetic as well as uncomment the ones appropriate to your machine. NOTE: dcdflib should be linked to the C math library. NOTE: Ignore compiler warnings of the type "statement not reached". SOURCES The following routines, written by others, are incorporated into DCDFLIB. Beta Distribution DiDinato, A. R. and Morris, A. H. Algorithm 708: Significant Digit Computation of the Incomplete Beta Function Ratios. ACM Trans. Math. Softw. 18 (1993), 360-373. Gamma Distribution and It's Inverse DiDinato, A. R. and Morris, A. H. Computation of the Incomplete Gamma Function Ratios and their Inverse. ACM Trans. Math. Softw. 12 (1986), 377-393. Normal Distribution Kennedy and Gentle, Statistical Computing, Marcel Dekker, NY, 1980. The rational function approximations from pages 90-95 are used during the calculation of the inverse normal. Cody, W.D. (1993). "ALGORITHM 715: SPECFUN - A Portabel FORTRAN Package of Special Function Routines and Test Drivers", acm Transactions on Mathematical Software. 19, 22-32. A slightly modified version of Cody's function anorm is used for the cumultive normal. Zero Finder J. C. P. Bus and T. J. Dekker. Two Efficient Algorithms with Guaranteed Convergence for Finding a Zero of a Function. ACM Trans. Math. Softw. 4 (1975), 330. We transliterated Algoritm R of this paper from Algol to Fortran. General Reference Abramowitz, M. and Stegun, I. A. Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables. (1964) National Bureau of Standards. This book has been reprinted by Dover and others. LEGALITIES Code that appeared in an ACM publication is subject to their algorithms policy: Submittal of an algorithm for publication in one of the ACM Transactions implies that unrestricted use of the algorithm within a computer is permissible. General permission to copy and distribute the algorithm without fee is granted provided that the copies are not made or distributed for direct commercial advantage. The ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. Krogh, F. Algorithms Policy. ACM Tran. Math. Softw. 13(1987), 183-186. We place the DCDFLIB code that we have written in the public domain. NO WARRANTY WE PROVIDE ABSOLUTELY NO WARRANTY OF ANY KIND EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THIS PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. IN NO EVENT SHALL THE UNIVERSITY OF TEXAS OR ANY OF ITS COMPONENT INSTITUTIONS INCLUDING M. D. ANDERSON HOSPITAL BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA OR ITS ANALYSIS BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES) THE PROGRAM. (Above NO WARRANTY modified from the GNU NO WARRANTY statement.) HOW TO USE THE ROUTINES The calling sequence for each routine is of the form: void cdf(int *which,double *p,double *q,double *x, double *,int *status,double *bound) WHICH and STATUS are pointers to int , all other arguments are pointers to double. is a one to three character name identifying the distribution. which is an input integer value that identifies what parameter value is to be calculated from the values of the other parameters. P is always the cdf evaluated at X, Q is always the compliment of the cdf evaluated at X, i.e. 1-P, and X is always the value at which the cdf is evaluated. The auxiliary parameters, , of the distribution differ by distribution. If WHICH is 1, P and Q are to be calculated, i.e., the cdf; if WHICH is 2, X is to be calculated, i.e., the inverse cdf. The value of one auxiliary parameter in can also be the value calculated. STATUS returns 0 if the calculation completes correctly. --------------------WARNING-------------------- If STATUS is not 0, no meaningful answer is returned. -------------------- END WARNING -------------------- STATUS returns -I if the I'th input parameter was not in the legal range (see below). Parameters are counted with which being the first in these return values. A STATUS value of 1 indicates that the desired answer was apparently lower than the lower bound on the search interval. A return code of 2 indicates that the answer was apparently higher than the upper bound on the search interval. A return code of 3 indicates that P and Q did not sum to 1. Other positive codes are routine specific. BOUND is not set if status is returned as 0. If STATUS is -I then BOUND is the bound illegally exceeded by input parameter I, where WHICH is counted as 1, P as 2, Q as 3, X as 4, etc. If STATUS is returned as 1 or 2 then bound is returned as the lower or upper bound on the search interval respectively. BOUNDS Below are the rules that we used in determining bounds on quantities to be calculated. Those who don't care can find a summary of the bounds in dcdflib.chs. Input bounds are checked for legality of input. The search range is the range of values searched for an answer. Input Bounds Bounds on input parameters are checked by the cdf* routines. These bounds were set according to the following rules. P: If the domain of the cdf (X) extends to -infinity then P must be greater than 0 otherwise P must be greater than or equal to 0. P must always be less than or equal to 1. Q: If the domain of the cdf (X) extends to +infinity then Q must be greater than 0 otherwise Q must be greater than or equal to 0. Q must always be less than or equal to 1. Further, P and Q must sum to 1. The smaller of the two P and Q will be used in calculations to increase accuracy X: If the domain is infinite in either the positive or negative direction, no check is performed in that direction. If the left end of the domain is 0, then X is checked to assure non-negativity. DF, SD, etc.: Some auxiliary parameters must be positive. The lowest input values accepted for these parameters is 1E-100. Search Bounds These are the ranges searched for an answer. If the domain of the parameter in the cdf is closed at some finite value, e.g., 0, then this value is the same endpoint of the search range. If the domain is open at some finite endpoint (which only occurs for 0 -- some parameters must be strictly positive) then the endpoint is 1E-100. If the domain is infinite in either direction then +/- 1E100 is used as the endpoint of the search range. HOW THE ROUTINES WORK The cumulative distribution functions are computed directly. The normal, gamma, and beta functions use the code from the references cited. Other cdfs are calculated by relating them to one of these distributions. For example, the binomial and negative binomial cdfs can be converted to a beta cdf. This is how fractional observations are handled. The formula from Abramowitz and Stegun for converting the cdfs is cited in the fdoc file. (We think the formula for the negative binomial in A&S is wrong, but there is a correct one which we used.) The inverse normal and gamma are also taken from the references. For all other parameters, a search is made for the value that provides the desired P. Initial values are chosen crudely for the search (e.g., 5). If the domain of the cdf for the parameter being calculated is infinite, a step doubling strategy is used to bound the desired value then the zero finder is employed to refine the answer. The zero finder attempts to obtain the answer accurately to about eight decimal places.