******************************************************************************* README.txt FOR Testing huge-combine.pl Version 0.01 Copyright (C) 2002-2004 Ted Pedersen, tpederse@umn.edu Amruta Purandare pura0010@d.umn.edu University of Minnesota, Duluth http://www.d.umn.edu/~tpederse/nsp.html ******************************************************************************* Testing for huge-combine.pl ------------------------------ AMRUTA PURANDARE pura0010@d.umn.edu 03/03/2004 1. Introduction: ---------------- This program is a component of the N-gram Statistics Package that combines two bigram count files. The scripts and files provided here could be used to test the correct behavior of the program and backward compatibility. 2. Tests: ---------- 2.1 Normal conditions: ---------------------- Tests written in testA*.sh test huge-combine.pl under normal conditions. Run normal-op.sh to run all test cases testA*.sh Test A1: Test A2: Tests on general bigram files Test A3: Tests when the tokens include punctuations Test A4: Tests when the two count files share no bigrams Test A5: Tests when the two count files share all bigrams Test A6: Tests when tokens include embedded spaces and upper case letters Test A7: Tests when the count files are generated with default token scheme from the README file contents 2.2 Error conditions: ---------------------- Tests written in testB*.sh test huge-combine.pl under error conditions. Run error-op.sh to run all test cases testB*.sh Test B1: Tests when a bigram is repeated in the same file Test B2: Tests when a word has different marginals in the same file 3. Conclusions: --------------- We have tested program huge-combine.pl enough to conclude that it runs correctly. We have also provided the test scripts so that future versions of huge-combine.pl can be compared to the current version against these scripts.