This is a master list of comments as used in the example configuration files found in this directory. This is not intended to be used as a configuration file but rather as a plain text summary of possible options and their values. In fact, the measures will not accept this as a configuration file. All of these options have default values that are described below. The only exception to this is vectordb, which has no default. If an option is listed without a value (as in trace:: or cache:: ), then the default value is used. Note that in the configuration files anything following a # is treated as a comment, so the following text can be used directly in a configuration file. You will want to make sure to change the value of an option as fits your needs however! # ---------------------------------------------------------------------- # The following options are supported for all measures trace::0 # Turns off (0) tracing. Turn on tracing by setting # to 1 or 2. The effect of these different levels will # depend on the measure being used. The default value # is off (0). If the value is omitted, then the default # is used. 0, 1, and 2 are the only valid settings. cache::1 # Turns on (1) caching. Turn off caching by setting # to 0. The default is on (1). If the value is omitted, # then the default is used. 0 and 1 are the only valid # settings. maxCacheSize::1000 # Limit the cache size to 1000 pairs of query words. # The default is 5000. If the value is omitted, then # the default is used. The value of this option # must be a non-negative integer or "unlimited" (without # the quotes). # ---------------------------------------------------------------------- # The following option is supported by : # path, lch, wup, res, lin, jcn rootNode::1 # Turns on (1) a (hypothetical) top-level root node for # the nouns, and another for the verbs. Turn off the # root nodes by setting to 0. The default is to use (1) # a unique top-level root node. If the value is omitted, # then the default is used. 0 and 1 are the only valid # settings. # ---------------------------------------------------------------------- # The following option is supported by : # res, lin, jcn infocontent::lib/WordNet/infocontent.dat # Specifies an information content file. The value of # this option must be the name of a file, or a relative # or absolute path name. The default value of this option # is $INSTALLDIR/WordNet/ic-semcor.dat, where $INSTALLDIR # is the directory in which the WordNet::Similarity modules # are installed. # ---------------------------------------------------------------------- # The following options are supported by vector and lesk stem::1 # Turns on (1) stemming. Turn off stemming by setting # this value to 0. The default value is on (1). If the # value is omitted, then the default is used. When # stemming is on (1), all the words in a gloss are stemmed # by the WordNet stemmer before overlaps are identified. stop::samples/stoplist.txt # Specifies the name of a stop list, which consists of # words that are to be ignored in a gloss overlap. The # value of this must be a file name, or an absolute or # relative path name. The default is to not use a stop # list. If the value is omitted, then the default is used. # ---------------------------------------------------------------------- # The following options are supported by the lesk measure relation::samples/lesk-relation.dat # Specifies a lesk relation file. This value can be a file # name, or an absolute or relative path name. The default # is to use the file $INSTALLDIR/WordNet/lesk-relation.dat, # where $INSTALLDIR is the directory in which the # WordNet::Similarity modules are installed. If the value is # ommited, then the default is used. Please note that the # format of the lesk relation file is not the same as # that of the vector relation file. The lesk relation file # consists of relation pairs that specify glosses that # are to be compared for overlaps. normalize::1 # Turns on (1) normalization of lesk scoring. Turn off # by setting this value to 0. The default value is off # (0). If the value is omitted, then the default is used. # When normalization is enabled, the gloss overlap score # is normalized by the size of the glosses. The details # are described in Banerjee and Pedersen (2002). # ---------------------------------------------------------------------- # The following options are supported by the vector measure vectordb::lib/WordNet/wordvectors.dat # Specifies a database file containing word vectors. # The value of this option must be a file name, or an # absolute or relative path name. utils/wordVectors.pl # must be used to generate this file. This option is # required, and there is no default value. If the # option is not specified, or if the option is specified # without a value, the vector measure will fail. relation::samples/vector-relation.dat # Specifies a vector relation file. This value can be a file # name, or an absolute or relative path name. The default # is to use the glos-example relation. If the value is # ommited, then the default is used. Please note that the # format of the vector relation file is not the same as # that of the lesk relation file. The vector relation file # consists of single relations that specify which glossess # of a word will be used in constructing the gloss vector. compounds::samples/wn30compounds.txt # Specifies a file of WordNet compounds. The value of # this option must be a file name, or an absolute or # relative path. The program utils /compounds.pl can # be used to generate this file. When compounds are # specified, compound words that occur in glosses are # identified prior to creating word vectors. The default # is to ignore compound words. If the value of this # option is omitted, then the default is used. # ---------------------------------------------------------------------- # The following option is supported by the random measure maxrand::1 # The random measure will generate measures between 0 # and this value. The value of this option may be an # integer or a real number. The default value is 1. # If the value of this option is omitted, then the # default is used. # ----------------------------------------------------------------------