combine (4.003) karmic; urgency=low * Added extra space when concatenating extracted text in HTMLExtractor.pm * Bugfix certain special characters in regexp when optimizing meta/title in generated XML * Added extra field, score, in table urldb to support new URL scheduling algoritms * Added support for new URL scheduling algorithms -- Anders Ardo Mon, 15 Jun 2009 10:22:38 +0200 combine (4.002) intrepid; urgency=low * Fixed tests 05MySQL och 90Installationtest * Added support for exceptions to GeoIP, config-file server2country * Now handles special characters when using files for external parsers * Disabled warnings for unknown config variables -- Anders Ardo Mon, 30 Mar 2009 09:09:35 +0200 combine (4.001) intrepid; urgency=low * New version numbering compatible with CPAN * Integration with Solr enterprise search server (http://lucene.apache.org/solr/) similar to Zebra: new module Solr.pm, configuration variable SolrHost, switch SolrIndexing in combineExport -- Anders Ardo Mon, 8 Dec 2008 14:55:21 +0100 combine (3.12) intrepid; urgency=low * Added code for simple Lucene integration to templates directory. Contributed by Xianghang Liu * Changed documentation HTML-generator to ht4tex -- Anders Ardo Sun, 16 Nov 2008 18:19:21 +0100 combine (3.11) intrepid; urgency=low * Added switches 'collapseinlinks' and 'nooutlinks' to combineExport * Moved tmp-files to /tmp/$$ * decoded output from extconverter to Perl internal utf8 * Added -nodrm to pdf2html switches * Fixed bug in processing of pure text documents * Added switch ZebraIndexing to combineExport. Enables updating of the configured Zebra server with exported records * Improved indexing of PDF documents * Handled case when $md5 empty in DeleteKey * Fixed bug in Zebra recordId handling -- Anders Ardo Sun, 09 Nov 2008 17:18:10 +0100 combine (3.10) sarge; urgency=low * Added a fulltext-index in MySQL table search plus configuration var to enable/disable * Added Zebra deleteRecord -- Anders Ardo Sun, 12 Oct 2008 20:56:51 +0200 combine (3.9) sarge; urgency=low * Changed default for useTidy to 0 (Not using Tidy) * Added dependency on libclass-factory-util-perl * Changed test Web-server to combine.it.lth.se * Added Combine/utilPlugIn.pm Combine/classifySVM.pm - support for SVM classifiers (depends SVMLight) plus documentation * Changed *Check_record.pm to use XWI text extraction from utilPlugIn.pm * Added country determination (plus dependency on GeoIp) * Updated extensions of binary files * Added analysePlugin and relTextPlugin to default config * Cleaned FromHTML.pm code * Added call to external relTextPlugin * Added call to Plugin for extra analysis in utilPlugin.pm * Improved title extraction in XWI2XML * Fixed in rules: shell IO redirection * Added table 'localtags' - for storing local name/value pairs to be added by analysePlugin -- Anders Ardo Fri, 26 Sep 2008 15:22:43 +0200 combine (3.8) sarge; urgency=low * Perl ABSTRACT now taken from bin/combine * Disable old behaviour to save html as text if extracted text is very short * PosCheckRecord now only uses meta fields that maps to dc.subject and dc.description to match against topic definition * PosCheckRecord now use also the URLpath for topic checking * Added new script: combineReClassify, that reclassifies all records * Enabled gzip HTTP content compression * Moved Wed, 23 Apr 2008 11:14:31 +0200 combine (3.7) sarge; urgency=low * Fixed bug with required but non-existant modifiedDate * Added fix from Sean Dreilinger to import SaveConfigString in Combine::Config * Fixed tests: make sure modfiedtime is set, and take into account new MD5 calculation * Now requires v 1.06 od HTML::Tidy; newer versions handle UTF8 differently * Applied patch from Sean Dreilinger: make Combine work with newer Config::General (SaveConfigString) * Added doc/Installationtest.pl as new test t/90Installationtest.t -- Anders Ardo Thu, 27 Sep 2007 08:52:10 +0200 combine (3.6) sarge; urgency=low * Bugfixes in DataBase.pm MD5-sum calculation -- Anders Ardo Mon, 26 Mar 2007 10:36:04 +0200 combine (3.5) sarge; urgency=low * Added bin/combineRank - utility for calculating ranking values like PageRank * Changed package numbering convention * Documentation updates -- Anders Ardo Wed, 13 Dec 2006 16:09:43 +0100 combine (3.4-5) sarge; urgency=low * Disabled parsing of really short files * Changed syntax of output redirection: ' >& /dev/null' to '> /dev/null 2> /dev/null' * Added test in 50FromHTML.t to avoid using HTML::Tidy when not available * Fixed bug with improper SQL esacping in MySQLhdb.pm * Fixed bug with expire time giving error-messages '... too small' * Added more tests * Changed combineINIT to disable Tidy in configuration when HTML::Tidy not available * Made ZOOM optional -- Anders Ardo Tue, 5 Dec 2006 12:57:18 +0100 combine (3.4-4) sarge; urgency=low * New switch 'myconf' for combineINIT to append a local configuration file to the standard job-specific configuration * Improved error checking for MySQL connection * Workaround for mysterious MySQL user access bug * Moved AutoRecycleLinks from job_default.cfg to default.cfg -- Anders Ardo Wed, 8 Nov 2006 12:33:15 +0100 combine (3.4-3) sarge; urgency=low * Removed obsolete flag xwi->check_nofollow * Documentation updates * Removing obsolete files: DTDs and ValidateWP7 * Added omit-xml-declaration="yes" to combine2dc.xsl -- Anders Ardo Fri, 3 Nov 2006 10:31:14 +0100 combine (3.4-2) sarge; urgency=low * Added external parsers for Excel, PowerPoint and RTF * Fixed bug that prevented RobotRules cache to be used * Moved setting of agent and from for UserAgent into UA::TruncatingUserAgent -- Anders Ardo Fri, 20 Oct 2006 09:08:08 +0200 combine (3.4-1) sarge; urgency=low * Added possibility to connect directly to a Zebra server -- Anders Ardo Sat, 14 Oct 2006 12:01:55 +0200 combine (3.3-2) sarge; urgency=low * Removed the (empty) combine-doc package. Documentation is included in the package combine. * Restructured XML export: moved basic XML generation from combineExport to XWI2XML.pm. * Implemented possibility to apply XSLT script on exported records * Added standard export XSLT scripts for alvis and dc * Now builds and includes doc in the source dist * Added module (Zebra.pm) and config var for direct connection the Zebra indexer * Fixed bug in UTF-8 handling for exported XML -- Anders Ardo Fri, 6 Oct 2006 12:02:45 +0200 combine (3.3-1) sarge; urgency=low * Implemented plugin modules for document classifications in focused mode * Added template for classify plugin -- Anders Ardo Sun, 1 Oct 2006 11:23:07 +0200 combine (3.2-2) sarge; urgency=low * Fixes and updates in PODs -- Anders Ardo Thu, 27 Sep 2006 15:31:00 +0200 combine (3.2-1) sarge; urgency=low * Fixes to make it work with MySQL 5 * Now suggests pdftohtml untex * Disabled 'word-2000' in tidy.cfg since it conflicts with 'bare' -- Anders Ardo Tue, 22 Aug 2006 10:07:00 +0200 combine (3.1-4) sarge; urgency=low * Added possibility to use HTTP proxy (conf variable httpProxy) (thanks to Joost Zuurbier) * Added conf variable UserAgentFollowRedirects controlling how redirects are handled (thanks to Joost Zuurbier) * Changed combineExport to use Alvis::Pipeline directly * Moved conf-files from libcombine to combine package * Added conffiles file to debian/ * Improved documentation -- Anders Ardo Thu, 15 Jun 2006 09:38:00 +0200 combine (3.1-3) sarge; urgency=low * All binaries now without '-w' switch * Perl testsuite added in directory t * New configuration variable WaitIntervalHost * Some bug fixes * combineExport supports the action attribute to XML element documentRecord -- Anders Ardo Fri, 2 Jun 2006 15:38:00 +0200 combine (3.1-2) sarge; urgency=low * combineExport now supports enhanced ALVIS XML format * new configuration var 'AutoRecycleLinks', default=1 enables automatic recycling of new links * combineINIT takes new parameter --topic which enables focused mode using the given file as topicdefinition -- Anders Ardo Fri, 24 Mar 2006 12:11:00 +0200 combine (3.1-1) sarge; urgency=low * New SQL doc * Bumped version to comply with conventions -- Anders Ardo Wed, 07 Mar 2006 15:37:00 +0200 combine (3.0.0-14) stable; urgency=low * Bug fixes in export -- Anders Ardo Wed, 01 Mar 2006 10:37:00 +0200 combine (3.0.0-13) stable; urgency=low * Efficiency improvement -- Anders Ardo Fri, 13 Jan 2006 13:37:00 +0200 combine (3.0.0-12) stable; urgency=low * Added delete functionality to combineUtil * Added a few extensions for binary files to config-files -- Anders Ardo Thu, 12 Jan 2006 10:37:00 +0200 combine (3.0.0-11) stable; urgency=low * Added combineUtil for database statistic, sanity and cleaning functionality * Changed Config to use include for additional configuration serveralias and exclude -- Anders Ardo Tue, 03 Jan 2006 09:50:00 +0200 combine (3.0.0-10) stable; urgency=low * Added dependecy on libalvis-cannonical * Changed to use Alvis::Canonical -- Anders Ardo Tue, 19 Dec 2005 15:24:00 +0200 combine (3.0.0-9) stable; urgency=low * Removed erronous quoting of '"' in exported XML-records -- Anders Ardo Tue, 13 Dec 2005 13:38:00 +0200 combine (3.0.0-8) stable; urgency=low * Added handling of Alvis Pipeline exports * Added modifications for OAI including incremental exports -- Anders Ardo Sat, 10 Dec 2005 14:38:00 +0200 combine (3.0.0-7) stable; urgency=low * UTF-8 fixes * Updated Tidy.cfg -- Anders Ardo Tue, 01 Dec 2005 08:58:00 +0200 combine (3.0.0-6) stable; urgency=low * Updates in documentation * Added DTD files to conf * Updated combineExport new switches - recordid, md5, validate -- Anders Ardo Tue, 08 Nov 2005 14:39:00 +0200 combine (3.0.0-5) stable; urgency=low * Updates in documentation * Added some Topic definition files * combineINIT now creates empty files 'stopwords.txt' * combineINIT now sets permission on /var/run/combine/xxx -- Anders Ardo Mon, 07 Nov 2005 16:16:00 +0200 combine (3.0.0-4) stable; urgency=low * Added man pages -- Anders Ardo Thu, 03 Nov 2005 13:11:00 +0200 combine (3.0.0-3) stable; urgency=low * Bug fixes in combine, combineCtrl, SD_SQL.pm * Added combineINIT -- Anders Ardo Wed, 02 Nov 2005 15:15:35 +0200 combine (3.0.0-2) stable; urgency=low * Bug fixes in Check_record.pm, Config.pm, FromHTML.pm, PosCheck_record.pm, combine, combineCtrl, combineExport -- Anders Ardo Wed, 02 Nov 2005 14:10:35 +0200 combine (3.0.0-1) stable; urgency=low * Initial Release. * Code taken from AlvisCombine cvs project -- Anders Ardo Mon, 25 Apr 2005 14:10:35 +0200