Part I
Overview

1 Introduction
2 Open source distribution, installation
 2.1 Installation
 2.2 Getting started
 2.3 Online documentation
 2.4 Use scenarios
3 Configuration
 3.1 Configuration files
4 Crawler internal operation
 4.1 URL selection criteria
 4.2 Document parsing and information extraction
 4.3 URL filtering
 4.4 Crawling strategy
 4.5 Built-in topic filter – automated subject classification using string matching
 4.6 Built-in topic filter – automated subject classification using SVM
 4.7 Topic filter Plug-In API
 4.8 Analysis
 4.9 Duplicate detection
 4.10 URL recycling
 4.11 Database cleaning
 4.12 Complete application – SearchEngine in a Box
5 Evaluation of automated subject classification
 5.1 Approaches to automated classification
 5.2 Evaluation methodology
 5.3 Results
6 Performance and scalability
 6.1 Speed
 6.2 Space
 6.3 Crawling strategy
7 System components
 7.1 combineINIT
 7.2 combineCtrl
 7.3 combineUtil
 7.4 combineExport
 7.5 Internal executables and Library modules
References