Circa is a search engine for your Web site, or for a list of sites. It
indexes like Altavista does. It can read, add and parse all url's found
in a page, if the page is on the same server.
Circa is free, under GNU license
Make a search on AlianWebServer :
- Full text indexing
- Different weights for title, keywords, description and rest of page
HTML read can be given in configuration
- Boolean query language support : or (default) and ("+")
not ("-"). Ex perl + faq -cgi : Documents with faq, eventually
perl and not cgi.
- Support protocol HTTP,FTP
- Make index in MySQL
- Client Perl or PHP
- Read HTML and full text plain
- Can do indexation of filesystem without talk to Web Server
- Can browse site by directory / rubrique.
- Several kinds of indexing : full, incremental, only on a particular
server. Documents not updated are not reindexed. All requests for a
file are made first with a head http request, for information such as
validate, last update, size, etc.
- Size of documents read can be restricted (Ex: don't get all documents
> 5 MB). For use with low-bandwidth connections, or computers which
do not have much memory.
- HTML template can be easily customized for your needs.
- Search for different criteria: news, last modified date, language,
URL / site.
- Admin functions available by browser interface or command-line.
- Full support of standard robots exclusion (robots.txt). Identification
with CircaIndexer/0.1, mail firstname.lastname@example.org.
- Delay requests to the same server for 8 secondes. "It's not a
bug, it's a feature!" Basic rule for HTTP serveur load.
- Index the different links found in a CGI (all after name_of_file?)
- Support proxy HTTP
- Support NNTP
- Support of different character sets
- Support of other bases
- Modules DBI, DBD::mysql,LWP::RobotUA,HTML::LinkExtor;
Memory : Indexation : 5,5M
Processeur : on Sun SPARC Station 4 : (5 secondes à 2%, 2s. à
20%, 1s. à 30%) / url indexée.
Size on MySQL: 2-5 ko / url.
Make index is a big work so it's not for CGI protocol. Try to use admin.pl
to update index; if you don't have telnet acces, try to lunch processus
on background with another CGI. Or install MySQL on local disk, make your
index, and export index on you sarch machine.
- Download one of archive file, uncompress it.
- You must update search.cgi and search.pl (script for search) admin.cgi
and admin.pl (script for admin) for put your MYSQL param :user, password,
database and ip adress if different from 'localhost'.
- Run admin.cgi (CGI interface) or admin.pl (command line) for add your
url, drop or create tables, ... I suggest to prefer use admin.pl on
command line because indexation can take a lot of time and is not adapted
- Run search.cgi. You can use the default form for use in your page.
Only field 'words' is necessary.
- For customized HTML result, look in file circa.htm
Documentation POD is available, use pod2html name_of_file.pm > name_of_file.html
for read it.
If you have root privileges and can install Perl modules, you can install
this two modules : Circa::Search
et Circa::Indexer. See directory
demo for how use this module. Install Circa::Indexer first.
Else, you can use this distrib :
or Format tar.gz
Alain BARBET email@example.com
Rules and security with :
I read of this need,
I needed one for AlianWebServer, and I think other people need it too.