package File::Locate::Harder; use base qw( Class::Base ); =head1 NAME File::Locate::Harder - when you're determined to use a locate db =head1 SYNOPSIS use File::Locate::Harder; my $flh = File::Locate::Harder->new(); my $results_aref = $flh->locate( $search_term ); # using a defined db location, plus some locate options my $flh = File::Locate::Harder->new( db => $db_file ); my $results_aref = $flh->locate( $search_pattern, { case_insensitive => 1, regexp => 1, } ); # creating your own locate db, (in this example for doing tests) use Test::More; SKIP: { my $flh = File::Locate::Harder->new( db => undef ); $flh->create_database( $path_to_tree_to_index, $db_file ); if( $flh->check_locate ) { my $reason = "Can't get File::Locate::Harder to work"; skip "Can't run 'locate'", $test_count; } my $results_aref = $flh->locate( $search_term ); is_deeply( $results_aref, $expected_aref, "Found expected files"); } # introspection (is it reading db directly, or shelling out to locate?) my $report = $flh->how_works; print "This is how File::Locate::Harder is doing locates: $report\n"; =head1 DESCRIPTION File::Locate::Harder provides a generalized "locate" method to access the file system indexes used by the "locate" command-line utility. It is intended to be a relatively portable way for perl code to quickly ascertain what files are present on the current system. This code is essentially a wrapper around multiple different techniques of accessing a locate database: it makes an effort to use the fastest method it can find that works. The "locate" command is a well-established utility to find files quickly by using a special index database (typically updated via a cron-job). This module is an attempt at providing a perl front-end to "locate" which should be portable across most unix-like systems. Behind the scenes, File::Locate::Harder silently tries many ways of doing the requested "locate" operation. If it can't establish contact with the file system's locate database, it will error out, otherwise you can be reasonably sure that a "locate" will return a valid result (including an empty set if the search matches nothing). If possible, File::Locate::Harder will use the perl/XS module L to access the locate db directly, otherwise, it will attempt to shell out to a command line version of "locate". If not told explicitly what locate db file to use, this module will try to find the file system's standard locate db using a number of reasonable guesses. If those all fail -- and it's possible for it to fail simply because file permissions make the db file effectively invisible -- as a last ditch effort, it will try shelling out to the command line "locate" without specifying a db for it (because it usually knows where to look). Efficiency may be improved in some circumstances if you help File::Locate::Harder find the locate database, either by explicitly saying where it is (using the "db" attribute), or by setting the LOCATE_PATH environment variable. Also see the L method. =head2 METHODS =over =cut use 5.006; use strict; use warnings; use Carp; use Data::Dumper; use Hash::Util qw( lock_keys unlock_keys ); use File::Path qw(mkpath); use File::Basename qw(fileparse basename dirname); # Note: File::Locate is now "require"ed during init instead of "use"ed. our $VERSION = '0.05'; # for autoload generated accessors our $AUTOLOAD; my %ATTRIBUTES = (); =item new Creates a new File::Locate::Harder object. With no arguments, the newly created object (largely) has attributes that are undefined. All may be set later using accessors named according to the "set_*" convention. Inputs: An optional hashref, with named fields identical to the names of the object attributes. The attributes, in order of likely utility: =over =item Settings for ways to run "locate" =over =item case_insensitive Like the usual command-line "-i". =item regexp The search term will be interpeted as a POSIX regexp =item posix_extended The search term is a regexp with the standard POSIX extensions. =back =item Overall settings (for "locate", "create_database", etc) =over =item db Locate database file, with full path. Use this to work with a non-standard location, or set it to "undef" if you don't want this module to waste time looking for it (e.g. you might be planning to generate your own db via L). =back =item For internal use, testing, and so on: The following items are lists used in the probing process which determines what works on the current system. These lists are defined with hardcoded defaults that will normally remain untouched, though are sometimes over-ridden for testing purposes. =over =item locate_db_location_candidates Likely places for a locate db. See L. =item test_search_terms Common terms in unix file paths. See L. =back The following are status fields where the results of system probing are stored. The user not will normally be uninterested in these, though see L for a hint about performance improvements in repeated runs. =over =item system_db_not_found Could not find where the standard locate db is. =item use_shell_locate Shell out to locate and forget about using File::Locate =item shell_locate_failed So don't try probe_db_via_shell_locate again =item shell_locate_cmd_idx Integer: controls the choice of syntax of the locate shell cmd =back =back =cut # Note: "new" is inherited from Class::Base, and # calls the following "init" routine automatically. =item init Method that initializes object attributes and then locks them down to prevent accidental creation of new ones. Not of interest to client coders, though inheriting code should have an init of it's own that calls this one. =cut sub init { my $self = shift; my $args = shift; unlock_keys( %{ $self } ); # all object attributes, including arguments that become attributes my @attributes = ( 'db', # locate database file, with full path # results of system introspection on how 'locate' works here 'system_db_not_found', 'use_shell_locate', # shell out to locate, forget File::Locate 'shell_locate_failed', # so don't try probe_db_via_shell_locate again 'shell_locate_cmd_idx', # integer, controls syntax of locate shell cmd # lists to try in sequence until one works 'test_search_terms', # common terms on perl/unix systems 'locate_db_location_candidates', # likely places for a locate db # options settings for different styles of "locate" 'case_insensitive', 'regexp', 'posix_extended', ); # transform args into attributes, if on the approved list foreach my $field (@attributes) { $ATTRIBUTES{ $field } = 1; $self->{ $field } = $args->{ $field }; } # (all accessors are now fair game to use here) $self->define_probe_parameters; # that is, the test_search_terms and locate_db_location_candidates # Try to load module "File::Locate", if it fails we'll try shell locate eval "require File::Locate;"; if ($@) { $self->set_use_shell_locate( 1 ); } my $probe_db = 1; # check for defined db field, but undef value if ( grep{ m/^db$/ } (keys %{ $args } ) || ( not( defined( $args->{db} ) ) ) ) { # if found we should *not* probe for a file-system db $probe_db = 0; } # two issues: determining which db to use # and how to use it db (i.e. via module or shell) my $db; if ( $probe_db ) { if ( $db = $args->{ db } || $ENV{ LOCATE_PATH } ) { $self->set_db( $db ); # even if we're told which db to use, must still determine how # But: we can't probe it if it's not created yet, # And: there's no point if we already know we're going via shell if ( -e $db && ( not ( $self->use_shell_locate ) ) ) { # will we use db fast way (module) or slow but sure (shell) $self->probe_db( $db ); # note: sets use_shell_locate # TODO check return for failure? } } elsif ( $db = $self->determine_system_db( ) ) { # using the standard file system locate db } elsif ( $self->probe_db_via_shell_locate ) { # the db is unknown to us, but locate may still know $self->set_use_shell_locate( 1 ); } else { croak "File::Locate::Harder is not working. " . "Problem with 'locate' installation?"; } } # end if don't probe db lock_keys( %{ $self } ); return $self; } =item locate Simple interface to performs the actual "locate" operation in a robust, reliable way. Uses the locate db file indicated by the object's "db" attribute (which is set automatically if not manually overridden). Input: A term to search for in the file name or path. Return: An array reference of matching files with full paths. =cut sub locate { my $self = shift; my $search_term = shift; my $locate_options = shift; # apply the current locate options but preserve object settings my $original_settings = { case_insensitive => $self->case_insensitive, regexp => $self->regexp, posix_extended => $self->posix_extended, }; foreach my $field (keys (%{ $locate_options })){ my $setter = "set_$field"; $self->$setter( $locate_options->{ $field } ); } # farm out the locate operation to "via_shell" or "via_module" my $result = []; if ( $self->use_shell_locate ) { $result = $self->locate_via_shell( $search_term ); } else { $result = $self->locate_via_module( $search_term ); } # restore the original object settings of locate options foreach my $field (keys (%{ $original_settings })){ my $setter = "set_$field"; $self->$setter( $original_settings->{ $field } ); } return $result; } =item create_database Tries to create the locate database file indicated in the object data, indexing the tree indicated by a path given as an argument. A required second argument specifys the db file: the "db" field in the object is ignored by this method, though if the database is successfully created, the object's "db" field will be set to the newly created database. Inputs: (1) full path of tree of files to index (2) full path of db file to create Return: false (undef) on failure. =cut sub create_database { my $self = shift; my $location = shift; my $db = shift; mkpath( dirname( $db )); my @cmd = ( "slocate -U $location -o $db", "updatedb --output=$db --localpaths='$location'", ); my $status = undef; TRY_AGAIN: foreach my $cmd (@cmd) { $self->debug("Trying cmd:\n$cmd\n"); my $ret; $ret = system( "$cmd 2>/dev/null" ); if ( $ret != 0 ) { $self->debug( "Failed locate db create command:\n $cmd\n" ); $self->debug( "\$\?: $?\n" ) if $?; next TRY_AGAIN; } else { $status = 1; $self->set_db( $db ); last; } } if ( not( $status ) ) { carp "Could not create db: $db to index $location"; } if ( -e $db ) { my $mtime = (stat $db)[9]; my $timestamp = ( localtime($mtime) ); $self->debug("Looks like database has been created: $db at $timestamp\n"); } return $status; } =back =head2 introspection =over =item check_locate Returns true (1) if this module's 'locate' method is capable of working. This is very similar to the L method, except that with no arguments *and* an undefined object's db setting, this will initiate a L run to try to find the standard system locate db. Example usage: my $flh = File::Locate::Harder->new( { db => undef } ); $flh->create_database( $tree_location, $db_file ); if ( $flh->probe_db ) { my @files = $flh->locate( "want_this" ); # checks the newly created db, # just indexing $tree_location # ... } # Then later, if you want to search the whole file system... $flh->set_db( undef ); if ( $flh->check_locate ) { my @hits = $flh->locate( "search_for_this" ); * ... } # But even more convenient would be: if ( $flh->determine_system_db ) { my @hits = $flh->locate( "search_for_this" ); * ... } (Thus I suspect that this is a redundant, useless method.) Rule of thumb: if you want to search the whole system, you can use check_locate to verify that L will (most likely) work, but if you're using your own custom db (e.g. created via L), you might as well just use . (Another rule of thumb: if this seems confusing, just ignore the issue for as long as you can.) =cut sub check_locate { my $self = shift; my $db = shift || $self->db || $self->determine_system_db; my $ret = $self->probe_db( $db ); return $ret; } =item how_works Returns a report on how this module has been doing "locate" operations (e.g. via the shell or the File::Locate module, and using which db). =cut sub how_works { my $self = shift; my $db = $self->db | 'unknown'; my $report = ''; if ( $self->use_shell_locate ) { my $version = $self->shell_locate_version || ''; $report = "We shell out to locate version: $version\n using the locate db: $db\n"; } else { $report = "Using File::Locate with the locate db: $db\n"; } return $report; } =item introspection_results Returns a hashref of the results of File::Locate::Harder's probing of the system's "locate" setup, so that it can be easily used again without re-doing that work. Example: my $settings_href = $flh1->introspection_results; # save $settings_href somehow (e.g. dump to yaml file) # restore $settings_href somehow my $flh2 = File::Locate::Harder->new( $settings_href ); =cut sub introspection_results { my $self = shift; my $settings = { db => $self->db, system_db_not_found => $self->system_db_not_found, use_shell_locate => $self->use_shell_locate, shell_locate_failed => $self->shell_locate_failed, shell_locate_cmd_idx => $self->shell_locate_cmd_idx, }; return $settings; } =item shell_locate_version Tries to determine the version of the shell's "locate" command. This will work only with the GNU locate and Secure Locate variants, not the Free BSD. Returns the version string on success, otherwise 0 for failure. =cut sub shell_locate_version { my $self = shift; my @cmd = ( 'locate --version', # gnu & slocate 'locate -V', # slocate ); # note: freebsd has no version option my $ret = 0; CMD: foreach my $cmd (@cmd) { $self->debug("Trying cmd:\n$cmd\n"); chomp( $ret = `$cmd` ); if ($ret) { last CMD; } else { $self->debug( "Failed locate version request of form:\n $cmd\n" ); $self->debug( "\$\?: $?\n" ) if $?; next CMD; } } if ( not( $ret ) ) { carp "Could not get version of locate shell command"; } return $ret; } =back =head2 special purpose methods (usually, though not exclusively, for internal use) =over =item locate_via_module Uses the perl/XS module L to perform a locate operation on the given search term, using the db file indicated by the object's db attribute. An optional second argument allows passing in a coderef, an anonymous routine that operates on each match (the match value is set to $_): this makes it possible to work with a large result without storing the entire set in memory. Uses the three object attribute toggles (L, , ) to control the way locate is performed. =cut sub locate_via_module { my $self = shift; my $search_term = shift; my $coderef = shift; my $db = $self->db; my @opts = $self->build_opts_for_locate_via_module; if( not( $coderef ) ) { my @results = File::Locate::locate( $search_term, @opts, $db ); return \@results; } else { my $ret = File::Locate::locate( $search_term, @opts, $db, $coderef ); return $ret; } } =item locate_via_shell Given a search term returns an array reference of matches found from a "locate" search. An optional second argument containing the locate command's "options string" (e.g. "-i", "-r", "-re", etc) may be passed in (otherwise it is generated from object data). This method uses object data settings: L, L And indirectly (via L): L, L, L =cut sub locate_via_shell { my $self = shift; my $search_term = shift; my $opt_str_override = shift; unless( defined( $self->shell_locate_cmd_idx ) ) { $self->probe_db_via_shell_locate; # side effect: determine cmd_idx } my $cmd_idx = $self->shell_locate_cmd_idx; my $db = $self->db; my $opt_str = $opt_str_override || $self->build_opts_for_locate_via_shell; my ($locate_cmd, @results); $locate_cmd = $self->generate_locate_cmd( $cmd_idx, $search_term, $db, $opt_str ); chomp( @results = `$locate_cmd` ); return \@results; } =back =head2 methods largely for internal use =over =item determine_system_db Internally used routine: looks for a useable system-wide locate db. Returns the path to the db if found, and as a side effect sets the object attribute "db". =cut # Note: for efficiency reasons, this trys to access all # candidates via module before falling back on via shell. That's # the reason this routine does not use the probe_db method sub determine_system_db { my $self = shift; if ( $self->system_db_not_found ) { return undef; # might as well bail if we've failed before } my $candidates = $self->locate_db_location_candidates; my @exist = grep { -e $_ } @{ $candidates }; foreach my $db (@exist) { if( $self->probe_db_via_module_locate( $db ) ) { $self->set_db( $db ); return $db; } } foreach my $db (@exist) { if( $self->probe_db_via_shell_locate( $db ) ) { # $self->set_use_shell_locate(1); ### TODO -- why not do this here $self->set_db( $db ); return $db; } } $self->set_system_db_not_found( 1 ); return undef; } =item probe_db For when the locate db file you're interested in is known, and you want to initialize access for it (and as a side-effect, find out if it works). Input: db file name with full path (optional, defaults to object's setting). Return: for success, the db file name, on failure undef. Side-effect: set's use_shell_locate if the access via module didn't work. =cut sub probe_db { my $self = shift; my $db = shift || $self->db; # will we use db fast way (module) or slow but sure (shell) if ( $self->probe_db_via_module_locate ) { # File::Locate module works, so use it return $db; # success } elsif ( $self->probe_db_via_shell_locate ) { $self->set_use_shell_locate( 1 ); return $db; # success } else { return; # failed } } =item probe_db_via_module_locate Looks to see if it can find anything in the given db by using the File::Locate module. =cut sub probe_db_via_module_locate { my $self = shift; my $db = shift || $self->db; # bail immediately if we've already know via_module doesn't work if ( $self->use_shell_locate ) { return undef; } my @test_search_terms = @{ $self->test_search_terms }; foreach my $search_term (@test_search_terms) { my $found_one; eval { # in scalar context, this 'locate' returns a boolean $found_one = File::Locate::locate( $search_term, $db ); }; if ($@) { # traps errors reported by the File::Locate module $self->debug("File::Locate::locate had a problem with $db:\n$@"); return undef; } if ( $found_one ) { return $db; } else { $self->debug("File::Locate::locate found no $search_term via $db:\n$?"); } } return undef; } =item probe_db_via_shell_locate Tries the series of standard test searches by shelling out to the command-line form of locate to make sure that it can be used. Tries to use the locate db file indicated by the objects "db" attribute, but this can be over-ridden with an optional argument. Under some circumstances, the db may remain undefined, but this method will return "1" for success if it appears that command-line locate works in any case. As a side-effect, saves the L that indicates a form of the locate command that has been observed to work. Returns: undef for failure, and for success either the db or 1 (because locate can work even if this code can't figure out what db file it's using). =cut sub probe_db_via_shell_locate { my $self = shift; if ( $self->shell_locate_failed ) { return undef; # bail now if we've failed before } my $default_db; if ( $self->system_db_not_found ) { $default_db = undef; # locate may find the system db even if we can't } else { $default_db = $self->db; } my $db = shift || $default_db; my $true = $db || 1; my $opt_str = $self->build_opts_for_locate_via_shell; # Nested loops of trials # The outer loop: different syntax variations of the locate cmd # The inner loop: a series of terms to try searching for. my $test_search_terms_aref = $self->test_search_terms; my @test_search_terms = @{ $test_search_terms_aref } if $test_search_terms_aref; my $lim = $self->generate_locate_cmd; for (my $cmd_idx = 0; $cmd_idx <= $lim; $cmd_idx++) { foreach my $search_term (@test_search_terms) { my $locate_cmd = $self->generate_locate_cmd( $cmd_idx, $search_term, $db, $opt_str ); chomp( my @hits = `$locate_cmd 2>/dev/null` ); if ( scalar( @hits ) > 0 ) { $self->set_shell_locate_cmd_idx( $cmd_idx ); return $true; } } } $self->set_shell_locate_failed( 1 ); return undef; } =item generate_locate_cmd Given an ordered list of four required parameters, returns a form of the locate command which can (in theory) be fed to the shell. In practice these different forms are expected to fail (some harder than others) on various different platforms, so some experimentation may be needed to find a form that works (which is the job of L). Inputs: $cmd_idx: integer index (beginning with 0) that chooses the form of a command to return. $search_term: string (or possibly regexp) to search for. $db: full path to the locate db to search. $opt_str: options string, defaults to values generated by build_opts_for_locate_via_shell Example usage: for ($i=0; $i<=$self->generate_locate_cmd; $i++) { my $locate_cmd = $self->generate_locate_cmd( $cmd_idx, $search_term, $db, $opt_str ); my @result = `$locate_cmd 2 > /dev/null `; if ( scalar(@result) > 0 ) { return $i; } } Note: the various forms of locate are discussed below in L Special case: with no arguments (specifically, with $cmd_idx undefined) returns the count of avaliable command forms minus 1 ($#cmd_forms); =cut sub generate_locate_cmd { my $self = shift; my $cmd_idx = shift; my $search_term = shift || ''; # suppressing warnings about subbing undefs my $db = shift || ''; my $opt_str = shift || $self->build_opts_for_locate_via_shell || ''; $self->debug("cmd_idx: $cmd_idx\n") if defined( $cmd_idx ); $self->debug("generate_locate_cmd: " . "db: $db search_term: $search_term\n"); my @shell_locate_cmds; if( $db ) { @shell_locate_cmds = ( "locate -q -d '$db' $opt_str $search_term", "locate -d '$db' $opt_str $search_term", "locate -q --database='$db' $opt_str $search_term", "locate --database='$db' $opt_str $search_term", ); } else { @shell_locate_cmds = ( "locate -q $opt_str $search_term", "locate $opt_str $search_term", "locate -q $opt_str $search_term", "locate $opt_str $search_term", ); } my $limit = $#shell_locate_cmds; if ( not( defined( $cmd_idx ) ) ) { return $limit; } if ( $cmd_idx > $limit ) { return; # undef } my $cmd = $shell_locate_cmds[ $cmd_idx ]; $self->debug("generate_locate_cmd: returned cmd:\n$cmd\n"); return $cmd; } =item build_opts_for_locate_via_shell Converts the three object attribute toggles (L, , ) into the command-line options string for locate. The "posix_extended" feature is not supported for locates via the shell, and if used will issue a warning. =cut sub build_opts_for_locate_via_shell { my $self = shift; my $opt_str = ''; if ( $self->case_insensitive ) { $opt_str .= 'i'; }; if ( $self->regexp ) { $opt_str .= 'r'; } if ( $self->posix_extended ) { carp("Can't use posix extended regexps with locate via the shell"); }; $opt_str = "-$opt_str" if $opt_str; return $opt_str; } =item build_opts_for_locate_via_module Converting three object attribute toggles (L, , ) into the form that the File::Locate::locate requires: returns an array. =cut sub build_opts_for_locate_via_module { my $self = shift; my $rexopt_str = ''; my @opts = (); if ( $self->case_insensitive ) { $rexopt_str .= 'i'; }; if ( $self->posix_extended ) { $rexopt_str .= 'e'; }; if ( $self->regexp || $rexopt_str ) { # any -rexopt (even 'i') implies # a need for -rex @opts = (-rex => 1); } push @opts, (-rexopt => $rexopt_str) if $rexopt_str; return @opts; } =back =head2 initialization utilities =over =item define_probe_parameters An internal method, used during the object init process. Defines two arrays that are used to control the locate db "probe" process: the test_search_terms and the locate_db_location_candidates. The locate_db_location_candidates are likely places for a system's locate db. See L below. The test_search_terms are common terms in unix file paths, which we can check to see if what looks like the locate database really is one. See L below. =cut sub define_probe_parameters { my $self = shift; # common strings in file paths on perl/unix systems, # in roughly increasing likelihood of size of search result my @test_search_terms = qw( MakeMaker SelfStubber DynaLoader README tmp bin the htm txt home e / ); $self->set_test_search_terms( \@test_search_terms ); # some places one might look for the system's # locate db, in roughly increasing order of likelihood my @candidates = qw( /var/lib/slocate/slocate.db /var/cache/locate/locatedb /var/db/locate.database /usr/var/locatedb /var/lib/locatedb /usr/local/var/locatedb /var/lib/locate/locatedb /var/spool/locate/locatedb /var/cache/locate/slocate.db /var/db/slocate.db /usr/var/slocate.db /usr/local/var/slocate.db /var/lib/locate/slocate.db /var/spool/locate/slocate.db /var/lib/slocate/locate.database /var/cache/locate/locate.database /usr/var/locate.database /usr/local/var/locate.database /var/lib/locate/locate.database /var/spool/locate/locate.database /var/lib/slocate/locatedb /var/db/locatedb ); $self->set_locate_db_location_candidates( \@candidates ); return $self; } =back =head2 basic setters and getters =over =item db Getter for object attribute system_db =item set_db Setter for object attribute set_db As a side-effect, unsets the shell_locate_failed flag (what if the last db file was bad, and this current setting will work?). =cut sub set_db { my $self = shift; my $db = shift; $self->{ db } = $db; $self->set_shell_locate_failed( undef ); return $db; } =back =head2 EXPERIMENTAL Having some trouble straightening out the above code as-written. Going to work on some experimental routines here, that might have a use somewhere. =over =item work_via Try the db various ways, make a recommendation on how to access it. Return string: 'module' or 'shell'. Q: how to handle the shell-but-undef-db case? A1: could be a third how-type 'shell_unknown' A2: could be some sort of meta-field, a "system_db_indeterminate" flag =cut sub work_via { my $self = shift; my $db = shift || $self->db; my $how; if ( $self->probe_db_via_module_locate( $db ) ) { $how = 'module'; } else { if( $self->probe_db_via_shell_locate( $db ) ) { $how = 'shell'; } else { $how = 'shell_unknown_db'; } } return $how; } =back =head2 automatic accessor generation =over =item AUTOLOAD =cut sub AUTOLOAD { return if $AUTOLOAD =~ /DESTROY$/; # skip calls to DESTROY () my ($name) = $AUTOLOAD =~ /([^:]+)$/; # extract method name (my $field = $name) =~ s/^set_//; # check that this is a valid accessor call croak("Unknown method '$AUTOLOAD' called") unless defined( $ATTRIBUTES{ $field } ); { no strict 'refs'; # create the setter and getter and install them in the symbol table if ( $name =~ /^set_/ ) { *$name = sub { my $self = shift; $self->{ $field } = shift; return $self->{ $field }; }; goto &$name; # jump to the new method. } elsif ( $name =~ /^get_/ ) { carp("Apparent attempt at using a getter with unneeded 'get_' prefix."); } *$name = sub { my $self = shift; return $self->{ $field }; }; goto &$name; # jump to the new method. } } 1; =back =head1 Platforms It's likely that this package will work on any unix-like system (including cygwin), though on some there might be a need for additional installation and setup (e.g. a "findutils" package). Development was done on two varieties of linux (aka GNU/linux): Knoppix (32bit) on a Turion and Kubuntu on an Opteron machine. This covered two major varieties of the "locate" command: GNU locate and Secure Locate. A serious attempt was made to support BSD locate on Freebsd, but the testing has not been completed. Note: at present the File::Locate module appears to fail silently on 64bit platforms, so there the command-line shell locate will always be used. =head1 MOTIVATION This module uses L, which is a a perl XS interface to read locate (or slocate) dbs without shellling out to the command-line "locate" program. File::Locate has one great limitation: it must be told which locate db to use (by explicit parameter, or by environment variable), it has no notion of a default location. Further, as of this writing, it appears to be limited to 32bit systems. This module then is a wrapper around File::Locate that tries a number of common locations for the locate database, and instead of just giving up, it also tries the command-line locate, which has it's own ways of knowing where the database can be (configuration file, compiled-in default, or command-line parameter). The intention here is to make this module as portable as possible... it might, for example, be useful to use in portable CPAN modules that need to look for things in the filesystem. (As a case in point: the job of File::Locate::Harder would be a lot easier if it could use "locate" to find the locate db...). =head1 Additional Examples =head2 forcing locate via File::Locate module or via shell command my $flh = File::Locate::Harder->new(); $result_via_module = $flh->locate_via_module( $term ); $result_via_shell = $flh->locate_via_shell( $term ); =head2 using the coderef feature of the File::Locate module my $count = 0; $flh->locate_via_module( $term, sub { $count++ } ); print "There are $count matches of $term\n"; $flh->locate_via_module( $term, sub { $count++ if $_ =~ m{ ^ /home }x } ); print "There are $count matches of $term located in /home\n"; =head2 speeding up multiple searches if you know you're using shell locate This reduces the number of calls to build_opts_for_locate_via_shell: my @searches = qw( .bashrc .bash_profile .emacs default.el ); my $flh = File::Locate::Harder->new(); my $opt_str = $self->build_opts_for_locate_via_shell; foreach my $term (@searches) { $result_via_shell = $flh->locate_via_shell( $term, $opt_str ); } =head1 SEE ALSO L Manual pages: L, L, and/or L. =head1 NOTES =head2 architecture The general philosophy in use here is to just try things that are likely to work and then just try something else if they fail. This is probably better than attempting to guess which form of locate to use based on the current platform, because (a) no one (to my knowledge) has a capabilities database that specifies which locate is found on which platform (b) different variants may be installed at the whim of a sysadmin (c) there may after all be variants of locate I've never encountered. So checking ^O is of limited utility, and similarly, some of the existing forms of locate lack introspection features (e.g. you can't get freebsd's locate to tell you what version it is). =head2 details The object creation process "new" and "init" determines how to do system-wide locates, and saves it's conclusions for use by future calls of the locate method on this object. Some of this elaborate initialization process can be short-circuited if it's told which db file to use, or even just giving it an "db" option with an undefined value. That's convenient for cases where you want to use this module to create a locate db of your own (there's no point in scoping for a system-wide db if we're going to use a specialized one). If the db location is not known, the search process begins with making guesses about likely locations it might be found. It goes through this list: /var/lib/slocate/slocate.db -- Secure Locate under Kubuntu /var/cache/locate/locatedb -- GNU locate, under Knoppix /var/db/locate.database -- BSD locate, under FreeBSD /usr/var/locatedb -- mentioned: File::Locate docs and cygwin lists /var/lib/locatedb -- mentioned on insecure.org /usr/local/var/locatedb -- Solaris with findutils installed /var/lib/locate/locatedb -- mentioned on a Debian list in 2000 /var/spool/locate/locatedb -- speculative mention on a cygwin list So that's three names, in 8 locations. It also tries other permutations on speculation: /var/cache/locate/slocate.db /var/db/slocate.db /usr/var/slocate.db /usr/local/var/slocate.db /var/lib/locate/slocate.db /var/spool/locate/slocate.db /var/lib/slocate/locate.database /var/cache/locate/locate.database /usr/var/locate.database /usr/local/var/locate.database /var/lib/locate/locate.database /var/spool/locate/locate.database /var/lib/slocate/locatedb /var/db/locatedb Each of these possibilites is checked for simple file-existance, and then checked to see if one works. (See L below.) =head2 locate shell command If attempts at using L fails, the system falls back to shelling out to the locate command (it really should already know how to find the system-wide db, either from a compiled-in default or a config file setting). But the locate shell command has it's own problems. There are at least three variants, with some slight differences between GNU locate, slocate and freebsd locate. The current architecture of locate_via_shell tries all of them in a certain order, and remembers the one that worked last time. Briefly, here are the variations we need to account for: =over =item -d or --database -d is essentially more general, because freebsd has it but does not have --database. So, we try "-d" first, but also try "--database" just in case. =item -q for quiet As of this writing, with slocate, if you tell it explicitly which db to use, that works, but you also get an ignorable error about how you don't have permissions to mess with the system wide database. You can get this warning to go away with the "-q" option, but neither Gnu locate or freebsd has it, and if you use it with them it's a fatal error. So here we try to use "-q" first, and if that dies, we run without it. =back And still other variations exist in requesting version information. The FreeBSD form does not understand "--version", and in fact doesn't seem to have any sort of version option. (Ah, Cross-platform programming is such a joy.) =head2 checking if a form of locate works In order to check that a system-wide locate is working, we probe for files we know (or strongly suspect) will be there on the system. This module tries a series of guesses of decreasing specificity (there's no point in getting a huge number of hits if they're not needed), then bails out on the list if a result is recieved. The list in use here begins with files in the standard perl library (which should accompany almost any installation of perl, unless they were removed for some reason): MakeMaker SelfStubber DynaLoader It then begins looking for strings that should be relatively common on most systems: README tmp bin the htm txt home The presumption is that if there are no hits on those searchs on a system-wide database, something is very wrong, and that particular form of "locate" just isn't working. =head2 File::Locate By using File::Locate with () to supress import, we need to call 'locate' like so: File::Locate::locate which makes it easy for us to define a new 'locate' method of our own. The proceedural syntax of File::Locate::locate has it's ugly aspects, but the documentation is usually clear: my @mp3s = File::Locate::locate "mp3", "/usr/var/locatedb"; # do regex search @hits = File::Locate::locate "^/usr", -rex => 1, "/usr/var/locatedb"; @hits = File::Locate::locate "^/usr", -rexopt => 'ie', "/usr/var/locatedb"; # i - case insensitive # e - POSIX extended regexps (say what?) Note: it isn't abundantly clear from the documentation if -rexopt has to be used with -rex, but it appears that this is the case. (And there is a syntax diagram that indicates this). Another oddity, though: there doesn't seem to be a way to do a case-insensitive search without using regexps. (Oddly enough, none of the tests use the "-rexopt" feature.) A very cool touch is that you can hand it a coderef, and avoid building up a big result set: File::Locate::locate "*.mp3", sub { print "MP3 found: $_n" }; Note: the order of arguments to File::Locate::locate is supposed to be irrelevant. =head2 system status fields The system status fields (the one's that can be saved or inspected via L) no doubt seem redundant: db system_db_not_found use_shell_locate shell_locate_failed shell_locate_cmd_idx It's possible that they *are* somewhat redundant: they were invented on-the-fly during development on an ad hoc basis. However, despite the way it looks, this set is resistant to being reduced in size. Two-valued logic has it's limitations: for our immediate purpose, there has to be ways to distinguish between "I don't know what this value is, and you should try to find out" and "I don't know what this value is, and it isn't worth trying to find it." For example, the "db" field alone isn't good enough, it needs to be supplemented with information about what we've done to try to determine the "db". As for "use_shell_locate" and "shell_locate_failed": "shell_locate_failed" is used largely to skip doing a probe via shell if it's failed before (possibly it's name should be expanded to "shell_locate_probe_failed"). Even if the system has been explicitly told to work via the shell, it's still necessary to do a probe to find out which form of the shell locate command will work ("shell_locate_cmd_idx"). =head1 AUTHOR Joseph Brenner, Edoom@kzsu.stanford.eduE, 29 May 2007 =head1 COPYRIGHT AND LICENSE Copyright (C) 2007 by Joseph Brenner This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available. =head1 BUGS None reported... yet. =cut