package CPAN::Metrics; =pod =head1 NAME CPAN::Metrics - Create and maintain a Perl::Metrics database for all of CPAN =head1 SYNOPSIS # Do a CPAN::Metrics run my $metrics = CPAN::Metrics->new( remote => 'http://mirrors.kernel.org/cpan/', local => '/home/adam/.minicpan', extract => '/home/adam/.cpanmetrics', metrics => '/home/adam/.cpanmetrics/metrics.sqlite', )->run; =head1 DESCRIPTION C is a combination of L and L. It lets you pull out all of CPAN (for various definitions of "all") and run L on it to generate massive amounts of metrics data on the 16,000,000 lines of code in CPAN. =head2 Resource Usage While it might make it relatively easy to write the B to "process all of CPAN", make no mistake that it's going to take you LOT of computing resources to do it. And especially so the first time. To do a single run should require 1-10 gigabytes of disk space, up to several hundred megabytes of memory, and hours (or days) of CPU time. The result will be a SQLite database containing somewhere between several hundred thousand and several million rows of metrics data. What you do with the metrics after B is up to you. =head1 METHODS =cut use 5.005; use strict; use base 'CPAN::Mini::Extract'; use Carp 'croak'; use Perl::Metrics (); use vars qw{$VERSION}; BEGIN { $VERSION = '0.08'; } ##################################################################### # Constructor =pod =head1 new The C constructor creates a new CPAN metrics processor. Although it is created as an object, due to L you can only create a single object within a single process. (I think) It takes a variety of different parameters. =over =item minicpan arguments =back Returns a new C object, or dies on error. =cut sub new { my $class = ref $_[0] ? ref shift : shift; # Call up to get the base object my $self = $class->SUPER::new( force => 1, skip_perl => 1, extract_check => 1, path_filters => [ qr/\bAcme\b/i, qr/\bPDF\-API2\b/i, qr/\bPerl6\b/i, ], # Remove some known troublemakers module_filters => [ qr/^Acme::/i, qr/^Meta::/i, qr/\bPerl6\b/i, ], extract_filter => sub { return 0 if /\:/; return 0 if /\binc\b/; return 1 if /\.pl$/; return 0 if /\bexamples?\b/; if ( /\bt\b/ ) { return 1 if /\.t$/; } else { return 1 if /\.pm$/; } return 0; }, @_, ); # Check and set the metrics database unless ( $self->{metrics} ) { croak("Metrics database param 'metrics' was not provided"); } Perl::Metrics->import( $self->{metrics} ); $self; } =pod =head2 run The C method launches the CPAN metrics processor. It will syncronize its L mirror from the remote server, expanding any new archives, and removing old ones. Once updated, the directory will be reindexed at update it in the metricsdatabase, and any required processing done to add the resulting metrics needed. And then (a C long time later) it will stop. :) Oh, and return true. Any errors will cause an exception (i.e. die) =cut sub run { my $self = shift; $self->SUPER::run( @_ ); $self->process_index; } sub process_index { my $self = shift; # Process the extraction directory local $Perl::Metrics::TRACE = 1; $self->trace("Indexing and processing documents in $self->{extract}...\n"); Perl::Metrics->process_index( $self->{extract} ); return 1; } 1; =pod =head1 TO DO - Improve Perl::Metrics to add needed things - Improve CPAN::Metrics::Extract to add needed things - Improve CPAN::Metrics to add needed things - Get all three of the above to use accessors - Possibly consider intentionally B caching so that we don't end up with a multi-multi-gigabyte parse cache. =head1 SUPPORT Bugs should be reported via the CPAN bug tracker at L For other issues, contact the author. =head1 AUTHOR Adam Kennedy Eadamk@cpan.orgE, L =head1 COPYRIGHT Copyright 2005 - 2008 Adam Kennedy. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of the license can be found in the LICENSE file included with this module. =cut