package WWW::Lipsum;
use strict;
use Carp qw(croak);
use vars qw( $VERSION );
use LWP::UserAgent;
use HTTP::Request::Common;
use HTML::TokeParser::Simple;
$VERSION = 0.2;
sub new {
my $class = shift;
my $self = {};
bless $self, $class;
return $self;
}
sub generate {
my ($self, %args) = @_;
foreach my $arg ( qw( amount type start html ) ) {
$self->{$arg} = $args{$arg};
}
# Arguments. If none are given, generate 1 paragraph by default.
$self->{amount} = $args{amount} || 1; # How much text is wanted?
$self->{what} = $args{what} || 'paras'; # What type of text?
$self->{start} = $args{start} || 1; # Begin with "Lorem ipsum..."?
$self->{html} = $args{html} || undef; # Wrap text in HTML tags?
my $ua = LWP::UserAgent->new;
my $response;
if ($self->{start} and $self->{start} eq 'no')
{
$response = $ua->request(POST 'http://www.lipsum.com/feed/http', [
amount => $self->{amount},
what => $self->{what}
]);
}
else
{
$response = $ua->request(POST 'http://www.lipsum.com/feed/http', [
amount => $self->{amount},
what => $self->{what},
start => 1
]);
}
return "Error: $!" unless $response->is_success;
my $raw = $response->content;
die "Fatal error: lipsum page wasn't as expected." unless defined $raw && $raw =~ /(.*)Generated(.*)/s;
my $stream = HTML::TokeParser::Simple->new( \$raw ) or die $!;
while (my $token = $stream->get_token)
{
# We're looking for '
'.
next unless $token->is_tag('div');
next unless defined $token->return_attr('id') && $token->return_attr('id') eq 'lipsum';
my $text = $stream->get_text('/div');
$text =~ s/ //; # remove leading space
if ($self->{what} =~ /^words$/i or $self->{what} =~ /^bytes$/i)
{
$text =~ s/\n//g;
return $text;
}
elsif ($self->{what} =~ /^lists$/i)
{
return $self->lists($text);
}
else
{
return $self->paras($text);
}
}
}
# The splitting and pushing and what-have-you that you're about to
# encounter would almost certainly be unnecessary if I really
# understood HTML::TokeParser.
sub paras
{
my ($self, $text) = @_;
my @paras;
foreach (split("\n", $text))
{
next unless $_ !~ /^ /; # drop empty chunks
push (@paras, $_);
}
for (0..1) { shift @paras; } # drop empty items
if (defined $self->{html})
{
$_ = "
\n" . $_ . "\n
" foreach @paras;
}
@paras;
}
sub lists
{
my ($self, $text) = @_;
my (@items, @list);
foreach (split( "\n", $text ))
{
next unless $_ =~ / /; # drop empty chunks
$_ =~ s/.//; # remove leading space
chop $_; # remove trailing space
foreach my $line (split ' ', $_)
{
$line = '
' . $line . '' if $self->{html};
$line .= "\n";
push (@list, $line);
}
my $tidied = join ('', @list);
$tidied = "
' if $self->{html};
push (@items, $tidied);
}
@items;
}
# Old method name for backwards compatibility.
sub lipsum {
my ($self, %args) = @_;
$self->generate(%args);
}
1;
__END__
=head1 NAME
WWW::Lipsum - get autogenerated text from lipsum.com
=head1 DESCRIPTION
C
is a module that will retrive "lorem ipsum" placeholder text
from lipsum.com.
What is "lorem ipsum"?
"Lorem Ipsum, or Lipsum for short, is simply dummy text of the
printing and typesetting industry. Lipsum has been the industry's
standard dummy text ever since the 1500s, when an unknown printer
took a galley of type and scrambled it to make a type specimen book.
It has survived not only four centuries, but now the leap into
electronic typesetting, remaining essentially unchanged. It was
popularised in the 1960s with the release of Letraset sheets
containing Lipsum passages, and more recently with desktop
publishing software like Aldus PageMaker including versions of
Lipsum." -- lipsum.com
lipsum.com is a useful resource on the Web that will generate passages of
lorem ipsum text for you in sizes of your choice. This module allows you to
retrieve them in an OO fashion to utilise for whatever purpose you wish.
=head1 SYNOPSIS
use WWW::Lipsum;
my $lipsum = WWW::Lipsum->new;
print $lipsum->generate;
my @paragraphs = $lipsum->generate(
amount => 5,
what => 'paras',
start => 'no',
html => 1
);
=head1 METHODS
There is just one method, C, which returns lorem ipsum text. It
has several options to correspond with those offered by lipsum.com.
print $lipsum->generate; # default usage, no options
This will give you one paragraph of lorem ipsum, beginning with the phrase
"Lorem ipsum dolor sit amet", as is traditional. The size of a "paragraph" is
randomly determined by the lipsum.com text generator, but is generally between
70 and 120 words.
my @paragraphs = $lipsum->generate(
amount => 5,
what => 'paras',
start => 'no',
html => 0
);
This will give you five paragraphs of lorem ipsum, the first of which will
be without the starting phrase. Setting 'html' to 1 will cause each paragraph
to be wrapped in HTML's tags.
print $lipsum->generate(
amount => 100,
what => 'words'
);
This will give you a hundred words of lorem ipsum with the starting
phrase. The 'html' setting has no effect if you ask for words. When
used to fill a variable, this will give you a list with one item.
print $lipsum->generate(
amount => 1024,
what => 'bytes'
);
This will give you 1024 bytes of lorem ipsum with the starting phrase.
Again, the 'html' setting will have no effect. Again, this will give
you a one-item list.
my @lists = $lipsum->generate(
amount => 10,
what => 'lists',
html => 1
);
The lipsum.com text generator's 'lists' setting produces HTML lists of
random size. Using this setting with this module will give you small chunks
of text, generally of the order of a couple of sentences. Using the 'html'
setting will cause these chunks to be wrapped in tags for you to
use as you see fit. If 'html' is off, you will get blocks of single lines of
text.
=head1 OLD METHOD NAMES
The C method used to be called C; this is retained as an
alias to C if you really want it.
=head1 AUTHOR
Earle Martin wrote this scraper, but see THANKS for details
of who really did the work.
=head1 LICENSE
This work is licensed under the Creative Commons Attribution-ShareAlike
License. To view a copy of this license, visit
L or send a letter to Creative
Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
=head1 THANKS
All kudos must go to James Wilson for creating
lipsum.com and thus providing inspiration for this module - and also for kindly
actually modifying the way his site works to make it easier for me to parse.
=cut