The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
NAME
    URI::Fetch - Smart URI fetching/caching

SYNOPSIS
        use URI::Fetch;

        ## Simple fetch.
        my $res = URI::Fetch->fetch('http://example.com/atom.xml')
            or die URI::Fetch->errstr;

        ## Fetch using specified ETag and Last-Modified headers.
        $res = URI::Fetch->fetch('http://example.com/atom.xml',
                ETag => '123-ABC',
                LastModified => time - 3600,
        )
            or die URI::Fetch->errstr;

        ## Fetch using an on-disk cache that URI::Fetch manages for you.
        my $cache = Cache::File->new( cache_root => '/tmp/cache' );
        $res = URI::Fetch->fetch('http://example.com/atom.xml',
                Cache => $cache
        )
            or die URI::Fetch->errstr;

DESCRIPTION
    *URI::Fetch* is a smart client for fetching HTTP pages, notably
    syndication feeds (RSS, Atom, and others), in an intelligent, bandwidth-
    and time-saving way. That means:

    *   GZIP support

        If you have *Compress::Zlib* installed, *URI::Fetch* will
        automatically try to download a compressed version of the content,
        saving bandwidth (and time).

    *   *Last-Modified* and *ETag* support

        If you use a local cache (see the *Cache* parameter to *fetch*),
        *URI::Fetch* will keep track of the *Last-Modified* and *ETag*
        headers from the server, allowing you to only download pages that
        have been modified since the last time you checked.

    *   Proper understanding of HTTP error codes

        Certain HTTP error codes are special, particularly when fetching
        syndication feeds, and well-written clients should pay special
        attention to them. *URI::Fetch* can only do so much for you in this
        regard, but it gives you the tools to be a well-written client.

        The response from *fetch* gives you the raw HTTP response code,
        along with special handling of 4 codes:

        *   200 (OK)

            Signals that the content of a page/feed was retrieved
            successfully.

        *   301 (Moved Permanently)

            Signals that a page/feed has moved permanently, and that your
            database of feeds should be updated to reflect the new URI.

        *   304 (Not Modified)

            Signals that a page/feed has not changed since it was last
            fetched.

        *   410 (Gone)

            Signals that a page/feed is gone and will never be coming back,
            so you should stop trying to fetch it.

USAGE
  URI::Fetch->fetch($uri, %param)
    Fetches a page identified by the URI *$uri*.

    On success, returns a *URI::Fetch::Response* object; on failure, returns
    "undef".

    *%param* can contain:

    *   LastModified

    *   ETag

        *LastModified* and *ETag* can be supplied to force the server to
        only return the full page if it's changed since the last request. If
        you're writing your own feed client, this is recommended practice,
        because it limits both your bandwidth use and the server's.

        If you'd rather not have to store the *LastModified* time and *ETag*
        yourself, see the *Cache* parameter below (and the SYNOPSIS above).

    *   Cache

        If you'd like *URI::Fetch* to cache responses between requests,
        provide the *Cache* parameter with an object supporting the Cache
        API (e.g. *Cache::File*, *Cache::Memory*). Specifically, an object
        that supports "$cache->get($key)" and "$cache->set($key, $value,
        $expires)".

        If supplied, *URI::Fetch* will store the page content, ETag, and
        last-modified time of the response in the cache, and will pull the
        content from the cache on subsequent requests if the page returns a
        Not-Modified response.

    *   UserAgent

        Optional. You may provide your own LWP::UserAgent instance. Look
        into LWPx::ParanoidUserAgent if you're fetching URLs given to you by
        possibly malicious parties.

    *   NoNetwork

        Optional. Controls the interaction between the cache and HTTP
        requests with If-Modified-Since/If-None-Match headers. Possible
        behaviors are:

        false (default)
            If a page is in the cache, the origin HTTP server is always
            checked for a fresher copy with an If-Modified-Since and/or
            If-None-Match header.

        1   If set to 1, the origin HTTP is never contacted, regardless of
            the page being in cache or not. If the page is missing from
            cache, the fetch method will return undef. If the page is in
            cache, that page will be returned, no matter how old it is. Note
            that setting this option means the URI::Fetch::Response object
            will never have the http_response member set.

        "N", where N > 1
            The origin HTTP server is not contacted if the page is in cache
            and the cached page was inserted in the last N seconds. If the
            cached copy is older than N seconds, a normal HTTP request (full
            or cache check) is done.

    *   ContentAlterHook

        Optional. A subref that gets called with a scalar reference to your
        content so you can modify the content before it's returned and
        before it's put in cache.

        For instance, you may want to only cache the <head> section of an
        HTML document, or you may want to take a feed URL and cache only a
        pre-parsed version of it. If you modify the scalarref given to your
        hook and change it into a hashref, scalarref, or some blessed
        object, that same value will be returned to you later on
        not-modified responses.

    *   CacheEntryGrep

        Optional. A subref that gets called with the *URI::Fetch::Response*
        object about to be cached (with the contents already possibly
        transformed by your "ContentAlterHook"). If your subref returns
        true, the page goes into the cache. If false, it doesn't.

    *   Freeze

    *   Thaw

        Optional. Subrefs that get called to serialize and deserialize,
        respectively, the data that will be cached. The cached data should
        be assumed to be an arbitrary Perl data structure, containing
        (potentially) references to arrays, hashes, etc.

        Freeze should serialize the structure into a scalar; Thaw should
        deserialize the scalar into a data structure.

        By default, *Storable* will be used for freezing and thawing the
        cached data structure.

    *   ForceResponse

        Optional. A boolean that indicates a *URI::Fetch::Response* should
        be returned regardless of the HTTP status. By default "undef" is
        returned when a response is not a "success" (200 codes) or one of
        the recognized HTTP status codes listed above. The HTTP status
        message can then be retreived using the "errstr" method on the
        class.

LICENSE
    *URI::Fetch* is free software; you may redistribute it and/or modify it
    under the same terms as Perl itself.

AUTHOR & COPYRIGHT
    Except where otherwise noted, *URI::Fetch* is Copyright 2004 Benjamin
    Trott, ben+cpan@stupidfool.org. All rights reserved.