New in version 1.32: - Fixed more installation problems (Thanks to Craig Brockmeier ) - Dropped DBI requirement during installation because it's really optional. - Changed commercial version so it can run on any operating system with the same registration key. (Old keys will continue to work if on the same operating system, but new ones will need to be issued if a new operating system is needed.) - Made handler loading routines more robust to illegal characters in the handler name. New in version 1.31: - Fixed a couple installation problems (Server files weren't being installed.) New in version 1.30: - Fixed a bug in detection of incompatible configuration file. - Removed cacheimages option processing code from NewsClipper.pl, and put it in the cacheimages handler. - Added DBI and DBD::mysql module requirements to Makefile.PL. (Forgot to add them in version 1.28) - "Server is down" warning is now shown only once for each run. - Fixed a bugs in handler server connect timeouts and retries. - -C now also gives the option to clear the debug and run logs. - Direct database connections are now used only if DBI and DBD::mysql are available. Otherwise, the older (and slower) web-based connections are used. - Fixed unhelpful warning when invalid handler names are used. - Added support for emailing output files (Thanks to Lee Adler for the suggestion) - Mail::Sendmail is now required if you use the email feature. New in version 1.29: - The Log::Agent and Log::Agent::Rotate packages are now required - Debug information is now sent to a log file determined by the debug_log_file configuration parameter. - Information about the execution of News Clipper is stored in the file determined by the run_log_file configuration parameter. - Errors are now saved in the run log, which prevents private information about the News Clipper commands from being printed in the HTML comments. (An HTML message tells you to look in the log file.) - New "lprint" function can be used within handlers to print to the run log. - The new max_number_of_log_files and max_log_file_size configuration parameters control the rotation of log files. - "tagText" configuration value allows users to set the comment keyword that indicates News Clipper commands - "makeOutputFilesExecutable" configuration value allows users to set whether or not the output files should be made executable - ConvertConfig.pl now saves updated config files even if a .bak file already exists. New backups are saved as .bak2, .bak3. etc. - The amount of time to hold a lock and the amount of time to wait for a lock are now based on the scriptTimeout value. - Updated Makefile.PL to use Win32::TieRegistry instead of older Win32::Registry. - Makefile.PL now determines old installation directory for single-user installations. - Fixed a use of unitialized value warning in HandlerFactory.pm. - Standardized config options to use_this_style instead of thisStyle or thisstyle. New in version 1.28: - New handler check now uses direct database connection instead of CGI scripts. (Five times faster) - The lockfile is now explicitly cleaned up. (May help prevent stale locks.) - Added -H flag to override home directory location - Modified -C flag to remove handler-specific state and News Clipper state, and to exit immediately afterwards - Fixed a URL handling bug in ConvertHandler - Added check in ConvertHandler to suggest that RunHandler be used instead of NewsClipper::HandlerFactory::Create - News Clipper should now only remove the old output file if the new one was created successfully. - modulepath can now contain multiple paths separated by whitespace - Fixed a bug where a newly downloaded handler update would not reload correctly. - StripAttributes bug fixed: now recognizes attributes with empty quotes like alt="" - Arguments to News Clipper are now treated as -e commands. This allows "NewsClipper slashdot" to be run to fetch and print slashdot headlines. - Debug output now correctly says the script name is "NewsClipper" instead of "NewsClipper2" (on compiled versions) - Compiled versions now have the default @INC set in modulepath during installation, allowing NC to find externally installed Perl modules. New in version 1.27: - Modified NewsClipper.cfg to use a local URL for the imagecache to help News Clipper work "out of the box" - Cause of "undefined value in File::Cache" bug found and repaired New in version 1.26: - Added -P flag to pause after completion. New in version 1.25: - Added LOCAL_TIME_ZONE option to timezone specifications, which causes the local time zone to be used. (Thanks to Gerhard Wiesinger for the suggestion and reminder to implement this feature.) - Re-implemented caching to use a different persistence mechanism. (This should finally squash the dreaded undefined value in File::Cache error.) New in version 1.24: - Added a lockfile (~/.NewsClipper/lock) that prevents more than one copy of News Clipper from running at the same time. (LockFile::Simple module now required.) - Faster verification of handler type during counting of number of installed acquisition handlers (affects commercial versions only) - Decreased dependence on handler server during couting of number of installed acquisition handlers (affects commercial versions only) - Installation no longer requires "make NewsClipper_Cleanup" step - Installation now automatically converts older configuration files. - Added ExtractTables function to HTMLTools. (Thanks to Chris Pimlott for the initial implementation and idea.) - Non-fatal errors while contacting the handler server are now printed in the output as HTML comments. - A few speed improvements. New in version 1.23: - Fixed MakeHandler.pl so that it doesn't insert comment characters ("#") into the syntax information. (bugfix) New in version 1.22: - Added a check to make sure that an incompatible handler will not be loaded and used. - Improved error reporting during replacement of old output file with new one. New in version 1.21: - Fixed a bug in error message reporting - Expanded the arguments to the handler subroutines ProcessAttributes, FilterType, OutputType, GetUpdateTimes, and GetDefaultHandlers, providing attribute information and data. [This should be backwards compatible with existing handlers which do not expect or use the new data.] - Fixed a bug where a newly downloaded update to a handler would not be used until the next execution of News Clipper. - Added use of socketTries through more of the code (not just when fetching remote data). - Added "forNewsClipperVersion" attribute to NewsClipper.cfg - Updated ConvertHandler and ConvertConfig to detect the version of the handler or NewsClipper.cfg, and only apply the changes that are necessary to bring it up to the current version. - Added FTP code which somehow didn't get into version 1.20. - Added support for type names containing spaces. New in version 1.20: - Support for FTPing output files (Thanks to Rob Saunders for the suggestion.) This allows users who don't have shell access to their web server to run News Clipper on a local machine, and send the output files to the web server. - News Clipper will check last modified dates before downloading content from the remote server. - Fixed bug where directives were stripped out erroneously. (Thanks to Kwon Soon Son for the bug report.) - Improved image caching. - Fixed a bug in HTMLTools::StripAttributes. (Thanks to Paco Hope for finding the bug.) - New "lastupdated" handler gets time data was last updated for a News Clipper input command. - New "dumpdata" output handler will dump the type signature and values for the data. - ... can now be used to mark blocks of HTML that will be removed from the output during processing. This is useful for putting "dummy" content right after a News Clipper command during input file design to take up space. - News Clipper now checks that remote content has changed before downloading information that has expired from the cache. - Added -C flag to clear the News Clipper cache. - Reworked cache to use File::Cache module. (This should fix problems related to missing cache files.) - Restructured the code to make it more understandable. Commented every function. - Added "socketTries" option to configuration, which specifies the number of times News Clipper should try to get data from the server before giving up. This defaults to 3. (Implemented by Ludovic Drolez ) - Modified GetHtml, GetLinks, and GetImages to generate correct links when the source web page uses (Thanks to Dominique ROUSSEAU for the initial implementation.) - Fixed a couple (more) bugs in the "check if the cached data needs to be updated" function. - Improved error message reporting. - Added versioning information to prevent accidental download of handlers that can break the input files. - Handler-specific configuration information is now separated in the NewsClipper.cfg file. - ConvertConfig-1.20.pl and ConvertHandler-1.20.pl help to convert config files and handler to the new style. - Fixed a bug in HTMLsubstr - Significantly improved API for handler writers: - Each handler has its own configuration information in the user's NewsClipper.cfg file. This information can be accessed from within the handler through the %handlerconfig hash. - The comment block has been replaced with a hash, which will help with the automated processing of the handler. - "Maintainer_Name", "Maintainer_Email", and "Language" fields have been added. - A new "CATEGORY" field says what category the handler belongs. - A new "FOR NEWSCLIPPER VERSION" field tells what version of News Clipper the handler was built for. This helps accidental download of incompatible handlers. - Version numbers have been updated so that you can indicate when a change will break people's input files, when a change adds functionality, and when a change fixes a bug. - You can save values between runs using $self->{'state'}, which is just a File::Cache object. ($self->{'state'}->set('key','value'), then $self->{'state'}->get('key')) - Types have changed substantially: - FilterType and OutputType return a "type signature", like "@($ | %)", which means "an array of either scalars or hashes". @($ & %Slashdot) means an array of at least one scalar and at least one Slashdot hash. - A hunk of data matches a type signature if the structure matches, and any subtype relationships hold. For example, a type signature of @($ | %) would be matched by a data strcuture whose type signature is @($URL), %@Myarray(%), and @Myarray($URL & %Slashdot), but not by @(@). - Everything in the data structure is a reference -- even scalars. - Each new data value is blessed, not the whole data structure. - Data values don't have to be bless as "String", "Array", or "Hash". - However, any new types introduced should be related to the others with the MakeSubtype function: MakeSubtype('Slashdot','HASH'); - There's a new API function called TypesMatch(), which takes a data reference and a types signature, and returns 1 if the data matches the signature. - dprint() can be used to send output as messages in debug mode. - error() can be used to output errors about invalid arguments, etc. - Default handlers are now specified as real News Clipper commands, like this: - FilterType and OutputType now provide the $data and $attributes as arguments, so you can return a different type signature depending on parameters such as "style". - ComputeURL should now be used to compute the URL from the attributes. - You should never have to check the type of $grabbedData -- News Clipper will do that automatically. - A RunHandler API function was added, which takes the handler name, type of handler, data, and attributes. It will automatically check for correct types. - Some "use" statements are no longer needed. New in version 1.17: - A parsing bug in links was found (and fixed!) by Ludovic Drolez (Ludovic Drolez ) - The way the configuration information is loaded has changed: - On non-Windows machines, News Clipper always loads the system-wide configuration file NEWSCLIPPER/NewsClipper.cfg, where NEWSCLIPPER is the path specified by the NEWSCLIPPER environment variable. - Per-user configuration information is then loaded from $home/.NewsClipper/NewsClipper.cfg, or the file specified with the -c option. - Personal config files can now override system-wide configuration information on an item-by-item basis. This means that the user's config file doesn't have to have "$ENV{TZ}", imgcache settings, proxy settings, etc. if all they want to do is override the timeout values. - Fixed a bug where cached data would not be used if a HTTP request failed. Thanks to Vadim Strizhevsky for finding the bug. - Fixed a bug in checking version information for update of handlers. - Consolidated common code into NewsClipper::Globals (dprint, DEBUG, etc.) Added a few useful functions to improve code and output message readability (dequote, reformat). - Improved timeout handling, so that errors get propagated correctly, and the script exits with the correct value. (Thanks to Ludovic Drolez for finding the bug.) Hopefully News Clipper works correctly in makefiles now. - Fixed a potential bug in NewsClipper::HTMLTools::HTMLsubstr New in version 1.16: - Updated compiler-related code - Changed licensing code. - Updated README instructions - Improved installation instructions - Improved installation script - Changed numbering scheme to X.YZ, where X is the major version number, Y is increased with each functionality enhancement, and Z is increased with each bugfix release for a given Y. - NOTE: The open source version will lead the commercial version in releases. The "official" version number will be the commercial version. New in version 1.0104: - Fixed a performance bug in the Trial version code (HandlerFactory.pm). - -h now prints version and registration information New in version 1.0102: - Fixed the brain-dead non-monotonically increasing version numbering :) - Fixed a bug in input file validation, when "STDIN" is used as the file. (Thanks to Jamie Wall for finding it) - Fixed a bug with -c. (Thanks to morbus@disobey.com for helping to find it) - Fixed a bug that occurs when checking for new versions and the handler server is down. (Thanks to tg@linuxmafia.org for helping to find it.) - Fixed a typo in the NewsClipper.cfg file: maxcachesize is in MB, not bytes. - Updated the template to work with the new yahootopstories handler. - Made Makefile.PL for unix installations interactive instead of specifying everything on the command line. New in version 1.0001: - Code modified to point to the new handler server at handlers.newsclipper.com (instead of newsclipper.binaryresearch.net). - Fixed typo in MakeHandler.pl (\\&main::dprint, not \&main::dprint) - Added licensing code: - email and regKey in NewsClipper.cfg, - key verification in NewsClipper.pl - handler counting code in HandlerFactory.pm - Nag in output for trial version - Clarified a few error messages. New in version 1.00: - Significantly faster update of out-of-date handlers. - Anti-abuse measures implemented to help reduce unnecessary load on the handler server. - Added types to the handlers and News Clipper commands, which helps catch errors. - A new name, new web page, fancy compiled versions, a nice user's manual, a dedicated server, and improved technical support. (Well, most of that's for the commercial version. Non-commercial users only get the name, web page, and server :) -------------- History before Daily Update became News Clipper -------------- New in version 7.1: - You can now specify "STDIN" and "STDOUT" as the input and output files in DailyUpdate.cfg. This allows you to use Daily Update as a filter, piping the data to and from it. You can also use a file name and STDOUT to force DU to output to standard output. - Debugging mode is now activated using the -d switch. It outputs lots of useful information that can help you nail down problems you might have. - Some speed improvements. - Date::Manip has been replaced by Time::CTime and Time::ParseDate, which are a lot faster. New in version 7.0: - This is a major rewrite, with a new syntax and improved support for web developers. - See the file MIGRATING for hints on how to move to the new version from older versions, and how to upgrade handlers you've written. - None of the old handlers work. You'll have to download the new ones and migrate the ones you've written yourself but haven't submitted to me. - Improved caching - New, separated input, filtering, and output commands. This means you'll have to update your input files to reflect the new syntax. - New and improved MakeHandler.pl - -v for verbose output - Added StripAttributes HTMLsubstr TrimOpenTags functions to HTMLTools. - Added day of the week and timezone support to handler update times. - New web support for browsing handlers - Daily Update mailing list - Now Daily Update prints a message if it times out. - When a handler fails, a comment is inserted in the output instead of printing it to visible HTML. - Miscellaneous bug fixes - Miscellaneous bugs introduced. :) (I reworked a lot of the code.) New in version 6.1: - Miscellaneous bug fixes and improvements to GetText, GetLinks, and GetImages. - Added -v (verbose output) support. - Setup changed so -I isn't necessary when running as a CGI program. - Message now displayed when Daily Update times out. - Output file now automatically chmod'd to 755. (Executable in case you call server-side scripts like I do.) New in version 6.02: - Added return values for those of you who like to use makefiles to build web pages. - Added documentation for MakeHandler.pl. - Bug fixed in output columns function. (Thanks to Craig Brockmeier for finding it.) - I've revamped the configuration and install procedure to make it more amenable to site-wide installation. DailyUpdate.cfg is looked for in ~/.DailyUpdate, and handlers are installed in the first directory specified by handlerlocations in DailyUpdate.cfg. (~/.DailyUpdate by default.) You'll need to fix up your handlerLocations variable if you're upgrading from a previous version. (Thanks to John Goerzen for his invaluable input.) - Added support for proxies requiring passwords. (Thanks to Kevin D. Clark .) - Fixed a bug in HTMLTools.pm. (Thanks to Mark Harburn ). New in version 6.01: - Added -r switch to force proxies to reload cached data during data acquisition. (Thanks to Gerhard Wiesinger ) - Fixed a couple of minor bugs. - Improved installation to automatically set #! line and set INSTALLDIRECTORY in DailyUpdate.pl. - Configuration doesn't have to be in the same directory as the DailyUpdate.pl file. (You can leave the DailyUpdate.cfg file in the install directory, and move the DailyUpdate.pl file wherever you want, like cgi-bin.) - During automatic installation, handlers are now put in the install directory, not the directory of the currently running DailyUpdate.pl script. New in version 6.0: - Sorry about the big version jump. I had to do it because I switched over to the standard Perl numbering scheme. (5.2 becomes 5.02, which is less than 5.1). Maybe some of these features will justify the leap. :) - Now the installation mimicks the usual "perl Makefile.PL;make;make install" of perl modules. - The script now supports multiple input and output files in the configuration file. - Changed the syntax from to so that WYSIWYG editors won't croak on the 'unknown' tag. - Moved HTML-related functionality from AcquisitionFunctions.pm to HTMLTools.pm, and added function StripTags. - Fixed 2 bugs in MakeLinksAbsolute. (Thanks to Kazuo Moriwaka and Phillip Gersekowski ) - Enhanced Handler to work better in the face of bad data acquisition and filtering. - Added OutputList to OutputFunctions.pm, which allows you to set the format and number of columns when printing out lists. New in version 5.1: - Update times are now set by the handlers by overriding GetUpdateTimes. (The default is 2,5,8,11,14,17,20,23.) Users can also customize the update times in the configuration file, if they don't like the ones given by the handlers. Modified MakeHandler.pl to support this. - Fixed a caching bug that would cause (a) multiple copies of some data when a tag is used more than once on the same page, and (b) reuse of cached data even when the style has been changed. - Added -a flag for automatic download of handlers. - Added -n flag to check for new versions of handlers. - Fixed prerequisites. - Removed use of the deprecated HTML::Parse in AcquisitionFunctions.pm New in version 5.0.2: - Fixed data caching. - MakeHandler.pl now stores the URL in the comment block. - Created POD documentation in DailyUpdate.pl - Fixed a problem with the loading of configuration information New in version 5.0.1: - Fixed a bug in MakeHandler.pl - Improved error checking for missing Perl modules when installing new handlers. New in version 5.0: This is a major rewrite. - Handlers, implemented as Perl classes, are now used instead of the schemas of previous versions. The GetHtml, GetText, OutputTwoColumn, OutputUnorderedList, etc. functions are now part of two APIs that can be used by handler writers: DailyUpdate::AcquisitionFunctions, and DailyUpdate::OutputFunctions. - No handlers are distributed with Daily Update. Instead, the user is prompted for permission to install them as necessary. This makes using a new tag easy -- just put it in your input file, run Daily Update once manually, and tell it to download the handler. - The handlers are real Perl, instead of the pseudo Perl used in the schemas. This increases the difficulty of adding new tags, but gives people a lot more flexibility. To help things, I've included a script "MakeHandler.pl" which will create a prototype handler based on user inputs. - GetText now uses HTML::FormatText. - Tags are now of the syntax New in version 4.6: - Added notification support for new versions. - Added -c flag for user-specified configuration file. Dropped use of POSIX (for NT compatibility.) - Fixed a bug in relative to absolute link conversion. (Thanks to C. Harald Koch ) - URLs can now be of the format "file://...". - Added GetImages acquisition function (thanks to Tanner Lovelace ). - Now the %attributes hash can be referenced in the URL or tag name of a schema--added tag to illustrate its use. - Made changes to the config file to support better schema design. - Added "How to Write Schemas" page at http://www.cs.virginia.edu/~dwc3q/code/writeschemas.html. - Moved to URI from URI::URL, which is now deprecated. New in version 4.5: - Added acquisition function GetHtml. - Added -i and -o flags for input and output files. - Now reads old html file only once (faster). - Added "always" schema time option. - Now checks script directory for configuration file. - Miscellaneous bug fixes, new data sources, and minor enhancements. New in version 4.4: - Separated configuration from main script (yea!). - Fixed "uninitialized variable" warnings. - Created user-submitted schemas webpage at http://www.cs.virginia.edu/~dwc3q/code/schemas.html. - Added more data acquisition schemas. New in version 4.3: - Added Infoworld, and Adam@Home. - Changed date & time handlers to take any valid strftime format string, and made url attribute to weather mandatory. - Changed CoolSite to use their logo. - Script only dumps to screen if called as a cgi script. - Added default values to date, time, and OutputListOrColumns. New in version 4.2: - Added "linuxtoday", and "time". - Modified GetText & GetLinks to take "^" and "\$" signifying start and end of file. - Added proxy support. - Fork is now disabled on Windows platforms. - Moved from HTML::Parse to HTML::Parser. New in version 4.1: - Added "User Friendly" and "Freshmeat" support. - Fixed Slashdot so that it takes a style. - Changed it to use LWP instead of my own HTTP sockets stuff. New in version 4.0: - This is a total rewrite, resulting in a modular, extensible script that is half the size of the previous version. New in version 3.5.4: - Added Peanuts (Thanks to Alvaro Herrera ) - Also fixed the news from the AP. New in version 3.5.2: - Fixed a bug in the routine to detect if we need to update the data. New in version 3.5.1: - Improved calculation of when cached data should be used. Hopefully it's less buggy. New in version 3.5: - Added support for Slashdot. Did a major cleanup of the code. New in version 3.4.6: - Fixed script to work with new Calvin and Hobbes webpage. - Improved algorithms for changing relative URLs to absolute in headlines. New in version 3.4.5: - More enhanced parsing of weather. New in version 3.4.4: - Fixed Dr. Science and Sports. - Enhanced parsing of weather. New in version 3.4.2: - Fixed a bug in the weather processing code. New in version 3.4.1: - Fixed the extra line in the headlines New in version 3.4: - Updated DailyUpdate to work with the new CNN/SI homepage. New in version 3.3.3: - Updated DailyUpdate to work with the new Yahoo headlines format New in version 3.3.1: - Fixed dilbert to handle absolute & relative URLs for the image. New in version 3.3.1: - Fixed dilbert to work with the new Dilbert Zone format New in version 3.3: - Added Calvin and Hobbes - Fixed a bug that caused nothing to be output for the first use of the tags. New in version 3.2.1: - Finally fixed the timeouts to work correctly. New in version 3.2: - Thanks to Christian Blair for helping to make the script more robust. New in version 2.9: - Thanks to Bob Anzlovar for the addition of the Dr. Science question of the day. - He also changed the script to "use Socket" instead of "require socket.ph". I agree with his thinking on this, however users of older versions of Perl will have to do some comment work on the geturl procedure. In a couple of versions I'm going to stop backward support, since Perl 5 is the way to go... New in version 2.8: - In this version I've commented out the telnet for weather and replaced it with an html grab, which seems much faster. For those people who need the weather thrice daily (and have a fast connection) or who can't get the forecast they need from http://cirrus.sprl.umich.edu/wxnet/states/states.html, put the telnet command in line 173 back in and remove the wwwgrab line. Then comment out the weather site and remove the comments for a telnet weathersite.