pp_addhdr(' #include "ricecomp.c" '); pp_addpm({At=>'Top'},<<'EOD'); =head1 NAME PDL::Compression - compression utilities =head1 DESCRIPTION These routines generally accept some data as a PDL and compress it into a smaller PDL. Algorithms typically work on a single dimension and thread over other dimensions, producing a threaded table of compressed values if more than one dimension is fed in. The Rice algorithm, in particular, is designed to be identical to the RICE_1 algorithm used in internal FITS-file compression (see PDL::IO::FITS). =head1 SYNOPSIS use PDL::Compression ($b,$asize) = $a->rice_compress(); $c = $b->rice_expand($asize); =cut EOD pp_addpm({At=>'Bot'},<<'EOD'); =head1 AUTHORS Copyright (C) 2010 Craig DeForest. All rights reserved. There is no warranty. You are allowed to redistribute this software / documentation under certain conditions. For details, see the file COPYING in the PDL distribution. If this file is separated from the PDL distribution, the copyright notice should be included in the file. The Rice compression library is derived from the similar library in the CFITSIO 3.24 release, and is licensed under yet more more lenient terms than PDL itself; that notice is present in the file "ricecomp.c". =head1 BUGS =over 3 =item * Currently headers are ignored. =item * Currently there is only one compression algorithm. =back =head1 TODO =over 3 =item * Add object encapsulation =item * Add test suite =back =cut EOD pp_addpm(<<'EOD'); =head1 METHODS =cut EOD pp_def( "rice_compress", HandleBad => 0, Pars => 'in(n); [o]out(m); int[o]len(); lbuf(n)', OtherPars => "int blocksize", # in OtherPars to avoid autopromotion GenericTypes =>['B','S','US','L'], Doc => <<'EOD', =for ref Squishes an input PDL along the 0 dimension by Rice compression. In scalar context, you get back only the compressed PDL; in list context, you also get back ancillary information that is required to uncompress the data with rice_uncompress. Multidimensional data are threaded over - each row is compressed separately, and the returned PDL is squished to the maximum compressed size of any row. If any of the streams could not be compressed (the algorithm produced longer output), the corresponding length is set to -1 and the row is treated as if it had length 0. Rice compression only works on integer data types -- if you have floating point data you must first quantize them. The underlying algorithm is identical to the Rice compressor used in CFITSIO (and is used by PDL::IO::FITS to load and save compressed FITS images). The optional blocksize indicates how many samples are to be compressed as a unit; it defaults to 32. How it works: Rice compression is a subset of Golomb compression, and works on data sets where variation between adjacent samples is typically small compared to the dynamic range of each sample. In this implementation (originally written by Richard White and contributed to CFITSIO in 1999), the data are divided into blocks of samples (by default 32 samples per block). Each block has a running difference applied, and the difference is bit-folded to make it positive definite. High order bits of the difference stream are discarded, and replaced with a unary representation; low order bits are preserved. Unary representation is very efficient for small numbers, but large jumps could give rise to ludicrously large bins in a plain Golomb code; such large jumps ("high entropy" samples) are simply recorded directly in the output stream. Working on astronomical or solar image data, typical compression ratios of 2-3 are achieved. =for usage $out = $pdl->rice_compress($blocksize); ($out, $len, $blocksize, $dim0) = $pdl->rice_compress; $new = $out->rice_expand; =cut EOD PMCode => <<'EOD', sub PDL::rice_compress { my $in = shift; my $blocksize = shift || 32; ## Reject floating-point inputs if( $in->type != byte && $in->type != short && $in->type != ushort && $in->type != long ) { die("rice_compress: input needs to have type byte, short, ushort, or long, not ".($in->type)."\n"); } # output buffer starts the same size; truncate at the end. my ($out) = zeroes($in); # line buffer is here to make sure we don't get fouled up by transpositions my ($lbuf) = zeroes($in->type, $in->dim(0)); # lengths go here my ($len) = zeroes(long, $in->slice("(0)")->dims); &PDL::_rice_compress_int( $in, $out, $len, $lbuf, $blocksize ); $l = $len->max; $out = $out->slice("0:".($l-1))->sever; if(wantarray) { return ($out, $in->dim(0), $blocksize, $len); } else { return $out; } } EOD Code => <<'EOD', // Copy current row into the row buffer (to ensure // contiguous location in memory) loop(n) %{ $lbuf() = $in(); %} $len() = 1 + ( rcomp( &($lbuf(n=>0)), sizeof($lbuf(n=>0)), $SIZE(n), (unsigned char *)(&($out(m=>0))), $SIZE(m) * sizeof($out(m=>0)), $COMP(blocksize) ) - 1 ) / sizeof($out(m=>0)); EOD ); pp_def( "rice_expand", HandleBad=>0, Pars=>'in(n); [o]out(m); lbuf(n)', OtherPars=>'int blocksize', GenericTypes=>['B','S','US','L'], Doc=><<'EOD', =for ref Unsquishes a PDL that has been squished by rice_expand. =for usage ($out, $len, $blocksize, $dim0) = $pdl->rice_compress; $copy = $out->rice_expand($dim0, $blocksize); =cut EOD PMCode => <<'EOD', sub PDL::rice_expand { my $squished = shift; my $dim0 =shift; my $blocksize = shift || 32; # Allocate output array my $out = zeroes( $squished->slice("(0),*$dim0") ); # Allocate row buffer to avoid weird memory edge case my $lbuf = zeroes($squished->type, $squished->dim(0)); &PDL::_rice_expand_int( $squished, $out, $lbuf, $blocksize ); return $out; } EOD Code=><<'EOD', loop(n) %{ $lbuf() = $in(); %} rdecomp( (unsigned char *)(&($lbuf(n=>0))), $SIZE(n) * sizeof($lbuf(n=>0)), &($out(m=>0)), sizeof($out(m=>0)), $SIZE(m), $COMP(blocksize) ); EOD ); pp_done();