# # $Id: ASCX12.pm,v 1.17 2004/09/28 14:59:34 brian.kaney Exp $ # # XML::ASCX12 # # Copyright (c) Vermonster LLC # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # For questions, comments, contributions and/or commercial support # please contact: # # Vermonster LLC # 312 Stuart St. 2nd Floor # Boston, MA 02116 US # # vim: set expandtab tabstop=4 shiftwidth=4 # =head1 NAME XML::ASCX12 - ASCX12 EDI to XML Module =head1 SYNOPSIS use XML::ASCX12; my $ascx12 = new XML::ASCX12(); $ascx12->convertfile("/path/to/edi_input", "/path/to/xml_output"); =head1 INFORMATION =head2 Module Description XML::ASCX12 started as a project to process X12 EDI files from shipping vendors (i.e. transaction sets 110, 820 and 997). However, this module can be extended to support any valid transaction set (catalog). =head2 Why are you doing this? If you've ever taken a look at an ASCX12 document you'll see why. The EDI format is very compact, which makes is great for transmission. However this comes at a cost. The main challenge when dealing with EDI data is parsing through the structure. Here we find loops within loops within loops. In this non-extensible, flat format, human parsing is nearly impossible and machine parsing is a task at best. A quick background of how a typical EDI is formed: +----> ISA - Interchange Control Header | GS - Functional Group Header <--------+ | ST - Transaction Set Header | Envelope [transaction set specific] Functional Group | SE - Transaction Set Trailer | | GE - Functional Group Trailer <--------+ +----> ISE - Interchange Control Trailer The Transmission Envelope can have one or more Functional Group. A Functional Group can have one or more Transaction Set. Then each specific catalog (transaction set) can have it's own hierarchical rules. This sort of structure really lends itself to XML. So using the power of Perl, this module was created to make accessing EDI information easier. To learn more, the official ASC X12 group has a website L. =head2 Module Limitations This is a new module and has a few limitations. =over 4 =item * EDI -> XML This module converts from EDI to XML. If you want to go in the other direction, suggest creating an XSL stylesheet and use L or similar to preform a transformation. =item * Adding Transaction Sets Adding new catalogs is a manual process. The L and the L need to be manually updated. A future development effort could store this information in dbm files with an import script if demand exists. =back =head2 Style Guide You will (hopefully) find consistent coding style throughout this module. Any private variable or method if prefixed with an underscore (C<_>). Any static method or variable is named in C. The tabs are set at 4 spaces and the POD is physically close to the stuff it is describing to promote fantastic ongoing documentation. =cut package XML::ASCX12; use 5.008004; use strict; use warnings; no warnings 'utf8'; use bytes; our $VERSION = '0.03'; =head1 REQUIREMENTS We use the L module to handle errors. Some day there may be a better error handler and maybe an error object to reference, but for now it croaks when there is a problem. L module is required and probably part of this package, as is the L. =cut use Carp qw(croak); use XML::ASCX12::Catalogs qw($LOOPNEST load_catalog); use XML::ASCX12::Segments qw($SEGMENTS $ELEMENTS); =head1 VARIABLE AND METHODS =head2 Private Variables These variables are not exported and not intended to be accessed externally. They are listed here for documentation purposes only. =over 4 =item C<@_LOOPS> Dynamic and keeps track of which loop we are on. =item C<%_XMLREP> Static variable used to lookup bad XML characters. =item C<$_XMLHEAD> Static variable containing the XML header for the output. =back =cut use vars qw(@_LOOPS %_XMLREP $_XMLHEAD); %_XMLREP = ( '&' => '&' ,'<' => '<' ,'>' => '>' ,'"' => '"' ); $_XMLHEAD = qq||; =head2 Public Methods =over 4 =item object = new([$segment_terminator], [$data_element_separator], [$subelement_separator]) The new method is the OO constructor. The default for the segment terminator is ASCII C<85> hex. The default for the data element separator is ASCII C<1D> hex. The default for the sub-element separator is ASCII C<1F> hex. my $xmlrpc = new XML::ASCX12(); The defaults can be overridden by passing them into the constructor. my $xmlrpc = new XML::ASCX12('\x0D', '\x2A', '\x3A'); The object that returns is now ready to transform EDI files. =cut sub new { my ($name, $st, $des, $sbs) = @_; $st = '\x85' unless $st; $des = '\x1D' unless $des; $sbs = '\x1F' unless $sbs; my $class = ref($name) || $name; my $self = { ST=>$st, DES=>$des, SBS=>$sbs }; bless ($self, $class); return $self; } =item boolean = $obj->convertfile($input, $output) This method will transform and EDI file to XML using the configuration information passed in from the constructor. my $xmlrpc = new XML::ASCX12(); $xmlrpc->convertfile('/path/to/EDI.dat', '/path/to/EDI.xml'); You may also pass filehandles (or references to filehandles): $xmlrpc->convertfile(\*INFILE, \*OUTFILE); =cut sub convertfile { my ($self, $in, $out) = @_; my ($inhandle, $outhandle); my ($bisinfile, $bisoutfile); $self->_unload_catalog(); if (ref($out) eq "GLOB" or ref(\$out) eq "GLOB" or ref($out) eq 'FileHandle' or ref($out) eq 'IO::Handle') { $outhandle = $out; } else { local(*XMLOUT); open (XMLOUT, "> $out") || croak "Cannot open file \"$out\" for writing: $!"; $outhandle = *XMLOUT; $bisoutfile = 1; } my $st_check=0; my $des_check=0; print {$outhandle} $XML::ASCX12::_XMLHEAD; { if (ref($in) eq "GLOB" or ref(\$in) eq "GLOB" or ref($in) eq 'FileHandle' or ref($in) eq 'IO::Handle') { $inhandle = $in; } else { local(*EDIIN); open (EDIIN, "< $in") || croak "Cannot open file \"$in\" file for reading: $!"; $inhandle = *EDIIN; $bisinfile = 1; } binmode($inhandle); (my $eos = $self->{ST}) =~ s/^\\/0/; local $/ = pack("C*", oct($eos)); # Looping per-segment for processing while (<$inhandle>) { if (!$st_check) { $st_check = 1 if m/$self->{ST}/; } if (!$des_check) { $des_check = 1 if m/$self->{DES}/; } chomp; print {$outhandle} $self->_proc_segment($_); } # This is done to close any open loops # XXX Is there a better way to "run on more time"? print {$outhandle} $self->_proc_segment(''); } print {$outhandle} ''; (close($inhandle) || croak "Cannot close output file \"$out\": $!") if $bisinfile; (close($outhandle)|| croak "Cannot close input file \"$in\": $!") if $bisoutfile; croak "EDI Parsing Error: Segment Terminator \"$self->{ST}\" not found" unless $st_check; croak "EDI Parsing Error: Data Element Seperator \"$self->{DES}\" not found" unless $des_check; return 1; } =item string = $obj->convertdata($input) This method will transform an EDI data stream, returning wellformed XML. my $xmlrpc = new XML::ASCX12(); my $xml = $xmlrpc->convertdata($binary_edi_data); =cut sub convertdata { my ($self, $in) = @_; croak "EDI Parsing Error: Segment Terminator \"$self->{ST}\" not found" unless ($in =~ m/$self->{ST}/); croak "EDI Parsing Error: Data Element Seperator \"$self->{DES}\" not found" unless ($in =~ m/$self->{DES}/); my $out = $XML::ASCX12::_XMLHEAD; (my $eos = $self->{ST}) =~ s/^\\/0/; my @data = split(pack("C*", oct($eos)), $in); foreach(@data) { $out .= $self->_proc_segment($_); } $out .= $self->_proc_segment(''); return $out; } =item string = XMLENC($string) Static public method used to encode and return data suitable for ASCII XML CDATA $xml_ready_string = XML::ASCX12::XMLENC($raw_data); =cut sub XMLENC { my $str = $_[0]; if ($str) { $str =~ s/([&<>"])/$_XMLREP{$1}/ge; # relace any &<>" characters $str =~ s/[\x80-\xff]//ge; # get rid on any non-ASCII characters $str =~ s/[\x01-\x1f]//ge; # get rid on any non-ASCII characters } return $str; } =back =head2 Private Methods =over 4 =item string = _proc_segment($segment_data); This is an internal private method that processes a segment. It is called by C<_proc_transaction()> while looping per-segment. =cut sub _proc_segment { my ($self, $segment) = @_; $segment =~ s/\n//g; if ($segment =~ m/[0-9A-Za-z]*/) { my ($segcode, @elements) = split(/$self->{DES}/, $segment); if ($segcode and $segcode eq "ST") { $self->_unload_catalog(); $self->load_catalog($elements[0]); } # check to see if we need to close a loop my $curloop = $XML::ASCX12::Segments::SEGMENTS->{$segcode}[3] if $segcode; my $xml = ''; if (my $tmp = $self->_closeloop($curloop, $self->{lastloop}, $segcode)) { $xml .= $tmp; } if (@elements) { # check to see if we need to open a loop if (my $tmp = $self->_openloop($curloop, $self->{lastloop})) { $xml .= $tmp; } # now the standard segment (and elements) $xml .= '{$segcode}[0]).'"' if $XML::ASCX12::Segments::SEGMENTS->{$segcode}; $xml .= '>'; # make our elements $xml .= $self->_proc_element($segcode, @elements); # close the segment $xml .= ''; # keep track $self->{lastloop} = $curloop; } return $xml; } } =item string = _proc_element($segment_code, @elements) This is a private method called by C<_proc_segment()>. Each segment consists of elements, this is where they are processed. =cut sub _proc_element { my ($self, $segcode, @elements) = @_; my $i = 1; my $xml = ''; foreach (@elements) { if ($_ =~ /[0-9A-Za-z]/) { my $elename; $elename = $segcode.$i if $i >= 10; $elename = $segcode.'0'.$i if $i < 10; $xml .= '{$elename}[0]).'"' if $XML::ASCX12::Segments::ELEMENTS->{$elename}; $xml .= '>'.XML::ASCX12::XMLENC($_).''; } $i++; } return $xml; } =item string = _openloop($loop_to_open, $last_opened_loop) This is an internal private method. It will either open a loop if we can or return nothing. =cut sub _openloop { my ($self, $newloop, $lastloop) = @_; if (XML::ASCX12::_CANHAVE($lastloop, $newloop)) { push (@_LOOPS, $newloop); return ''; } return; } =item void = _closeloop($loop_to_close, $last_opened_loop, $current_segment, $trigger) This routine is a private method. It will recurse to close any open loops. =cut sub _closeloop { my ($self, $newloop, $lastloop, $currentseg, $once) = @_; $once = 0 unless $once; my $xml; # Case when there are two consecutive loops if ($newloop and $lastloop and $currentseg eq $lastloop and ($currentseg ne "")) { $xml = $self->_execclose($lastloop); return $xml; } # "Standard Case" elsif (XML::ASCX12::_CANHAVE($newloop, $lastloop)) { $xml = $self->_execclose($lastloop); return $xml; } # Recusrively close loops else { my @parent_loops_to_close = (); if (@_LOOPS) { foreach my $testloop (reverse @_LOOPS) #Close in reverse order { # found a loop, see which ones we ough to close if ($testloop eq $newloop) { if (@parent_loops_to_close) { foreach my $closeme (@parent_loops_to_close) { $xml .= $self->_execclose($closeme) if $closeme; } # See if the current loop ought to be closed if ($once != 1) { if (my $tmp = $self->_closeloop($newloop, $self->{lastloop}, $currentseg, 1)) { $xml .= $tmp; } } return $xml; } } # Push into the loops to close else { if ($testloop) { push (@parent_loops_to_close, $testloop); } } } } } return; } =item string = _execclose($loop_to_close) Private internal method to actually return the XML that signifies a closed loop. It is called by C<_closeloop()>. =cut sub _execclose { my ($self, $loop) = @_; return unless $loop; if ($loop =~ /[A-Za-z0-9]*/) { pop @_LOOPS; $self->{lastloop} = $_LOOPS[-1]; return '' if XML::ASCX12::XMLENC($loop); } } =item void = _unload_catalog() Private method that clears out catalog data and loads standard ASCX12 structure. =cut sub _unload_catalog { my $self = shift; $XML::ASCX12::Catalogs::LOOPNEST = (); $self->load_catalog(0); } =item boolean = _CANHAVE($parent_loop, $child_loop) This is a private static method. It uses the rules in the L to determine if a parent is allowed to have the child loop. Returns C<0> or C<1>. =cut sub _CANHAVE { my ($parent, $child) = @_; if (!$parent) { return 1; } # root-level can have anything return 0 unless $child; foreach (@{$XML::ASCX12::Catalogs::LOOPNEST->{$parent}}) { if ($_ eq $child) { return 1; } } return 0; } =back =head1 TODO Here are some things that would make this module even better. They are in no particular order: =over 4 =item * Error Handling Maybe throw in an error object to keep track of things =item * Encoding Support Anyone that could review to make sure we are using the correct encodings We basically read in the EDI file in binary and use the ASCII HEX-equivalent for the separators. Many EDI-producing systems use EBCDIC and not UTF-8 so be careful when specifying the values. =item * B Transaction Set (Catalog) Library Make a live repository of transaction set data (catalogs). I'd really like use XML to describe each catalog and import them to local dbm files or tied hashes during install and via an update script. This project will be driven if there is adaquate demand. According to the ASC X12 website (L), there are 315 transaction sets. This module has 3, so there are 312 that could be added. =item * XML Documentation Create a DTD and maybe even an XML Schema for the XML output. There ought to be better documentation here. =back =head1 AUTHORS Brian Kaney >, Jay Powers > L Copyright (c) 2004 Vermonster LLC. All rights reserved. This library is free software. You can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Basically you may use this library in commercial or non-commercial applications. However, If you make any changes directly to any files in this library, you are obligated to submit your modifications back to the authors and/or copyright holder. If the modification is suitable, it will be added to the library and released to the back to public. This way we can all benefit from each other's hard work! If you have any questions, comments or suggestions please contact the author. =head1 SEE ALSO L, L and L =cut 1;