=for comment This file was automatically generated from src/xsh_grammar.xml on Wed Sep 10 17:53:04 2003 =head1 NAME XSH - scripting language for XPath-based editing of XML =head1 FILES/DOCUMENTS XSH is intended to query and manipulate XML and HTML documents. Use one of the Bopen-*Ecreate> commands to load an XML or HTML document from a local file, external URL (such as http:EE or ftp:EE), string or pipe. While loading, XSH parses and optionally validates (see B and B) the document. Parsed documents are stored in memory as DOM trees, that can be B and B quite similarly to a local filesystem. Every opened document is associated with an identifier (B), that is a symbolic name for the document in XSH and can be used for example as a prefix of B. In the current version, XSH is only able to save documents locally. To store a document on any other location, use B command and pipe redirection to feed the XML representation of the document to any external program that is able to store it on a remote location. Example: Store XSH document DOC on a remote machine using Secure Shell xsh> ls DOC:/ | ssh my.remote.org 'cat > test.xml' =head2 RELATED COMMANDS backups, catalog, clone, close, create, documents, nobackups, open, process-xinclude, save, select, stream, switch-to-new-documents =head1 TREE NAVIGATION With XSH, it is possible to browse B as if they were a local filesystem, except that B expressions are used instead of ordinary UNIX paths. Current position in the document tree is called the current node. Current node's XPath may be queried with B command. In the interactive shell, current node is also displayed in the command line prompt. Remember, that beside B command, current node (and document) is silently changed by all variant of B command, B command and temporarily also by the node-list variant of the B statement. Documents are specified in a similar way as harddrives on DOSEWindows(TM) systems (except that their names are not limitted to one letter in XSH), i.e. by a prefix of the form doc: where doc is the B associated with the document. To mimic the filesystem navigation as closely as possible, XSH contains several commands named by analogy of UNIX filesystem commands, such as B, B and B. Example: xsh scratch:/> open docA="testA.xml" xsh docB:/> open docB="testB.xml" xsh> pwd docB:/ xsh docB:/> cd docA:/article/chapter[title='Conclusion'] xsh docA:/article/chapter[5]> pwd docA:/article/chapter[5] xsh docA:/article/chapter[5]> cd previous-sibling::chapter xsh docA:/article/chapter[4]> cd .. xsh docA:/article> select docB xsh docB:/> =head2 RELATED COMMANDS cd, fold, locate, ls, pwd, register-function, register-namespace, register-xhtml-namespace, register-xsh-namespace, select, unfold, unregister-function, unregister-namespace =head1 TREE MODIFICATION XSH provides mechanisms not only to browse and inspect the DOM tree but also to modify its content by providing commands for copying, moving, and deleting its nodes as well as adding completely new nodes or XML fragments to it. It is quite easy to learn these commands since their names or aliases mimic their well-known filesystem analogies. On the other hand, many of these commands have two versions one of which is prefixed with a letter "x". This "x" stands for "cross", thus e.g. B should be read as "cross copy". Let's explain the difference on the example of B. When you copy, you have to specify what are you copying and where are you copying to, so you have to specify the source and the target. XSH is very much XPath-based so, XPath is used here to specify both of them. However, there might be more than one node that satisfies an XPath expression. So, the rule of thumb is that the "cross" variant of a command places one and every of the source nodes to the location of one and every destination node, while the plain variant works one-by-one, placing the first source node to the first destination, the second source node to the second destination, and so on (as long as there are both source nodes and destinations left). Example: xsh> create a ""; xsh> create b ""; xsh> xcopy a://A replace b://B; xsh> copy b://C before a://A; xsh> ls a:/; xsh> ls b:/; As already indicated by the example, another issue of tree modification is the way in which the destination node determines the target location. Should the source node be placed before, after, or into the resulting node? Should it replace it completely? This information has to be given in the B argument that usually precedes the destination XPath. Now, what happens if source and destination nodes are of incompatible types? XSH tries to avoid this by implicitly converting between node types when necessary. For example, if a text, comment, and attribute node is copied into, before or after an attribute node, the original value of the attribute is replaced, prepended or appended respectively with the textual content of the source node. Note however, that element nodes are never converted into text, attribute or any other textual node. There are many combinations here, so try yourself and see the results. You may even use some more sofisticated way to convert between node types, as shown in the following example, where an element is first commented out and than again uncommented. Note, that the particular approach used for resurrecting the commented XML material works only for well-balanced chunks of XML. Example: Using string variables to convert between different types of nodes create doc < Intro Rest EOF # comment out the first chapter ls //chapter[1] |> $chapter_xml; add comment $chapter_xml replace //chapter[1]; ls / 0; # OUTPUT: Rest # un-comment the chapter $comment = string(//comment()[1]); add chunk $comment replace //comment()[1]; ls / 0; # OUTPUT: Intro Rest =head2 RELATED COMMANDS clone, copy, insert, map, move, normalize, process-xinclude, remove, rename, set-enc, set-standalone, strip-whitespace, xcopy, xinsert, xmove, xslt, xupdate =head1 FLOW CONTROL What a scripting language XSH would be had it not some kind of conditional statements, loops and other stuff that influences the way in which XSH commands are processed. Most notable XSH's feature in this area is that some of the basic flow control statements, namely B, B, B and B have two variants, an XPath-based one and a Perl-based one. The XPath-based variant uses B expressions to specify the condition or node-lists to iterate, while the other one utilizes B for this purpose. See descriptions of the individual statements for more detail. =head2 RELATED COMMANDS call, def, exit, foreach, if, ifinclude, include, iterate, last, next, prev, redo, return, run-mode, stream, test-mode, throw, try, undef, unless, while =head1 RETRIEVING MORE INFORMATION Beside the possibility to browse the DOM tree and list some parts of it (as described in B), XSH provides commands to obtain other information related to open documents as well as the XSH interpreter itself. These commands are listed bellow. =head2 RELATED COMMANDS count, defs, doc-info, documents, dtd, enc, help, locate, ls, namespaces, options, print, pwd, valid, validate, variables, version =head1 ARGUMENT TYPES XSH commands accept different types of arguments, such as usual strings (B) or B. Notably, these two types and types based on them support string variable interpolation. See documentation of the individual types for more information. =head1 VARIABLES In the current version, XSH supports two types of variables: string (scalar) variables and node-list variables. Perl programmers that might miss some other kinds of variables (arrays or hashes) may use the support for B to access these types (see some examples below). These two kinds of variables differ syntactically in the prefix: string variables are prefixed with a dollar sign (B<$>) while node-list variables are prefixed with a percent sign (B<%>). =head2 String Variables Every string variable name consists of a dollar sign (B<$>) prefix and an B, that has to be unique among other scalar variables, e.g. B<$variable>. Values are assigned to variables either by simple B of the form B<$variable = B> or by capturing the output of some command with a variable redirection of the form BE $variable>. String variables may be used in B, B, or even in perl-code as $B or ${B}. In the first two cases, variables act as macros in the sense that all variables occurences are replaced by the corresponding values before the expression itself is evaluated. To display current value of a variable, use the B command, B command or simply the variable name: Example: xsh> $b="chapter"; xsh> $file="${b}s.xml"; xsh> open f=$file; xsh> ls //$b[count(descendant::para)>10] xsh> print $b chapter xsh> $b $b='chapter'; xsh> variables $a='chapters.xml'; $b='chapter'; =head2 Node-list Variables Every string variable name consists of a percent sign (B<%>) prefix and an B, that has to be unique among other node-list variables, e.g. B<%variable>. Node-list variables can be used to store lists of nodes that result from evaluating an XPath. This is especially useful when several changes are performed on some set of nodes and evaluating the XPath expression repeatedly would take too long. Other important use is to remember a node that would otherwise be extremely hard or even impossible to locate by XPath expressions after some changes to the tree structure are made, since such an XPath cannot be predicted in advance. Although node-list variables act just like XPath expressions that would result in the same node-list, for implementation reasons it is not possible to use node-list variables as parts of complex XPath expressions except for one case. They may be only used at the very beginning of an XPath expression. So while constructions such as B<%creatures[4]>, B<%creatures[@race='elf']>, or B<%creaturesEparentsEfather> do work as expected, B@name)> BEcreature[%creatures[2]E@name=@name]>, or B<%creatures[@race='elf'][2]> do not. In the first two cases it is because node-list variables cannot be evaluated in the middle of an XPath expression. The third case fails because this construction actually translates into a sequence of evaluations of B for each node in the B<%creatures> node-list, which is not equivallent to the intended expression as the B<[2]> filter does not apply to the whole result of B<%creatures[@race='elf']> at once but rather to the partial results. Fortunatelly, it is usually possible to work around these unsupported constructions quite easily. This is typically done by introducing some more variables as well as using the B statement. The following example should provide some idea on how to do this: Example: # work around for $name=string(%creatures[2]/@name) xsh> foreach %creatures[2] $name=string(@name) # work around for ls //creature[%creatures[2]/@name=@name] xsh> ls //creature[$name=@name] # work around for ls %creatures[@race='elf'][2] xsh> %elves = %creatures[@race='elf'] xsh> ls %elves[2] Remember, that when a node is deleted from a tree it is at the same time removed from all node-lists it occurs in. Note also, that unlike string variables, node-list variables can not be (and are not intended to be) directly accessed from Perl code. =head2 Accessing Perl Variables All XSH string variables are usual Perl scalar variables from the B namespace, which is the default namespace for any Perl code evaluated from XSH. Thus it is possible to arbitrarily intermix XSH and Perl assignments: Example: xsh> ls //chapter[1]/title Introduction xsh> $a=string(//chapter[1]/title) xsh> eval { $b="CHAPTER 1: ".uc($a); } xsh> print $b CHAPTER 1: INTRODUCTION If needed, it is, however, possible to use any other type of Perl variables by means of evaluating a corresponding perl code. The following example demonstrates using Perl hashes to collect and print some simple racial statistics about the population of Middle-Earth: Example: foreach a:/middle-earth/creature { $race=string(@race); eval { $races{$race}++ }; } print "Middle-Earth Population (race/number of creatures)" eval { echo map "$_/$races{$_}\n", sort ($a cmp $b), keys(%races); }; =head2 RELATED COMMANDS assign, local =head1 OPTIONS The following commands are used to modify the default behaviour of the XML parser or XSH itself. Some of the commands are switch between two different modes according to a given expression (which is expected to result either in zero or non-zero value). Other commands also working as a flip-flop have their own explicit counterpart (e.g. B and B or B and B). This misconsistency is due to historical reasons. The B and B options allow to specify character encoding that should be expected from user as well as the encoding to be used by XSH on output. This is particularly useful when you work with UTF-8 encoded documents on a console which supports only 8-bit characters. The B command displays current settings by means of XSH commands. Thus it can not only be used to review current values, but also to store them future use, e.g. in ~E.xshrc file. Example: xsh> options | cat > ~/.xshrc =head2 RELATED COMMANDS backups, debug, empty-tags, encoding, indent, keep-blanks, load-ext-dtd, nobackups, nodebug, options, parser-completes-attributes, parser-expands-entities, parser-expands-xinclude, pedantic-parser, query-encoding, quiet, recovering, register-function, register-namespace, register-xhtml-namespace, register-xsh-namespace, run-mode, skip-dtd, switch-to-new-documents, test-mode, unregister-function, unregister-namespace, validation, verbose, xpath-axis-completion, xpath-completion =head1 INTERACTING WITH PERL AND SHELL To allow more complex tasks to be achieved, XSH provides ways for interaction with the Perl programming language and the system shell. =head2 Calling Perl Perl is a language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, and complete). XSH itself is written in Perl, so it is extremely easy to support this language as an extension to XSH. Perl B can either be simply evaluated with the B command, used to do quick changes to nodes of the DOM tree (see B command), used to provide list of strings to iterate over in a B loop, or to specify more complex conditions for B, B, and B statements. To prevent conflict between XSH internals and the evaluated perl code, XSH runs such code in the context of a special namespace B. As described in the section B, XSH string variables may be accessed and possibly assigned from Perl code in the most obvious way, since they actually are Perl variables defined in the B namespace. The interaction between XSH and Perl actually works also the other way round, so that you may call back XSH from the evaluated Perl code. For this, Perl function B is defined in the B namespace. All parameters passed to this function are interpreted as XSH commands. To simplify evaluation of XPath expressions, another three functions: The first one, named B, returns the same value as would be printed by B command in XSH on the same XPath expression. The second function, named B, returns the result of XPath evaluation as if the whole expression was wrapped with the XPath B function. In other words, B returns the same value as B. The third function, named B, returns the result of the XPath search as a XML string which is equivallent to the output of a B on the same XPath expression (without indentation and without folding or any other limitation on the depth of the listing). In the following examples we use Perl to populate the Middle-Earth with Hobbits whose names are read from a text file called B, unless there are some Hobbits in Middle-Earth already. Example: Use Perl to read text files unless (//creature[@race='hobbit']) { perl 'open $file, "hobbits.txt"'; perl '@hobbits=<$file>'; perl 'close $file'; foreach { @hobbits } { insert element "" into m:/middle-earth/creatures; } } Example: The same code as a single Perl block perl { unless (count(//creature[@race='hobbit'])) { open my $file, "hobbits.txt"; foreach (<$file>) { xsh(qq{insert element "" into m:/middle-earth/creatures}); } close $file; } }; =head2 Writing your own XPath extension functions in Perl XSH allows the user to extend the set of XPath functions by providing an extension function written in Perl. This can be achieved using the B command. The perl code implementing an extension function works as a usual perl routine accepting its arguments in B<@_> and returning the result. The following conventions are used: The arguments passed to the perl implementation by the XPath engine are either simple scalars or B objects, depending on the types of the XPath arguments. The implementation is responsible for checking the argument number and types. The implementation may use arbitrary B methods to process the arguments and return the result. (B perl module documentation can be found for example at http:EEsearch.cpan.orgEauthorEPHISHEXML-LibXML-1.54ELibXML.pm). The implementation SHOULD NOT, however, MODIFY the document. Doing so could not only confuse the XPath engine but result in an critical error (such as segmentation fault). Calling XSH commands from extension function implementations is not currently allowed. The perl code must return a single value, which can be of one of the following types: a simple scalar (a number or string), B object reference (result is a boolean value), B object reference (result is a string), B object reference (resulat is a float), B (or derived) object reference (result is a nodeset consisting of a single node), or B (result is a nodeset). For convenience, simple (non-blessed) array references consisting of B objects can also be used for a nodeset result instead of a B. =head2 Calling the System Shell In the interactive mode, XSH interprets all lines starting with a exclamation mark (B) as shell commands and invokes the system shell to interpret them (this is to mimic FTP command-line interpreters). Example: xsh> !ls -l -rw-rw-r-- 1 pajas pajas 6355 Mar 14 17:08 Artistic drwxrwxr-x 2 pajas users 128 Sep 1 10:09 CVS -rw-r--r-- 1 pajas pajas 14859 Aug 26 15:19 ChangeLog -rw-r--r-- 1 pajas pajas 2220 Mar 14 17:03 INSTALL -rw-r--r-- 1 pajas pajas 18009 Jul 15 17:35 LICENSE -rw-rw-r-- 1 pajas pajas 417 May 9 15:16 MANIFEST -rw-rw-r-- 1 pajas pajas 126 May 9 15:16 MANIFEST.SKIP -rw-r--r-- 1 pajas pajas 20424 Sep 1 11:04 Makefile -rw-r--r-- 1 pajas pajas 914 Aug 26 14:32 Makefile.PL -rw-r--r-- 1 pajas pajas 1910 Mar 14 17:17 README -rw-r--r-- 1 pajas pajas 438 Aug 27 13:51 TODO drwxrwxr-x 5 pajas users 120 Jun 15 10:35 blib drwxrwxr-x 3 pajas users 1160 Sep 1 10:09 examples drwxrwxr-x 4 pajas users 96 Jun 15 10:35 lib -rw-rw-r-- 1 pajas pajas 0 Sep 1 16:23 pm_to_blib drwxrwxr-x 4 pajas users 584 Sep 1 21:18 src drwxrwxr-x 3 pajas users 136 Sep 1 10:09 t -rw-rw-r-- 1 pajas pajas 50 Jun 16 00:06 test drwxrwxr-x 3 pajas users 496 Sep 1 20:18 tools -rwxr-xr-x 1 pajas pajas 5104 Aug 30 17:08 xsh To invoke a system shell command or program from the non-interactive mode or from a complex XSH construction, use the B command. Since UNIX shell commands are very powerful tool for processing textual data, XSH supports direct redirection of XSH commands output to system shell command. This is very similarly to the redirection known from UNIX shells, except that here, of course, the first command in the pipe-line colone is an XSH command. Since semicolon (B<;>) is used in XSH to separate commands, it has to be prefixed with a backslash if it should be used for other purposes. Example: Use grep and less to display context of `funny' xsh> ls //chapter[5]/para | grep funny | less Example: The same on Windows 2000EXP systems xsh> ls //chapter[5]/para | find "funny" | more =head2 RELATED COMMANDS exec, lcd, map, perl, rename =head1 COMMAND REFERENCE =head2 assign =over 4 =item Usage: assign $B=B $B=B assign %B=B %B=B =item Description: In the first two cases (where dollar sign appears) store the result of evaluation of the B in a variable named $B. In this case, B is evaluated in a simmilar way as in the case of the B: if it results in a literal value this value is used. If it results in a node-list, number of nodes occuring in that node-list is used. Use the B XPath function to obtain a literal values in these cases. Example: String expressions xsh> $a=string(chapter/title) xsh> $b="hallo world" Example: Arithmetic expressions xsh> $a=5*100 xsh> $a $a=500 xsh> $a=(($a+5) div 10) xsh> $a $a=50.5 Example: Counting nodes xsh> $a=//chapter xsh> $a $a=10 xsh> %chapters=//chapter xsh> $a=%chapters xsh> $a $a=10 Example: Some caveats of counting node-lists xsh> ls ./creature ## WRONG (@name results in a singleton node-list) !!! xsh> $name=@name xsh> $name $name=1 ## CORRECT (use string() function) xsh> $name=string(@name) xsh> $name $name=Bilbo In the other two cases (where percent sign appears) find all nodes matching a given B and store the resulting node-list in the variable named %B. The variable may be later used instead of an XPath expression. =item See also: var_command =back =head2 backups =over 4 =item Usage: backups =item Description: Enable creating backup files on save (default). This command is equivalent to setting the B<$BACKUPS> variable to 1. =item See also: nobackups =back =head2 call =over 4 =item Usage: call B [B E B]* =item Description: Call an XSH subroutine named B previously created using def. If the subroutine requires some paramters, these have to be specified after the B. Node-list parameters are given by means of B expressions. String parameters have to be string Bs. =item See also: def return_command =back =head2 catalog =over 4 =item Usage: catalog B =item Description: Will use a given catalog file as a catalog during all parsing processes. Using a catalog will significantly speed up parsing processes if many external ressources are loaded into the parsed documents (such as DTDs or XIncludes) =back =head2 cd =over 4 =item Usage: cd [B] =item Aliases: chxpath =item Description: Change current context node (and current document) to the first node matching a given B argument. =back =head2 clone =over 4 =item Usage: clone B=B =item Aliases: dup =item Description: Make a copy of the document identified by the B following the equal sign assigning it the identifier of the first B. =item See also: open_command close_command print_enc_command files_command =back =head2 close =over 4 =item Usage: close [B] =item Description: Close the document identified by B, removing its parse-tree from memory (note also that all nodes belonging to the document are removed from all nodelists they appear in). If B is omitted, the command closes the current document. =back =head2 copy =over 4 =item Usage: copy B B B =item Aliases: cp =item Description: Copies nodes matching the first B to the destinations determined by the B directive relative to the second B. If more than one node matches the first B than it is copied to the position relative to the corresponding node matched by the second B according to the order in which are nodes matched. Thus, the n'th node matching the first B is copied to the location relative to the n'th node matching the second B. The possible values for B are: after, before, into, replace and cause copying the source nodes after, before, into (as the last child-node). the destination nodes. If replace B is used, the source node is copied before the destination node and the destination node is removed. Some kind of type conversion is used when the types of the source and destination nodes are not equal. Thus, text, cdata, comment or processing instruction node data prepend, append or replace value of a destination attribute when copied before,afterEinto or instead (replace) an attribute, and vice versa. Attributes may be copied after, before or into some other attribute to append, prepend or replace the destination attribute value. They may also replace the destination attribute completely (both its name and value). To copy an attribute from one element to another, simply copy the attribute node into the destination element. Elements may be copied into other elements (which results in appending the child-list of the destination element), or before, after or instead (replace) other nodes of any type except attributes. Example: Replace living-thing elements in the document b with the coresponding creature elements of the document a. xsh> copy a://creature replace b://living-thing =back =head2 count =over 4 =item Usage: count B =item Aliases: print_value get =item Description: Calculate a given B expression. If the result is a node-list, return number of nodes in the node-list. If the B results in a boolean, numeric or literal value, return the value. =back =head2 create =over 4 =item Usage: create B B =item Aliases: new =item Description: Create a new document using B to form the root element and associate it with a given identifier. Example: xsh> create t1 root xsh> ls / xsh> create t2 "Just a test" xsh> ls / Just a test xsh> files scratch = new_document.xml t1 = new_document1.xml t2 = new_document2.xml =item See also: open_command clone_command =back =head2 debug =over 4 =item Usage: debug =item Description: Turn on debugging messages. This is equivalent to setting B<$DEBUG> variable to 1. =item See also: nodebug =back =head2 def =over 4 =item Usage: def B [$B E %B]* B or def B [$B E %B]*; =item Aliases: define =item Description: Define a new XSH subroutine named B. The subroutine may require zero or more parameters of nodelist or string type. These are declared as a whitespace-separated list of (so called) parametric variables (of nodelist or string type). The body of the subroutine is specified as a B. Note, that all subroutine declarations are processed during the parsing and not at run-time, so it does not matter where the subroutine is defined. The routine can be later invoked using the B command followed by the routine name and parameters. Nodelist parameters must be given as an XPath expressions, and are evaluated just before the subroutine's body is executed. String parameters must be given as (string) Bs. Resulting node-listsEstrings are stored into the parametric variables before the body is executed. These variables are local to the subroutine's call tree (see also the B command). If there is a global variable using the same name as some parametric variable, the original value of the global variable is replaced with the value of the parametric variable for the time of the subroutine's run-time. Note that subroutine has to be declared before it is called with B. If you cannot do so, e.g. if you want to call a subroutine recursively, you have to pre-declare the subroutine using a B with no B. There may be only one full declaration (and possibly one pre-declaration) of a subroutine for one B and the declaration and pre-declaration has to define the same number of arguments and their types must match. Example: def l3 %v { ls %v 3; # list given nodes upto depth 3 } call l3 //chapter; Example: Commenting and un-commenting pieces of document def comment %n # nodes to move to comments $mark # maybe some handy mark to recognize such comments { foreach %n { if ( . = ../@* ) { echo "Warning: attribute nodes are not supported!"; } else { echo "Commenting out:"; ls .; local $node = ""; ls . |> $node; add comment "$mark$node" replace .; } } } def uncomment %n $mark { foreach %n { if (. = ../comment()) { # is this node a comment node local $string = substring-after(.,"$mark"); add chunk $string replace .; } else { echo "Warning: Ignoring non-comment node:"; ls . 0; } } } # comment out all chapters with no paragraphs call comment //chapter[not(para)] "COMMENT-NOPARA"; # uncomment all comments (may not always be valid!) $mark="COMMENT-NOPARA"; call uncomment //comment()[starts-with(.,"$mark")] $mark; =item See also: call_command return_command local_command =back =head2 defs =over 4 =item Usage: defs =item Description: List names and parametric variables for all defined XSH routines. =item See also: def var_command =back =head2 doc-info =over 4 =item Usage: doc-info [B] =item Aliases: doc_info =item Description: In the present implementation, this command displays information provided in the B?xml ...?E> declaration of a document: B, B, B, plus information about level of B compression of the original XML file. =item See also: set_enc_command set_standalone_command =back =head2 documents =over 4 =item Usage: files =item Aliases: files docs =item Description: List open files and their identifiers. =item See also: open_command close_command =back =head2 dtd =over 4 =item Usage: dtd [B] =item Description: Print external or internal DTD for a given document. If no document identifier is given, the current document is used. =item See also: valid_command validate_command =back =head2 empty-tags =over 4 =item Usage: empty-tags B =item Aliases: empty_tags =item Description: If the value of B is 1 (non-zero), empty tags are serialized as a start-tagEend-tag pair (BfooEEEfooE>). This option affects both B and B and possibly other commands. Otherwise, they are compacted into a short-tag form (BfooEE>). Default value is B<0>. This command is equivalent to setting the B<$EMPTY_TAGS> variable. =back =head2 enc =over 4 =item Usage: enc [B] =item Description: Print the original document encoding string. If no document identifier is given, the current document is used. =item See also: set_enc_command =back =head2 encoding =over 4 =item Usage: encoding B =item Description: Set the default output character encoding. This command is equivalent to setting the B<$ENCODING> variable. =back =head2 exec =over 4 =item Usage: exec B [B ...] =item Aliases: system =item Description: execute the system command(s) in Bs. Example: Count words in "hallo wold" string, then print name of your machine's operating system. exec echo hallo world; # prints hallo world exec "echo hallo word" | wc; # counts words in hallo world exec uname; # prints operating system name =back =head2 exit =over 4 =item Usage: exit [B] =item Aliases: quit =item Description: Exit xsh immediately, optionally with the exit-code resulting from a given expression. WARNING: No files are saved on exit. =back =head2 fold =over 4 =item Usage: fold B [B] =item Description: This feature is still EXPERIMENTAL! Fold command may be used to mark elements matching the B with a B attribute from the BExsh.sourceforge.netExshE> namespace. When listing the DOM tree using B B fold>, elements marked in this way are folded to the depth given by the B (default depth is 0 = fold immediately). Example: xsh> fold //chapter 1 xsh> ls //chapter[1] fold ... ... ... =item See also: unfold_command list_command =back =head2 foreach =over 4 =item Usage: foreach BEB BEB =item Aliases: for =item Description: If the first argument is an B expression, execute the command-block for each node matching the expression making it temporarily the current node, so that all relative XPath expressions are evaluated in its context. If the first argument is a B, it is evaluated and the resulting perl-list is iterated setting the variable $__ (note that there are two underscores!) to be each element of the list in turn. It works much like perl's foreach, except that the variable used consists of two underscores. Example: Move all employee elements in a company element into a staff subelement of the same company xsh> foreach //company xmove ./employee into ./staff; Example: List content of all XML files in current directory xsh> foreach { glob('*.xml') } { open f=$__; list f:/; } =back =head2 help =over 4 =item Usage: help BEargument-type =item Aliases: ? =item Description: Print help on a given command or argument type. =back =head2 if =over 4 =item Usage: if BEB B if BEB B [ elsif B ]* [ else B ] =item Description: Execute B if a given B or B expression evaluates to a non-emtpty node-list, true boolean-value, non-zero number or non-empty literal. If the first test fails, check all possibly following B conditions and execute the corresponding B for the first one of them which is true. If none of them succeeds, execute the B B (if any). Example: Display node type def node_type %n { foreach (%n) { if ( . = self::* ) { # XPath trick to check if . is an element echo 'element'; } elsif ( . = ../@* ) { # XPath trick to check if . is an attribute echo 'attribute'; } elsif ( . = ../processing-instruction() ) { echo 'pi'; } elsif ( . = ../text() ) { echo 'text'; } elsif ( . = ../comment() ) { echo 'comment' } else { # well, this should not happen, but anyway, ... echo 'unknown-type'; } } } =back =head2 ifinclude =over 4 =item Usage: ifinclude B =item Description: Include a file named B and execute all XSH commands therein unless the file was already included using either B of B. =item See also: include_command =back =head2 include =over 4 =item Usage: include B =item Aliases: . =item Description: Include a file named B and execute all XSH commands therein. =item See also: ifinclude_command =back =head2 indent =over 4 =item Usage: indent B =item Description: If the value of B is 1, format the XML output while saving a document by adding some nice ignorable whitespace. If the value is 2 (or higher), XSH will act as in case of 1, plus it will add a leading and a trailing linebreak to each text node. Note, that since the underlying C library (libxml2) uses a hardcoded indentation of 2 space characters per indentation level, the amount of whitespace used for indentation can not be altered on runtime. This command is equivalent to setting the B<$INDENT> variable. =back =head2 insert =over 4 =item Usage: insert B B [namespace B] B B =item Aliases: add =item Description: Works just like xadd, except that the new node is attached only the first node matched. =item See also: xinsert_command move_command xmove_command =back =head2 iterate =over 4 =item Usage: iterate B B =item Description: Iterate works very much like the XPath variant of B, except that B evaluates the B as soon as a new node matching a given B is found. As a limitation, the B expresion used with B may only consist of one XPath step, i.e. it cannot contain an XPath step separator B>. What are the benefits of B over a B loop, then? Well, under some circumstances it is efficiency, under other there are none. To clarify this, we have to dive a bit deeper into the details of XPath implementation. By definition, the node-list resulting from evaluation of an XPath has to be ordered in the canonical document order. That means that an XPath implementation must contain some kind of a sorting algorithm. This would not itself be much trouble if a relative document order of two nodes of a DOM tree could be determined in a constant time. Unfortunately, the libxml2 library, used behind XSH, does not implement mechanisms that would allow this complexity restriction (which is, however, quite natural and reasonable approach if all the consequences are considered). Thus, when comparing two nodes, libxml2 traverses the tree to find their nearest common ancestor and at that point determines the relative order of the two subtrees by trying to seek one of them in a list of right siblings of the other. This of course cannot be handled in a constant time. As a result, the sorting algorithm, reasonably efficient for a constant time comparison (polynomial of a degree E 1.5) or small node-lists, becomes rather unusable for huge node-lists with linear time comparison (still polynomial but of a degree E 2). The B command provides a way to avoid sorting the resulting nodelist by limiting allowed XPath expression to one step (and thus one axis) at a time. On the other hand, since B is implemented in Perl, a proxy object glueing the C and Perl layers has to be created for every node the iterator passes by. This (plus some extra subroutine calls) makes it about two to three times slower compared to a similar tree-traversing algorithm used by libxml2 itself during XPath evaluation. Our experience shows that B beats B in performance on large node-lists (E=1500 nodes, but your milage may vary) while B wins on smaller node-lists. The following two examples give equivallent results. However, the one using iterate may be faster esp. if the number of nodes being counted is very large. Example: Count inhabitants of the kingdom of Rohan in productive age cd rohan/inhabitants; iterate child::*[@age>=18 and @age<60] { perl $productive++ }; echo "$productive inhabitants in productive age"; Example: Using XPath $productive=count(rohan/inhabitants/*[@age>=18 and @age<60]); echo "$productive inhabitants in productive age"; Use e.g. B time cut> pipe-line redirection to benchmark a XSH command on a UNIX system. =item See also: foreach next_command prev_command last_command =back =head2 keep-blanks =over 4 =item Usage: keep_blanks B =item Aliases: keep_blanks =item Description: Allows you to turn off XML::LibXML's default behaviour of maintaining whitespace in the document. Non-zero expression forces the XML parser to preserve all whitespace. This command is equivalent to setting the B<$KEEP_BLANKS> variable. =back =head2 last =over 4 =item Usage: last [B] =item Description: The last command is like the break statement in C (as used in loops); it immediately exits an enclosing loop. The optional B argument may evaluate to a positive integer number that indicates which level of the nested loops to quit. If this argument is omitted, it defaults to 1, i.e. the innermost loop. Using this command outside a subroutine causes an immediate run-time error. =item See also: foreach while iterate next_command last_command =back =head2 lcd =over 4 =item Usage: lcd B =item Aliases: chdir =item Description: Changes the filesystem working directory to B, if possible. If B is omitted, changes to the directory specified in HOME environment variable, if set; if not, changes to the directory specified by LOGDIR environment variable. =back =head2 load-ext-dtd =over 4 =item Usage: load_ext_dtd B =item Aliases: load_ext_dtd =item Description: If the expression is non-zero, XML parser loads external DTD subsets while parsing. By default, this option is enabled. This command is equivalent to setting the B<$LOAD_EXT_DTD> variable. =back =head2 local =over 4 =item Usage: local $B = B local %B = B local $BE%B [ $BE%B ... ] =item Description: This command acts in a very similar way as B does, except that the variable assignment is done temporarily and lasts only for the rest of the nearest enclosing B. At the end of the enclosing block or subroutine the original value is restored. This command may also be used without the assignment part and assignments may be done later using the usual B command. Note, that the variable itself is not lexically is still global in the sense that it is still visible to any subroutine called subsequently from within the same block. A local just gives temporary values to global (meaning package) variables. Unlike Perl's B declarations it does not create a local variable. This is known as dynamic scoping. Lexical scoping is not implemented in XSH. To sum up for Perl programmers: B in XSH works exactly the same as B in Perl. =item See also: assign_command def =back =head2 locate =over 4 =item Usage: locate B =item Description: Print canonical XPaths leading to nodes matched by a given B. =item See also: pwd_command =back =head2 ls =over 4 =item Usage: ls B [B] =item Aliases: list =item Description: List the XML representation of all nodes matching B. The optional B argument may be provided to specify the depth of XML tree listing. If negative, the tree will be listed to unlimited depth. If the B results in the word B, elements marked with the B command are folded, i.e. listed only to a certain depth (this feature is still EXPERIMENTAL!). Unless in quiet mode, this command prints also number of nodes matched on stderr. If the B parameter is omitted, current context node is listed to the depth of 1. =item See also: count_command fold_command unfold_command =back =head2 map =over 4 =item Usage: map B B =item Aliases: sed =item Description: This command provides an easy way to modify node's data (content) using arbitrary Perl code. Each of the nodes matching B is passes its data to the B via the B<$_> variable and receives the (possibly) modified data using the same variable. Since element nodes do not really have any proper content (they are only a storage for other nodes), node's name (tag) is used in case of elements. Note, however, that recent versions of XSH provide a special command B with a very similar syntax to B, that should be used for renaming element, attribute, and processing instruction nodes. Example: Capitalises all hobbit names xsh> map { $_=ucfirst($_) } //hobbit/@name Example: Changes goblins to orcs in all hobbit tales. xsh> map { s/goblin/orc/gi } //hobbit/tale/text() =back =head2 move =over 4 =item Usage: move B B B =item Aliases: mv =item Description: B command acts exactly like B, except that it removes the source nodes after a succesfull copy. Remember that the moved nodes are actually different nodes from the original ones (which may not be obvious when moving nodes within a single document into locations that do not require type conversion). So, after the move, the original nodes do not exist neither in the document itself nor any nodelist variable. See B for more details on how the copies of the moved nodes are created. =item See also: copy_command xmove_command insert_command xinsert_command =back =head2 namespaces =over 4 =item Usage: namespaces [B] =item Description: For each node matching given B lists all namespaces that are valid in its scope in the form of B declarations. If no B is given, lists namespaces in the scope of the current node. =back =head2 next =over 4 =item Usage: next [B] =item Description: The next command is like the continue statement in C; it starts the next iteration of an enclosing loop. The optional B argument may evaluate to a positive integer number that indicates which level of the nested loops should be restarted. If omitted, it defaults to 1, i.e. the innermost loop. Using this command outside a loop causes an immediate run-time error. =item See also: foreach while iterate redo_command last_command prev_command =back =head2 nobackups =over 4 =item Usage: nobackups =item Description: Disable creating backup files on save. This command is equivalent to setting the B<$BACKUPS> variable to 0. =item See also: nobackups =back =head2 nodebug =over 4 =item Usage: nodebug =item Description: Turn off debugging messages. This is equivalent to setting B<$DEBUG> variable to 0. =item See also: debug =back =head2 normalize =over 4 =item Usage: normalize B =item Description: B puts all text nodes in the full depth of the sub-tree underneath each node selected by a given B, into a "normal" form where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes. =back =head2 open =over 4 =item Usage: [open [HTMLEXMLEDOCBOOK] [FILEEPIPEESTRING]] B=B =item Description: Load a new XML, HTML or SGML DOCBOOK document from the file (actually arbitrary URL), command output or string provided by the B. In XSH the document is given a symbolic name B. To identify the documentin commands like close, save, validate, dtd or enc simply use B. In commands which work on document nodes, give B: prefix to XPath expressions to point the XPath to the document. Example: xsh> open x=mydoc.xml # open a document # open a HTML document from the Internet xsh> open HTML h="http://www.google.com/?q=xsh" # quote file name if it contains whitespace xsh> open y="document with a long name with spaces.xml" # you may omit the word open when loading an XML file/URI. xsh> z=mybook.xml # use HTML or DOCBOOK keywords to load these types xsh> open HTML z=index.htm # use PIPE keyword to read output of a command xsh> open HTML PIPE z='wget -O - xsh.sourceforge.net/index.html' # use z: prefix to identify the document opened with the # previous comand in an XPath expression. xsh> ls z://chapter/title =back =head2 options =over 4 =item Usage: options =item Aliases: flags =item Description: List current values of all XSH flags and options (such as validation flag or query-encoding). Example: Store current settings in your .xshrc xsh> options | cat > ~/.xshrc =back =head2 parser-completes-attributes =over 4 =item Usage: parser-completes-attributes B =item Aliases: complete_attributes complete-attributes parser_completes_attributes =item Description: If the expression is non-zero, this command allows XML parser to complete the elements attributes lists with the ones defaulted from the DTDs. By default, this option is enabled. This command is equivalent to setting the B<$PARSER_COMPLETES_ATTRIBUTES> variable. =back =head2 parser-expands-entities =over 4 =item Usage: parser_expands_entities B =item Aliases: parser_expands_entities =item Description: Enable the entity expansion during the parse process if the B is non-zero, disable it otherwise. If entity expansion is off, any external parsed entities in the document are left as entities. Defaults to on. This command is equivalent to setting the B<$PARSER_EXPANDS_ENTITIES> variable. =back =head2 parser-expands-xinclude =over 4 =item Usage: parser_expands_xinclude B =item Aliases: parser_expands_xinclude =item Description: If the B is non-zero, the parser is allowed to expand XIinclude tags imidiatly while parsing the document. This command is equivalent to setting the B<$PARSER_EXPANDS_XINCLUDE> variable. =item See also: process_xinclude_command =back =head2 pedantic-parser =over 4 =item Usage: pedantic_parser B =item Aliases: pedantic_parser =item Description: If you wish, you can make XML::LibXML more pedantic by passing a non-zero B to this command. This command is equivalent to setting the B<$PEDANTIC_PARSER> variable. =back =head2 perl =over 4 =item Usage: eval B =item Aliases: eval =item Description: Evaluate a given perl expression. =item See also: count_command =back =head2 prev =over 4 =item Usage: prev [B] =item Description: This command is only allowed inside an B loop. It returns the iteration one step back, to the previous node on the iterated axis. The optional B argument may be used to indicate to which level of nested loops the command applies to. =item See also: iterate redo_command last_command next_command =back =head2 print =over 4 =item Usage: print B [B ...] =item Aliases: echo =item Description: Interpolate and print a given expression(s). =back =head2 process-xinclude =over 4 =item Usage: process_xinclude [B] =item Aliases: process_xinclude process-xincludes process_xincludes xinclude xincludes load_xincludes load-xincludes load_xinclude load-xinclude =item Description: Process any xinclude tags in the document B. =item See also: parser_expands_xinclude =back =head2 pwd =over 4 =item Usage: pwd =item Description: Print XPath leading to the current context node. This is equivalent to B. =item See also: locate_command =back =head2 query-encoding =over 4 =item Usage: query-encoding B =item Aliases: query_encoding =item Description: Set the default query character encoding. This command is equivalent to setting the B<$QUERY_ENCODING> variable. =back =head2 quiet =over 4 =item Usage: quiet =item Description: Turn off verbose messages. This command is equivalent to setting the B<$QUIET> variable. =item See also: verbose =back =head2 recovering =over 4 =item Usage: recovering B =item Description: Turn on recovering parser mode if the B is non-zero or off otherwise. Defaults to off. Note, that the in the recovering mode, validation is not performed by the parser even if the validation flag is on and that recovering mode flag only influences parsing of XML documents (not HTML). The recover mode helps to efficiently recover documents that are almost well-formed. This for example includes documents without a close tag for the document element (or any other element inside the document). This command is equivalent to setting the B<$RECOVERING> variable. =back =head2 redo =over 4 =item Usage: redo [B] =item Description: The redo command restarts a loop block without evaluating the conditional again. The optional B argument may evaluate to a positive integer number that indicates which level of the nested loops should be restarted. If omitted, it defaults to 1, i.e. the innermost loop. Using this command outside a loop causes an immediate run-time error. Example: Restart a higher level loop from an inner one while ($i<100) { # ... foreach //para { # some code if $param { redo; # redo foreach loop } else { redo 2; # redo while loop } } } =item See also: foreach while iterate next_command last_command =back =head2 register-function =over 4 =item Usage: register-function B B =item Aliases: regfunc =item Description: EXPERIMENTAL! Register given perl code as a new XPath extension function under a name provided in the first argument (B). XML::LibXML DOM API may be used in the perl code for object processing. If the name contains a colon, then the first part before the colon must be a registered namespace prefix (see B) and the function is registered within the corresponding namespace. =back =head2 register-namespace =over 4 =item Usage: register-namespace B B =item Aliases: regns =item Description: Registers the first argument as a prefix for the namespace given in the second argument. The prefix can later be used in XPath expressions. =back =head2 register-xhtml-namespace =over 4 =item Usage: register-xhtml-namespace B =item Aliases: regns-xhtml =item Description: Registers a prefix for the XHTML namespace. The prefix can later be used in XPath expressions. =back =head2 register-xsh-namespace =over 4 =item Usage: register-xsh-namespace B =item Aliases: regns-xsh =item Description: Registers a new prefix for the XSH namespace. The prefix can later be used in XPath expressions. Note, that XSH namespace is by default registered with B prefix. This command is thus, in general, useful only when some document uses B prefix for a different namespace. =back =head2 remove =over 4 =item Usage: remove B =item Aliases: rm prune delete del =item Description: Remove all nodes matching B. Example: Get rid of all evil creatures. xsh> del //creature[@manner='evil'] =back =head2 rename =over 4 =item Usage: rename B B =item Description: This command is very similar to the B command, except that it operates on nodes' names rather than their dataEvalues. For every element, attribute or processing-instruction matched by the B expression the following procedure is used: 1) the name of the node is stored into Perl's B<$_> variable, 2) the B is evaluated, and 3) the (posibly changed) content of the B<$_> variable is used as a new name for the node. Example: Renames all hobbits to halflings xsh> map $_='halfling' //hobbit Example: Make all elements and attributes uppercase xsh> map { $_=uc($_) } (//*|//@*) =item See also: map_command =back =head2 return =over 4 =item Usage: return =item Description: This command immediatelly stops the execution of a procedure it occurs in and returns the execution to the place of the script from which the subroutine was called. Using this command outside a subroutine causes an immediate run-time error. =item See also: def call_command =back =head2 run-mode =over 4 =item Usage: run-mode =item Aliases: run_mode =item Description: Switch into normal XSH mode in which all commands are executed. This is equivalent to setting B<$TEST_MODE> variable to 0. =item See also: test_mode =back =head2 save =over 4 =item Usage: save [HTMLEXMLEXInclude] [FILEEPIPEESTRING] B B [encoding B] or save B or save =item Description: Save the document identified by B. Using one of the B, B, B keywords the user may choose to save the document to a file send it to a given command's input via a pipe or simply return its content as a string. If none of the keywords is used, it defaults to FILE. If saving to a PIPE, the B argument must provide the coresponding command and all its parameters. If saving to a FILE, the B argument may provide a filename; if omitted, it defaults to the original filename of the document. If saving to a STRING, the B argument is ignored and may freely be omitted. The output format is controlled using one of the XML, HTML, XInclude keywords (see below). If the format keyword is ommited, save it defaults to XML. Note, that a document should be saved as HTML only if it actually is a HTML document. Note also, that the optional encoding parameter forces character conversion only; it is up to the user to declare the document encoding in the appropriate HTML EMETAE tag. The XInclude keyword automatically implies XML format and can be used to force XSH to save all already expanded XInclude sections back to their original files while replacing them with Exi:includeE tags in the main XML file. Moreover, all material included within EincludeE elements from the http:EEwww.w3.orgE2001EXInclude namespace is saved to separate files too according to the B attribute, leaving only empty EincludeE element in the root file. This feature may be used to split the document to new XInclude fragments. The encoding keyword followed by a B can be used to convert the document from its original encoding to a different encoding. In case of XML output, the E?xml?E declaration is changed accordingly. The new encoding is also set as the document encoding for the particular document. Example: Use save to preview a HTML document in Lynx save HTML PIPE mydoc 'lynx -stdin' =item See also: open_command close_command print_enc_command files_command =back =head2 select =over 4 =item Usage: select B =item Description: Make B the document identifier to be used in the next xpath evaluation without identifier prefix. Example: xsh> a=mydoc1.xml # opens and selects a xsh> ls / # lists a xsh> b=mydoc2.xml # opens and selects b xsh> ls / # lists b xsh> ls a:/ # lists and selects a xsh> select b # does nothing except selecting b xsh> ls / # lists b =back =head2 set-enc =over 4 =item Usage: set-enc B [B] =item Description: Changes character encoding of a given document. If no document B is given, the command applies to the current document. This has two effects: changing the XMLDecl encoding declaration in the document prolog to display the new encoding and making all future B operations on the document default to the given charset. Example: xsh> ls ... xsh> set-enc "utf-8" xsh> ls ... xsh> save # saves the file in UTF-8 encoding =item See also: print_enc_command doc_info_command =back =head2 set-standalone =over 4 =item Usage: set-standalone B [B] =item Description: Changes the value of B declaration in the XMLDecl prolog of a document. The B should evaluate to either 1 or 0 or B<'yes'> or B<'no'>. The result of applying the command on other values is not specified. If no document B is given, the command applies to the current document. =item See also: doc_info_command =back =head2 skip-dtd =over 4 =item Usage: skip-dtd B =item Aliases: skip_dtd =item Description: If the value of B is 1 (non-zero), DTD DOCTYPE declaration is omitted from any serialization of XML documents (including B and B). Default value is B<0>. This command is equivalent to setting the B<$SKIP_DTD> variable. =back =head2 sort =over 4 =item Usage: sort BEB B %B =item Description: EXPERIMENTAL! This command is not yet guaranteed to remain in the future releases. DOCUMENTATION OBSOLETE! Syntax changed! This command may be used to sort the node-list stored in the node-list variable B. First, for each node in the node-list %B, the first argument (either a B or B expression), which serves as a sorting criterion, is evaluated in the context of the node and the obtained value is stored together with the node. (In case of B the result of whatever type is cast to a string). Then perl's sorting algorithm is used to sort the nodelist, consulting the second, B, argument to compare nodes. Before the B is evaluated, the values obtained from the previous evaluation of the sorting crierion argument on the two nodes being compared are stored into B<$a> and B<$b> variables in the respective order. The B being consulted is supposed to return either -1 (the first node should come first), 0 (no order precedence), or 1 (the second node should come first). Note that Perl provides very convenient operators B and B=E> for string and numeric comparison of this kind as shown in the examples below. Remember that B (unlike B, B, or B) evaluates the first B argument (the sorting criterion) in a way to obtain a string. Thus you need not to bother with wrapping node-queries with a B function but you must remember to explicitly wrap the expression in B if the number of the nodes is to be the sorting criterion. Example: Sort creatures by name (XPath-based sort) in ascending order using current locale settings xsh> local %c=/middle-earth[1]/creatures xsh> sort @name { use locale; lc($a) cmp lc($b) } %c xsh> xmove %c into /middle-earth[1]# replaces the creatures Example: Sort (descending order) a node-list by score (Perl-based sort) xsh> sort { $scores{ literal('@name') } } { $b <=> $a } %players =back =head2 stream =over 4 =item Usage: stream input [FILEEPIPEESTRING] B output [FILEEPIPEESTRING] B select B B [ select B B ... ] =item Description: EXPERIMENTAL! This command provides a memory efficient (though slower) way to process selected parts of an XML document with XSH. A streaming XML parser (SAX parser) is used to parse the input. The parser has two states which will be refered to as A and B below. The initial state of the parser is A. In the state A, only a limited vertical portion of the DOM tree is built. All XML data comming from the input stream other than start-tags are immediatelly copied to the output stream. If a new start-tag of an element arrives, a new node is created in the tree. All siblings of the newly created node are removed. Thus, in the state A, there is exactly one node on every level of the tree. After a node is added to the tree, all the B expressions following the B