=pod =encoding utf8 =head1 NAME Muldis::D::Dialect::PTMD_STD - How to format Plain Text Muldis D =head1 VERSION This document is Muldis::D::Dialect::PTMD_STD version 0.79.1. =head1 PREFACE This document is part of the Muldis D language specification, whose root document is L; you should read that root document before you read this one, which provides subservient details. =head1 DESCRIPTION This document outlines the grammar of the I dialect named C. The fully-qualified name of this Muldis D dialect, in combination with the base language spec it is bundled with, is C (when the bundled base language version is substituted for the C). This dialect is designed to exactly match the Muldis D system catalog (the possible representation of Muldis D code that is visible to or updateable by Muldis D programs at runtime) as to what non-critical meta-data it explicitly stores; so code in the C dialect should be round-trippable with the system catalog with the result maintaining all the details that were started with. Since it matches the system catalog, this dialect should be able to exactly represent all possible Muldis D base language code (and probably all extensions too), rather than a subset of it. That said, the C dialect does provide a choice of multiple syntax options for writing Muldis D value literals and DBMS entity (eg type and routine) declarations, so several very distinct C code artifacts may parse into the same system catalog entries. There is even a considerable level of abstraction in some cases, so that it is easier for programmers to write and understand typical C code, and so that this code isn't absurdly verbose. This dialect is designed to be as small as possible while meeting the above criteria, and is designed such that a parser that handles all of this dialect can be fairly small and simple. Likewise, a code generator for this dialect from the system catalog can be fairly small and simple. A significant quality of the C dialect is that it is designed to work easily for a single-pass parser, or at least a single-pass lexer; all the context that one needs to know for how to parse or lex any arbitrary substring of code is provided by prior code, or any required lookahead is just by a few characters in general. Therefore, a C parser can easily work on a streaming input like a file-handle where you can't go back earlier in the stream. Often this means a parser can work with little RAM. Also the dialect is designed that any amount of whitespace can be added or omitted next to most non-alphanumeric characters (which happen to be next to alphanumeric tokens) without that affecting the meaning of the code at all, except obviously for within character string literals. And long binary or character or numeric or identifier strings can be split into arbitrary-size substrings, without affecting the meaning. And many elements are identified by name rather than ordinal position, so to some degree the order they appear has no effect on the meaning. So programmers can easily format (separate, indent, linewrap, order) code how they like, and making an automated code reformatter shouldn't be difficult. Often, named elements can also be omitted entirely for brevity, in which case the parser would use context to supply default values for those elements. Given that plain text is (more or less) universally unambiguously portable between all general purpose languages that could be used to implement a DBMS, it is expected that every single Muldis D implementation will natively accept input in the C dialect, which isn't dependent on any specific host language and should be easy enough to process, so it should be considered the safest official Muldis D dialect to write in by default, when you don't have a specific reason to use some other dialect. See also the dialects L and L, which are derived directly from C, and represent possible Perl 6 and 5 concrete syntax trees for it; in fact, most of the details in common with those other dialects are described just in the current file, for all 3 dialects. =head1 GENERAL STRUCTURE A C Muldis D code file consists just of a full or partial Muldis D C routine definition, which begins with a language name declaration, and otherwise is simply an ordered sequence of imperative routine calls, where earlier routine calls are to system-defined data-definition routines (their arguments are values to put in the system catalog), and later ones are then to user-defined routines that the earlier statements either loaded or defined. This is conceptually what a C file is, and it can even be that literally, but C provides a canonical further abstraction which should be used when doing data-definition. And so you typically use syntax resembling routine and type declarations in a general purpose programming language, where simply declaring such an entity will cause it to be written into the system catalog for subsequent use. The grammar in this file is formatted as a hybrid between various EBNF flavors and Perl 6 rules (see L for details on the latter) with further changes. It is only meant to be illustrative and human readable, and would need significant changes to actually be a functional parser, which are different for each parser toolkit. The grammar consists mainly of named I which define matching rules. Loosely speaking, each parser match of a token corresponds to a capture I or node element in the concrete syntax tree resulting from the parse; in practice, the parser may make various alterations to the match when generating a node, such as adding guide keywords corresponding to the token name, or by merging series of trivial tokens or doing escaped character substitutions. No explicit capture syntax such as parenthesis is used in the grammar. To help understand the grammar in this file, here are a few guidelines: 1. The grammar is exactly the same as that of a Perl 6 rule except where these guidelines state otherwise; this includes that square brackets mean grouping not optionality, and that when multiple sub-pattern alternatives match, the one that is the longest wins. 2. The grammar portion that actually declares a token, that is what associates a token name with its definition body, is formatted like EBNF, as C<< ::= ... >> rather than the Perl 6 way like C or C. 3. All non-quoted whitespace is not significant and just is formatting the grammar itself; rather, whitespace rules in the grammar are spelled out explicitly such as with C<\s*> (optional whitespace) and C<\s+> (mandatory whitespace). The root grammar token for the entire dialect is C. =head1 BOOTLOADER Grammar: ::= [\s+ [ | | ** \s+ ]]? A C node has 1..N ordered elements where the first element is a C node and then either: 1. there is exactly one (second) element that is a C node or a C node; 2. there are 1..N ordered elements where each is a C node; 3. there are no other elements, making the bootloader a no-op. See the pod sections in this file named L, L, L, and L for more details about the aforementioned tokens/nodes. When Muldis D is being compiled and invoked piecemeal, such as because the Muldis D implementing virtual machine (VM) is attached to an interactive user terminal, or the VM is embedded in a host language where code in the host language invokes Muldis D code at various times, the conceptual C is usually split up, and so not every Muldis D code fragment would then have its own C. Usually a C would be supplied to the Muldis D VM just once as a VM configuration step, which provides a context for further interaction with the VM that just involves Muldis D code that isn't itself qualified with a C. =head1 LANGUAGE NAME Grammar: ::= \s* ':' \s* \s* ':' \s* \s* ':' \s* [\s* ':' \s* ]? ::= Muldis_D ::= ::= ::= PTMD_STD ::= As per the VERSIONING pod section of L, code written in Muldis D must start by declaring the fully-qualified Muldis D language name it is written in. The C dialect formats this name as a C node having 4-5 ordered elements: =over =item C This is the Muldis D language base name; it is simply the bareword character string C. =item C This is the base authority; it is a character string formatted as per a specific-context C value literal; it is typically the delimited character string C. =item C This is the base version number; it is a character string formatted as per C; it is typically a character string like C<1.2.3>. =item C This is the dialect name; it is simply the bareword character string C. =item C Optional; this is a set of chosen pragma/parser-config options as per a C SCVL; see the L pod section for more details. =back Examples: Muldis_D:"http://muldis.com":"1.2.3":PTMD_STD Muldis_D:"http://muldis.com":"1.2.3":PTMD_STD:{ ... } =head1 VALUE LITERALS AND SELECTORS Grammar: ::= | ::= | | | | | | | | | | | | | | | | | ::= | | | | | | | A C node is a Muldis D value literal, which is a common special case of a Muldis D value selector. Unlike value selectors in general, which must be composed beneath a C or C because they actually represent a Muldis D value expression tree of a function or updater or type definition, a C node does I represent an expression tree, but rather a value constant; by definition, a C can be completely evaluated at compile time. A C consisting directly of a C is hence just a serialized Muldis D value. The PTMD_STD grammar subsection for value literals (having the root grammar token C) is completely self-defined and can be used in isolation from the wider grammar as a Muldis D sub-language; for example, a hosted-data Muldis D implementation may have an object representing a Muldis D value, which is initialized using code written in that sub-language. Every grammar token, and corresponding capture node, representing a Muldis D value literal is similarly formatted and has 1-3 elements; the following pod section L describes the similarities once for all of them, in terms of an alternate C token definition which is called C. And then the other pod sections specific to each kind of value literal then just focus on describing their unique aspects, namely their I. An C node represents a conceptually opaque Muldis D value, such that every one of these values is defined with its own literal syntax that is compact and doesn't look like a collection of other nodes; this includes the basic numeric and string literals. A C node represents a conceptually transparent Muldis D value, such that every one of these values is defined visibly in terms of a collection of other nodes; this includes the basic tuple and relation selectors. =head2 Value Literal Common Elements A I (or I) is a value literal that can be properly interpreted in a context that is expecting I value but has no expectation that said value belongs to a specific data type; in the general case, a GCVL includes explicit I meta-data (such as, "this is an C" or "this is a C"); but with a few specific data types (see the C node description for details) that meta-data may be omitted for brevity because the main literal has mutually uniquely identifying characteristics. For example, each element of a generic Muldis D collection value, such as a member of an array or tuple, could potentially have any type at all. In contrast, a I (or I) is a value literal that does not include explicit value kind meta-data, even when the main literal doesn't have uniquely identifying characteristics, because the context of its use supplies said meta-data. For example, in a tuple value literal it is assumed that a value literal in an attribute name position must denote a C. The grammar token C|C denotes a GCVL, as do most short-named grammar tokens, like C or C; in contrast, a grammar token containing C denotes a SCVL, like C or C. Every GCVL has 1-3 elements, illustrated by this grammar: ::= [ \s* ':' \s* [ \s* ':' \s*]? ]? ::= Bool | Order | RatRoundMeth | Int | NNInt | PInt | Rat | NNRat | PRat | Blob | OctetBlob | Text | Name | NameChain | DeclNameChain | Comment | Instant | Duration | UTC [Instant | DateTime | Date | Time] | Float [Instant | DateTime | Date | Time] | UTCDuration | RatRoundRule | String | BString | OString | UCPString | DH? Scalar | DH? Tuple | Database | DH? Relation | DH? Set | DH? [Maybe | Single] | DH? Array | DH? Bag ::= ::= | | | | | | | | | | | | | | | | | | | | | | | | | So a C|C node has 1-3 elements in general: =over =item C This is a character string of the format C<< <[A..Z]> <[ a..z A..Z ]>+ >>; it identifies the data type of the value literal in broad terms and is the only external meta-data of C generally necessary to interpret the latter; what grammars are valid for C depend just on C. For all values of just the 7 data types [C, C, C, C, C, C, C], the C portion of a GCVL may be omitted for brevity, but the code parser should still be able to infer it easily by examining the first few characters of the C, which for each of said 7 data types has a mutually uniquely identifying format, which is also distinct from all possible C. Note that omission of C is only allowed when the GCVL doesn't include a C element. Note that, by the above criteria, the C type could easily have been an 8th member of the list; however, it is excluded on purpose because in practice Muldis D code can contain SCVL C literals almost anywhere as meta-data for that code, and so having an explicit C of C means that that particular C literal is to be treated as ordinary data like with any GCVL. For just these certain special values of other data types, the same option of omitting the C (and C) applies: C, C, C, C. =item C This is a Muldis D data type name, for example C; it identifies a specific subtype of the generic type denoted by C, and serves as an assertion that the Muldis D value denoted by C is a member of the named subtype. Iff C is C<[|DH]Scalar> then C is mandatory; otherwise, C is optional for all C, except that C must be omitted when C is one of the 2 [C, C]; this isn't because those 2 types can't be subtyped, but because in practice doing so isn't useful. How a Muldis D parser treats a C node with a C element depends on the wider context. In the general case where the C is an C beneath the context of a C or C node, the C is treated as if it had an extra parent C node that invokes the C function and whose 2 argument nodes are as follows: C gets the C without the C element, and C gets the C element. This means that in general the C assertion is done at runtime. In the common special case where both C is an C and C refers to a system-defined type, then the C assertion is done at compile time, and then the C element is simply eliminated, so the C ends up simply as itself with no new C parent. =item C This is mandatory for all C. =back For GCVL and SCVL examples, see the subsequent documentation sections. =head1 OPAQUE VALUE LITERALS =head2 Boolean Literals Grammar: ::= [Bool \s* ':' \s*]? ::= false | ⊥ | true | ⊤ A C node represents a logical boolean value. It is interpreted as a Muldis D C value as follows: The C is a bareword character string formatted as per a C SCVL, and it maps directly to the C possrep of the C type. Examples: Bool:true false ⊤ ⊥ =head2 Order-Determination Literals Grammar: ::= [Order \s* ':' \s*]? ::= increase | same | decrease An C node represents an order-determination. It is interpreted as a Muldis D C value as follows: The C is a bareword character string formatted as per a C SCVL, and it maps directly to the C possrep of the C type. Examples: Order:same decrease =head2 Rational Rounding Method Literals Grammar: ::= [ RatRoundMeth \s* ':' \s* [ \s* ':' \s*]? ]? ::= half_down | half_up | half_even | to_floor | to_ceiling | to_zero | to_inf A C node represents a rational rounding method. It is interpreted as a Muldis D C value as follows: The C is a bareword character string formatted as per a C SCVL, and it maps directly to the only possrep of the C type. Examples: RatRoundMeth:half_up to_zero =head2 General Purpose Integer Numeric Literals Grammar: ::= [ [Int | NNInt | PInt] \s* ':' \s* [ \s* ':' \s*]? ]? ::= \s* ';' \s* | ::= ::= [0 | \-?] ::= ? ::= <[ 1..9 A..Z ]> ::= [[_?<[ 0..9 A..Z ]>+]+] ** [\s* '~' \s*] ::= [0 | \-?] ::= ? ::= <[ 1..9 ]> ::= [[_?<[ 0..9 ]>+]+] ** [\s* '~' \s*] An C node represents an integer numeric value. It is interpreted as a Muldis D C value as follows: If the C is composed of a C plus C, then the C is interpreted as a base-I integer where I might be between 2 and 36, and the C says which possible value of I to use. Assuming all C column values are between zero and I-minus-one, the C contains that I-minus-one. So to specify, eg, bases [2,8,10,16], use C of [1,7,9,F]. If the C is a C, then it is interpreted as a base 10 integer. Fundamentally the I part of an C node consists of a string of digits and uppercased (but not lowercased) letters, where each digit (C<0..9>) represents its own number and each letter (C) represents a number in [10..35]. A I may optionally contain underscore characters (C<_>), which exist just to help with visual formatting, such as for C<10_000_000>, and these are ignored/stripped by the parser. A I may optionally be split into 1..N segments where each segment is separated by a tilde token (C<~>); this segmenting ability is provided to support code that contains very long numeric literals while still being well formatted (no extra long lines); the tilde tokens are also ignored/stripped by the parser, and the I is interpreted as if all its alphanumeric characters were contiguous. If the C of a C node is C or C rather than C, then the C node is interpreted simply as an C node whose C is C or C, and the allowed I is appropriately further restricted. Examples: Int:1;11001001 # binary # 7;0 # octal # 7;644 # octal # -34 # decimal # 42 # decimal # F;DEADBEEF # hexadecimal # Z;-HELLOWORLD # base-36 # 3;301 # base-4 # B;A09B # base-12 # =head2 General Purpose Rational Numeric Literals Grammar: ::= [ [Rat | NNRat | PRat] \s* ':' \s* [ \s* ':' \s*]? ]? ::= \s* ';' \s* | ::= \. | \s* \/ \s* | \s* \* \s* \s* \^ \s* ::= \. | \s* \/ \s* | \s* \* \s* \s* \^ \s* A C node represents a rational numeric value. It is interpreted as a Muldis D C value as follows: Fundamentally a C node is formatted and interpreted like an C node, and any similarities won't be repeated here. The differences of interpreting a C being composed of a C plus C versus the C being a C are as per the corresponding differences of interpreting an C. Also interpreting a C or C is as per a C or C. If the I part of a C node contains a radix point (C<.>), then it is interpreted as is usual for a programming language with such a literal. If the I part of a C node contains a solidus (C), then the rational's value is interpreted as the leading integer (a numerator) divided by the trailing positive integer (a denominator); that is, the two integers collectively map to the C possrep of the C type. If the I part of a C node contains a asterisk (C<*>) plus a circumflex accent (C<^>), then the rational's value is interpreted as the leading integer (a mantissa) multiplied by the result of the middle positive integer (a radix) taken to the power of the trailing integer (an exponent); that is, the three integers collectively map to the C possrep of the C type. Examples: Rat:1;-1.1 -1.5 # same val as prev # 3.14159 A;0.0 F;DEADBEEF.FACE Z;0.000AZE Rat:6;500001/1000 B;A09B/A Rat:1;1011101101*10^-11011 45207196*10^37 1/43 314159*10^-5 =head2 General Purpose Binary String Literals Grammar: ::= [ [Blob | OctetBlob] \s* ':' \s* [ \s* ':' \s*]? ]? ::= \s* ';' \s* ::= <[137F]> ::= [ <[']> <[ 0..9 A..F ]>* <[']> ] ** [\s* '~' \s*] A C node represents a general purpose bit string. It is interpreted as a Muldis D C value as follows: Fundamentally the I part of a C node consists of a delimited string of digits and uppercased (but not lowercased) letters, where each digit (C<0..9>) represents its own number and each letter (C) represents a number in [10..15]; this string is qualified with a C character (C<[137F]>), similarly to how an C is qualified by a C. Each character of the delimited string specifies a sequence of one of [1,2,3,4] bits, depending on whether C is [1,3,7,F]. If the C of a C node is C rather than C, then the C node is interpreted simply as an C node whose C is C, and the delimited string is appropriately further restricted. Examples: Blob:1;'00101110100010' # binary # 3;'' F;'A705E' # hexadecimal # 7;'523504376' =head2 General Purpose Character String Literals Grammar: ::= [ Text \s* ':' \s* [ \s* ':' \s*]? ]? ::= [ <[']> [<-[\\\'\t\n\f\r]> | ]* <[']> ] ** [\s* '~' \s*] ::= '\b' | '\a' | '\q' | '\h' | '\s' | '\t' | '\n' | '\f' | '\r' | '\c<' [ [<[ A..Z ]>+] ** ' ' | [0 | <[ 1..9 ]> <[ 0..9 ]>*] | <[ 1..9 A..Z ]> ';' [0 | <[ 1..9 A..Z ]> <[ 0..9 A..Z ]>*] ] '>' A C node represents a general purpose character string. It is interpreted as a Muldis D C value as follows: The C is interpreted generally as is usual for a programming language with such a delimited character string literal. A C may optionally be split into 1..N segments where each segment is delimited by apostrophes/single-quotes (C<'>) and separated by a tilde token (C<~>); this segmenting ability is provided to support code that contains long string literals while still being well formatted (no extra long lines); the tilde tokens and adjoining string delimiters are ignored/stripped by the parser, and the C is interpreted as if it just consisted of a single delimited string. All Muldis D delimited character string literals (generally the 3 C, C, C) may contain some characters denoted with escape sequences rather than literally. The Muldis D parser would substitute the escape sequences with the characters they represent, so the resulting character string values don't contain those escape sequences. Currently there are 2 classes of escape sequences, called I and I. The meanings of the simple escape sequences are: Esc | Unicode | Unicode | Chr | Literal character used Seq | Codepoint | Character Name | Lit | for when not escaped ----+-----------+-----------------+-----+------------------------------ \b | F;5C | REVERSE SOLIDUS | \ | esc seq lead (aka backslash) \a | F;27 | APOSTROPHE | ' | delim Text literals \q | F;22 | QUOTATION MARK | " | delim quoted Name literals \h | F;23 | NUMBER SIGN | # | delim Comment lit (aka hash) \s | F;20 | SPACE | | space char \t | F;9 | CHAR... TAB... | | control char horizontal tab \n | F;A | LINE FEED (LF) | | ctrl char line feed / newline \f | F;C | FORM FEED (FF) | | control char form feed \r | F;D | CARR. RET. (CR) | | control char carriage return One design decision of PTMD_STD that is distinct from typical other languages is that an escape sequence for any character used as a delimiter I contains that literal character. For example, while in SQL or Perl character strings delimited by C<'>, they typically escape literal apostrophes/single-quotes as C<''> or C<\'>; while this is unambiguous, the task of parsing such code is considerably more difficult than it could be. In contrast, while in PTMD_STD character strings delimited by C<'>, a literal of the same is escaped with C<\a>; so parsing such code is an order of magnitude easier because the parser doesn't have to understand the internals of any character string literal in order to separate out the character string from its surrounding code. Another design decision of PTMD_STD that is distinct at least from Perl is that non-"space" whitespace characters in character string literals must never appear literally, but must instead be denoted with escape sequences. The main reason for this is to ensure that the actual values being selected by the string literals were not variable per the kind of linebreaks used to format the Muldis D source code itself. There is currently just one complex escape sequence, of the format C<< \c<...> >>, that supports specifying characters in terms of their Unicode abstract codepoint name or number. If the C<...> consists of just uppercased (not lowercased) letters and the space character, then the C<...> is interpreted as a Unicode character name. If the C<...> looks like an C, sans that underscores and tilde segmentation aren't allowed here, then the C<...> is interpreted as a Unicode abstract codepoint number. One reason for this feature is to empower more elegant passing of Unicode-savvy PTMD_STD source code through a communications channel that is more limited, such as to 7-bit ASCII. Examples: Text:'Ceres' 'サンプル' '' 'Perl' '\c\c\c<65>' =head2 DBMS Entity Name Literals Grammar: ::= Name \s* ':' \s* [ \s* ':' \s*]? ::= | ::= <[ a..z A..Z _ ]><[ a..z A..Z 0..9 _ - ]>* ::= [ <["]> [<-[\\\"\t\n\f\r]> | ]* <["]> ] ** [\s* '~' \s*] ::= NameChain \s* ':' \s* [ \s* ':' \s*]? ::= ['.' \s*]? ** [\s* '.' \s*] ::= DeclNameChain \s* ':' \s* [ \s* ':' \s*]? ::= | '[]' A C node represents a canonical short name for any kind of DBMS entity when declaring it; it is a character string type, that is disjoint from C. It is interpreted as a Muldis D C value as follows: Fundamentally a C node is formatted and interpreted like a C node, and any similarities won't be repeated here. Unlike a C literal which must always be delimited, a C has 2 variants, one delimited (C) and one not (C). The delimited C form differs from C only in that each string segment is delimited by double-quotes rather than apostrophes/single-quotes. A C is composed of a single alphabetic or underscore character followed by zero or more characters that are each alphanumeric or underscore or hyphen. It can not be segmented, so you will have to use the C equivalent if you want a segmented string. I A C node represents a canonical long name for invoking a DBMS entity in some contexts; it is conceptually a sequence of entity short names. This node is interpreted as a Muldis D C value as follows: A C consists of a sequence of 1 or more C where the elements of the sequence are separated by period (C<.>) tokens; each element of the sequence, in order, defines an element of the C possrep's attribute of the result C value. Now, strictly speaking, a Muldis D C value is supposed to have at least 2 elements in its sequence, and the first element of any sequence must be one of these 9 C values, which is a top-level namespace: C, C, C, C, C, C, C, C, C. (Actually, C is a 10th option, but that will be treated separately in this discussion.) In the general case, a C must be written out in full, so it is completely unambiguous (and is clearly self-documenting), and it is always the case that a C value in the system catalog is written out in full. But the PTMD_STD grammar also has a few commonly used special cases where a C may be a much shorter substring of its complete version, such that a simple parser, with no knowledge of any user-defined entities besides said shorter C in isolation, can still unambiguously resolve it to its complete version; exploiting these typically makes for code that is a lot less verbose, and much easier to write or read. The first special case involves any context where a type or routine is being referenced by name. In such a context, when the referenced entity is a standard system-defined type or routine, programmers may omit any number of consecutive leading chain elements from such a C, so long as the remaining unqualified chain is distinct among all standard system-defined (C-prefix) DBMS entities (but that as an exception, a non-distinct abbreviation is allowed iff exactly 1 of the candidate entities is in the language core, C-prefix, in which case that 1 is unambiguously the entity that is resolved to). This feature has no effect on the namespace prefixes like C or C or C; one still writes those as normal prepended to the otherwise shortened chains. When a C, whose context indicates it is a type or routine invocation, is encountered by the parser, and its existing first chain element isn't one of the other 8 top-level namespaces, then the parser will assume it is an unqualified chain in the C namespace and lookup the best / only match from the known C DBMS entities, to resolve to. So for example, one can just write C rather than C, C rather than C, C rather than C, C rather than C, C rather than C, and so on. In fact, the Muldis D spec itself uses such abbreviations frequently. The second special case involves any context where a value expression (including a parameter) or a variable is being referenced by name, such as with an C node. In such a context, any leading C element may be omitted; when a C, whose context indicates it is a value expression / etc reference, is encountered by the parser, and its existing first chain element isn't one of the other 8 top-level namespaces, then the parser will assume it is an unqualified chain in the C namespace and will prepend a C element to it. So for example a C<$foo> is treated as being C<$lex.foo> while a C<$dep.data.foo> is treated as itself. The third special case is an extension to the second special case that involves any context where a referenced-by-name value expression / etc has the declaration name C. In only such a context, a C may be prefixed with a chain-element-separator / period token instead of having a leading (post C omission) C element; a parser encountering a chain with a leading period will assume the chain sans that period is unqualified and will prepend both a C and C element to it. So for example a C<$.foo> is treated as being C<$lex.topic.foo>. Note that PTMD_STD doesn't confuse this with the use of an empty string chain element because those must always be delimited, so C<$"".foo> is still treated as C<$lex."".foo>. Note that the third special case may only be used to reference attributes of C<$lex.topic> (or attributes of those, etc), not C<$lex.topic> itself; you still have to use C<$topic> for that. Note that if C<$topic> is a scalar value, you still have to write the possrep name as normal, such as C<$.possrep.attr>. The fourth special case involves any context where a type is being referenced using the C namespace prefix feature described in L. In such a context, when the namespace prefix contains either of the optional chain elements C<[|dh_][tuple|relation]_from> or C<[|dh_][set|maybe|single|array|bag]_of>, programmers may omit the single prefix-leading C chain element. So for example, one can just write C rather than C, or C rather than C. This fourth special case is completely orthogonal to which of the 9 normal top-level namespaces is in use (implicitly or explicitly) by the chain being prefixed, and works for all 9 of them. A C node represents a canonical long name for declaring a DBMS entity in N-depth contexts; the format and interpretation of a C (but as a C value) is the same as a C but that the chain may have as few as zero parts rather than as few as 1 or 2, and a zero-element chain is represented by the special C syntax of C<[]>. Examples: Name:login_pass Name:"First Name" NameChain:fed.data.the_db.gene.sorted_person_names NameChain:fed.data.the_db.stats."samples by order" NameChain:.attr # same as NameChain:lex.topic.attr # DeclNameChain:gene.sorted_person_name DeclNameChain:stats."samples by order" DeclNameChain:[] =head2 Code Comment Literals Grammar: ::= Comment \s* ':' \s* [ \s* ':' \s*]? ::= [ '#' ** 2..* | '#' ' '* [<-[\\\#\t\n\f\r]> | ]* ' '* '#' ] ** \s+ A C node represents the text of a Muldis D code comment; it is a character string type, that is disjoint from both C and C. It is interpreted as a Muldis D C value as follows: Fundamentally a C node is formatted and interpreted like a C node, and any similarities won't be repeated here. The C differs from C only in that each string segment is delimited by number-signs/hash-marks rather than apostrophes/single-quotes, and also that: Note that any leading or trailing space (F;20) characters inside the C<#> delimiters of a C are also part of the delimiters, and are not part of the selected C value; if you want to denote a C value with leading or trailing space chars, you must write those space chars in an escaped form such as with C<\s>. Note that a run of 3+ C<#> is equivalent to exactly 2 adjacent ones, which denotes an empty comment segment. This feature exists to empower things like making visual dividing lines in the code just out of hash-marks. Note that the hash-mark does have other uses in PTMD_STD code besides delimiting comments, so since C may conceptually be placed almost anywhere in code, the other parts of the grammar that specifically enable this need to ensure appropriate measures are taken to avoid ambiguity, for example mandating that the comments are bounded by whitespace. Examples (the first is GCVL, the second is a SCVL): Comment:# This does something. # # So does this. # =head2 TAI Temporal Literals Grammar: ::= Instant \s* ':' \s* [ \s* ':' \s*]? ::= ::= Duration \s* ':' \s* [ \s* ':' \s*]? ::= An C node represents a single point in time which is specified in terms of of atomic seconds; it is a rational numeric type, that is disjoint from both C and C. This node is interpreted as a Muldis D C value as follows: An C is formatted and interpreted in the same way as a C. A C node represents a single amount of time (the difference between two instants) which is specified in terms of of atomic seconds; it is a rational numeric type, that is disjoint from both C and C. This node is interpreted as a Muldis D C value as follows: A C is formatted and interpreted in the same way as a C. Examples: Instant:1235556432.0 Instant:854309115.0 Duration:3600.0 Duration:-50.0 Duration:3.14159 Duration:1;1011101101*10^-11011 Duration:1/43 =head2 UTC and Float Temporal Literals Grammar: ::= UTC [Instant | DateTime | Date | Time] \s* ':' \s* [ \s* ':' \s*]? ::= ::= Float [Instant | DateTime | Date | Time] \s* ':' \s* [ \s* ':' \s*]? ::= ::= UTCDuration \s* ':' \s* [ \s* ':' \s*]? ::= \s* ';' \s* | ::= '[' \s* [? [\s* ',' \s*]] ** 5 \s* ',' \s* ? \s* ']' ::= '[' \s* [? [\s* ',' \s*]] ** 5 \s* ',' \s* ? \s* ']' A C node represents an "instant"/"datetime" value that is affiliated with the UTC time-zone. This node is interpreted as a Muldis D C value whose C possrep attribute values are defined as follows: A C consists mainly of a bracket-delimited sequence of 6 comma-separated elements, where each element is either a valid numeric literal or is completely absent. The 6 elements correspond in order to the 6 attributes: C, C, C, C, C, C. For each element that is absent or defined, the corresponding attribute has the C or a C value, respectively. For each of the first 5 elements, when it is defined, it must qualify as a valid I part of an C node; for the 6th element, when it is defined, it must qualify as a valid I part of a C node. Fundamentally each C node element is formatted and interpreted like an C or C node, and any similarities won't be repeated here. A defined C may be any integer, each of [C, C] must be a positive integer, each of [C, C] must be a non-negative integer, and C must be a non-negative rational number. If all 6 attributes are defined, then the new C value is also a C; if just the first 3 or last 3 are defined, then the value is not a C but rather a C or C, respectively; if any other combination of attributes are defined, then the value is just a C and not of any of the other 3 subtypes. If the C of a C node is C or C or C rather than C, then the C node is interpreted simply as a C node whose C is C or C or C, and the allowed I is appropriately further restricted. A C node represents an "instant"/"datetime" value that is "floating" / not affiliated with any time-zone. This node is interpreted as a Muldis D C value in an identical fashion to how a C node is interpreted, whose format it completely shares. Likewise regarding C. A C node represents a duration value, an amount of time, which is not fixed to any instant in time. This node is interpreted as a Muldis D C value whose C possrep attribute values are defined as follows: A C consists mainly of a bracket-delimited sequence of 6 comma-separated elements, where each element is either a valid numeric literal or is completely absent. The 6 elements correspond in order to the 6 attributes: C, C, C, C, C, C. For each element that is absent or defined, the corresponding attribute has the C or a C value, respectively. For each of the first 5 elements, when it is defined, it must qualify as a valid I part of an C node; for the 6th element, when it is defined, it must qualify as a valid I part of a C node. Mostly a C is formatted and interpreted like a C node, and any similarities won't be repeated here. A defined [C, C, C, C, C] may be any integer, and C may be any rational number. I has no system-defined subtypes, but that may change later.> Examples: UTCInstant:[1964,10,16,16,12,47.5] # a UTCDateTime # UTCInstant:[2002,12,16,,,] # a UTCDate # UTCInstant:[,,,14,2,29.0] # a UTCTime # FloatInstant:[2003,4,5,2,,] # min,sec unknown or N/A # FloatInstant:[1407,,,,,] # just know its sometime in 1407 # UTCDuration:[3,5,1,6,15,45.000012] =head2 Rational Rounding Rule Literals Grammar: ::= RatRoundRule \s* ':' \s* [ \s* ':' \s*]? ::= '[' \s* \s* ',' \s* \s* ',' \s* \s* ']' ::= ::= ::= A C node represents a rational rounding rule. It is interpreted as a Muldis D C value whose attributes are defined by the C. A C consists mainly of a bracket-delimited sequence of 3 comma-separated elements, which correspond in order to the 3 attributes: C (a C), C (an C), and C (a C). Each of C and C must qualify as a valid C, and C must qualify as a valid C. Examples: RatRoundRule:[10,-2,half_even] RatRoundRule:[2,-7,to_zero] =head2 Low Level Integer String Literals Grammar: ::= [String | BString | OString | UCPString] \s* ':' \s* [ \s* ':' \s*]? ::= \s* ';' \s* | ::= '[' \s* [ ** [\s* ',' \s*]]? \s* ']' ::= '[' \s* [ ** [\s* ',' \s*]]? \s* ']' A C node represents an integer string value. It is interpreted as a Muldis D C value as follows: A C consists mainly of a bracket-delimited sequence of 0..N elements, where each element must qualify as a valid I part of a C node, and the new C is conceptually that sequence of integers. Fundamentally each C node element is formatted and interpreted like an C node, and any similarities won't be repeated here. Examples: String:[80,101,114,109] # Unicode abstract codepoints = 'Perl' # String:F;[50,65,72,6C] # same thing # =head1 COLLECTION VALUE SELECTORS Note that, with each of the main value selector nodes documented in this main POD section (members of C etc), any occurrences of child C nodes should be read as being C nodes instead in contexts where instances of the main nodes are being composed beneath C nodes. That is, any C node options beyond what C options exist are only valid within a C node or C node. =head2 Scalar Selectors Grammar: ::= DH? Scalar \s* ':' \s* \s* ':' \s* ::= \s* ';' \s* ::= ::= A C node represents a literal or selector invocation for a scalar subtype value. It is interpreted as a Muldis D C subtype value whose declared type is specified by the node's (mandatory for C) C and whose attributes are defined by the C. The C is interpreted specifically as attributes of the declared type's possrep which is specified by the C. Each name+expr pair of the C defines a named possrep attribute of the new scalar; the pair's name and expr specify, respectively, the possrep attribute name, and the possrep attribute value. If the C of a C node is C rather than C, then the C node is interpreted simply as an C node that is appropriately further restricted; the C must name a C subtype, and the C must specify only deeply homogeneous typed attribute values. See also the definition of the catalog data type C, a tuple of which is what in general a C node distills to when it is beneath the context of a C or C node, as it describes some semantics. Examples: Scalar:sys.std.Core.Type.Rat:float;{ mantissa => 45207196, radix => 10, exponent => 37 } Scalar:sys.std.Temporal.Type.UTCDateTime:datetime;{ year => 2003, month => 10, day => 26, hour => 1, minute => 30, second => 0.0 } Scalar:fed.lib.the_db.WeekDay:name;{ "" => "monday" } Scalar:fed.lib.the_db.WeekDay:number;{ "" => 5 } =head2 Tuple Selectors Grammar: ::= DH? Tuple \s* ':' \s* [ \s* ':' \s*]? ::= | ::= '{' \s* [[ \s* '=>' \s* ] ** [\s* ',' \s*]]? \s* '}' ::= d0 A C node represents a literal or selector invocation for a tuple value. It is interpreted as a Muldis D C value whose attributes are defined by the C. Iff the C is a C then each name+expr pair of the C defines a named attribute of the new tuple; the pair's name and expr specify, respectively, the attribute name, and the attribute value. If the C of a C node is C rather than C, then the C node is interpreted simply as an C node that is appropriately further restricted; the C must specify only deeply homogeneous typed attribute values. Iff the C is a C then the C node is interpreted as the special value C aka C, which is the only C value with exactly zero attributes. Note that this is just an alternative syntax, as C can select that value too. See also the definition of the catalog data type C, a tuple of which is what in general a C node distills to when it is beneath the context of a C or C node, as it describes some semantics. Examples: Tuple:{} Tuple:d0 # same as previous # d0 # same as previous # Tuple:type.tuple_from.var.fed.data.the_db.account.users:{ login_name => 'hartmark', login_pass => 'letmein', is_special => true } Tuple:{ name => 'Michelle', age => 17 } =head2 Database Selectors Grammar: ::= Database \s* ':' \s* [ \s* ':' \s*]? ::= A C node represents a literal or selector invocation for a 'database' value. It is interpreted as a Muldis D C value whose attributes are defined by the C. Each name+relation pair of the C defines a named attribute of the new 'database'; the pair's name and relation specify, respectively, the attribute name, and the attribute value. While this grammar mentions that C is a C, it is in fact significantly further restricted, such that every attribute value of the C can only be a C. See also the definition of the catalog data type C, a tuple of which is what in general a C node distills to same as when C does. =head2 Relation Selectors Grammar: ::= DH? Relation \s* ':' \s* [ \s* ':' \s*]? ::= | | | ::= '{' \s* [ ** [\s* ',' \s*]]? \s* '}' ::= '{' \s* [ ** [\s* ',' \s*]]? \s* '}' ::= '[' \s* [ ** [\s* ',' \s*]]? \s* ']' \s* ';' \s* '{' \s* [ ** [\s* ',' \s*]]? \s* '}' ::= '[' \s* [ ** [\s* ',' \s*]]? \s* ']' ::= d0c0 | d0c1 A C node represents a literal or selector invocation for a relation value. It is interpreted as a Muldis D C value whose attributes and tuples are defined by the C, which is interpreted as follows: Iff the C is composed of just a C pair with zero elements between them, then it defines the only relation value having zero attributes and zero tuples. Iff the C is a C with at least one C element, then it defines the attribute names of a relation having zero tuples. Iff the C is a C with at least one element, then each element defines a tuple of the new relation; every must define a tuple of the same degree and have the same attribute names as its sibling ; these are the degree and attribute names of the relation as a whole, which is its heading for the current purposes. Iff the C is a C, then: The new relation value's attribute names are defined by the C elements, and the relation body's tuples' attribute values are defined by the C elements. This format is meant to be the most compact of the generic relation selector formats, as the attribute names only appear once for the relation rather than repeating for each tuple. As a trade-off, the attribute values per tuple from all of the C elements must appear in the same order as their corresponding attribute names appear in the collection of C elements, as the names and values in the relation literal are matched up by ordinal position here. Iff the C is a C then the C node is interpreted as one of the 2 special values C aka C, which are the only C values with exactly zero attributes. Note that this is just an alternative syntax, as other C formats can select those values too. If the C of a C node is C rather than C, then the C node is interpreted simply as an C node that is appropriately further restricted; the C specify only deeply homogeneous typed attribute values. See also the definition of the catalog data type C, a tuple of which is what in general a C node distills to when it is beneath the context of a C or C node, as it describes some semantics. Examples: Relation:{} # zero attrs + zero tuples # Relation:d0c0 # same as previous # Relation:{ x, y, z } # 3 attrs + zero tuples # Relation:{ {} } # zero attrs + 1 tuple # d0c1 # same as previous # Relation:{ { login_name => 'hartmark', login_pass => 'letmein', is_special => true } } # 3 attrs + 1 tuple # Relation:fed.lib.the_db.gene.Person:[ name, age ];{ [ 'Michelle', 17 ] } # 2 attrs + 1 tuple # =head2 Set Selectors Grammar: ::= DH? Set \s* ':' \s* [ \s* ':' \s*]? ::= '{' \s* [ ** [\s* ',' \s*]]? \s* '}' A C node represents a literal or selector invocation for a set value. It is interpreted as a Muldis D C value whose elements are defined by the C. Each C of the C defines a unary tuple of the new set; each C defines the C attribute of the tuple. If the C of a C node is C rather than C, then the C node is further restricted. See also the definition of the catalog data type C, a tuple of which is what in general a C node distills to when it is beneath the context of a C or C node, as it describes some semantics. Examples: Set:fed.lib.the_db.account.Country_Names:{ 'Canada', 'Spain', 'Jordan', 'Thailand' } Set:{ 3, 16, 85 } =head2 Maybe Selectors Grammar: ::= DH? [Maybe | Single] \s* ':' \s* [ \s* ':' \s*]? ::= | ::= '{' \s* \s* '}' ::= nothing | '∅' A C node represents a literal or selector invocation for a maybe value. It is interpreted as a Muldis D C value whose elements are defined by the C. Iff the C is a C then it defines either zero or one C; in the case of one, the C defines the unary tuple of the new maybe, which is a 'single'; the C defines the C attribute of the tuple. If the C of a C node is C or C<[|DH]Single> rather than C, then the C node is further restricted, either to having only deeply homogeneous resulting C or to having exactly one C, as appropriate. Iff the C is a C then the C node is interpreted as the special value C, aka C, aka I, aka C<∅>, which is the only C value with zero elements. Note that this is just an alternative syntax, as C can select that value too. As a further restriction, the C must be just one of C<[|DH]Maybe> when the C is a C. See also the definition of the catalog data type C, a tuple of which is what in general a C node distills to same as when C does. Examples: Maybe:{ 'I know this one!' } Maybe:nothing Maybe:∅ nothing ∅ =head2 Array Selectors Grammar: ::= DH? Array \s* ':' \s* [ \s* ':' \s*]? ::= '[' \s* [ ** [\s* ',' \s*]]? \s* ']' A C node represents a literal or selector invocation for a array value. It is interpreted as a Muldis D C value whose elements are defined by the C. Each C of the C defines a binary tuple of the new sequence; the C defines the C attribute of the tuple, and the C attribute of the tuple is generated such that the first C gets an C of zero and subsequent ones get consecutive higher integer values. If the C of a C node is C rather than C, then the C node is further restricted. See also the definition of the catalog data type C, a tuple of which is what in general a C node distills to when it is beneath the context of a C or C node, as it describes some semantics. Examples: Array:[ 'Alphonse', 'Edward', 'Winry' ] Array:fed.lib.the_db.stats.Samples_By_Order:[ 57, 45, 63, 61 ] =head2 Bag Selectors Grammar: ::= DH? Bag \s* ':' \s* [ \s* ':' \s*]? ::= | ::= '{' \s* [[ \s* '=>' \s* ] ** [\s* ',' \s*]]? \s* '}' ::= \s* ';' \s* | ::= '{' \s* [ ** [\s* ',' \s*]]? \s* '}' A C node represents a literal or selector invocation for a bag value. It is interpreted as a Muldis D C value whose elements are defined by the C, which is interpreted as follows: Iff the C is composed of just a C pair with zero elements between them, then it defines the only bag value having zero elements. Iff the C is a C with at least one C/C-pair element, then each pair defines a binary tuple of the new bag; the C defines the C attribute of the tuple, and the C defines the C attribute. Iff the C is a C with at least one element, then each C contributes to a binary tuple of the new bag; the C defines the C attribute of the tuple. The bag has 1 tuple for every distinct (after normalization or evaluation) C and C-derived value in the C, and the C attribute of that tuple says how many instances of said C there were. See also the definition of the catalog data type C, a tuple of which is what in general a C node distills to when it is beneath the context of a C or C node, as it describes some semantics. Further concerning C, because of how C is defined, a C has to be a compile time constant, since an integer is stored in the system catalog rather than the name of an expression node like with C; if you actually want the bag value being selected at runtime to have runtime-determined C values, then you must use a C node rather than a C node. Examples: Bag:fed.lib.the_db.inventory.Fruit:{ 'Apple' => 500, 'Orange' => 300, 'Banana' => 400 } Bag:{ 'Foo', 'Quux', 'Foo', 'Bar', 'Baz', 'Baz' } =head1 GENERIC VALUE EXPRESSIONS Grammar: ::= | | | | | | | | ::= ::= '$' ::= \s* \s* ::= ::= '::=' An C node is the general case of a Muldis D value expression tree (which normally denotes a Muldis D value selector), which must be composed beneath a C or C, or specifically into a function or updater or type or constraint (etc) definition, because in the general case an C can I be completely evaluated at compile time. An C node is a proper superset of a C node, and any occurrences of C nodes in this document may optionally be substituted with C nodes on a per-instance basis. An C node in the PTMD_STD grammar corresponds directly to a tuple of an attribute of a value of the catalog data type C, which is how a value expression node is actually represented in Muldis D's nonsugared form, which is as a component of the system catalog. Or more specifically, an entire tree of PTMD_STD C nodes corresponds to a set of said attribute values, one attribute value per C node. In the nonsugared form, every C node has an explicitly designated name, as per a PTMD_STD C node, and all child nodes are not declared inline with their parent nodes but rather are declared in parallel with them, and the parents refer to their children by their names. A feature of the PTMD_STD grammar is that expression nodes may be declared without explicit names, such that the parser would generate names for them when deriving system catalog entries, and that is why PTMD_STD supports, and encourages the use of for code brevity/readability, the use of inline-declared expression nodes, especially so when the C in question is an C. Iff an C is an C, then this typically means that the parent C is having at least one of its children declared with an explicit name rather than inline, same as the corresponding system catalog entry would do, and then the C is the invocation name of that child. Alternately, the C may be the invocation name of one of the expression-containing routine's parameters, in which case the C in question represents the current argument to that parameter; this also is exactly the same as a corresponding catalog entry for using an argument. Iff an C is a C, then then the C element of the C is being declared with an explicit name, and the C element of the C is that name. Note that a C node may not have a C node as its direct C child element. I node.> Examples: # an expr_name node # $foo_expr # a named_expr node # $bar_expr ::= factorial( $foo_expr ) =head2 Generic Function Invocation Expressions Grammar: ::= \s* ::= ::= '(' \s* [[ | ] ** [\s* ',' \s*]]? \s* ')' ::= \s* '=>' \s* ::= C ::= A C node represents the result of invoking a named inner/function with specific arguments. It is interpreted as a tuple of a Muldis D C value. The C element specifies the C attribute of the new C, which is the name of the function being invoked, and the C element specifies the C attribute. In the general case of a function invocation, all of the arguments are named, as per C, and formatting a C node that way is always allowed. In some (common) special cases, some (which might be all) arguments may be anonymous, as per C. With just functions in the top-level namespaces C and C, these 4 special cases apply: If a function has exactly one parameter, then it may be invoked with a single anonymous argument and the latter will bind to that parameter. Or, if a function has multiple parameters but exactly one of those is mandatory, then it may be invoked with just one anonymous argument, which is assumed to bind to the single mandatory parameter, and all optional arguments must be named. Or, if a function has multiple mandatory parameters and one of them is named C, then it may be invoked with a single anonymous argument and the latter will bind to that parameter. Or, if a function has multiple mandatory parameters and two of them are named C and C, then it may be invoked with two anonymous arguments and the latter will bind to those parameters in sequential order, the first one to C and the second one to C. With just functions in all top-level namespaces I C and C, these 2 special cases apply (similar to the prior-mentioned second 2): If a function invocation has either 1 or 2 anonymous arguments, then they will be treated as if they were named arguments for the C and C parameters; the only or sequentially first argument will bind to C, and any sequentially second argument will bind to C. One reason for this difference between treatment of top-level namespaces is it allows the Muldis D parser to convert all the anonymous arguments to named ones (all arguments in the system catalog are named) when parsing the expression-containing outer routine/etc in isolation from any other user-defined entities. The other reason for this limitation is that it helps with self-documentation; programmers wanting to know an anonymous argument's parameter name won't have to look outside the current outer routine/etc or the language spec to find the answer. I Examples: # zero params # nothing() # single mandatory param # Integer.median( Bag:{ 22, 20, 21, 20, 21, 21, 23 } ) # single mandatory param # factorial( topic => 5 ) # two mandatory params # Rational.quotient( dividend => 43.7, divisor => 16.9 ) # same as previous # Rational.quotient( divisor => 16.9, dividend => 43.7 ) # one mandatory param, two optional # inn.barfunc( $mand_arg, oa1 => $opt_arg1, oa2 => $opt_arg2 ) # same as previous # inn.barfunc( oa2 => $opt_arg2, $mand_arg, oa1 => $opt_arg1 ) # a user-defined non-inner function # dep.lib.foodb.bazfunc( a1 => 52, a2 => 'hello world' ) # two params named 'topic' and 'other' # is_identical( $foo, $bar ) =head2 Generic If-Else Expressions Grammar: ::= '(' \s* [ [if \s+ \s+ then \s+ \s+ else \s+]* | [ \s+ '??' \s+ \s+ '!!' \s+]* ] \s* ')' ::= ::= ::= An C node represents an N-way if-else control flow expression. It is interpreted as a tuple of a Muldis D C value. The whole collection of sequential 0..N C + C elements specifies the C attribute of the new C, which is a sequence of arbitrary but C-resulting I expressions, and for just the first one of those in the sequence that at runtime evaluates to C, its associated I result value is the result of the C. The C element specifies the C attribute, which determines the result value of the C at runtime if either is an empty sequence or all of its conditionals evaluate to C. Examples: (if ($foo > 5) then $bar else $baz) (if ($ary is empty) then $empty_result else ($ary[0])) (if (($x = ∅) or ($y = ∅)) then ∅ else (s ((v $x) I+ ((v $y) I^ 3))) (if ($val isa T->Int) then ($val I^ 3) else if ($val isa T->Text) then ($val Tx 5) else true) ('My answer is: ' T~ ($maybe ?? 'yes' !! 'no')) =head2 Generic Given-When-Default Expressions Grammar: ::= '(' \s* given \s+ \s+ [when \s+ \s+ then \s+ \s+]* default \s+ \s* ')' ::= ::= ::= ::= A C node represents an N-way given-when-default switch control flow expression that dispatches based on matching a single value with several options. It is interpreted as a tuple of a Muldis D C value. The C element specifies the C attribute of the new C, which is the control value for the expression. The whole collection of nonordered 0..N C + C elements specifies the C attribute, which is a set of I comparands; if any of these I values matches the value of C, its associated I result value is the result of the C. The C element specifies the C attribute, which determines the result value of the C at runtime if either is an empty set or none of its comparands match C. Examples: (given $digit when 'T' then 10 when 'E' then 11 default $digit ) =head2 Library Entity Reference Selector Grammar: ::= | | | ::= 'F->' ::= 'P->' ::= 'T->' ::= 'ODF->' A C<[func|proc|type|ord_det_func]_ref> node represents a literal or selector invocation for a DBMS routine or type reference value. It is interpreted as a tuple of a Muldis D C value, which when evaluated at runtime would result in a C value. The C<[routine|type]_name> element specifies the C attribute of the new C, which is the name of the routine or type being invoked by way of the new reference value. Examples: F->inn.filter P->inn.try_block T->inn.foo_type ODF->inn.order_bars =head1 FUNCTION INVOCATION ALTERNATE SYNTAX EXPRESSIONS Grammar: ::= | | | | | | | | ... A C node represents the result of invoking a named system-defined function with specific arguments. It is interpreted as a tuple of a Muldis D C value. A C node is a lot like a C node in purpose and interpretation but it differs in several significant ways. While a C node can be used to invoke any inner/function at all, a C node can only invoke a fraction of them, and only standard system-defined functions. While a C node uses a simple common format with all functions, written in prefix notation with generally named arguments, a C node uses potentially unique syntax for each function, often written in infix notation, although inter-function format consistency is still applied as much as is reasonably possible. One basic format commonality with all C tokens is that the entire token is delimited by a pair of parenthesis, whose conceptual purpose is to group together all the parts of the token, so there are no precedence issues to make difficulty in telling the token from its parent, no matter how complicated it is; this is like with the common grouping parenthesis in mathematical expressions. A practical purpose of this is that the delimiting parenthesis is a feature mostly unique to a C token, which almost all other grammar tokens don't have, C and C being the main exceptions, and so a parser encountering an opening parenthesis at the start of a context where a generic C is expected can be reasonably sure it is dealing with a C (or if-else or given-when) and not something else. Broadly speaking, a C node has 2-3 kinds of payload elements: The first is the determinant of what function to invoke, hereafter referred to as a I or I. The second is an ordered list of 1-N mandatory function inputs, hereafter referred to as I
, whose elements typically have generic names like C or C or C. The (optional) third is a named list of optional function inputs, hereafter referred to as I, whose elements tend to have more purpose-specific names such as C, though note that things like C can be either mandatory or optional depending on the op they are being used with. The decision of I system-defined functions get the special alternate syntax treatment partly comes down to respecting common good practices in programming languages, letting people write code more like how they're comfortable with. Most programming languages only have special syntax for a handful of their operators, such as common comparison and boolean and mathematical and string and element extraction operators, and so Muldis D mainly does likewise. Functions get special alternate syntax if they would be frequently used and the syntax would significantly aid programmers in quickly writing understandeable code. =head2 Simple Commutative N-adic Infix Reduction Operators Grammar: ::= '(' \s* ** [\s+ \s+] \s* ')' node is for using infix notation to invoke a (homogenous) commutative N-adic reduction operator function. Such a function takes exactly 1 actual argument, which is unordered-collection typed (set or bag), and the elements of that collection are the inputs of the operation; the inputs are all of the same type as each other and of the result. A single C node is equivalent to a single C node whose C element defines a single argument, whose value is a C or C node, which has a payload C element for each C element of the C, and the relative sequence of the C elements isn't significant. A C node requires at least 2 input value providing child nodes (C must match at least twice), which are its 2-N main op args; if you already have your inputs in a single collection-valued node then use C to invoke the function instead. If C matches more than once in the same C, then all of the C matches must be identical / the same operator. Some of the keywords are aliases for each other: keyword | aliases --------+-------- and | ∧ or | ∨ xor | ⊻ ↮ xnor | ↔ iff ∪ | R+ union ∩ | R* intersect ∆ | R% exclude symdiff ⋈ | join × | times 'cross join' This table indicates which function is invoked by each keyword: and -> Core.Bool.and( Set:{ $expr[0], ..., $expr[n] } ) or -> Core.Bool.or( Set:{ $expr[0], ..., $expr[n] } ) xor -> Core.Bool.xor( Bag:{ $expr[0], ..., $expr[n] } ) xnor -> Bool.xnor( Bag:{ $expr[0], ..., $expr[n] } ) I+ -> Integer.sum( Bag:{ $expr[0], ..., $expr[n] } ) I* -> Integer.product( Bag:{ $expr[0], ..., $expr[n] } ) N+ -> Rational.sum( Bag:{ $expr[0], ..., $expr[n] } ) N* -> Rational.product( Bag:{ $expr[0], ..., $expr[n] } ) ∪ -> Core.Relation.union( Set:{ $expr[0], ..., $expr[n] } ) ∩ -> Core.Relation.intersection( Set:{ $expr[0], ..., $expr[n] } ) ∆ -> Relation.exclusion( Bag:{ $expr[0], ..., $expr[n] } ) ⋈ -> Core.Relation.join( Set:{ $expr[0], ..., $expr[n] } ) × -> Core.Relation.product( Set:{ $expr[0], ..., $expr[n] } ) Examples: (true and false and true) (true or false or true) (true xor false xor true) (14 I+ 3 I+ -5) (-6 I* 2 I* 25) (4.25 N+ -0.002 N+ 1.0) (69.3 N* 15*2^6 N* 49/23) (Set:{ 1, 3, 5 } ∪ Set:{ 4, 5, 6 } ∪ Set:{ 0, 9 }) (Set:{ 1, 3, 5, 7, 9 } ∩ Set:{ 3, 4, 5, 6, 7, 8 } ∩ Set:{ 2, 5, 9 }) =head2 Simple Non-commutative N-adic Infix Reduction Operators Grammar: ::= '(' \s* ** [\s+ \s+] \s* ')' node is for using infix notation to invoke a (homogenous) non-commutative N-adic reduction operator function. Such a function takes exactly 1 actual argument, which is ordered-collection typed (array), and the elements of that collection are the inputs of the operation; the inputs are all of the same type as each other and of the result. A single C node is equivalent to a single C node whose C element defines a single argument, whose value is a C node, which has a payload C element for each C element of the C, and the C elements have the same relative sequence. A C node requires at least 2 input value providing child nodes (C must match at least twice), which are its 2-N main op args; if you already have your inputs in a single collection-valued node then use C to invoke the function instead. If C matches more than once in the same C, then all of the C matches must be identical / the same operator. Exception: with some of these, the actual C derived from this has 2 actual arguments, the first a collection and the second taking a different type of value, from the last op input list element. This table indicates which function is invoked by each keyword: [<=>] -> Core.Cat.Order.reduction( Array:{ $expr[0], ..., $expr[n] } ) B~ -> Blob.catenation( Array:{ $expr[0], ..., $expr[n] } ) T~ -> Text.catenation( Array:{ $expr[0], ..., $expr[n] } ) A~ -> Array.catenation( Array:{ $expr[0], ..., $expr[n] } ) // -> Set.Maybe.attr_or_value( Array:{ $expr[0], ..., $expr[n-1] }, value => $expr[n] ) //d -> Set.Maybe.attr_or_default( Array:{ $expr[0], ..., $expr[n-1] }, default => $expr[n] ) Examples: (same [<=>] increase [<=>] decrease) (F;'DEAD' B~ 1;'10001101' B~ F;'BEEF') ('hello' T~ ' ' T~ 'world') (Array:[ 24, 52 ] A~ Array:[ -9 ] A~ Array:[ 0, 11, 24, 7 ]) ($a // $b // 42) ($a //d $b //d T->inn.foo_type) =head2 Simple Symmetric Dyadic Infix Operators Grammar: ::= '(' \s* \s+ \s+ \s* ')' ::= '=' | '≠' | '!=' | '<>' | nand | '⊼' | '↑' | nor | '⊽' | '↓' | 'I|-|' | 'N|-|' A C node is for using infix notation to invoke a symmetric dyadic operator function. Such a function takes exactly 2 arguments, which are the inputs of the operation; the inputs are all of the same type as each other but the result might be of either that type or a different type. A single C node is equivalent to a single C node whose C element defines 2 arguments, and the 2 C elements of the C supply the values of those arguments, and which arguments get which C isn't significant. Some of the keywords are aliases for each other: keyword | aliases --------+-------- ≠ | != <> nand | ⊼ ↑ nor | ⊽ ↓ This table indicates which function is invoked by each keyword: = -> Core.Universal.is_identical( $expr[0], $expr[1] ) ≠ -> Core.Universal.is_not_identical( $expr[0], $expr[1] ) nand -> Bool.nand( $expr[0], $expr[1] ) nor -> Bool.nor( $expr[0], $expr[1] ) I|-| -> Integer.abs_diff( $expr[0], $expr[1] ) N|-| -> Rational.abs_diff( $expr[0], $expr[1] ) Examples: ($foo = $bar) ($foo ≠ $bar) (false nand true) (15 I|-| 17) (7.5 N|-| 9.0) =head2 Simple Non-symmetric Dyadic Infix Operators Grammar: ::= '(' \s* \s+ \s+ \s* ')' ::= ::= ::= | isa | !isa | as | asserting | imp | '→' | implies | nimp | '↛' | if | '←' | nif | '↚' | 'I-' | 'I/' | '%' | mod | 'I^' | 'N-' | 'N/' | Bx | Tx | Ax | '∈' | '∉' | '∋' | '∌' | 'S∈' | 'S∉' | 'S∋' | 'S∌' | 'B∈' | 'B∉' | 'B∋' | 'B∌' | '⊆' | '⊈' | '⊇' | '⊉' | '⊂' | '⊄' | '⊃' | '⊅' | '∖' | 'R-' | minus | except | '⊿' | 'not matching' | antijoin | semiminus | '⋉' | matching | semijoin | '÷' | 'R/' | divideby | like | 'not like' A C node is for using infix notation to invoke a non-symmetric dyadic operator function. Such a function takes exactly 2 arguments, which are the inputs of the operation; the inputs and the result may possibly be all of the same type, or they might all be of different types. A single C node is equivalent to a single C node whose C element defines 2 arguments, and the 2 C elements of the C supply the values of those arguments, which are associated in the appropriate sequence. Some of the keywords are aliases for each other: keyword | aliases --------+-------- imp | → implies nimp | ↛ if | ← nif | ↚ % | mod ∖ | R- minus except ⊿ | 'not matching' antijoin semiminus ⋉ | matching semijoin ÷ | R/ divideby Currently the alternate syntaxes for 20 functions, those testing set membership or sub/superset relationships, only come in versions that use trans-ASCII characters; if you are stuck using plain ASCII then you'll just have to use the generic function invocation syntax to invoke them for now. Plain ASCII infix syntax that is reasonable is yet to be determined. This table indicates which function is invoked by each keyword: isa -> Core.Universal.is_value_of_type( $lhs, type => $rhs ) !isa -> Core.Universal.is_not_value_of_type( $lhs, type => $rhs ) as -> Core.Universal.treated( $lhs, as => $rhs ) asserting -> Core.Universal.assertion( $lhs, is_true => $rhs ) imp -> Bool.imp( $lhs, $rhs ) nimp -> Bool.nimp( $lhs, $rhs ) if -> Bool.if( $lhs, $rhs ) nif -> Bool.nif( $lhs, $rhs ) I- -> Integer.diff( minuend => $lhs, subtrahend => $rhs ) I/ -> Integer.quotient( dividend => $lhs, divisor => $rhs ) % -> Integer.remainder( dividend => $lhs, divisor => $rhs ) I^ -> Integer.power( radix => $lhs, exponent => $rhs ) N- -> Rational.diff( minuend => $lhs, subtrahend => $rhs ) N/ -> Rational.quotient( dividend => $lhs, divisor => $rhs ) Bx -> Blob.replication( $lhs, count => $rhs ) Tx -> Text.replication( $lhs, count => $rhs ) Ax -> Array.replication( $lhs, count => $rhs ) ∈ -> Core.Tuple.is_member( t => $lhs, r => $rhs ) ∉ -> Core.Tuple.is_not_member( t => $lhs, r => $rhs ) ∋ -> Core.Relation.has_member( r => $lhs, t => $rhs ) ∌ -> Core.Relation.has_not_member( r => $lhs, t => $rhs ) S∈ -> Set.value_is_member( value => $lhs, set => $rhs ) S∉ -> Set.value_is_not_member( value => $lhs, set => $rhs ) S∋ -> Set.has_member( set => $lhs, value => $rhs ) S∌ -> Set.has_not_member( set => $lhs, value => $rhs ) B∈ -> Bag.value_is_member( value => $lhs, bag => $rhs ) B∉ -> Bag.value_is_not_member( value => $lhs, bag => $rhs ) B∋ -> Bag.has_member( bag => $lhs, value => $rhs ) B∌ -> Bag.has_not_member( bag => $lhs, value => $rhs ) ⊆ -> Core.Relation.is_subset( $lhs, $rhs ) ⊈ -> Core.Relation.is_not_subset( $lhs, $rhs ) ⊇ -> Core.Relation.is_superset( $lhs, $rhs ) ⊉ -> Core.Relation.is_not_superset( $lhs, $rhs ) ⊂ -> Relation.is_proper_subset( $lhs, $rhs ) ⊄ -> Relation.is_not_proper_subset( $lhs, $rhs ) ⊃ -> Relation.is_proper_superset( $lhs, $rhs ) ⊅ -> Relation.is_not_proper_superset( $lhs, $rhs ) ∖ -> Core.Relation.diff( source => $lhs, filter => $rhs ) ⊿ -> Core.Relation.semidiff( source => $lhs, filter => $rhs ) ⋉ -> Core.Relation.semijoin( source => $lhs, filter => $rhs ) ÷ -> Core.Relation.quotient( dividend => $lhs, divisor => $rhs ) like -> sys.std.Text.is_like( look_in => $lhs, look_for => $rhs ) not like -> sys.std.Text.is_not_like(look_in => $lhs, look_for => $rhs) Note that while the C functions also have an optional third parameter C, you will have to use a C node to exploit it; for simplicity, the infix C and C don't support that customization; but most actual uses of like/etc don't use C anyway. Examples: ($bar isa T->inn.foo_type) ($bar !isa T->inn.foo_type) ($scalar as T->Int) ($int asserting ($int ≠ 0)) (true implies false) (34 I- 21) (5 I/ 3) (5 % 3) (2 I^ 63) (9.2 N- 0.1) (1;101.01 N/ 1;11.0) ('-' Tx 80) (Set:{ 8, 4, 6, 7 } ∖ Set:{ 9, 0, 7 }) (Relation:[ x, y ];{ [ 5, 6 ], [ 3, 6 ] } ÷ Relation:{ { y => 6 } }) =head2 Simple Monadic Prefix Operators Grammar: ::= '(' \s* \s+ \s* ')' ::= d | not | ¬ | 'I||' | 'N||' | 'R#' | t | r | s | v A C node is for using prefix notation to invoke a monadic operator function. Such a function takes exactly 1 argument, which is the input of the operation. A single C node is equivalent to a single C node whose C element defines 1 argument, and the 1 C element of the C supplies the value of that argument. Some of the keywords are aliases for each other: keyword | aliases --------+-------- not | ¬ This table indicates which function is invoked by each keyword: d -> Core.Universal.default( of => $expr ) not -> Core.Bool.not( $expr ) I|| -> Integer.abs( $expr ) N|| -> Rational.abs( $expr ) R# -> Core.Relation.cardinality( $expr ) t -> Core.Relation.Tuple_from_Relation( $expr ) r -> Core.Relation.Relation_from_Tuple( $expr ) s -> Set.Maybe.single( value => $expr ) v -> Set.Maybe.attr( $expr ) Examples: (d T->inn.foo_type) (not true) (I|| -23) (N|| -4.59) (R# Set:{ 5, -1, 2 }) (t $relvar) (r $tupvar) (s ((v $a) N+ (v $b))) =head2 Simple Monadic Postfix Operators Grammar: ::= '(' \s* \s+ \s* ')' ::= '++' | '--' | '!' A C node is for using prefix notation to invoke a monadic operator function. Such a function takes exactly 1 argument, which is the input of the operation. A single C node is equivalent to a single C node whose C element defines 1 argument, and the 1 C element of the C supplies the value of that argument. This table indicates which function is invoked by each keyword: ++ -> Integer.inc( $expr ) -- -> Integer.dec( $expr ) ! -> Integer.factorial( $expr ) Examples: (13 ++) (4 --) (5 !) =head2 Rational Operators That Do Rounding Grammar: ::= '(' \s* \s+ \s* ')' ::= | | | ::= \s+ \s+ ::= 'N^' | log ::= \s+ 'e^' ::= \s+ loge ::= round \s+ ::= A C node is for using infix or prefix or postfix notation to invoke a rational numeric operator function whose operation involve rounding a number to one with less precision. Such a function takes exactly 1 (C) or 2 (C and C) primary arguments, which are the inputs of the operation, plus a special C argument which specifies explicitly the semantics of the numeric rounding in a declarative way (all 3 of these are I
). A single C node is equivalent to a single C node whose C element defines 2-3 arguments, and the C elements of the C supply the values of those arguments, which are associated in the appropriate sequence. This table indicates which function is invoked by each keyword: -> Rational.round( $expr, round_rule => $round_rule ) N^ -> Rational.power( radix => $lhs, exponent => $rhs, round_rule => $round_rule ) log -> Rational.log( $lhs, radix => $rhs, round_rule => $round_rule ) e^ -> Rational.natural_power( $expr, round_rule => $round_rule ) loge -> Rational.natural_log( $expr, round_rule => $round_rule ) Examples: ($foo round RatRoundRule:[10,-2,half_even]) (2.0 N^ 0.5 round RatRoundRule:[2,-7,to_zero]) (309.1 log 5.4 round RatRoundRule:[10,-4,half_up]) (e^ 6.3 round RatRoundRule:[10,-6,to_ceiling]) (17.0 loge round RatRoundRule:[3,-5,to_floor]) =head2 Order Comparison Operators Grammar: ::= '(' \s* [ | ] \s* ')' ::= \s+ '<=>' \s+ [\s+ ]? [\s+ ]? ::= [ | | ] [\s+ ]? ::= ** [\s+ \s+] ::= \s+ \s+ ::= | ::= '<' | '≤' | '<=' ::= '>' | '≥' | '>=' ::= [not \s+]? \s+ \s+ \s+ \s+ ::= ::= ::= ordered \s+ | An C node is for using infix notation to invoke an order comparison operator function. Such a function takes exactly 2 (C and C) or 3 (C and C and C) or N/2+ (C) main op args, which are the inputs of the operation, plus 2 extra op args (C and C for the C<< <=> >> op, or C and C for any other op) which let you customize the semantics of the operation. A single C node is equivalent to a single C node whose C element defines 2-N arguments, and the C elements of the C supply the values of those arguments, which are associated in the appropriate sequence, except for the N-adic operators which are commutative (and associative and idempotent) so the relative order of the main op args isn't significant. Details on the extra op args are pending. Some of the keywords are aliases for each other: keyword | aliases --------+-------- ≤ | <= ≥ | >= ≤≤ | <=<= ≤< | <=< <≤ | <<= !≤≤ | !<=<= !≤< | !<=< !<≤ | !<<= This table indicates which function is invoked by each keyword: <=> -> Core.Scalar.order( $lhs, $rhs ) min -> Ordered.min( Set:{ $expr[0], ..., $expr[n] } ) max -> Ordered.max( Set:{ $expr[0], ..., $expr[n] } ) < -> Ordered.is_before( $lhs, $rhs ) > -> Ordered.is_after( $lhs, $rhs ) ≤ -> Ordered.is_before_or_same( $lhs, $rhs ) ≥ -> Ordered.is_after_or_same( $lhs, $rhs ) ≤≤ -> Ordered.is_inside_range( $expr, min => $min, max => $max ) ≤< -> Ordered.is_inside_range( $expr, min => $min, max => $max max_is_outside => true ) <≤ -> Ordered.is_inside_range( $expr, min => $min, max => $max min_is_outside => true ) << -> Ordered.is_inside_range( $expr, min => $min, max => $max min_is_outside => true, max_is_outside => true ) !≤≤ -> Ordered.is_outside_range( $expr, min => $min, max => $max ) !≤< -> Ordered.is_outside_range( $expr, min => $min, max => $max max_is_outside => true ) !<≤ -> Ordered.is_outside_range( $expr, min => $min, max => $max min_is_outside => true ) !<< -> Ordered.is_outside_range( $expr, min => $min, max => $max min_is_outside => true, max_is_outside => true ) Details regarding the extra op args is pending. But most of the time you wouldn't be using them, so just the main args represents typical usage. Examples (for now sans any use of extra op args, which are atypical): ($foo <=> $bar) ($a min $b min $c) ($a max $b max $c) ($foo < $bar) ($foo > $bar) ($foo ≤ $bar) ($foo ≥ $bar) ($min ≤ $foo ≤ $max) ($min ≤ $foo < $max) (not $min < $foo ≤ $max) (not $min < $foo < $max) =head1 DEPOT DECLARATION B Grammar: ::= depot \s+ '{' \s* ? \s* '}' =head1 BOOTLOADER STATEMENT B Grammar: ::= | ... ::= boot_stmt \s* ':' \s* \s* ':' \s* \s* ':' \s* ::= ::= '{' \s* [[ \s* '=>' \s* ] ** [\s* ',' \s*]]? \s* '}' ::= Examples: boot_stmt:sys.std.Core.Cat.create_depot_material:{}:{ ... } =head1 MULDIS D TINY DIALECT PRAGMAS B =head1 SEE ALSO Go to L for the majority of distribution-internal references, and L for the majority of distribution-external references. =head1 AUTHOR Darren Duncan (C) =head1 LICENSE AND COPYRIGHT This file is part of the formal specification of the Muldis D language. Muldis D is Copyright © 2002-2009, Muldis Data Systems, Inc. See the LICENSE AND COPYRIGHT of L for details. =head1 TRADEMARK POLICY The TRADEMARK POLICY in L applies to this file too. =head1 ACKNOWLEDGEMENTS The ACKNOWLEDGEMENTS in L apply to this file too. =cut