=pod =encoding utf8 =head1 NAME Muldis::D::Core::Types - Muldis D general purpose data types =head1 VERSION This document is Muldis::D::Core::Types version 0.99.0. =head1 PREFACE This document is part of the Muldis D language specification, whose root document is L; you should read that root document before you read this one, which provides subservient details. Moreover, you should read the L document before this current document, as that forms its own tree beneath a root document branch. =head1 DESCRIPTION This document contains one or more sections that were moved here from L so that said other document would not be too large. =head1 TYPE SUMMARY Following are all the data types and data type factories described in this document, arranged in a type graph according to their proper sub|supertype relationships: sys.std.Core.Type.Universal sys.std.Core.Type.Empty sys.std.Core.Type.Scalar sys.std.Core.Type.DHScalar sys.std.Core.Type.Cat.OVLScalar # The following are all regular ordered scalar types. sys.std.Core.Type.Bool sys.std.Core.Type.Int sys.std.Core.Type.NNInt sys.std.Core.Type.PInt sys.std.Core.Type.PInt2_N sys.std.Core.Type.Blob sys.std.Core.Type.OctetBlob sys.std.Core.Type.Text sys.std.Core.Type.Rat sys.std.Core.Type.NNRat sys.std.Core.Type.PRat sys.std.Core.Type.Instant sys.std.Core.Type.Duration # The following are mostly nonscalar type factories. sys.std.Core.Type.Tuple sys.std.Core.Type.DHTuple sys.std.Core.Type.Database sys.std.Core.Type.Interval sys.std.Core.Type.DHInterval sys.std.Core.Type.Relation sys.std.Core.Type.DHRelation sys.std.Core.Type.Set sys.std.Core.Type.DHSet sys.std.Core.Type.Maybe sys.std.Core.Type.DHMaybe sys.std.Core.Type.Single sys.std.Core.Type.DHSingle sys.std.Core.Type.Array sys.std.Core.Type.DHArray sys.std.Core.Type.Bag sys.std.Core.Type.DHBag # The following are all reference types. sys.std.Core.Type.Reference sys.std.Core.Type.External Note that C is a proper subtype of all of the other types in this graph, but every other type has only one immediate supertype shown, and hence the graph of them is a simple hierarchy. Similarly, most C subtypes have at least 2 parent types; the above graph shows one view of their relationships, and here is another view of those: sys.std.Core.Type.Universal sys.std.Core.Type.Empty # The following are mostly nonscalar type factories. sys.std.Core.Type.Relation sys.std.Core.Type.DHRelation sys.std.Core.Type.DHSet sys.std.Core.Type.DHMaybe sys.std.Core.Type.DHSingle sys.std.Core.Type.DHArray sys.std.Core.Type.DHBag sys.std.Core.Type.Set sys.std.Core.Type.Maybe sys.std.Core.Type.Single sys.std.Core.Type.Array sys.std.Core.Type.Bag =head1 SYSTEM-DEFINED CORE MAXIMAL AND MINIMAL DATA TYPES These core data types are special and are the only Muldis D types that are neither scalar nor nonscalar nor reference types. They are all system-defined and it is impossible for users to define more types of this nature. =head2 sys.std.Core.Type.Universal This is an enumeration data type. The C type is the maximal type of the entire Muldis D type system, and contains every value that can possibly exist. Every other (non-aliased) type is implicitly a proper subtype of C, and C is implicitly a union type over all other types. Its default value is C. The cardinality of this type is infinity. =head2 sys.std.Core.Type.Empty This is an enumeration data type. The C type is the minimal type of the entire Muldis D type system, and is the only type that contains exactly zero values. Every other (non-aliased) type is implicitly a proper supertype of C and C is implicitly an intersection type over all other types. It has no default value. The cardinality of this type is zero. =head1 SYSTEM-DEFINED CORE SCALAR DATA TYPES These core scalar data types are the most fundamental Muldis D types. Plain Text Muldis D provides a specific syntax per type to select a value of every one of these types (or of their super/subtypes), which does not look like a routine invocation, but rather like a scalar literal in a typical programming language; details of that syntax are not given here, but in L. Hosted Data Muldis D as hosted in another language will essentially use literals of corresponding host language types, whatever they use for eg booleans and integers and character strings, but tagged with extra meta-data if the host language is more weakly typed or lacks one-to-one type correspondence; see L or L for a Perl 6|5-based example. These types, except for C and C, are all ordered. =head2 sys.std.Core.Type.Scalar This is an enumeration data type. The C type is the maximal type of all Muldis D scalar types, and contains every scalar value that can possibly exist. Every other (non-aliased) scalar type is implicitly a proper subtype of C, and C is implicitly a union type over all other scalar types. Its default value is C. The cardinality of this type is infinity. =head2 sys.std.Core.Type.DHScalar This is an enumeration data type. C is a proper subtype of C where every one of its possreps' attributes is restricted to be of just certain categories of data types, rather than allowing any data types at all; related to this restriction, any dh-scalar value is allowed to be stored in a global/persisting relational database but any other scalar value may only be used for transient data. The C type is the maximal type of all Muldis D dh-scalar types, and contains every dh-scalar value that can possibly exist. Every other (non-aliased) dh-scalar type is implicitly a proper subtype of C, and C is implicitly a union type over all other dh-scalar types. Its default value is C. The cardinality of this type is infinity. =head2 sys.std.Core.Type.Bool This is a structure data type. C consists of just the 2 values C and C. A C represents a truth value, and is the result type of any C or C routine; it is the only essential general-purpose scalar data type of a generic B language, although not the only essential one in Muldis D. A C has 2 system-defined possreps, named C and C. The C possrep directly matches the conception of the type as consisting of 2 character string values; it consists of 1 C-typed attribute whose name is the empty string. The C possrep consists of 1 C-typed attribute whose name is the empty string and whose value must be one of [C<0>, C<1>]; the 2 values of each possrep correspond in the same order as they are documented here. The default and minimum value of C is C; its maximum value is C. The cardinality of this type is 2. The C type has a default ordering algorithm that corresponds directly to that of its C possrep attribute; C is ordered before C. The C type has an implementation hint for less intelligent Muldis D implementations, that suggests using the C possrep as the basis for the physical representation. The value C is also known as C and I and C<⊥>. The value C is also known as C and I and C<⊤>. =head2 sys.std.Core.Type.Int This is a structure data type. An C is a single exact integral number of any magnitude. An C has 1 system-defined possrep whose name is the empty string, which has of 1 C-typed attribute whose name is the empty string. Its default value is zero; its minimum and maximum values are conceptually infinities and practically impossible. The cardinality of this type is infinity; to define a most-generalized finite C subtype, you must specify the 2 integer end-points of the inclusive range that all its values are in. The C type has a default ordering algorithm; for 2 distinct C values, the value closer to negative infinity is ordered before the value closer to positive infinity. =head2 sys.std.Core.Type.NNInt This is an enumeration data type. C (non-negative integer) is a proper subtype of C where all member values are greater than or equal to zero. Its minimum value is zero. =head2 sys.std.Core.Type.PInt This is an enumeration data type. C (positive integer) is a proper subtype of C where all member values are greater than zero. Its default and minimum value is 1. =head2 sys.std.Core.Type.PInt2_N This is an enumeration data type. C is a proper subtype of C where all member values are greater than 1. Its default and minimum value is 2. =head2 sys.std.Core.Type.Blob This is a structure data type. A C is an undifferentiated string of bits. A C has 1 system-defined possrep named C which consists of 1 C-typed attribute whose name is the empty string; each element of C is either C<0> to represent a low bit or C<1> to represent a high bit. A C is a simple wrapper for a C and all of its other details such as default and minimum and maximum values and cardinality and default ordering algorithm all correspond directly. But C is explicitly disjoint from C due to having a different intended interpretation. =head2 sys.std.Core.Type.OctetBlob This is an enumeration data type. C is a proper subtype of C where all member values have a length in bits that is an even multiple of 8 (or is zero). An C adds 1 system-defined possrep named C which consists of 1 C-typed attribute whose name is the empty string. The C and C possreps correspond as you might expect, such that each element of the sole attribute of C maps to 8 consecutive elements of the sole attribute of C; with each 8 bits corresponding to an octet, the lowest-element-indexed bit corresponds to the highest bit of the octet when the latter is encoded as a standard two's complement binary unsigned integer, and the highest-element-indeed bit corresponds to the lowest bit of the octet. The reason the C type is system-defined as distinct from C is for convenience of users since it is likely the vast majority of C values consist of whole octets and users would want to work with them in those terms. =head2 sys.std.Core.Type.Text This is a structure data type. A C is a string of Unicode abstract characters which is formatted as a sequence of Unicode abstract codepoints in canonical decomposed normal form (NFD). Two C will generally match at the grapheme abstraction level. Of course, a Muldis D implementation doesn't actually have to store character data in NFD; but default matching semantics need to be as if it did. A C has 1 system-defined possrep named C which consists of 1 C-typed attribute whose name is the empty string; each element of C represents a Unicode standard version 5.1.0 character abstract codepoint number. A C is a simple wrapper for a C and all of its other details such as default and minimum and maximum values and cardinality and default ordering algorithm (sorting is numeric by abstract codepoint number) all correspond directly. But C is explicitly disjoint from C due to having a different intended interpretation. In regards to ordering, possibly the standard Unicode Collation Algorithm (UCA) also works this way, assuming it is totally ordered, but that's unsure. I =head2 sys.std.Core.Type.Rat This is a structure data type. A C (scalar) is a single exact rational number of any magnitude and precision. It is conceptually a composite type with 2 main system-defined possreps, called C and C, both of which are defined over several C. The C possrep consists of 2 attributes: C (an C), C (a C); the conceptual value of a C is the result of rational-dividing its C by its C. Because in the general case there are an infinite set of [C,C] integer pairs that denote the same rational value, the C possrep carries the normalization constraint that C and C must be coprime, that is, they have no common integer factors other than 1. The C possrep consists of 3 attributes: C (an C), C (a C), C (an C); the conceptual value of a C is the result of multiplying its C by the result of taking its C to the power of its C. The C possrep carries the normalization constraint that among all the [C,C,C] triples which would denote the same rational value, the only allowed triple is the one having both the C with the lowest value (that is closest to or equal to 2) and the C with the highest value (that is closest to positive infinity). I The default value of C is zero; its minimum and maximum values are conceptually infinities and practically impossible. The cardinality of this type is infinity; to define a most-generalized finite C subtype, you must specify the greatest magnitude value denominator, plus the 2 integer end-points of the inclusive range of the value numerator; or alternately you must specify the greatest magnitude value mantissa (the I of the number), and specify the greatest magnitude value radix, plus the 2 integer end-points of the inclusive range of the value exponent (the I of the number). Common subtypes specify that the normalized radixes of all their values are either 2 or 10; types such as these will easily map exactly to common human or physical numeric representations, so they tend to perform better. The C type has a default ordering algorithm which is conceptually the same as for C; for 2 distinct C values, the value closer to negative infinity is ordered before the value closer to positive infinity. The C type has an implementation hint for less intelligent Muldis D implementations, that suggests using the C possrep as the basis for the physical representation. =head2 sys.std.Core.Type.NNRat This is an enumeration data type. C (non-negative rational) is a proper subtype of C where all member values are greater than or equal to zero (that is, the C|C is greater than or equal to zero). Its minimum value is zero. =head2 sys.std.Core.Type.PRat This is an enumeration data type. C (positive rational) is a proper subtype of C where all member values are greater than zero (that is, the C|C is greater than zero). Its default and minimum value is 1. =head2 sys.std.Core.Type.Instant This is a structure data type. An C is a single point in time which is specified with arbitrary precision in terms of atomic seconds with fractions. That is, an C is defined as a point on the canonical continuous timeline of International Atomic Time (TAI; this is a perfectly linear scale with no discontinuities), specified by a scalar number of TAI seconds since the TAI epoch, which is exactly midnight at the start of January 1st of the year 1958 CE. Put another way, the C type is intended to have exactly the same meaning as the same-named type of Perl 6 (see L for details). An C has 1 system-defined possrep named C which consists of 1 C-typed attribute named C. An C is a simple wrapper for a C and all of its other details such as default and minimum and maximum values and cardinality and default ordering algorithm all correspond directly. But C is explicitly disjoint from C due to having a different intended interpretation. The C type is intended more for use with system event time-stamps or sensitive scientific applications and is not necessarily the best choice for common human-specified temporal artifacts according to various calendars, since there is no fixed conversion rate between them in the general case that includes future dates, and also calendar-based artifacts may be very non-specific; see also the L for a selection of other temporal data types defined in terms of calendars. =head2 sys.std.Core.Type.Duration This is a structure data type. A C is a single amount of time, which is specified with arbitrary precision in terms of the same units as an C is structured with. A C is not fixed to any point in time. A C is the result type of taking the difference between two C values, but it is not defined in terms of said two values. The C type is intended to have exactly the same meaning as the same-named type of Perl 6, as per C. A C has 1 system-defined possrep named C which consists of 1 C-typed attribute named C. A C is a simple wrapper for a C in all ways as per C. But C is explicitly disjoint from C due to having a different intended interpretation. The C type is intended more for benchmarking or scientific applications and is not intended for human-specified calendar based artifacts; again see the I for alternatives. =head1 SYSTEM-DEFINED CORE NONSCALAR DATA TYPES These core nonscalar data types permit transparent/user-visible compositions of multiple values into other conceptual values. For all nonscalar types, their cardinality is mainly or wholly dependent on the data types they are composed of. =head2 sys.std.Core.Type.Tuple This is a primitive data type. The C type is the maximal type of all Muldis D tuple (nonscalar) types, and contains every tuple value that could possibly exist. A C is an unordered heterogeneous collection of 0..N named attributes (the count of attributes being its I), where all attribute names are mutually distinct, and each attribute may be of distinct types; the mapping of a tuple's attribute names and their declared data types is called the tuple's I. Its default value is the sole tuple value that has zero attributes. The cardinality of a I C type (if it has no type constraints other than those of its constituent attribute types) is equal to the product of the N-adic multiplication where there is an input to that multiplication for each attribute of the tuple and the value of the input is the cardinality of the declared type of the attribute; for a C subtype to be finite, all of its attribute types must be. =head2 sys.std.Core.Type.DHTuple This is an enumeration data type. C is a proper subtype of C where every one of its attributes is restricted to be of just certain categories of data types, rather than allowing any data types at all; related to this restriction, any dh-tuple value is allowed to be stored in a global/persisting relational database but any other tuple value may only be used for transient data. The C type is the maximal type of all Muldis D dh-tuple (dh-nonscalar) types, and contains every dh-tuple value that could possibly exist. Its default value is the same as that of C and matters of its cardinality are determined likewise. The only member value of C that has exactly zero attributes is also known by the special name C aka C, which serves as the default value of the 3 types C<[|DH]Tuple> and C. =head2 sys.std.Core.Type.Database This is an enumeration data type. C is a proper subtype of C where all of its attributes are each of dh-relation types or of database types (the leaves of this recursion are all dh-relation types); it is otherwise the same. The 3 system-defined user-data variables named C<[fed|dep|sdp].data> are all of "just" the C type, or are of its proper subtypes. =head2 sys.std.Core.Type.Interval This is an enumeration data type. An C is a C. It defines a single I or I in terms of 2 I values plus an indicator of whether either, both, or none of the endpoint values are included in the interval. An C has these 4 attributes: =over =item C - C These are the interval endpoint values; C defines the I endpoint and C defines the I endpoint. The endpoint values conceptually must be of the same, totally-ordered type (typically one of C, C, C, C, etc), although strictly speaking they may be of any types at all; in the latter case, to actually make practical use of such intervals, an C function must explicitly be employed. =item C - C If C or C are C, then C or C I considered to be included within the interval, respectively; otherwise, it I considered to be included within the interval. If both endpoints are within the interval (the use case which Muldis D optimizes its syntax for), the interval is I; otherwise if both endpoints are not in the interval, the interval is I. =back The C type supports empty intervals (which include no values at all) at least as a matter of simplicity in that it doesn't place any restrictions on the combination of attribute values an C value may have, such as that C can't be before C. This liberal design is also necessary to support the general case where the relative order of the C and C values is situation-dependent on what C function is used with the interval; that function also determines what type's concept of order is being applied, and so it also determines whether or not a given interval is considered empty or not. With respect to each compatible C function, an C is considered empty iff at least one of the following is true: 1. Its C is greater than its C. 2. Its C is equal to its C I at least one of C or C is true. 3. Both C and C are true I C and C are consecutive values. And so, there are many distinct C values that are conceptually empty intervals, and the C function should not be used to test an C for being empty or not. The default value of C represents an empty interval where its C attribute is C and its other 3 attributes are C. The C type has no support for I/I or I intervals that is orthogonal to data type. However, if there are any types with their own special values to represent infinities, then those special values can be uded for endpoints of intervals over those types. I The C type only represents a continuous interval, but a discontinuous interval may be effectively represented by a set of C values, either a C or a C. See also the L extension. =head2 sys.std.Core.Type.DHInterval This is an enumeration data type. C is a proper subtype of C where every one of its values is also a C. In general practice, all C values are C values, because their endpoints would all be C values. The default value of C is the same as that of C. =head2 sys.std.Core.Type.Relation This is a primitive data type. The C type is the maximal type of all Muldis D relation (nonscalar) types, and contains every relation value that could possibly exist. A C is analogous to a set of 0..N tuples where all tuples have the same heading (the degrees match and all attribute names, and typically corresponding declared data types, match), but that a C data type still has its own corresponding heading (attribute names and declared data types) even when it consists of zero tuples. Its default value is the sole relation value that has zero tuples and zero attributes. The cardinality of a I C type (if it has no type constraints other than those of its constituent attribute types) is equal to 2 raised to the power of the cardinality of the I C type with the same heading. A relation data type can also have (unique) keys each defined over a subset of its attributes, which constrain its set of values relative to there being no explicit keys, but having the keys won't turn an infinite relation type into a finite one. =head2 sys.std.Core.Type.DHRelation This is an enumeration data type. C is a proper subtype of C where every one of its attributes is restricted to be of just certain categories of data types, rather than allowing any data types at all; related to this restriction, any dh-relation value is allowed to be stored in a global/persisting relational database but any other relation value may only be used for transient data. The main difference from its supertype is that a dh-relation's dh-tuples' headings all have matching declared data types for corresponding attributes, while with relations they don't have to. The C type is the maximal type of all Muldis D dh-relation (dh-nonscalar) types, and contains every dh-relation value that could possibly exist. Its default value is the same as that of C and matters of its cardinality are determined likewise. The only member value of C that has exactly zero attributes and exactly zero tuples is also known by the special name C aka C, which serves as the default value of the 2 types C<[|DH]Relation>. The only member value of C that has exactly zero attributes and exactly one tuple is also known by the special name C aka C. Note that I also refers to these 2 values by the special shorthand names I and I, respectively. =head2 sys.std.Core.Type.Set This is an enumeration data type. C is a proper subtype of C that has 1 attribute, and its name is C; it can be of any declared type. A C subtype is normally used by any system-defined N-adic operators where the order of their argument elements or result is not significant, and that duplicate values are not significant. Its default value has zero tuples. Note that, for any given C subtype, C, where its C attribute has a declared type of C, the type C can be considered the I of the type C. =head2 sys.std.Core.Type.DHSet This is an enumeration data type. C is the intersection type of C and C. The cardinality of this type is infinite. =head2 sys.std.Core.Type.Maybe This is an enumeration data type. C is a proper subtype of C where all member values may have at most one element; that is, it is a unary C with a nullary key. Operators that work specifically with C subtypes can provide a syntactic shorthand for working with sparse data; so Muldis D has something which is conceptually close to SQL's nullable types without actually having 3-valued logic; it would probably be convenient for code that round-trips SQL by way of Muldis D to use the C type. Its default value has zero tuples. =head2 sys.std.Core.Type.DHMaybe This is an enumeration data type. C is the intersection type of C and C. The cardinality of this type is infinite. The only member value of C that has exactly zero elements is also known by the special name C, aka C, aka I, aka C<∅>, which serves as the default value of the 4 types C<[|DH]Maybe> and C<[|DH]Set>. The single C value, which is a relation with zero tuples and a single attribute named C, is Muldis D's answer to the SQL NULL and is intended to be used for the same purposes; that is, a special marker for missing or inapplicable information, that does not typically equal any normal/scalar value; however, in Muldis D, C I, and it I equal to itself. To be more specific, the SQL NULL is very limited in what it actually can do, and can not be used to say anything other than "this isn't a normal value", similar to what Perl's "undef" says; if you want to actually indicate a reason why we don't have a normal value when more than one reason could possibly apply in the context, then using simply C or SQL's NULL can't do it, and instead you'll have to use other normal values such as status flags to keep the appropriate metadata. =head2 sys.std.Core.Type.Single This is an enumeration data type. C is a proper subtype of C where all member values have exactly 1 element. Its default value's only tuple's only attribute has the value C. The C type consists of all of C's values except C. =head2 sys.std.Core.Type.DHSingle This is an enumeration data type. C is the intersection type of C and C. Subtypes of C are also used to implement data-carrying database objects that are conceptually scalars rather than relations; for example, the current state of a sequence generator might typically be one. The cardinality of this type is infinite. =head2 sys.std.Core.Type.Array This is an enumeration data type. C is a proper subtype of C that has 2 attributes, and their names are C and C, where C is a unary primary key and its declared type is a C subtype (C can be non-unique and of any declared type). An C is considered dense, and all C values in one are numbered consecutively from 0 to 1 less than the count of tuples, like array indices in typical programming languages. An C subtype is normally used by any system-defined N-adic operators where the order of their argument elements or result is significant (and duplicate values are significant); specifically, C defines an explicit ordering for C. Its default value has zero tuples. =head2 sys.std.Core.Type.DHArray This is an enumeration data type. C is the intersection type of C and C. The cardinality of this type is infinite. =head2 sys.std.Core.Type.Bag This is an enumeration data type. C (or I) is a proper subtype of C that has 2 attributes, and their names are C and C, where C is a unary primary key (that can have any declared type) and C is a C subtype. A C subtype is normally used by any system-defined N-adic operators where the order of their argument elements or result is not significant, but that duplicate values are significant; specifically, C defines an explicit count of occurrences for C, also known as that value's I. Its default value has zero tuples. =head2 sys.std.Core.Type.DHBag This is an enumeration data type. C is the intersection type of C and C. The cardinality of this type is infinite. =head1 SYSTEM-DEFINED REFERENCE TYPES These are the core reference data types. =head2 sys.std.Core.Type.Reference This is an enumeration data type. The C type is the maximal type of all Muldis D reference types. Its default value is a reference to the C data type by way of its C subtype. The cardinality of this type is infinity. =head2 sys.std.Core.Type.External This is a reference data type. An C is a reference within the Muldis D virtual machine to a value managed not by the Muldis D implementation but rather by a peer or host language in the wider program that includes the VM. All C values are treated as black boxes by Muldis D itself. The default value of this type is implementation-defined. =head1 SEE ALSO Go to L for the majority of distribution-internal references, and L for the majority of distribution-external references. =head1 AUTHOR Darren Duncan (C) =head1 LICENSE AND COPYRIGHT This file is part of the formal specification of the Muldis D language. Muldis D is Copyright © 2002-2009, Muldis Data Systems, Inc. See the LICENSE AND COPYRIGHT of L for details. =head1 TRADEMARK POLICY The TRADEMARK POLICY in L applies to this file too. =head1 ACKNOWLEDGEMENTS The ACKNOWLEDGEMENTS in L apply to this file too. =cut