Muldis::D TODO --------------------------------------------------------------------------- Following is a summary of things that still need doing. It is specific to the Muldis D specification distribution only, and doesn't talk about things that would go in other distributions, including implementations. (But, look at lib/Muldis/D/SeeAlso.pod for a list of actual or possible implementations.) Alternately, this list deals with possible ideas to explore, which may or may not be good ideas to pursue. The following list is loosely ordered by priority, and is organized into groups by approximate subject area, but list items may actually be addressed in a different order. There is no specific time table for these items; they are simply to be done "as soon as possible". * Generally speaking, make a new release to CPAN once every week, assuming the progress is non-trivial, so there are regular public snapshots with nicely rendered documentation. ---------- * Update the development status of the Muldis D language spec to "alpha" from "pre-alpha" (but don't set the version to 1.0.0 yet) only when the Muldis Rosetta Example Engine reference implementation has fully implemented the language core, or a significant and computationally complete working subset thereof, and so the language spec is then considered sufficiently complete with corner cases exposed; Muldis Rosetta would also be updated to "alpha" status et al simultaneously. There are no other preconditions to consider either project "alpha" status. Curr est mid-late 2011 for this. * Preconditions for considering the Muldis D language spec to be either "beta" or "released" status or "1.0" incl: A significant, computationally complete working core or subset thereof as a Parrot hosted language, a TAP speaking test suite with significant feature coverage, a serious level of post-alpha-status design input solicited of other interested parties, implementations over multiple SQL DBMSs. Curr est late 2011. ---------- * Add references to or adoptions of ISO/IEC 11404:2007(E) "Information technology -- General-Purpose Datatypes (GPD)" which could be very useful. * ZEROTH PRIORITY... Make the system catalog into something much closer to a *concrete* syntax tree-like-thing. See various following TODO items for details. Mostly do this as its own spec release with minimal dialect/etc/other chgs, that is, dialect changes to fill in new slots not needed, but anything the catalog would break could be altered. * FIRST PRIORITY... Reduce the general options for concrete value literals to have just the simple ones. For any given "x:y:z", remove all "y" but for Scalar where it is necessary; people can wrap a literal in an explicit TREAT assertion/etc otherwise if they really want to. Also remove all "x" where possible, so just the plain "z" is the only option, in general. So then, to write an integer/rat/text literal, the only option then is to say "42/3.25/'hello'"; you can't say "Int:42/Rat:3.25/Text:'hello'" any more. So cascade these simplifications, and we also free up the ":" mostly for other uses. This particularly applies to PTMD_STD, but we'll also simply the Perl-STDs where possible, which is easier in Perl 6. Unlike "[$|%|@]:..." for generic value literals, only %|@ without the colon are in use, I believe, as prefix operators, meaning cast-tuple-as-relation and vice-versa, but there is no colon-less $ prefix in use nor does it make sense for any similar purpose. So, use $ for name literals, that is, "$foo" means "Name:foo" and then say "$<>foo" means "NameChain:foo". And then we're a long way towards being able to ditch the postcircumfix syntaxes for eg projection since, say "r keep {$foo,$bar}" is terse enough. Then come up with something for rename, maybe a set of name-pair literals. After this, keep postcircumfixes rare and common, like for array elements. In fact, we could just say foo[x] and bar{x} are then array/dict lookups. Or the dotted forms are elem lookups and no dots are for slices, like Perl. And then ".attr" is then its own thing rather than being a shorthand for ".{attr}" although it may still be a function shorthand. * ALSO... Get rid of the Set.new() options in Perl 6 and generally update the Perl-STDs to use just Array/Seq/arrayref and mostly not set/bag/hash/etc, partly for code brevity but particularly to preserve the visual order of elements from source to catalog and back again. Also add scm_vis_ord to the catalog or change some catalog types to record order. * Maybe also but probably not yet be concerned w comments in Perl-STD/etc. * NEXT PRIORITY... Reformat all declarations, materials/subdepots particularly, to be of the format "name ::= kind ..." rather than "kind name ...". Similarly, we may be able to just nix the "subdepot" keyword so a subdepot is then declared as just "foo ::= {...}", and a material as "foo ::= kind ...". So the name on the left of the ::= is no longer part of the material node itself but rather is part of the larger thing into which the material node is composed; the material node is now just eg ['function', ]. Material nodes themselves just declare anonymous entities. * Example of function in Perl with context ... The FunctionSet tuple: ['cube','opt comment of FunctionSet',] The Function tuple: ['opt comment of Function',,] ... and so on. The comments are first so they're like leading comments as is common with whole-routine definitions and such, and they tend to be terser besides. * ALSO... Change function bodies from {...} to (...). And make "..." expr/stmt kind. * Likewise make a code comment a stmt/expr kind or otherwise provide for specifying where it goes visually in code in a statement position. Consider reworking scm_vis_ord to be external for some things it describes, eg mapping an order to a declared name. In fact, if this is done, then it becomes much easier to add/remove/reorder code pieces because their sequence numbers are stored separately and so code diffs on the system catalog itself may not show much in the way of spurrious diffs, especially if the mapping is simply an array_of.Name. Consider pulling out code comments in a similar fashion, just putting them as their own named (as stmts/etc are named) code bits, which are then associated by name with other code bits, and are listed in the vis-ord too. In fact, then where comments are physically visible and what things they are semantically connected to are then not joined at the hip. The comment names are optionally user-specifiable too, like with statemnts. For example: cmt_on_x ::= `This roxors!`; ... or: comment cmt_on_x ::= `This roxors!`; ... and other details still to fill in like saying what it applies to. Maybe a new infix-op-bind-like syntax will fit the bill for association. Also make blank code lines or visual dividing lines recordable in syscat. * ALSO... Use colons to separate any kind of heading/body pairs, both materials and values. Take Relation now "@:[...]:{...}" as example to follow. Also, routines now "function (...): (...)" or "updater (...): {...}" or "procedure (...): [...]"; this for routines is inspired by Python. This then opens the door for routine body bounding chars to be opt sometimes, and makes clearer where a heading ends and a body starts when there are various extra heading clauses such as is-x or implements x. Also consider using ":" in other places where pairs are, maybe freeing up => for something more specific; eg Python uses ":" in dicts rather than =>; or do the opposite; keep "=>" for named param/attr/etc lists and use the ":" for things like Bag literals or generic dicts that are binary relations ... use one for atvl:atvl (bags/dicts), other for atnm:atvl (tuples, arg-lists). Also consider using "::" for something, maybe type conversion, as Pg does. Keep "::=" as for explicitly associating names with what they are naming. * Have ";" as separator (opt lead or term) for both statements/vars/exprs etc as well as whole materials. This also comes together nicely for the simpler routines that don't need to have bounders because they are just single statements or expressions, for example: cube ::= function (Int <-- topic : Int) : topic ^ 3; ... and that's it. * Remove the "var" and "attr" keywords or make them optional noisewords. Simply having "foo : bar" in a procedure statement position should be enough to know it is a variable declaration. Likewise, "foo : bar" in a sca/tup/rel typedef can be known an attr def. Then, other things can gain optional noisewords, such as "result" before the type in function sigs, or "param" before a param in routine sigs, or "expr" or "stmt" optionally before those things in a routine, etc. * Update system catalog, if necessary, to support specifying where a named expression, or a variable declaration, lives visually in a statement list. * THIRD PRIORITY... Add support for material and parameter synonyms. And change what params any positional arguments implicitly go with from topic|other to 0|1|... But don't actually change any routines/params until later, except adding 0|1 to all topic|other. * Update the array-specific postcircumfix concrete syntaxes to make them more generic such that the array index/es (what's inside the "[]") may be any arbitrary value expression rather than having to be an integer or interval literal. But if nothing else changes, this means the slice will have to be spelled like "ary[{x..y}]" rather than "ary[x..y]", but individual element access like "ary.[x]" will still work. But now you can actually have the x,y variables rather than those having to be constants. * Consider taking a more Perl 6 like approach by turning ".." and its 3 friends into infix dyadic functions that take endpoint values and result in interval values. Then the surrounding curly braces are no longer needed, and you can once again say "ary[x..y]". Note that if ppl still want/need delimiters for an interval, they can always use parens, like "(x..y)". If we also redefine an MPInterval to be a set_of.SPInterval, then any {x..y} would unambiguously mean either a set or MPInterval, but we may then lose the shorthand "x" meaning "x..x", but this could be ok tradeoff. * Consider also making the likes of "," and "=>" into dyadic functions along the lines of Perl 6, though this would have further consequences. * Demote the numeric operators that are more statistics-oriented from the language core into a new Statistics extension or some such. Specifically this means these 5 in [Numeric|Rational|Integer]: range, frac_mean, median, frac_mean_of_median, mode; and these 2 in Integer: whole_mean, whole_mean_of_median. Also, this "mean" is "arithmetic mean" (division of sum); there is also "geometric mean" (root of product), etc. After the demotion, this set of ops can be changed or expanded to be something more appropriate for statistical applications; some yet-missing SQL-standard functions like pop-etc can then come in also. Now these core-removed functions are just shorthands for not-too-complicated expressions that users can define for themselves with core ops, so they're not really missing anything important if they only get the core. For example, the current (arithmetic) mean is just: arith_mean ::= function (Rat <-- topic : bag_of.Rat) ([+]topic / #+topic) ... and geometric mean is something like: geom_mean ::= function (PRat <-- topic : bag_of.PRat) ([*]topic ** (0 - #+topic)) ... but any vers in the dedic Statistics could be impl more efficiently. * Drop special entity name embedded support for inline type declarations like "foobag : bag_of.Foo"/etc; instead, this syntax is demoted to a dialect-specific thing that is just sugar for something like "foobag : relation-type Bar { attr value : Foo, attr count : PInt, primary-key { value } }". That way, we can always point to a specific material that actually exists when asked what is the declared type of "foobag", and also we are psychologically more free to just declare things as relation types anyway, and the added flexibility that comes with that, such as in the definition of the system catalog itself, and also then the concept of an entity name chain is no longer overloaded. * Generalize the Set/Array/Bag/Maybe-specific operators so that: 1. the names of the value/index/count attributes can be specified with arguments (that are optional, and default to the current ones if not given); 2. they work with relations of arbitrary degree. For example, merge the Counted extension into Bag and call it Counted, and generalize Array into Ranked ("Ordered" is already taken and best left as is) which also absorbs the ranking and quota functions from Relation.pod, and generalize Set into Relation. The Counted|Bag is then any 1+ degree relation with a positive-integer typed attribute C that has a key (or superkey) on all of the attributes except for C; it is treated as special by the functions, which are analogies to general relational functions that work as normal on all attributes but C and merge C. The Array|Ranked is then any 1+ degree relation with a nonnegative-integer typed attribute I that has a key on I and is further constrained that "max(r{I})+1 = #r"; I is treated as special by the functions. The Maybe is then any relation with a nullary key. With these generalizations, some concrete syntax like .[N] will just compile into special cases such as assuming certain special attribute names, and you can use the foo() syntax when that isn't the case. After these generalizations, some Counted|Array|Maybe|etc functions can be core and others can be pushed into extensions, as is appropriate. After these generalizations, we may or may not still have named Array|Bag|etc types, which will probably keep their definitions, as special cases of the generalized where the attribute names match the canonical ones. Also rename "index" to "rank" in Array perhaps. After the generalizations, the distinct usefulness of Set would decrease somewhat. Note: For a generalization of Maybe, consider the Zoo name, inspired by Database Explorations that discusses MD's canonical missing info solution, or alternately call it C01 in the spirit of D0C0/D0C1/D0. Still in question is what if anything to change about [S|M]PInterval/etc. * Add official support for functions/expressions to be able to do some things that they otherwise couldn't, such as have side-effects or be quasi-non-deterministic. To be specific, add support for side-effects that occur external to the current in-DBMS process, such as output via some side-channel like STDERR or a message queue, which can be used for debugging a function. But any such functionality can't directly affect the current process, and in particular it can't affect the function/expression's result value. On the other hand, it is acceptable for something to cause the function/expression to abort with a thrown exception, since this isn't changing the result value. There should be metadata for any function which does or might do something like this, to declare the fact. In addition, we could support a limited form of non-determinism, such as allowing a rand() or now() function that does affect the calling function/expression's result, but that this is constrained to be mutually deterministic within the whole of a single Muldis D multi-update-statement. That is, given the same arguments (or none), now() would always return the same value within a multi-update-statement, and might only change between different multi-update-statements, and rand() likewise. This might also give some support for partial-sort functions, as long as they are consistent within multiple calls in the same multi-update-statement. Once again, such things would need to be tagged with metadata. Normal deterministic functions always have the same result no matter how far apart. * Make autonomous transactions / in-DBMS processes not so much startable directly by a process but rather that the kernal/etc process always does it directly and any other process asks to have such done by sending a message to the kernal. Similarly, DBMS-clients just become message passers, and they start a process the same way as internally, by sending a message to the kernal/etc to please call this procedure for me, and the result to the client is also a message. This also generalizes the stateful/stateless thing and streaming/cursor or not thing. Now also tied into this is stimulus-response-rules, in that all stimuli are messages. The kernal can also initiate messages, such as this depot did mount, or whatever. * Consider relaxing the restriction of how much of a depot must be defined just in terms of itself. So, for example, only a depot's data types (and dbvar) must be defined wholly internally to the depot. But any routines in a depot may invoke routines outside of the depot if the former aren't used in the definition of a data type or dbvar. * Consider adding some way of generating a type specification from a value of that type and consider having something like a system catalog which describes the actual database value rather than a prescribed database type, such as to help introspection of a database whose declared type is just 'Database'. The MST thing of TTM may tie into this. See also how the "Pick" DBMS works, or something. * Add a scm_foo to the system catalog next to any place that declares a DBMS entity name, particularly an expr/var/material, to indicate whether the declared name is considered explicitly user-specified or parser-gen. There may be more than 2 possible values (making this an enum rather than a Bool) that relate, say, to distinguishing explicitly named but inlined items versus explicitly named and not inlined items. The sys-cat might restrict based on this such that it doesn't allow certain references to entities whose names are marked parser-generated, because any generated source code would have to make the references visible. A related implication is that any entity names marked as generated are not sacred and are free to be automatically renamed by different catalog-updating actions such as source code optimizers. Maybe also have something to distinguish things declared in positional format so "0"=> etc don't appear, maybe. * Numeric updates ... See http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html . Excise the M;N format for bases 17..36 leaving just 2..16, absolutely. Use "#" as separator rather than ";". Write M as a base-10 integer rather than a single character. These are more like Ada "based" literals then, read better, frees up ";". So 16#FF is an integer, 16#'FF' is a blob. Maybe also add Perl-6 inspired commalists, like this: 60#[43,5,12] (integer); no good reason for a blob analogy. * IN PROGRESS ... Rewrite/update anything talking about matters affected by process isolation to both declare that Muldis D is generally orthogonal or agnostic to such matters and makes no guarantees in general that any routine, even a recipe or updater/function invoked by one, will see a consistent view of the database during its execution, and generally remove "atomic" terminology, and rename "nested transaction" to some other terminology. Rather, any guarantees of serializability of a recipe/etc will need further work by users such as to explicitly configure their isolation or locks or whatever as appropriate, and of course everything's affected by what DBMS you use and what concurrency models it supports, such as locking or MVCC. Likewise the model being used affects when conflict errors may manifest, eg at commit or earlier, or when/if user tasks will block, or how complicated it is to resolve or avoid a conflict. Matters of the concurrency model or isolation are best not legislated by Muldis D but be left up to the implementations and users. Muldis D just has to require that the database is always in a consistent state on statement boundaries et al. * Define how one can split a PTMD_STD depot into multiple text files since you would conceptually put an entire potentially large program in one. * Tweak the STD dialects to account for defining system modules with them. * Update STDIO.pod and Cast.pod concerning the Text types split. * PACKAGE: - Support variant of "<[ a..z A..Z _ ]><[ a..z A..Z 0..9 _ - ]>*" nonquoted name strs that's more liberal "<[ a..z A..Z 0..9 _ - ]>+" for just atnms, possrep names, param and arg names, so any of "-foo", "3", "-4" can be bw. - Update system catalog and grammars to add lightweight aliasing support for whole materials, as a new "synonym" (name?) material. These have no mutual order but the actual non-synonym target is the "primary" name. Grammar can be "synonym foo of nlx.lib.bar" et al in general form, or "function foo|bar|baz (...) {...}" where original is the first one "foo" and the other synonyms all live in the same subdepot, and in particular the others are "not" inner materials of "foo". - Also add [Integer, Rational, Boolean], make [Int,Rat,Bool] into synonyms. - Likewise (and necessarily), subdepots themselves can have synonyms. - Also update tuple (and by extension) database types to add attribute synonyms which semantically are lightweight virtual attribute maps that simply make 2 attributes always-identical so only one ever needs storing and no map function is required. Not the same as material/sdp synonyms. - Update system catalog and grammars to add support for routine parameter aliases, built-in to the definitions of the routines; all names for a param are defined in an array, that ordering being source-code-metadata, and the first item in the list being the "primary" name. Grammar can be "function foo (Int <-- topic|0 : Int, other|1 : Int) {...}". This is not supported for param names in generic expr context except for the shorthand "=>foo", so "=>1 is allowed". - Change grammar so any number positionals supported for both s-d and u-d, always map to "0","1".."N" and *not* "topic","other". Also, any ".foo" now is short for "0.foo" rather than "topic.foo". - Consider changing param names of special routines like value-filter etc, or at least change any "topic" to "0" (other "1") so it works with ".foo". - Update the documented signatures of all system-defined routines to use the updated grammars reflecting the above additions. Add param aliases of "0" and "1" for every "topic" and "other" respectively, keeping said old names too, and add other aliases as appropriate. Add routine synonyms for every distinct way of spelling a routine that rtn-invo-alt-syn provided, so one can then always use that spelling in "foo(...)" plain-rtn-inv syntax; update all routine docs so that the "also known as" comments no longer mention any declared synonyms, no longer mention anything as "C" but rather just anything as "I". - Just stick to that, basically, leave anything else such as Unicode or rtn-invo-alt-syn alone/not-removed for this release. * Make the fully qualified language names declarable by code to be more flexible than the names declared by the language spec itself, so that the ones in code can specify multiple language versions that they conform to, as if the code is declaring that it only uses the parts of the language spec that are unchanged or that intersect between all the specified versions. For example, let one say: Muldis_D:"http://muldis.com":{0.112..0.125,0.127..0.136}:PTMD_STD ... and then any implementation which takes one of those versions will parse the code according to any of those same that it supports. The idea is to make code more easily compatible with a wider range of interpreters, such as newer ones designed for version 25 who don't explicitly know how to emulate version 23 or know what its differences are, and are just trusting the code to be valid for version 25 even if declaring 23; by the code declaring a range, it is declaring it is willing to take its chances. - A multiplicity of stated authorities is also possible. - A number of consequences still have to be thought out here. - Specifying a closed range is the code saying it *knows* it is compatible with all those versions, specifying an open range says take chances or is not recommended and maybe won't be supported. * Consider adding a midweight version of virtual-attribute-maps which is like the fullweight version but that it expressly maps 1 attr to 1 attr; it still uses a map function but that is no longer Tuple<--Tuple. * Demote the "[array|set|etc]_of" types from a special concept knowable by the backend (and explained in Basics.pod), where you can essentially use some data types without them being declared as system catalog materials, so that instead actual s-c materials *are* required; this syntax will remain only as a dialect feature which is a shorthand for inline type definitions; eg, these are now all equivalent: - param : set_of.Foo - param : relation-type { over tuple-type T { value : Foo } } - param : set-type over Foo ... or we might consider more material kinds specif to [set|array|etc]-type so to help preserve the user's syntax and be more compact, maybe, but those 6 or so could probably be repr by single m-k which has an enum type attr. Also thanks to the change about replacing N-adic with dyadic s-d routines, and its precedent, there is less need for "foo_of" shorthands anyway. * Externalize all the details of character string repertoires or encodings from the Muldis D core, such that say all the details of Unicode become part of a Muldis D extension instead, and maybe ASCII likewise. More plans pending. * Considering the following items where non-ASCII chars are much more pervasive (though strictly optional), replace the "op_char_repertoire" pragma with a pragma that affects all non-quoted code in general, including all nonquoted (but not quoted) entity names. The options would be, at least, the 3: ASCII, Unicode_6.0.0_canon, Unicode_6.0.0_compat. There would separately be options for each kind of quoted character string: quoted entity names, texts, comments; see later TODO item about this; as per that, all of these could be part of a single pragma. A simpler implementation could support only ASCII across the board as literal characters, while non-ASCII data could be supported as escape sequences. * Enhance the cat-type/syntax for defining tuple types as attr lists (and by extension, relations and scalar possreps) to let users provide an optional hint for the order that tuple/sca-pr/etc attributes should be consulted when doing an equality test between 2 tuples/etc so to direct the DBMS to do the least expensive comparisons first, eg integer attributes, prior to more expensive ones, eg blob attributes; since the test short-circuits, and assuming the vast majority of compares would return false, this should aid performance in a clean way without users resorting to overload operators or something for performance reasons. This is a separate hint from that garnered by marking relation attrs as key attrs, and could work within that eg to suggest order within multi-attr keys. Other areas in the language could probably be assisted by hints also. * Update the PTMD_STD grammar to split up the "Name_payload" or its parts further so that, rather than just the 2 "[|non]quoted_name_str", there is at least the additional "nonquoted_rtn_invo_name_str" which is only allowed to be used in a routine invocation context like (...), with trailing parenthesis, and not in a context lacking trailing parenthesis. A "nonquoted_rtn_invo_name_str" is a nonquoted string containing no whitespace and, in addition to all the chars nonquoted_name_str allows, also many other symbolic chars such that wouldn't confuse the parser, so bracketing chars would likely be disallowed, at least as leading or trailing characters in the string, and trailing colon could be disallowed, and leading comma or leading => etc. The idea here is that people can then write "+(foo,bar)" for addition or "++(foo)" for increment, or "=(foo,bar)" for comparison, "@(t)" or "%(r)", or ":=(target,value)" for assign. In this case, if infix ops are allowed, they'd have to have mandatory surrounding whitespace. We also generally have to revisit Unicode for what is allowed in bareword variable/etc names such as non-Latin or accented letters in general. The parser would have to use Unicode character classes in its definitions, then. Look at what Perl 6 does for some guidance. As per another change, also assume that the idea of the internal catalog no longer using Unicode for sys-def entity names is no longer true. So Muldis D would then much more be Polish notation (with parens) by default, and it should be much easier to just use the whole language that way when it is more terse like this. Supporting polish without parens would be up to rtn-inv-alt-syn replacemnts while above is in plain-rtn-inv. See also the 2nd(+?) next TODO item on splitting rtn-inv-alt-syn. Also add yet another nonquoted...name_str that is just for use with attribute/param/arg names and is only slightly less restrictive than the old nonquoted_name_str in that it also allows strings of just or leading digit chars; this is mainly so one can write positional params wo quotes. Maybe just this last one can be added ASAP, and the other wait longer. * Consider creating a branch of the Muldis D spec (and of the Muldis D Manual) which retains all of the current spec features, and subsequently strip out the whole rtn_inv_alt_syn catalog abstraction level in trunk so that we can more radically evolve the language design at the more fundamental level which plain_rtn_inv has access to, without worrying about clashes or the complexity of a dozen-plus-precedence-level grammar. Ideally the more fundamental level can evolve to the point that a lot of what rtn_inv_alt_syn offers is no longer necessary in practice with regards to making the code more terse. The branch would merge in the more fundamental changes with the old retained rtn_inv_alt_syn to see how they might look together, or show how the new is absorbing the old; ideally their differences would reduce over time without th branch losing features. In the interest of marketing, the reduced trunk would retain all or much of the example code using the then-removed features, as well as gain ones using not yet specced features. Each examples section would potentially be split in 2, with the normal "Examples" just using the reduced spec features and a new "Potential Future Examples" having anything not yet specced. Also, the 3 Dialect files wouldn't actually lose the rtn_inv_alt_syn precedence level but rather it would be made impotent as the grammar would just define it as a non-proper superset of plain_rtn_inv for now; mainly the change is that the 2 main pod sections "FUNCTION INVOCATION ALTERNATE SYNTAX EXPRESSIONS" and "IMPERATIVE INVOCATION ALTERNATE SYNTAX STATEMENTS" would be removed, or alternately stripped down to collection of "Potential Future Examples" sections with a bit of commentary to explain if needed. * The new version may be a lot easier to learn, considering that SQL + many other C-like languages actually don't have too many non "f()" format ops. Perhaps the main use of rtn_inv_alt_syn later is for people that want their code to look like math/logic/etc exprs rather than named function calls. IDEA: Split rtn_inv_alt_syn into 2 abstraction levels where the lower one has just 1-2 dozen or so plain prefix/infix ops such as [:=, =,≠,!=, <,>,≤,<=,≥,>=,--,++, not,!,and,or,xor, +,-,|-|,*,/, ~, @,%,#] and few are allowed having modifiers or that aren't in most languages. Likely disallowed in lower level are [<=>,abs,div,mod,exp,^,**,log], the other math ops, all other or Unicode variants of logic ops, all hyper-ops including hypers of := or !, practically all relational/set/array/etc ops including membership or sub/super tests. As a middle-ground, for which we could probably have a middle-third level from the split, are all the postcircumfix ops that do restricted-to-constants shorthands of the likes of array element access, projection, rename, un/group, un/wrap etc. Things like the full set of infix logic ops are reserved for highest level, and likewise for majority of Unicode ops and their ASCII-symbolic versions. Now assuming we get generic (...) in plain-rtn-inv, and so "+(foo,bar)" etc is an option, then we should reprioritize the above 3 post-split levels so that a level adding just postcircumfix syntax for project/group/ary-acc/rename/etc should be the lowest additional level, so one can be able to say "foo{...}" without also needing support for foo+bar. Maybe call that new lowest "rtn_inv_pcfx_alt_syn". Making postcircumfix the lowest alt syn is also fitting because just it is like some of the lower levels such as code-as-data where using some syntaxes make certain inputs hard-coded, such as the attr names or interval-endpoint-flags, versus those taking variables in the the more verbose generic syntaxes. Presumably all levels higher than rtn_inv_pcfx_alt_syn are plain infix or paren-less prefix with fully-variable arguments like generic functions. * Maybe this isn't feasible, but ... Consider formally making every function map 1:1 from a tuple input to a tuple output; it declares exactly 1 parameter that is a tuple type and its result declared type is a tuple type. Consider making every updater formally do something analogous, such as having exactly 2 tuple-typed parameters where only 1 is subject-to-update. A recipe is like that but has 4 tuple-typed parameters, 2 like updater and 2 global alias analogies. A virtual attribute map kind of resembles this already. Doing this would require making tuple attribute accessors special, their own expression/etc node kind and not just a function ... though they kind of are already as an alternative; also, variable assignment would have to be a special node kind and not just an updater; in both cases, to save their definitions from being mutually recursive. ---------- * In all 3 STD.pod, add code examples for each of these 4 material kinds: scalar-type, domain-type, subset-type, mixin-type. * In all 3 STD.pod, complete the description text, defining interpretation in PTMD_STD and structure in the 2 Perl-STD, for each of these 7 material kinds: scalar-type, tuple-type, relation-type, domain-type, subset-type, mixin-type, subset-constraint. * In all 3 STD.pod, populate the entire pod sub-section for each of these 2 material kinds, to provide concrete grammar, description text, and code examples: distrib-key-constraint, distrib-subset-constraint. ---------- * Eliminate the simple monadic postfix special syntax category. Convert ++ and -- into simple prefix ops, because an expression with that in it is no longer end-weighted, and it would be less likely to confuse people into thinking the op is variable increment rather than just returning a result. Removing the category also simplifies the parser as there are no longer pre vs post precedence conflicts, and helps open the door to the parser being more generic. Simply eliminate postfix "!" factorial or change it to prefix "fact". * Update Basics.pod or other places to distinguish between the 2 main ways that a type can be infinite, such as with "outwardly infinite" and "inwardly infinite"; the later is when any 2 values have an infinite number of others between them, so eg a time-of-day type could be infinite in the inward sense but not in th outward sense; th result type of sin() likewise. Also, the singleton types -Inf, Inf only refer to outwardly infinite types. * Change the basic exception throwing mechanism from a function/procedure to its own expression/statement node kind. Call the new node kind "fail" or "failure" or "throw" or "raise" something. The "fail" node has a child expression node or references a variable node which defines an Exception value. Simply evaluating a "fail" expression node will throw the exception so a "fail" expr node is expected to only be the child of a short-circuit expression like ??!!. - Add a "fail" term, which throws a generic/default Exception value, and/or a tight-binding "fail" prefix-keyword which takes an Exception arg; that term/prefix is the concrete syntax for the new fail node. - The "assertion" function can then go away; instead of writing [$foo asserting $foo != 0], say [$foo = 0 ?? fail !! $foo]. - Add a few simple functions that each result in a kind of generic Exception value. At least have a niladic one for the most gen exception. Then one could write [ ?? gen_exception() !! ]. - The treated() function then is just a wrapper over ??!! + isa. - The fail() procedure will go away, replaced with a term/keyword also, which maps to the "fail" statement node. - Maybe use 'fail' for niladic term and 'raise' for prefix term? - New keyword speelings: - failure - raised - fail - raise - Maybe alternatively, make an assertion into a lexical entity that is like an expr node but doesn't have its own node name, and so is always used either inline or offside, the main point being that users don't have to come up with another node name when the node represents the same value as another node and should naturally just have the same name. Example: foo ::= ... asserts bar( foo ) baz( foo ) ... here, the assertion only happens when baz() is going to be evaluated; the spelling is "asserts" since it should be an adjective. - There also needs to be a version that can assert multiple exprs. - Or actually, the ??!! version may still be better? - Naming the "duplicate" isn't actually that hard; just use a leading underscore, eg: _foo ::= foo asserting bar _foo ::= bar ?? foo !! failure ... so maybe that's best? - A BIG THING TO CONSIDER HERE IS, HOW DO FUNCTIONAL LANGUAGES MAKE ASSERTIONS ON COMBINATIONS OF ARGUMENTS ... OR IS THE ANSWER THAT ALL FUNCTIONS HAVE EXACTLY ONE ARGUMENT? SEE WHAT HASKELL/ETC DOES. * Change generic assertion mechanism from a function/procedure to its own * Add support for materials to have aliases. But this kind of alias would be simple, just an alternate unqualified name that exists in the same namespace and is for the same material. Aliases would be declared with an "aliases" attribute, typed set-of-Name, held directly in the same catalog types that have "name" attributes; for example, add it to the "FunctionSet" type. So, R.count becomes a simple alias for R.cardinality, and we can add a whole bunch more aliases, so to make it friendlier for people who prefer to call routines with foo(x,y) syntax rather than alternate symbols. A common use could be to provide both "prefix" and "infix" reading names, such as both "product" and "multiply", and especially to give shorthands. Example: "function product|multiply|mul (Int <-- x : Int, y : Int) {...}". The first one in the list is the primary name, remainder are the aliases. Or actually, it would probably be better for FunctionSet et al to *not* internalize aliases, but rather have each alias exist as a separate material which cites what it aliases. And then that version could exist in any public namespace (usually nlx), and not just the same subdepot as what is being aliased. The SYNONYM schema object of Oracle and other dbs corresponds to this, and maybe "synonym" is what I should call mine too, being what the specific material kind is called, leaving "alias" as a more generic term. Even if we have separate synonym materials for routines/etc, one can still declare them bundled into their originals like in the above foo|bar example as that would just be a dialect shorthand but produce separate materials. Also useful in support of users having their own home subdepots which have aliases to the things they use, without them having to know where they are. Add alias for every 'op' node 2nd element for a routine, meaning eg add "+" and "⋈" as aliases, and so then a Muldis D parser can then produce calls to those, as if one said `"+"(4,5)` or `"⋈"(foo,bar)`, and so we can better remember the individual syntactic choices that the users made. But then, how do we deal with the idea of making logical-not into a meta-op so that there is no actual is_not_same|"≠" function etc; how do we preserve user's individual syntactic choices then? So think about that. While SQL synonyms can also be used for relvars, mine would probably only be used for materials - types, routines, stim-resp-rules, themselves, etc; perhaps leave relvar aliases to be handled by virtual attributes. * With the improvements from having aliases or supporting "+"(x,y) etc, and other language improvements, it becomes a lot more feasible for users to settle for users to be satisfied with "plain_rtn_inv", that being sufficiently terse, and so there is less need for "rtn_inv_alt_syn" to be implemented or available. * Maybe also treat material names like `function "infix<+>" (...) {...}` as special such that if a parser encounters a random "foo + bar" then it would parse it as if it were `"infix<+>"(foo,bar)` maybe I guess. But if this is going to work in a general sense, including for user-defined things, then general format rules have to be set out for the parser so that if it sees anything like X, without knowing what ops are declared, then it treats it as an operator rather than some other construct. On the other hand, we're sure to run into trouble in trying to support non foo(x,y) syntax for user-defined operators (besides those overloading system-defined virtuals), and so better off just not doing this period; "infix<+>" is not special. * Add support for routine parameters to have aliases, that is, for a named parameter to be able to bind with a named argument where the argument may have several possible names. One use for this would be to support parameters where it is desired to refer to them within their routine using one name, but to use a different name in the argument, such as because the latter is shorter or reads better (the Perl 6 spec should have some examples of this). Another use for this is to provide better support for mixtures of arbitrary numbers each of positional and named routine arguments; any parameters that would be reasonable to have a positional argument would have 2 names, where one is an integer and one is text. All Muldis D grammars would be updated to no longer consider 'topic' and 'other' as special, which is a contrived notion, and instead consider '0','1',... special. And so, for all system-defined or user-defined routines, any `op(foo,&bar,baz)` would be parsed into the same thing as `op("0"=>foo,&"1"=>bar,"2"=>baz)`, and `.name` would be `"0".name`. Now it will so happen that "topic","other" will be commonly used in parameter names, typically paired with "0","1" but we can now be a lot freer to name parameters something more descriptive, such as "addends", and not artificially make them topic/other simply so they support positional syntax. An idea for declaration syntax when aliases exist is to use the "|" char; eg `function foo (Int <-- topic|"0" : Int, other|"1" : Int)`. Of course, this complexity is only in param lists; arg lists are unchanged and still are plain tuples with a single name per attr/arg. For simplicity, a single param name will be more important than the others, and only that would be its "expression node name" or "variable name" within its routine, by which it must be referenced; therefore, the current system catalog for declaring parameters can remain unchanged, and new rtn-decl-type rtn-heading-attrs can be added to declare aliases. Largely for flexibility, and correctness where they don't make sense, parameters will never automatically have a number alias, but rather only when the routine definer explicitly gives it one. Of course, these aliases only apply to regular params, not global params. One result of this change is that the Muldis D grammars will no longer consider positional ro and rw args in separate spaces such that they can appear in either order; now all positional args must be in the correct mixed relative order, as there is only one "0", not one per ro and rw. * Add special syntax for more ops: - ?#foo - "has 1+ elements" - is_not_empty(foo) - !#foo - "has zero elements" - is_empty(foo) - foo :=!# - assign_empty(&foo) ... and maybe rename underlying routines in the process. * Update the mixins feature to add support for mixins that define attributes that types can compose, whereby we support some approximation of "specialization by extension" while still actually being just "specialization by constraint". Maybe also it could be said ... A primary purpose of mixins is to help with managing software reuse, mainly when multiple types have a number of attributes in common, a mixin can define these and then the multiple types can compose that mixin. A mixin or type that composes a mixin can both add additional attributes of its own to what the mixin defines, and the composer can add extra constraints over the composed attributes like forcing a subtype. Maybe also do ... Support delegation / 'handles'; for example: - Name explic delegate to Text attr - maybe Blob, Text explic delegate to String attr - a ColoredCircle would delegate to both Color and Circle attrs? This will all take some work to get right; not /all/ Rat/etc can be subst. Probably *only* those operators that Rational/etc explicitly declares can be delegated to Rat/etc by TAIInstant/etc. * Replace many N-adic routines with dyadic ones, specifically those whose definition is a repetition of a dyadic operation (so, 'sum' or 'join' etc yes but 'mean' no), which users then can invoke by way of a reduction function if they want N-adic syntax. Also let system catalog store more information such as whether or not functions are commutative or associative or idempotent or symmetric etc; likewise, the function def can store what the operation's identity value is, if it has one, as meta-data, useable when comm/assoc; the reduction func can read this using a meta-programming function or something. Reduction will fail if used on a base func that doesn't define an identity if given an empty list. The point of this change is to make the common dyadic case of N-adic operators simpler, and also set a foundation for user-defined operators that provide more information such that a compiler can be more effective in optimizing them, or something. The explicit/normal way, then, to indicate in code whether you want the parser to produce a reduce op wrapper call rather than nested direct invocations in the system catalog, is to just invoke the reduction operator directly and explicitly pass an operand list; but the reduce op would have special syntax, taking normal collection exprs, such as: [+] {5,23,5} [~] ['hello', 'world'] [join] {order,inventory} [*] {1..5} ... or something. Not using that would parse into nested dyadic calls instead though the compiler can still rearrange. Once we do that, its also simple to add hyper-operators, though arguably these are redundant with 'map' or 'extension' etc. Or this would be better for simplicity, given it won't be used as often, and any dyadic infix function at all may be used, spelled the same way: reducing + {4,23,5} reducing ~ [...] reducing join {...} reducing * {...} reducing (a=>3) {...} ... and so the regular operators can be parsed as usual. Or maybe: reduced {4,23,5} using + reduced {...} using 3) ... but that might have an end-weight problem? Or, still go symbolic like the first one, but use prefix notation so that it works well with both symbolic and wordy or inline-defined operators: []+ {5,23,5} []~ ['hello', 'world'] []join {order,inventory} []* {1..5} [](a=>3) {...} Another consideration is that, when combined with routine synonyms that are symbolic, the plain_rtn_inv alone would let you do this: reduce( <"+">(), {5,23,5} ) reduce( <"~">(), ['hello', 'world'] ) reduce( (), {order,inventory} ) reduce( <"*">(), {1..5} ) reduce( (a=>3), {...} ) * Furthering the above, add somewhat generalized support for what Perl 6 calls "meta" operators, at least in that we define and exploit several. The general reducer above would be one of these. Another is the negated relational, whose syntax is putting ! or not- in front of any Bool-resu op. Another is the assignment, putting := in front of any function. For !, we can then eliminate all the "not" variants of any Bool-resulting functions, so eg "x != y" parses into "not(is_same(x,y))", same as if they had said "!(x = y)". As for the old intended purpose of all the not- variants, which is to preserve the user's intent of how code should look, we could simply have an alias for the not() function which is what is parsed into when != is used, and the old not() is just parsed into when the separate prefix op is used. On the other hand, while lots of not- variants would go away, we'll keep the alias-but-param-order-reversed dualities such as less-than/greater-than and sub-superset; unlike these, what we're eliminating would not result in losing track of which args are lhs/rhs. A related change is infix ops like ≠ or ⊈ would parse into not(foo()) even though they don't have the !; these would be aliases for the combos, same as Perl 6 has != as an alias for !==. For :=, we can eliminate all the updaters that are just shorthands for doing an op and assigning the result to one of the args. And so a "foo :=union bar" would parse to "assign(&foo,union(foo,bar))". Once again, an alias for assign() can exist which such combos are parsed into, where the regular assign() is used when users write "foo := foo union bar". Of course, despite Muldis D requiring operator combos where singles used to work, we assume that implementations will be smart enough to, say, use a single "!=" or "insert into foo ..." etc when it sees the combination, so there is no performance loss. Probably, any meta'd operator would have the same precedence as the base operator that it is modifying. Adding the hyper-meta may not be useful since we already have map()/etc; or alternately it might be useful in avoiding some uses of map or extend or substitute etc where users are just adding/defining one attr. Or maybe hyper-meta would only be useful with Set/Array/Bag because the general map/extend/etc would require naming the attribute explicitly. As for ASCII vs Unicode etc, that preference is never encoded in the system catalog, so when code would be generated from the system catalog, it would be up to the generator's configuration for which versions are used. * Add hyper-meta in a more general fashion, as per the Ranked general type of which Array is a more specific kind. The hyper-meta is fundamentally associated with the join operator, because it typically involves taking 2 relations, joining them on one set of same-named attrs (exactly 1 usually), and then taking another set of *same-named* attrs and applying the hypered op pairwise and deriving a single replacement set of those attrs with the results. The argument attrs would be renamed distinct first. For example, given 2 relations A{key,value,x} and B{key,value,y}, where we assume that "key" is a unary key of each relation, the expression "A >>+<< B" is roughly like this code: with ( a ::= A{%others_a<-!key,value}{value_a<-value} b ::= B{%others_b<-!key,value}{value_b<-value} ab ::= a join b f ::= function (Tuple <-- t : Tuple) { %{ value => t.value_a + t.value_b } } fr ::= extension( ab, ){!value_a,value_b} ) fr{<-%others_a}{<-%others_b} And the result is a relation with heading {key,value,x,y} but of course with the more typical case the inputs and output are just {key,value}, in which case that simplifies to: with ( a ::= A{value_a<-value} b ::= B{value_b<-value} ab ::= a join b f ::= function (Tuple <-- t : Tuple) { %{ value => t.value_a + t.value_b } } fr ::= extension( ab, ){!value_a,value_b} ) fr A variant taking a relation and a tuple would be like the >>+>> /etc form. We might have variants for join vs union etc or generalize this further so that bag/counted variants of relational ops can be defined using this generalized hyper in combination with the regular relational ops, maybe. * About extra metadata in the system catalog for functions/etc, see http://www.postgresql.org/docs/9/static/extend.html for some ideas, such as 35.13.x on Pg's use of COMMUTATOR and NEGATOR where function pairs declare their complement operator. The first pairs up "<" and ">" say (and "+" pairs with itself) while the second pairs up "<", ">=" (dbl-chk that). * Note that Pg exts are like Muldis D system modules in what they do, such as that they add types and routines etc to the language. * Change multi-update to be a sequence of statements rather than a set, and explicitly allow the same target to be used more than once ... this could be the case anyway thanks to virtual relvars etc. * Move or adapt more Text functions into Stringy. - Fundamentally all Stringy funcs work on Text in terms of the "maximal_chars" possrep; this will just work correctly for when all func args are of the same Text subtype, such as Canon etc. - The Stringy/Text ops are analogous to Rational ops such that it is like doing fraction math. catenation() is like sum(), replication() is like multiply, a substring test is related to difference/subtract (maybe "?~" and "!~" might work as infix ops for something?). - Move cat_with_sep to Stringy; semantics are clear cut and generalizable. - has_substr ought to work with Stringy no problem from the Text and Array perspectives, but Blob presents an issue purely concerning bit alignment, such as whether we're searching on bits or on octet/etc alignments. ---------- * Update the virtual attributes maps so there is a way to manually specify a reverse function, as meanwhile all the virtuals don't have to be either read-only or updatable due to an automatically generated reverse function, which might vary by implementation, which may be considered broken. Note that the reverse functions might have to be defined as per-tuple operations, separately for insert/substitute/delete. * Add new "material" kinds that define state constraints (address as simple nlx.*.data.*), like type constraints but ref in reverse. * Update the "material" kinds that def stimulus-response rules / triggered routines so that they work for more kinds of stimuli, and maybe change the keywords. The material kind has 2 main attributes, where the "stimulus" defines what to look out for and "response" defines what to do when the former is sighted. Some possible keywords for the first are "stimulus", "cause", "when"; for the latter, "response", "effect", "invoke". * Add new "material" kinds that define descriptions of resource locks that one wants to get, starting with basic whole dbvar, relvar locks (address as simple fed.data.foo.*, as well as simple relvar tuple locks (addr as prior plus lists of values to match like with a semijoin); leave out generic predicate locks at first but note they will be added later. Update the system catalog concerning managing shared|exclusive locks or looking for consistent reads between statements, etc. * Large updates to docs concerning transactions and resource locking. Note: Supposedly PostgreSQL and MySQL use read-committed isolation by default while SQLite provides serializable. * Rewrite the "Exception" catalog type so it can carry metadata on what kind of exception occurred, not just that an exception occurred. * Also study SQL concept of conditions and handlers, looks sort of like something between exception handling, signals; or it is their exceptions. * Also adapt something like Postgres' LISTEN/NOTIFY/UNLISTEN feature, which is an effective way for DB clients to be sent signals, such as when a database relvar has changed. * Use a conceptual framework for database transactions that is strongly inspired by how distributed source-code version control systems (VCSs) work, in particular drawing on GIT specifically. The fundamental feature of the framework is that the DBMS is managing a depot consisting of 1..N versions of the same database, where every one of these versions is both consistent and durable. Each version is completely defined in isolation, conceptually, and so any versions in a depot may be deleted without compromising each of the other versions' ability to define a version of the entire database. It is implementation-dependent as to how the versions are actually stored, such as each having all of the data versus most of them just having deltas from some other version; what matters is that each version *appears* to be self-contained. Every version is created as a single atomic action, and it is never modified afterwards, though it may be later deleted (also an atomic action). Every in-DBMS user process, henceforth called "user", has its own concept of the current state of the database, which is one of the depot's versions that is designated a "head". A user's current head is never replaced during the course of the in-DBMS process unless the user explicitly replaces it, such as by either performing an update or requesting to see the latest version (the latter done such as with an explicit "synchronize" control statement). Therefore, each user is highly isolated from all the others, and is guaranteed consistent repeatable reads and no phantoms; they will get repeatable reads until they request otherwise. The framework has no native concept of "nesting transactions" or "savepoints" or explicit "commit" or "rollback" commands. Rather, every single DBMS-performed parent-most multi-update statement (which is the smallest scope where TTM requires the database to be consistent both immediately before and immediately after its execution), is a durable atomic transaction all by itself. The effect of a successful multi-update statement is to both produce a new (durable) version in the depot and to update the executing user's "head" to be that new version (the prior version may then be deleted automatically depending on circumstances); a failed multi-update statement is a no-op for the depot, and the user gets a thrown exception. A depot's versions are arranged in a directed acyclic graph where each version save the oldest cites 1..N other versions as its parents, and conversely each version may have 0..N children. A child version has exactly 1 parent when it was created as the result of executing a multi-update statement in the context of the parent version; the parent version is the pre-update state of the database and the child is the post-update state of the database. A child version has multiple parents when it is the result of merging or serializing the changes of multiple users' statements that ran in parallel. One main purpose of tracking parents like this is for reliable merging of parallel changes, so that the intended semantics of each change can be interpreted correctly, and potential conflicts can be easily detected, and effectively resolved. More on how this works follows below. Note that versions simply have unique identifiers to be referenced with and there is no implied ordering between them if they are generated as serial numbers or using date stamps, though versions with earlier date stamps are given priority in the case of a merge conflict. So a multi-update statement is the only native "transaction" concept, and it is ACID all by itself. Now, the multi-statement "transactions" or concepts of nested transactions or savepoints would all be syntactic sugar over the native concept, and basically involve keeping track of versions prior to the head and optionally making an older one the head. This framework uses the VCS concept of "branching" (which is something that GIT strongly encourages the use of, as GIT makes later "merging" relatively painless) as the native way to manage concurrent autonomous database updates by multiple users. By default, when no users have made any changes to the database, a depot just has a "trunk", and its childmost or only version is called "master"; every database user process' "head" starts off as the "master" version when that process starts. Each (autonomous) user process that wants to update the database will start by creating a new branch off of the trunk, and subsequent versions of theirs will go into that, rather than into the trunk or some other branch. The trunk is shared by all users while each user's branch is just for that user, as their private working space. Note that, unlike a VCS in general where branches can become long-lived and interact with each other independently of the trunk, the framework instead follows the typical needs of an RDBMS, which espouses a single world view as being dominant over any others, and expects that any branches will be very short-lived, not existing for longer than a conceptual "database transaction" would; only the trunk is expected to be long-lived. (This isn't to say that a DBMS can't maintain them long term, but one that acts like a typical RDBMS of today wouldn't.) Note that the final action on a branch that involves merging into the trunk, this would be perceived by all other DBMS users as all of the changes wrought by the branch being a single atomic update, though the user performing it may see several steps. * Flesh out matters related to starting or communicating between multiple autonomous in-DBMS processes, in general, besides the special case about sequence generators. ---------- * Add to Routines_Catalog.pod and other files definitions of any remaining routines, eg String routines, that would be needed so that for all system-defined types all the necessary system-defined routines would exist that are necessary for defining said types, especially their constraint or mapping etc definitions. So in String.pod we need [catenation, repeat, length, has_substr] etc. Also add "is_coprime" or GCD or LCM or etc which are used either in the constraint definition of Rat or in a normalization function for Rat; see also "the Euclidean algorithm" as an efficient way to do the calculations. * Consider adding type introspection routines like: is_incomplete() or is_dh() or is_primitive|structure|reference|enumerated etc. Or don't since one could look that up in the system catalog. But more tests on individual values might be useful, or maybe we have enough already. * Add ext/TAP.pod, which is a partial port of Perl 5's Test::More / Perl 6's Test.pm / David Wheeler's pgTAP to Muldis D; assist users in testing their Muldis D code using TAP protocol. The TAP messages have type Text. ---------- * Add concept of shallowly homogeneous / sh- relation types to complement the deeply version, and named maximal types like SHRelation, SHSet, SHArray, etc to complement the DH/etc, and sh_set_of/etc to complement dh_set_of/etc; but not sh-scalar or sh-tuple as the concept doesn't make sense there. Then update functions like Relation.union/etc to take sh_set_of.Relation rather than set_of.Relation, which more formally defines some of their input constraints. * Consider adding an imperative for-each looping statement; the main question here is whether it should work on any (unordered) relation or just on an Array (in which case it iterates through the tuples in sequence by index); the question is what tasks the for-each would be used for; perhaps both versions are useful; presumably the main reason to have for-each at all is when I/O is involved and some derivative needs to be output either where order matters or where order does not matter; but perhaps only a routine is needed here such as a catenate function plus normal I/O output. The question also is what tasks would an imperative for-each be needed for that functional constructs like the list-processing relational functions can't better be used for those tasks instead. ---------- * Add a round-rule param to rat division, I suppose, since in general we'll need it if we want to maintain a rational radix through every op (+,-,* will already do so when all their args are in the desired radix). * Add explicit support for +/- underflow, +/- overflow, NaNs, etc. I'm inclined to think +/- zero is unnecessary when we have underflow and can be confusing anyway (just a single normal number zero is better). I'm not sure if +/- overflows are useful or if infinities cover them for our purposes. How this would work is that we define a set of scalar singleton types, one for each of the special values. Then we define extended versions of the Int, Rat, etc types where the extended types are defined in terms of being union types that union the regular numeric types with the special singleton type values. This approach also means just one each of +Underflow, -Overflow, etc is needed and is a member of extended Int or Rat etc. Consider using the existing names "Int"/"Rat" with the versions that include these special values, and make new names for the current simpler versions that don't, such as "IntNS" (int no specials), "RatNS", etc. Either way, it is useful to support the full range of values that a Perl 6 numeric can support, or that an IEEE float can support, without users necessarily having to define it themselves. IDEA: Maybe make all normal math/etc ops work with the extended versions (those with NaNs, infinities, etc) and in situations where users don't want those special values they just use a declared type excluding them, and then the normal type constraints will take care of throwing exceptions when one divides by zero for example. * Flesh out Interval.pod to add a complement of functions for comparing multiple intervals in different ways, such as is-subset, is-overlap, is-consecutive, etc, as well as for deriving intervals from a union/intersect/etc of others, as well as for treating intervals as normal relations in some contexts, such as for joining or filtering etc, as well as a function or 3 to do normalization of Interval values. Maybe the type name 'Range' can be used for something. Maybe the type name 'Span' or 'SpanSet' can be used for something; there are Perl modules with those names concerning date ranges. Input is welcome as to what interval-savvy functions Muldis D should have. * Flesh out some window/partition funcs, which are kind of like a generalization of aggregation/reduction functions. A window()/partition() wrapper func is like the summary() wrapper func but it has the same number of output tuples as input ones; when wrapping an agg/reduc func, all output tuples have the same value per tuple in the same group; when wrapping a window/partition-oriented func, such as rank(), each tuple in the group gets or can get a different value. See these: - http://www.postgresql.org/docs/9.0/interactive/tutorial-window.html - http://www.postgresql.org/docs/9.0/interactive/functions-window.html - http://www.postgresql.org/docs/9.0/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS * IN PROGRESS ... Add Bool-resulting relational operators EXISTS and FORALL, that provide "existential quantification" and "universal quantification" respectively, these being useful in constraint definitions. See TTM book p168, pp394-5 for some info on those. Also add analogies to Perl 5's List::MoreUtils operators any(), all(), notall(), none(), true(), false(); some of those may be the same as EXISTS/FORALL. Also add an EXACTLY operator like the Tutorial D language has, and a one() op that is between any() and none(). Maybe some pure boolean ops can be added analogous to the above also; eg any() an alias for or() and all() an alias for and(). is_(any|all|one|none|notall|etc)_of_(restr|semijoin|semidiff|etc) source is any|etc matching|where|etc filter|etc ADD RELATIONAL OPERATORS THAT COMBINE BOOL OPS ADDED IN 0.80.0 WITH RELATIONAL MAP/RESTRICTION/ETC AND ... The new functions are modelled after some in Perl 5's List::MoreUtils module. That is, add prefix ops exactly|all|any|one|none|etc which take a relation and result in True or False depending on what that relation's cardinality is. In some cases, an extra arg is needed: - exactly((s⋉t),n) = (#(s⋉t) = n) - none((s⋉t)) = exactly((s⋉t),0) = !#(s⋉t) - any((s⋉t)) = !exactly((s⋉t),0) = ?#(s⋉t) - all((s⋉t),#s) = exactly((s⋉t),#s) = (#(s⋉t) = #s) - notall((s⋉t),#s) = !exactly((s⋉t),#s) = (#(s⋉t) != #s) - one((s⋉t)) = exactly((s⋉t),1) = (#(s⋉t) = 1) OR MAYBE THESE AREN'T ANY MORE USEFUL THAN THEIR EQUIVALENT EXPRS. * Consider adding sequence generator updaters|procedures in Integer.pod. * Consider adding random value generators for data types other than integer and rational numerics, such as for character strings or binary strings. * Consider analogy to SQL's "[UNION|EXCEPT|INTERSECT] CORRESPONDING BY (attr1,attr2,...)", which is a shorthand for combining projection and union, that takes a list of attributes and unions the projections of those attributes from every input relation; so this means, as with join(), that the input relations don't need to have the same headings. ---------- * In PTMD_STD, consider further changes to how character escape sequences in strings/etc are done. For example, whether the simple escape sequence for each string delimiter char may be used in all kinds of strings (as they are now) or just in strings having the same delim char as is being escaped. * IN PROGRESS ... Update the STD dialects to support inline definition of basic routines (and types?) right in the expressions/etc where they are used, such as filter functions in restriction() invocations, so many common cases look much more like their SQL or Perl counterparts, or for that matter, a functional language's anonymous higher order functions. This syntax would be sugar over an explicit material definition plus a FooRef val selection, which means the inner def effectively is an expression node, and users can choose to name or not name the FooRef selecting node as normal with value expressions. It is expected that the materials could be decl anonymously and names for them (the inn.foo, not the FooRef's lex.foo) would be generated as per inline expression nodes etc. * Further to the previous item, add some special syntax, similar to how one references a parameter to get its argument's value, which can see into the caller's lexical scope. This would be sugar over declaring parameters with the same name and having the caller explicitly pass arguments to it, without having to explicitly write that. Generally this syntax would only be used with inline-declared routines. But similarly, add some special syntax allowing one to essentially just write the body of a routine without having to explicitly write its heading / parameter list, which is useful for routines invoked directly from a host language, where said parameters are attached to host bind variables. Now one still has to say what the expected data type is for these bind variables, but then the explicit syntax for such Muldis D routines is more like that of a SQL statement you plug into DBI or whatever, without the explicit framing. May not work anywhere, but should help where it does. Maybe use $$foo rather than $foo to indicate that the 'foo' wasn't explicitly declared in the current lexical scope and we are referring to the caller or a bind variable. Or rather than $$foo, have something like "(param foo : Bar)" for an expression-inline parameter definition and use, where the part after the "param" has all the same syntax as an actual param list; this is the one for host language bind parameters. Actually that might be useful by itself. Similarly "(caller foo)" would be the look to parent Muldis D lexical scope, or $$foo would just do that maybe, unless this should have an explicit type declaration still. Note, if same inline-declared host param used more than once, you just need "(param foo : Bar)" form once and other uses can just say foo as per usual; in fact, it must be this way. * Consider in all STD adding a new pragma that concerns whether data in delimited character string literals is ASCII or Unicode etc. Example PTMD_STD grammar additions: ? ',' ? str_char_repertoire ? '=>' ? ::= '{' ? [ ? '=>' ? ] ** [? ',' ?] ? '}' ::= all | text | name | cmnt ::= ASCII | Unicode_6.0.0_canon | Unicode_6.0.0_compat Example PTMD_STD code additions: str_char_repertoire => { text => Unicode_6.0.0_canon, name => Unicode_6.0.0_compat, cmnt => Unicode_6.0.0_compat }, str_char_repertoire => { all => ASCII }, Of particular interest is the Unicode canonical vs compatibility, that is NFC|D vs NFKC|D; it is generally recommended such as by the Unicode consortium to use canonical for general data but to use compatibility for things like identifiers or to avoid some kinds of security problems; see http://www.unicode.org/faq/normalization.html. Note that compatibility is a smaller repertoire than canonical, so converting from the latter to the former will lose information. The text|name affect how delimited char strs that are Text|Name are interpreted, and the effects are orthogonal to whether characters are specified literally or in escaped (eg "\c<...>" form); canonical will preserve exactly what is stated (but for normalization to NFD) and compatibility will take what is stated and fold it so semantically same characters become the same codepoints (like as normalizing to NFKD). The suggested usage is compatibility for Name to help avoid security or other problems, and canonical for Text; as for comments, I currently don't know which is better. If ASCII is chosen, the semantics are different; with both Unicode any input is accepted but folded if needed; for ASCII, it is more likely an exception would be raised if there are any codepoints outside the 0..127 range in character strings. The 'all' is a shorthand for giving the same value to all 3 text|name|cmnt and is more likely to occur with ASCII but it might happen otherwise. An additional reason to raise this feature is to setup support for other char sets in future, such as Mojikyo, TRON, GB18030, etc which go beyond Unicode eg no Han-unification (see http://www.jbrowse.com/text/unij.html + http://www.ruby-forum.com/topic/165927) but type system also needs update. * Update HDMD_Perl6_STD.pod considering that a 2010.03.03 P6Syn update eliminated the special 1/2 literal syntax for rats and so now one writes <1/2> instead (no whitespace allowed by the '/'); now 1/2 could still work but now it does so using regular constant folding and so having a higher precedence op nearby affects its interpretation. * Update HDMD_Perl6_STD.pod considering names of Perl collection types, such that "Enum" is the immutable "Pair" and "EnumMap" was renamed from "Mapping", and "FatRat" is now the "Rat" of unlimited size, etc. * Consider using postcircumfix syntax for extracting single relation attrs into Set or Bag etc, meaning wrap_attr; eg "r.@S{a}", "r.@B{a}". Now that might not work for Array extraction, unless done like "(r.@A{a} ordered ...)" or some such, which isn't pure postcircumfix, but that may be for the best anyway. * Consider adding concrete syntax that is shorthand for multiple single-attribute extractions where each goes to a separate named expression node (or variable) but the source is a single collection-typed expr/var. Or the source could be a multiplicity as well, or mix and match. The idea here is to replicate some common idioms in Perl such as "(x, y) = @xy[0,1]" or "(x, y) = %xy{'x','y'}", this being more useful when the source is an anonymous arbitrary expression. Proposed syntax is that, on each side of the "::=" or ":=", the source and target lists are bounded in square brackets, indicating named items assign in order, and syntax for collections supplying/taking multiple items are ident to single-attr accessors (having a ".") but that a list is in the braces/brackets; for example: "[x, y] ::= [3, 4]", "[a, b] ::= t.{a,b}", "[c, d] ::= ary.[3,5]". This syntax would resolve into multiple single-attr accessors when app in system catalog. The assignment variants of the above would naturally fall out the ability to have arbitrary expressions on both sides of the ":=", so what you do is have an array-valued expression on both sides, eg "[x,y] := [y,x]" works because "[...]" is an array literal now. We can overload ".[]" for tuples in general so they extract like projection but return an array rather than a tuple, so we can then say "[a,b] ::= t.[a,b]" or even "t1.[x,y] := t2.[a,b]" to multi-substitute, that being a shorthand for "t1.x := t2.a, t1.y := t2.b". We can't do that for general relations though since the array subtype of rel is using it. This mechanism also provides a general way for a function to have multiple ord retv; eg, "[x,y,z] := foo(...)"; like Perl's "($x,$y,$z) = foo(...)". A variable (or subject-to-update parameter), "bar", may be aliased using "foo ::= bar" such that "foo" is an expr node, but like all named exprs in procedures, "foo" is conceptually reevaluated per mu-statement. Ordered tuples can be used instead of arrays, and in fact might be a better solution for multiple reasons. To do this, just say "%:{x,y,z}" rather than "[x,y,z]"; the former is shorthand for '%:{"0"=>x,"1"=>y,"2"=>z}'. * In PTMD_STD, consider loosening the grammar regarding some of the normal prefix or postfix or infix operators so that rather than mandating whitespace be present between the operators and their arguments, the whitespace is optional where it wouldn't cause a problem. ---------- * Restore the concept of public-vs-private entities directly in sub|depots. * Restore the concept of "topic namespaces" (analogous to SQL DBMS concept of "current database|schema" etc) in some form if not redundant. * Update the system catalog to deal with database users and privileges etc. ---------- * IN PROGRESS ... A Muldis D host|peer language can probably hold lexical-to-them variables whose Muldis D value object is External typed, and so they could effectively pass around an anonymous closure of their own language. Such a value object would be a black box to the host and can't be dumped to Muldis D source code. * IN PROGRESS ... Fully support direct interaction with other languages, mainly either peer Parrot-hosted languages or each host language of the Muldis D implementation. Expand the definition of the "reference" main type category (or if we need to, create a 5th similarly themed main category) so that it is home to all foreign-managed values, which to Muldis D are simply black boxes that Muldis D can pass around routines, store in transient variables, and use as attributes of tuples or relations. These of course can not be stored in a Muldis D depot/database, but they can be kept in transiant values of Muldis D collection types which are held in lexical variables by the peer or host language; that language is then really just using Muldis D as a library of relational functions to organize and transform its own data. We also need to add a top level namespace by which we can reference or invoke the opaque-to-us data types and routines of the peer or host language. This can not go under sys.imp or sys.anything because these are supposed to represent user-defined types and routines, which in a dynamic peer language can appear or disappear or change at a moment's notice, same as in Muldis D; on the other hand, types or routines built-in to the peer/host language that we can assume are as static as sys.std, could go under sys.imp or something. This also doesn't go under fed etc since fed is reserved for data under Muldis D control and only ever contains pure s/t/r types. Presumably this namespace will be subdivided by language analogously to sys.imp or whatever syntax Perl 6 provides for calling out into foreign languages co-hosted on Parrot. Since all foreign values are treated as black boxes by Muldis D, it is assumed that the Muldis D implementation's bindings to the peer/host language will be providing something akin to a simple pointer value, and that it would provide the means to know what foreign values are mutually distinct or implement is_same for them. One thing for certain is that every foreign value is disjoint from every Muldis D value, and by default every foreign value is mutually distinct from every other foreign too, unless identity is overloaded by the foreign, like how Perl 6's .WHICH works. The foreign-access namespace may have a simple catalog variable representing what types and routines it is exposing, but to Muldis D this would be we_may_update=false. * IN PROGRESS ... About External type ... update Perl5_STD and Perl6_STD to add a new selector node kind 'External' which takes any Perl value or object as its payload; this is treated completely as a black box in general within the Muldis D implementation. For matters of identity within the Muldis D envirnment, it works as follows: Under Perl 6, the Perl value's .WHICH result determines its identity. Under Perl 5, if the value is a Perl ref ('ref obj' returns true) then its memory address is used, and this applies to all objects also (since all refs are mutable, this seems to be the safest bet); otherwise ('ref obj' is false) then the value's result in a string context, "obj", is used as the identity; the mem addr and stringification would both be prefixed with some constant to distinguish the 2 that might stringify the same. By default, an External supports no operators but is/not_same. ---------- * Add new "FTS" or "FullTextSearch" extension which provides weighted indexed searching of large Text values in terms of their component tokens, such as what would be considered "words" in human terms. This is what would map to the full text search capabilities that underlying SQL DBMSs may provide, if they are sufficiently similar to each other, or there might be distinct FTS extensions for significantly different ones? * Add new "Perl5Regex" extension which provides full use of the Perl 5 regular expression engine for pattern matching and transliteration of Text values. Maybe the PCRE library can implement this on other foundations than Perl 5 itself if they are sufficiently alike; otherwise we can also have a separate "PCRE" extension. Or the same extension can provide both? * Add new "Perl6Rules" extension which provides full use of the Perl 6 rules engine for pattern matching and transliteration of Text values. * Add new "PGE" or "ParrotGrammarEngine" extension, or whatever an appropriate replacement is, for pattern matching and transliteration of Text values. This and "Perl6Rules" may or may not be sufficiently similar to combine into one extension. * Add functions for splitting strings on separators or catenating them with such to above extensions or to Text.pod as appropriate. Text has one now. * Update or supplement the order-determination function for Text so that it compares whole graphemes (1 grapheme = sequence starting with a base codepoint plus N combining codepoints, or something) as single string elements, rather than say comparing a base char against a combining char. * Add new "Complex" extension which provides the numeric "complex" data types (each expressed as a pair of real numbers with at least 2 possreps like cartesian vs polar) and operators. Note that the SQL standard does not have such data types but both many general languages as well some hardware CPUs natively support them. Probably make "Complex" a mixin type and have the likes of "RatComplex" and "IntComplex" composing it. Note that a complex number over just integers is also called a Gaussian integer. A question to ask is whether a distinct "imaginary" type is useful; some may say it is and Digital Mars' "D" has it, but I don't know if others do. In any event, complex numerics should most likely not be part of the core, even though their candidacy could be considered borderline; for one thing, I would expect that most actual uses of them would work with inexact math. * Add other mathematical extensions, such as ones that add trigonometric functions et al, or ones that deal with hyperreal/hypercomplex/etc types, or ones with variants of the core numeric types that propagate NaNs etc. * Consider adding a sleep() system-service routine, if it would be useful. * Add multiplication and division operators to the Duration types; these would both be dyadic ops where the second op is a Numeric. * Consider adding a Temporal.pod type specific to representing a period in time, maybe simply as an alias for 'interval_of.*Instant' or some such. See also the PGTemporal project and its 'Period' type. * Flesh out "Spatial" extension; provide operators for the spatial data types, maybe overhaul the types. * Consider another dialect that is JSON ... like HDMD in form, but stringy. ---------- * Add one or more files to the distro that are straight PTMD_STD code like for defining a whole depot (as per the above) but instead these files define all the system entities. Or more specifically they define just the interfaces/heads of all the system-defined routines, and they have the complete definitions of all system-defined types, and they declare all the system catalog dbvars/dbcons. In other words these files contain everything that is in the sys.cat dbcon; anything that users can introspect from sys.cat can also be read from these files in the form of PTMD_STD code, more or less. The function of these files is analogous to the Perl 6 Setting files described in the Perl 6 Synopsis 32, except that the Muldis D analogy explicitly does not define the bodies of any built-in routines. An idea is that Muldis D implementations could take these files as is and parse them to populate their sys.cat that users see; of course, the implementations can actually implement the routines/types as they want. Note that although this Muldis D code would be bundled with the spec, it is most likely that the PTMD_STD-written standard impl test suite will not. Note that these files will not go in lib/ but in some other new dir. Note that it is likely any implementation will bundle a clone of these files (suitably integrated as desired) rather than having an actual external dependency on the Muldis::D distro. Note that some explicit comment might be necessary to say there are no licensing restrictions on copying this builtins-interfaces-defining code into Muldis D implementations, or maybe no comment is necessary. Probably a good precedent is to look at what legalities concern existing tutorial/etc books that have sample code. * Create another distribution, maybe called Muldis::D::Validator, which consists essentially of just a t/ directory holding a large number of files that are straight PTMD_STD code, and that emit the TAP protocol when executed. The structure and purpose of this collection is essentially identical to the official Perl 6 test suite. A valid Muldis D implementation could conceivably be defined as any interpreter which runs this test suite correctly. This new distro would be a "testing requires" external dependency of both Muldis::Rosetta and any Parrot-hosted language or other implementation, though conceivably either could bundle a clone of Muldis::D::Validator rather than having an actual external dependency. This test suite would be LGPL licensed. This new distribution would have a version number that is of X.Y.Z format like Muldis::D itself, where the X.Y part always matches that of the Muldis D spec that it is testing compliance with, while the .Z always starts at zero and increments independently of the Muldis D spec, as often there may be multiple updates to ::Validator for awhile between releases of the language spec, and also since .Z updates in the language spec only indicate bug fixes and shouldn't constitute a change to the spec from the point of view of ::Validator. ---------- * Whatever else needs doing, such as, fixing bugs.