field/column tuple -> row relation -> rowset ---- SQL columns are ordered, they don't need names, they can have duplicate names ---- This nature of SQL is a source of many problems ---- Attributes of a tuple/relation must all have distinct names, are not ordered ---- A relational heading represents a predicate, the heading's attributes represent free variables of the predicate ---- The body of a relation represents the set of true propositions that can be formed from the predicate represented by the relation's heading ---- The body of a tuple with the same heading provides attribute values to instantiate the predicate into a proposition by substituting each of its free variables ---- When a tuple appears in a relation body, the proposition it represents is deemed to be true ---- For every compatible tuple that does not appear in the relation body, its proposition is deemed to be false ---- This assumption is known as the "closed world assumption" ---- SQL is bag oriented, allows duplicate rows in a rowset ---- A relation never has duplicate tuples, as they are meaningless ---- Saying the same thing twice doesn't make it more true ---- A "D" relation is, by definition, *always* in at least first normal form ---- SQL's bag nature is one of its fundamental problems ---- When duplicate rows exist or are produced by operators, distinct queries that look equivalent give different answers ---- For example, think of the same conceptual SQL query written with a join versus with an "IN" subquery ---- Duplicates cause a lot of bugs, and are a scourge to optimizers ---- In a set-oriented language, it is safe to make many optimizations you can't with bags ---- Muldis D is always set oriented, so results are more predictable, easier to auto-optimize ---- If duplicate rows in SQL are significant to you, such as to indicate a quantity ... ---- Then you should add a 'quantity' column instead ---- The relational model says data is operated on using a relational calculus or algebra, those being equivalent ---- Relational algebra, an offshoot of first-order logic, is a set of relations closed under operators; each operator takes N relations and results in a relation ---- (The relational operators in Muldis D are detailed later) ---- Tuples, relations of degree zero are interesting and useful, and Muldis D has them ---- Zero-attribute relations correspond to predicates with no free variables ---- One can consist of just zero tuples or one tuple ---- They are identity values for relational operations like the numbers 0 and 1 are for number ops ---- They are also useful in various other ways ---- SQL does not permit a row or rowset to have zero fields/columns ---- That's like trying to do math without using a zero ---- Data types and the relational model are orthogonal ---- An attribute value may be a tuple or a relation; not just a scalar ---- All of the same relational operators work with relations in attributes, not just relvars ---- Consistency is good ---- This is a very powerful feature generally missing from SQL where only scalars allowed ---- An RVA means a single query can return parent, child tuples together, without redundancy ---- Child tuples are in an RVA of parent tuples ---- Similarly, you don't have to use association tables for a many-to-many relationship ---- You don't have to split up a relvar into several for normalization ---- The use of TVAs, RVAs does *not* violate any normalization forms ---- So about a 2VL alternative to SQL nulls? ---- Use RVAs as the conceptually nullable attributes ---- If the RVA has zero tuples, that means missing or N/A, meaning IS NULL ---- If the RVA has a tuple, that means IS NOT NULL; the tuple attr val is the known value ---- This approach brings consistency for outer-joins ---- Result attributes from the possibly missing tuples are RVAs with either zero or 1+ tuples ---- So a test for IS (NOT) NULL becomes a test for is the RVA empty or not ---- Of course, an implementation can recognize such cases and optimize BTS ---- A Muldis D implementation over SQL would probably map nullable to RVA ---- So what is a relational database? ---- It is a set of relation-typed variables, or "relvars" ---- Being composed of relations is what gives a relational DB that name ---- Or at a different abstraction level, it's a tuple of relation-valued attributes ---- A variable of that is called a "dbvar" ---- SQL's analogy to a relvar is a "table" ---- High Level Design ---- Muldis D is a Turing / computationally complete language with fully integrated database functionality ---- It can perform any operation that a typical general purpose language is capable of ---- It has operators, values, variables, conditionals, repetition, and system input/output ---- You *can* write an entire app in just Muldis D ---- But typically you won't ... ---- Typical use of Muldis D is like what is typical with SQL ---- You write DB facing parts of apps in Muldis D, and the rest in something else ---- But Muldis D is based on the concept of a DB embedded in your dev environment ---- The DBMS doesn't have to be some separate program your app talks to ---- AFAIK, that was a mistaken trend that COBOL started ---- Rather it is like a module in your app that provides relational ops for data munging ---- As well as a state managing solution based on relation variables ---- Its up to the implementation whether the DBMS is embedded or a separate process ---- But you write code to it the same way regardless, like local versus remote procedure calls ---- When you write Muldis D queries, you are writing named routines to invoke later ---- (More later about virtual machines and depots) ---- The paradigm of Muldis D is a mixture of ... ---- 1. Declarative ---- Your code says *what* to do rather than *how* to do it ---- Provides a clear distinction between model and implementation ---- A database structure is logical, not physical ---- Don't define abstract machine in terms of what system "really does" BTS ---- 2. Homoiconic ---- Muldis D code is represented as data structures manipulatable in Muldis D ---- (More on that shortly) ---- 3. Functional ---- Most Muldis D code takes the form of pure functional value expressions ---- Also, Muldis D uses closures or higher order functions to an extent ---- 4. Imperative ---- The top levels of a Muldis D program are imperative ---- 5. Object Oriented ---- Sub/super-typing, substitutability, orthogonality, user-defined types ---- System Catalog ---- Muldis D provides a thorough means to both introspect and define all DBMS entities using just data manipulation operators ---- The DBMS catalog is a set of system-defined relvars which reflect the definitions of DBMS entities; update them to create, alter, or drop DBMS entities ---- In fact, updating the catalog is the fundamental way to do data-definition in Muldis D; any other provisions for DD are just abstractions of this ---- Generally speaking, users can do absolutely everything in the DBMS with just data querying and updating operations ---- The SQL analogy to the Muldis D catalog is the "information schema", but you can't update those SQL tables ---- The Muldis D system catalog is fully decomposed like an AST, including stored routines and type defs ---- This nature is the main part of what makes Muldis D homoiconic ---- By contrast, most of a SQL information schema consists of strings of SQL code, not decomposed ---- Muldis D thereby has a huge advantage over SQL ... it is very easy to build or parse Muldis D code ---- You conceptually just manipulate tree nodes ... though with relational operators ---- This is in contrast to SQL where you have to parse or stitch character strings ---- Your individual data points are separate values in Muldis D code, you don't have to stitch into strings ---- SQL injection attacks have no frequently exploitable analogy here! ---- Depots ---- A depot is a local abstraction of a typically external storage system which holds 1 dbvar and 1 associated catalog ---- Plus maybe other details that help mapping of the abstraction to the actuality ---- All user-defined non-lexical code and data lives in 1+ depots; those are generally persisted ---- A depot can also have just code; then it is essentially a dynamically loaded library. ---- A depot is a completely independent unit ---- All types or constraints defining a dbvar or routine are always kept with it ---- A depot can't have external dependencies to understand its contents, implementing DBMS aside ---- There can't be any inter-depot constraints, but depots can call each others routines ---- A DBMS can mount multiple depots at once, under their own namespaces, like a filesystem ---- You can perform cross-database queries or updates on multiple depots by mounting them at once ---- They are like one database during that time, definition isolation aside ---- So cloning a database is essentially just a variable assignment ---- A transaction will subjugate all mounted depots together, so they commit or rollback as a unit ---- You can't unmount a depot while a transaction is active ---- A depot can be mounted read-only ---- A depot can be persistent or transient ---- Database and application vars treated the same, all in depots used the same way ---- Authentication details like DB user/pass apply per-depot mount, not on DBMS ---- A relational DBMS is a program that manages relational dbvars, associated catalogs, and depots in general ---- A Muldis D DBMS is a virtual machine ---- Transactions ---- All "D" languages are ACID compliant and support arbitrary depth nested transactions ---- You can start a transaction within another one ---- This lets code blocks make themselves atomic without worry of how calling code uses transactions ---- Only a parent-most trans commit will cause actual commit ---- Child transactions make it easy to rollback just part of the overall transaction ---- Muldis D transactions are always explicitly tied to lexical scopes ---- They are also tied to the exception control-flow mechanism ---- When you enter a "try" block, that begins a new child transaction ---- If the "try" block executes to its end normally, its transaction commits ---- If the "try" block abnormally ends with an exception (that it catches), its transaction rolls back ---- So transactions are easy to get right even when a block has multiple exit points, or throws exceptions ---- No worry about mis-matched transaction start/end statement calls ---- SQL savepoints can emulate child transactions, but can be more difficult to use ---- In Muldis D, all kinds of changes are subject to transactions, including DD ---- So schema changes due to updating the system catalog can also be rolled back ---- SQL DBMSs mutually differ on this ---- SQLite subjugates everything to transactions, including DD, which is right ---- MySQL will do an implicit commit of prior data changes if you, say, create a table; bad ---- Muldis D requires everything to be explicit ---- An implementation of Muldis D over such as MySQL will have to fail what it can't support ---- There, data definition should only be allowed outside of a transaction ---- All "D" languages support multiple assignment, to simultaneously update multiple variables ---- The whole of this multi-update is implicitly atomic, an implicit child / only transaction ---- For example, you can insert parent and child tuples in diff relvars simultaneously ---- Or debit one bank account and credit another simultaneously ---- Database constraints are tested on multi-update statement boundaries ---- All constraints are immediate, can't defer to end of transaction ---- So database is never seen in an invalid state ---- There is *not* "domain check override" in a "D" language ---- It is the job of a DBMS to ensure its data always obeys its constraints or business rules ---- You can't count on the main app to check everything ---- There might be new applications sharing a DB that have different quality control ---- Much as you should never trust the user ---- Thanks to multi-update, you can be strict without losing flexability ---- A Muldis D DBMS *must* rollback on disconnect; only an explicit success can commit ---- Many SQL DBMSs do that too; some will auto-commit an incomplete transac; AFAIK, so does ODBC ---- Implicit commit on disconnect is bad, making a DBMS unreliable with any unexpected failure ---- Dialects ---- Muldis D code has several layers of abstraction ---- Inner Abstraction ---- You write code by writing data; **** This is homoiconic ---- Conceptually, you make ASTs, but actually tuples and relations ---- Code becomes runnable as side-effect of insertion in sys catalog ---- There is no concrete syntax, only abstract syntax ---- Code is somewhat verbose, as typical for ASTs ---- High determinism extremely important ---- Muldis D is not DWIM, where DWIM means guessing ---- Code has exact same semantics everywhere ---- Traditional implicit semantic diffs made explicit choices ---- Semantics with numeric rounding made explicit ---- Any logical difference is a big difference ---- Strongly typed, is type safe ---- No implicit / automatic sideways type conversion / coercion ---- For example, converting char data to a number must be explicit ---- [less organized notes follow, to flesh out] ---- Entity names are case sensitive, can be anything, even empty str ---- Like most languages, or like quoted SQL identifiers ---- Invoke entities fully qualified with namespaces ---- No unqualified name searches, prevents ambiguity, bugs ---- System-defined and user defined entities in separate namespaces ---- No reserved words; they aren't needed ---- Character data is always case, accent, whitespace sensitive ---- If you want insensitive, explicitly fold case, etc ---- Numbers are big numbers ---- No multiplicity of types like in SQL where essentially the same ---- A selector is a routine that captures an appearance of a value for use in a variable or expression ---- The term constructor is not used because all values in Muldis D are conceptually eternal and immutable ---- So it does not make sense to say that we are "building" one; we are "selecting" one ---- [content goes here] ---- Copyright © 2008, Darren Duncan ---- http://www.muldis.com for email, see above ---- This slideshow text is free documentation for software; you can redistribute it and/or modify it under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation (L); either version 3 of the License, or (at your option) any later version. ---- END ]]>