# Generated by getTODO.pm on Tue Nov 20 00:03:09 2007 # for Genezzo Version 0.72 - Alpha 20070626 Some general TODO categories: APIs: See if can embed genezzo in apache. real DBI support (DBI::Genezzo) web-based management console Need to fix quoting to behave consistently in strings, literals, and export functions. (Need test) Missing SQL features: Binds, sorting/aggregation, subquery support, views, explain plan Multiuser support issues: transactions, logging, recovery, shared memory buffer cache exclusive table locks first, then read share/write exclusive, then row locks users/roles, sessions, schemas, tablespaces, authentication query optimization: Rule/Cost-based optimization costing by index probe fancier functionality: btrees with overflow blocks for long keys block migration block-level predicate pushdown, aggregate pushdown yaml datatype support antlr parser file encryption, row compression MLS: multi level security parallel/distributed operation, replication, scalability, fault-tolerance user-defined functions, indexes, datatypes non-blocking aggregation based upon count estimation unicode support error messages space management: lehman and yao "efficient locking for concurrent operations on b-trees" ACM TODS v6, #4, Dec 81, pp 650-670. SCN/LSN block header information freelists, extent headers Per-file TODO breakdown follows: TODO lib/Genezzo/BasicHelp.pm convert hashes back to pod hyperlink "SEE ALSO" headings TAG search hierarchical topic groups and searches TODO lib/Genezzo/Block/RDBlkA.pm HSplice: offset calculation must match offset2hkey in RDBlock. Special handling needed if inherited by RDBlk_NN? TODO lib/Genezzo/Block/RDBlk_NN.pm build simple test cases build complex test cases test thoroughly packdeleted: make this work. It's broken! integration with bt2 - need to packdelete in bsplit, do null checks in leaf blocks (branch blocks should be ok) need a validation function to ensure that block maintains invariant: small number of leading metadata rows starting at row zero, followed by data rows (deletes ok). Easier to support non-split rows initially, but should be able to support head rows (need mods to splice functions to preserve rowstats for this case). need to modify metadata methods so all metadata created in first n rows. could simply have delete really delete the rows, so no changes necessary for rdblock clients (i.e., no "null rows" generated). TODO lib/Genezzo/Block/RDBlock.pm use row directory rowlen vs len/value for row storage meta row - should binary search for meta id unicode support TODO lib/Genezzo/Block/Std.pm Support for completely variable block headers TODO lib/Genezzo/Block/Util.pm Support for completely variable block headers TODO lib/Genezzo/BufCa/BCFile.pm note that _fileread could just be part of GetContrib need to move TSExtendFile functionality here if want to overload syswrite with encryption read_only database support buffer cache block zero should contain description of buffer cache layout need a way to free blocks associated with a file that is not currently in use TODO lib/Genezzo/BufCa/BufCaElt.pm Deprecate GetInfo, convert to GetContrib. Switch syshook methods to use _BCE_dirtyhook get fileno, blockno info deal with multiple pins on same block sanely. We shouldn't be maintaining a reference count scheme here. Shouldn't pin be <= 1, and the destroy cb should set it to zero when last reference is garbage collected? TODO lib/Genezzo/BufCa/DirtyScalar.pm Deprecate SetBCE: can shift responsibility and functionality to storeCB which will contain a hook, versus directly overloading STORE here. TODO lib/Genezzo/Dict.pm pref1 - distinguish fixed/mutable parameters cons1 - distinguish user constraint names from system-defined names IDXTAB indexed tables don't give a constraint error, or primary key error. They don't have constraints because they are themselves indexes. Need to give better error message. Fix t/Cons1 constraint error DictTableAllTab: need index on allfileused for delete DictTableAllTab: update tsfiles for usefile need some combo _get_table/corecolnum/getcol - create a custom iterator that returns specified cols non-unique index support using bt2 use_keycount. Need to separate notion of SQL uniqueness from btree concept of unique, since a non-unique SQL index is a unique btree with the rid as least-significant key col (vs rid as value col). need drop table/drop index linkage, delete constraints for table, etc constraints: can fix check constraint in update case -- don't need to check insert if check columns aren't modified. constraints: need not null/foreign key constraints constraints: need to limit one primary key per table, prevent creation of duplicate indexes on same ordered key columns expose drop index, drop constraint. tie drop index/drop table? check usage of HCount for max tid, max fileidx, max consid. This won't work if have deletions DictTableUseFile: update space management to use this function correctly DictDefineCoreTabs, tsfiles: need to save file headersize as a tsfile column. deal with dict->{headersize} attribute in some rational way. Currently set via tablespace->TSAddFile... TODO lib/Genezzo/GenDBI.pm SPOOL: options to remove "prompt> " from output files Feeble/SQL: fix DESCribe to handle quoted identifiers. TABLESPACE: alter, drop, online, offline, more testing... This module is a bit of a catch-all, since it contains a DBI-style interface, an interactive loop with an interpreter and some presentation code, plus some expression evaluation and query planning logic. It needs to get split up. SQLselprep_Algebra: move to XEval SQLAlter: need And purity check SQLUpdate: cleanup - avoid generating new SELECT. Allow regexp update. SQLCreate: need to handle CREATE TABLE AS SELECT, table/column constraints, etc. TODO lib/Genezzo/Havok.pm extension to support CPAN install via HavokUse use real YAML vs "fake" yaml documents Create dictionary initialization havok (vs post-startup havok) Need some type of first-time registration function. For example, if your extension module needs to install new dictionary tables. Probably can add arg to havokinit, and add a flag to havok table to track init status. Safety/Security: could load modules using Safe package to restrict their access (not a perfect solution). May also want to construct a dictionary wrapper to restrict dictionary capabilities for certain clients, e.g. let a package read, but not update, certain dictionary tables. Force Init/ReInit when new package is loaded. update module flags if necessary, handle cleanup use something like Sub::Install, Sub::Installer, or Hook::WrapSub to redefine the subroutines in SysHook, etc. TODO lib/Genezzo/Havok/SysHelp.pm unload/reload help as well TODO lib/Genezzo/Havok/SysHook.pm should be able to dynamically create hook vars, versus using existing "our" vars. should we do something smart on dictionary shutdown, like unload hooks? Or have a clever way to re-init and reload a hook? TODO lib/Genezzo/Havok/UserExtend.pm Need to fix "import" mechanism so can load specific functions into Genezzo::GenDBI namespace, versus creating stub functions. Use "import" and "export_to_level". Could just load Acme::Everything and we'd be done... Need function "type" information so can validate argument lists, determine return type of function TODO lib/Genezzo/Havok/UserFunctions.pm use "sqlname" and "typecheck" attributes in user_functions table Need to fix "import" mechanism so can load specific functions into Genezzo::GenDBI namespace, versus creating stub functions. Use "import" and "export_to_level". Could just load Acme::Everything and we'd be done... Need function "type" information so can validate argument lists, determine return type of function. If pass named args, have "TypeCheck" and "Execute" modes for sql_function. Or have typecheck function pass back name/ref to execute function, since it may change depending on argument types. TODO lib/Genezzo/Index/bt2.pm hkey/offset functions: should be able to convert between different "place" formats (Array and Hash prefixes), like the common fetch routine, or ASSERT that prefix matches. add reverse scan to search/SQLFetch support multicol keys, non-unique keys (via combo of key + rid as unique) support transaction unique constraints -- probably via treat key+rid as unique, then turn on true unique key, and scan for duplicates? find out why can't do pctfree=0 Work on RDBlk_NN support. search with startkey/stopkey support, vs supplying compare/equal methods. restricting the search api to straight "=","<" comparisons means can try the estimation function need to handle partial startkey/stopkey comparison in searchR/SQLFetch for multi-col keys semantics of nulls in multi-col keys -- sort low? simplify _pack_row with splice and a supplied split position, something like -1 for normal indexes (n-1 key cols, 1 val col, so pop the val) or "N=?" for index-organized tables (N key cols, M val cols, so splice N) reorganize along the lines of "GiST" Generalized Search Trees (Paul Aoki, J. Hellerstein, UCB) ecount support? TODO lib/Genezzo/Index/bt3.pm new: maybe a way to get blocksize from rstab/rsfile and pass to bt2, versus passing it to each layer separately getMainMeta from first block of tied hash, but no guarantee that space management is nice enough to return blocks in allocation order. Should store block address of leftmost leaf in index table. spacecheck: space cache should simply be free extents allocated to the index. Need to extend smfile to have multiple free extents in spacelist, vs just used extents. Note still an issue for simultaneous inserts -- need lots of space for pathological case where each parallel insert splits a separate subtree. That's why transactions were invented. TODO lib/Genezzo/Index/btHash.pm figure out whether should be a pushhash, hash, or rowsource SQLPrepare/Execute/Fetch: clean up. Shouldn't need to manage a distinction between using btHash as a row source and the old bt2 api. bt2 is wrong - should only have one Fetch style. Should be able to use the index start/stop key vs filtering. NEXTKEY: broken in "dump tsidx" for case where create 2 tables, insert some rows, then drop the first table (and don't COMMIT) and call dump tsidx. Loops in NEXTKEY - never terminates for allfileused index. Add ReadOnly mode so can view indexes, but not insert/update/delete. TODO lib/Genezzo/Parse/SQL.pm alter table (elcaro MODIFY column NOT NULL) vs (sql3 ALTER COLUMN)... Support for DDL, ANSI Interval, Date, Timestamp, etc. fix the extra array deref in join rules error messages everywhere ECOUNT reserved word issues TRIM, UPPER, etc in standard function list? use of negative lookahead in reserved_word regex? table constraint, storage clause constraint attributes - deferrable, disable delete cascade referential action maybe can collapse qualified join with qj_leftop? table expr optional column list "system" literals like USER, SYSDATE better separation of strings and numbers (see concatenate) leading NOT double colon in function names? TODO lib/Genezzo/Plan.pm SQLWhere2: need to allow rownum in where clause, which means we need a rownum rowsource [select * from dual where rownum < 10; ] update pod TODO lib/Genezzo/Plan/MakeAlgebra.pm need additional work for non-query operations/special cases TODO lib/Genezzo/Plan/QueryRewrite.pm check for function existance in GenDBI and main namespaces update pod need to handle FROM clause subqueries -- some tricky column type issues. check bool_op - AND purity if no OR's. check relational operator (comp_op, relop) handle ddl/dml (create, insert, delete etc with embedded queries) by checking for query_block info -- look for hash with 'query_block' before attempting table/col resolution. Need special type checking for these functions. refactor to common TreeWalker _process_name_pieces: quoted string/case-insensitivity handle all pseudo cols most value expression stuff needs to migrate to XEval TODO lib/Genezzo/Plan/TypeCheck.pm need to generate stages to perform aggregate initialization and intermediate aggregation check for aggregates in WHERE clause check for GROUPing/aggregates check for final select list columns vs all projected columns in all clauses check args for all functions check for function existance in GenDBI and main namespaces update pod need to handle FROM clause subqueries -- some tricky column type issues. check for duplicate aliases/type mismatch in _FROM_subq_star_fixup ? check bool_op - AND purity if no OR's. check relational operator (comp_op, relop) handle ddl/dml (create, insert, delete etc with embedded queries) by checking for query_block info -- look for hash with 'query_block' before attempting table/col resolution. Need special type checking for these functions. refactor to common TreeWalker handle all pseudo cols most value expression stuff needs to migrate to XEval TODO lib/Genezzo/PushHash/HPHRowBlk.pm fix synopsis TODO lib/Genezzo/Row/RSExpr.pm SQLPrepare/SQLFetch: requires ALIAS argument, which doesn't make sense for rowsources like RSDual (see XEval). "Alias" is only necessary to disambiguate named columns. TODO lib/Genezzo/Row/RSFile.pm need error handlers vs "whisper" TODO lib/Genezzo/Row/RSIdx1.pm HSuck: FirstCount/NextCount: do real estimate vs fake should pass leftmost blockno explicitly versus rely on RSTab FIRSTKEY rectify some overlap between btHash and this module could encode multiple column key into single col rid using MIME::Base64 encode of a packed row. should check dependency for perl 5.6 and add to Makefile.PL. TODO lib/Genezzo/Row/RSJoinA.pm build nested-loop, sort-merge, hash join TODO lib/Genezzo/Row/RSTab.pm rownum filter support to move to separate package $href: remove - need a dict function to return allfileused via tso HSuck: need a way to specify packing method HSuck: fix trailing zero replacement NextCount: fix quitloop localPush/Store: qualify length packstr as percentage of blocksize (1/3?) localStore: race condition on rowstat localFetchDelete: frag flag info, delete status. Could express this function as a generalized "RowSplice" (as distinct from RDBlkA::HSplice, which is a block splice operator). Would need be able to splice based upon column number/array offset, as well as substring byte offset -- the inverse functionality of PackRow2/HSuck DBI - support Bind and projection (returning only certain specified columns, versus all columns) _init: change to use TSTableAFU support versus href->{filesused} need support for constraints that "mutate" supplied values, e.g. manipulate numeric precision or supply default values for columns. Also need support for foreign keys in delete. TODO lib/Genezzo/SpaceMan/SMExtent.pm need to coalesce adjacent free extents maintain multiple free lists for performance better indexing scheme - maybe a btree TODO lib/Genezzo/SpaceMan/SMFile.pm read_only database support support for non-table objects like indexes - done? freetable: when last object is freed, need to update _tsfiles as UNUSED need to coalesce adjacent free extents maintain multiple free lists for performance better indexing scheme - maybe a btree chain the block header if necessary -- allocate a new block to hold additional free list information, append extent allocation to HEADER row (after 0:1) check status everywhere where update rows maintain free extents list for each object, so can re-use extents (especially important for updates of large multi-block rows) TODO lib/Genezzo/SpaceMan/SMFreeBlock.pm Under Construction TODO lib/Genezzo/SpaceMan/SMHook.pm better error handling better error handling TODO lib/Genezzo/TSHash.pm SQLFetch: need to handle get_col_alias for filter? TODO lib/Genezzo/Tablespace.pm filearr, used, unused: should match dict _tsfiles fileidx - done 3.21? notion of buffercache associated the tablespace object -- possible multiple active bc's, with different characteristics/semantics, e.g. a bc for temp space with different blocksize, lacking txn recovery? Need to guarantee that all clients of a tso use the same bc for consistency/locking/txn support use compatibility matrix to drive automatic upgrade capability TODO lib/Genezzo/TestSQL.pm stuff TODO lib/Genezzo/TestSetup.pm stuff TODO lib/Genezzo/Util.pm Should bundle all data file utility functions, such as FileGetHeaderInfo, SetHeaderInfo, etc, under separate Util::DataFile module FileGetHeaderInfo: need to handle case of header which exceeds a single block. Probably should keep increasing the buffer size until find null terminator (within reason). packrow: store metadata in col0 vs trailing col with next ptr packrow: check pack format for a zero len row of zero cols. Does it need a nullvec? packrow/unpackrow: in Perl 5.8 could use the nifty repeating templates to our advantage. packrow: could generate skiplists as col zero metadata tracking byte position and column numbers to speed lookups TODO lib/Genezzo/XEval.pm Should become more of a dispatch routine, with major guts for each function stashed in separate modules under XEval. SQLAlter, SQLInsert: move type checking to TypeCheck module. TODO lib/Genezzo/XEval/Prepare.pm sql_where: function name processing -- drive from user_function, use type-checking functions. update pod need to handle FROM clause subqueries -- some tricky column type issues. explode STARs with column names - need consistent join table position check bool_op - AND purity if no OR's. check relational operator (comp_op, relop) handle ddl/dml (create, insert, delete etc with embedded queries) by checking for query_block info -- look for hash with 'query_block' before attempting table/col resolution. Need special type checking for these functions. refactor to common TreeWalker TODO lib/Genezzo/XEval/SQLAlter.pm drop constraint TODO lib/Genezzo/genexp.pl move most methods to separate .pm file need to distinguish "dictionary" havok routines vs post-dictionary havok tables