The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=head1 NAME

Base - SMOP basic structures

=head1 REVISION

$Id$

=head1 SMOP__Object

In SMOP, every single value must be binary-compatible with the 
C<SMOP__Object> struct.  This even includes core level constructs such as
the interpreter and the native types.  This idea comes directly from 
how perl5 works, with the SV struct.

Unlike p5, however, the C<SMOP__Object> struct is absolutely minimalist; It
defines no type, no flags, and no introspection information.  It defines 
only that every C<SMOP__Object> has a "responder interface" (C<.RI>), so the 
structure is merely:

  struct SMOP__Object {
    SMOP__ResponderInterface* RI;
    /* Maybe there is something here, maybe there is nothing here.
     * Only the responder interface knows.
     */
  }

The value in the C<.RI> member is not unique to the object.  For all but 
singleton classes, one responder interface will be used by multiple object 
structs.  As such, the object is identified only by the memory address at 
which the struct C<SMOP__Object> is stored.

This means that you can't really do anything to the object yourself, you
can only talk to its responder interface.  The object serves as both
a way to find the correct responder interface, and a way to tell the 
responder interface which instance data to operate on -- and that is all.

There may be additional data below the C<.RI> member, but if so,
only the responder interface knows how to use it.  The data for the
object instance may, in fact, B<not> be stored in the structure at all --
it could be looked up using the object's address in a completely 
separate data store.

As such, it is incorrect to attempt to copy or move a C<SMOP_Object> struct 
using a simple memory copy like C's memcpy().  Even if you lucked out and
got all the data in the object, you would have changed its address,
and it would not be the same object anymore.  This point is especially
important to note if an object may exist in multiple address spaces -- only 
one address will be valid without special handling.

=head1 SMOP__ResponderInterface

The responder interface (which, of course, is also binary-compatible 
with C<SMOP__Object>) implements the low-level part of the meta object 
protocol.  It is through the responder interface that you can perform 
actions on the object.

Using the responder interface, arbitrary methods may be invoked on
the object. It's important to realize that this method invocation 
happens at the same level that any high-level language might call.
This means that there's no distinction between native operators and 
high-level operators, nor between native values and high-level values.

The structure of a responder interface is as follows:

  struct SMOP__ResponderInterface {
    SMOP__ResponderInterface* RI;
    SMOP__Object* (*MESSAGE)  (SMOP__Object* interpreter,
                               SMOP__ResponderInterface* self,
                               SMOP__Object* identifier,
                               SMOP__Object* capture);
    SMOP__Object* (*REFERENCE)(SMOP__Object* interpreter,
                               SMOP__ResponderInterface* self,
                               SMOP__Object* object);
    SMOP__Object* (*RELEASE)  (SMOP__Object* interpreter,
                               SMOP__ResponderInterface* self,
                               SMOP__Object* object);
    SMOP__Object* (*WEAKREF)  (SMOP__Object* interpreter,
                               SMOP__ResponderInterface* self,
                               SMOP__Object* object);
    char* id;
    /* Maybe there is something here, maybe there is nothing here.
     * Only the responder interface in member .RI knows.
     */
  }

However, the SMOP base defines a few macros that should be used when
interacting with SMOP Objects.  While in theory, the use of those
macros is optional, it's strongly advised that you stick with them, to
make transitions to newer versions easier.

As such, each of the function hooks defined in the above structure
will be described along with the macros which should be used to access
them.

=over

=item macro SMOP_DISPATCH

    SMOP_DISPATCH(interpreter, object, identifier, capture)

This macro (and all its parameters) correspond with the C<MESSAGE> function 
hook member.  This is the function that handles method invocation for 
the objects which this responder interface oversees:

  SMOP__Object* (*MESSAGE)  (
      SMOP__Object* interpreter,      /* gets interpreter */
      SMOP__ResponderInterface* self, /* gets (responder) object */
      SMOP__Object* identifier,       /* gets identifier */
      SMOP__Object* capture           /* gets capture (instance object inside) */
  );

As you might have noticed, it receives objects as arguments and returns, of 
course, an object.

C<SMOP_DISPATCH> uses the C<.MESSAGE> function in the responder found at 
C<object> to invoke a method with a name found at C<identifier>.  It invokes 
that method in the context of the interpeter found at C<interpreter> using 
the capture found at C<capture> to pass data to the method's parameters.

Each of these macro arguments are expanded upon in other documentation, however, 
you may notice that something appears to be missing.  Methods usually have 
an "invocant" -- which would be a C<SMOP__Object> that was used to find the 
responder that is being pointed to in C<object> above.  If there is one, it 
is tucked away inside the capture.


=item macros SMOP_REFERENCE and SMOP_RELEASE

    SMOP_REFERENCE(interpreter, object)
    SMOP_RELEASE(interpreter, object)

C<SMOP_REFERENCE> and C<SMOP_RELEASE> call, respectively, the C<.REFERENCE> and
C<.RELEASE> functions in a responder interface.  The responder interface
used is the one that is pointed to by the C<.RI> member of the object structure 
pointed to by C<object>.  The C<object> pointer itself is also passed to the 
REFERENCE or RELEASE function:

  SMOP__Object* (*REFERENCE)(
      SMOP__Object* interpreter,       /* gets interpreter */
      SMOP__ResponderInterface* self,  /* gets the RI member found at object */
      SMOP__Object* object             /* gets object itself */
  );

These functions increment or decrement the reference count of C<object> in 
the context of C<interpreter>.  The reference count is used to handle 
automatic cleanup of objects when they are no longer needed -- more on this 
subject later.

The macros both return the same value that was passed into C<object>, so you 
can use the macro in most places where you would use an object pointer, much 
like you would use C<i++> to postincrement an integer in-place.  This is handy 
in keeping code terse, but take care, you should do nothing like 
C<SMOP_RELEASE(interp,current++)> nor C<SMOP_RELEASE(interp,current)++> when 
working with arrays of objects.

=item macro SMOP_WEAKREF 

    SMOP_WEAKREF(interpreter, object)

C<SMOP_WEAKREF> calls the C<.WEAKREF> function in a responder interface.  It
works much the same way as the C<SMOP_REFERENCE> macro, above.

  SMOP__Object* (*WEAKREF)  (
      SMOP__Object* interpreter,       /* gets interpreter */
      SMOP__ResponderInterface* self,  /* gets the RI member found at object */
      SMOP__Object* object             /* gets object itself */
  );

C<SMOP_WEAKREF> can be used wherever you would normally use C<SMOP_REFERENCE> 
to obtain a "weak reference" instead.  This call is allowed to return you a
different object than the one you point to with C<object>, and you are supposed 
to use that as a proxy.  Weak references do not count as a reference 
against the original C<object> for the purposes of garbage collection.

This means that the original object may be freed before the weak reference itself
is destroyed.  If this happens, the weak reference will start to refer
to some appropriate constant (like C<False>) instead of the now-dead object.

The implementation of the weak-reference is private to each responder
interface's implementation, so the exact behavior may vary depending on the 
kind of objects you are working with.  Especially, note that if an object 
does not actually need to be reference counted, a weak reference may end 
up returning the original object, so you are not allowed to assume the 
macro will always return a different pointer than the one passed via C<object>.

Note that a weak reference is itself an object.  So you do still need to
call C<SMOP_RELEASE> on it when you are done with it.  (It isn't provided
just to help us be lazy.)  However, all C<SMOP_REFERENCE> and C<SMOP_RELEASE> 
calls on the weak reference object count references to the the proxy object, 
not the original object.

That makes weak references a handy way to break circular dependencies 
between objects and code.

=back

=head1 Other Macros

=over

=item macro SMOP__Object__BASE

This macro defines the top members present in every SMOP Object, basically 
defining the members documented in the section above.  Currently that is just
the C<.RI> member, but should members be added in future versions, they 
will appear in this list.  It should be used when declaring new types
of objects.

=item macro SMOP__ResponderInterface__BASE

Like the above macro, except that this defines the members present in
all responder interface objects, as documented further above.  Note
this does not include C<SMOP__Object__BASE>.  It is best not to nest
such macros to keep them reusable for compound types.

=item macro SMOP_RI(value)

Shorthand to dereference the C<.RI> member of a C<SMOP__Object> structure given 
the address of the C<SMOP__Object> structure.

=back

=head1 Talking Trash (Garbage Collection)

SMOP uses reference counting garbage collection convetions, as you probably
can tell from the above documentation for C<SMOP_REFERENCE> and C<SMOP_RELEASE>.

In the initial implementation, a reference counting garbage collector was 
selected since this type of garbage collector is considerably simpler to 
implement (even if considerably harder to debug and maintain.)  However, 
when design goals expanded to include interoperability with perl5, it 
became evident that following reference counting conventions would be a 
necessity in making SMOP and perl5 work together.

One thing that might not be obvious from the above technical notes
is that it's up to each responder interface to implement its own garbage 
collector.  This means that we can have several garbage collectors 
coexisting within the same process.  For instance, the SMOP default 
low-level and the perl5 garbage collectors could both manage different
sets of objects.  In addition, objects that do not do any garbage collection
at all may be present.  Even in this case, all objects at least pretend 
to implement the mechanisms that make reference counting possible. 

That is why the C<.REFERENCE>, C<.RELEASE> and C<.WEAKREF> functions 
are included at the base level.  Relatively few objects should be responder 
interfaces, so it is better for them just to carry vestigial members than 
make the code complex by trying to do without them.  This set of functions 
should be sufficient to interact with the majority of reference counting 
garbage collectors.

=head2 Who owns an object?

This is the most important question: when to call C<SMOP_REFERENCE> and 
when to call C<SMOP_RELEASE>.  The following documents the policy that 
must be followed to correctly garbage collect SMOP objects.

The below will refer to ownership "stakes" which belong to either
sections of code, or other objects -- an ownership stake is a concept, 
not a solid object residing in memory somewhere.  One stake in an object 
is merely an obligation by the owner to call C<SMOP_RELEASE> once on the 
object, or to transfer the stake by ensuring that some other code will 
call C<SMOP_RELEASE> on the object when appropriate.

There is also an obligation never to call C<SMOP_RELEASE> on an object in 
which you have no ownership stakes.

REFERENCE/RELEASE conventions:

=over

=item *

When an object is created, it becomes owned by that code which called the
method that created it.  The code has one ownership stake in the newly
created object after the creation is complete.

=item *

Code that calls C<SMOP_REFERENCE> assumes an additional ownership stake in 
the object.  Since it is so easy to give away stakes, C<SMOP_REFERENCE> is 
an important tool for keeping objects alive.

As such, code may have more than one stake in a single object, 
even though there is no way to distinguish between the results of object 
creation or the results of any of the calls to C<SMOP_REFERENCE>.  It is 
up to the developer to keep count of the number of stakes.

Code that has more than one stake in an object needs to 
C<SMOP_RELEASE> (or transfer) the reference as many times as it has stakes, 
and B<only> that many times.

=item *

Installing an object in a capture implies transferring one stake
in the object to the capture object (or more than one, if the object is 
installed more than once in the capture.)

As such, the code installing the object in the capture is no longer responsible 
for calling C<SMOP_RELEASE> for this one ownership stake.  If it has other 
ownership stakes, it must still call release for each of those.  

Note that this means to install an object in a capture more than once,
you should have obtained more than one stake in the object, because the
capture will call C<SMOP_RELEASE> more than once.

Also note that, as long as the capture is around to own the object, the 
original code may still use references to the object, without acquiring a 
new one.  However, this may not be advisable for code legibility and
maintainability.

=item *

References owned by capture objects will be automatically C<SMOP_RELEASE>d 
when the capture object itself is destroyed.  Capture objects automatically
fulfill their obligations to ownership stakes (as long as the ownership
stakes to the aggregate capture object itself are correctly fulfilled.)

Again note, if an object is in a capture more than one time, the capture
is going to call C<SMOP_RELEASE> on the object more than one time when it is
destroyed.

=item *

Once an object is installed in a capture, getting a new reference to the
individual object requires the use of a special direct-access API that
bypasses the normal C<.MESSAGE> method calling interface.  This procedure 
will be documented elsewhere -- the important thing to know is that the
capture will automatically call C<SMOP_REFERENCE> on any object extracted
from it.

This stake is owned by the code that extracted the value.

=item *

When a capture is passed to a C<SMOP_DISPATCH>/C<.MESSAGE> as the capture 
parameter, the code receiving the capture assumes one ownership stake in 
the capture object from the caller.  That is, the caller has one less 
ownership stake in the capture after passing it on.  Thus, the receiving 
code should C<SMOP_RELEASE> the capture before returning (or pass it 
on somewhere else.)

In this scenario, the capture is still the owner of the objects inside it.

=item *

TODO: ownership behavior of return.

=item *

A call to C<SMOP_RELEASE> implies that this owner no longer wants B<one> 
of its ownership stakes in the object.  The owner will still retain any other 
ownership stakes.

=item *

Passing an object to the C<intepreter>, C<object>/C<self>, or C<identifier> 
parameters of a C<SMOP_DISPATCH>/C<.MESSAGE> does not transfer the ownership 
stake in that object, unlike the C<capture> parameter.

=item *

If a C<SMOP_RELEASE> or C<SMOP_REFERENCE> happens inside a subroutine, and 
the subroutine returns with a net gain or loss of ownership stakes, then 
the code that called the subroutine will gain or lose that many ownership 
stakes.  There is no requirement to keep all ownership stake manipulation 
within the same block of C code.  

However, from a good coding practice standpoint, it is avisable to balance 
ownership stakes where possible, or otherwise, to fully comment and document 
the behavior.

=item *

C<SMOP_WEAKREF> is used to return a weak reference to an object, it may return
a different pointer, to an entirely new object, owned by the code that 
called it.  Calling C<SMOP_WEAKREF> doesn't change the ownership stake in the 
original object (at least, never when it matters.) 

However, since it may create a new object, the weakref itself should 
still be C<SMOP_RELEASE>d.

=back

=head2 Summary

Most reference counting will happen around C<SMOP_DISPATCH>/C<.MESSAGE> 
method invocations.

In general, the caller can "fire and forget" and the callee has to clean
up the mess.  From the caller side, the only tricky part is remembering
to take an extra C<SMOP_REFERENCE> when installing one object into a capture
more than once, or if the object is to be used after a capture it is
inside has been destroyed.

The callee, on the other hand, must remember to C<SMOP_RELEASE> any objects it 
extracted from the capture (once for every time that object is extracted)
and after that, to C<SMOP_RELEASE> the capture itself, before returning.
Alternatively it may dispose of the ownership stakes by transferring them
to other code or captures, like, for example, inside its result.

=head1 IMPORTANT SPEC NOTICE

This document describes everything that you can assume about an arbitrary
object. This means that you can only introspect in more detail by
either calling a method, or via special knowledge of the internals of 
the responder interface of the given object (for example, inside the
code of the responder interface itself.)

It is erroneous to assume anything about the internal structure of any 
object, even responder interface objects, beyond what is described in 
this document.

=cut