=head1 NAME
Base - SMOP basic structures
=head1 REVISION
$Id$
=head1 SMOP__Object
In SMOP, every single value must be binary-compatible with the
C<SMOP__Object> struct. This even includes core level constructs such as
the interpreter and the native types. This idea comes directly from
how perl5 works, with the SV struct.
Unlike p5, however, the C<SMOP__Object> struct is absolutely minimalist; It
defines no type, no flags, and no introspection information. It defines
only that every C<SMOP__Object> has a "responder interface" (C<.RI>), so the
structure is merely:
struct SMOP__Object {
SMOP__ResponderInterface* RI;
/* Maybe there is something here, maybe there is nothing here.
* Only the responder interface knows.
*/
}
The value in the C<.RI> member is not unique to the object. For all but
singleton classes, one responder interface will be used by multiple object
structs. As such, the object is identified only by the memory address at
which the struct C<SMOP__Object> is stored.
This means that you can't really do anything to the object yourself, you
can only talk to its responder interface. The object serves as both
a way to find the correct responder interface, and a way to tell the
responder interface which instance data to operate on -- and that is all.
There may be additional data below the C<.RI> member, but if so,
only the responder interface knows how to use it. The data for the
object instance may, in fact, B<not> be stored in the structure at all --
it could be looked up using the object's address in a completely
separate data store.
As such, it is incorrect to attempt to copy or move a C<SMOP_Object> struct
using a simple memory copy like C's memcpy(). Even if you lucked out and
got all the data in the object, you would have changed its address,
and it would not be the same object anymore. This point is especially
important to note if an object may exist in multiple address spaces -- only
one address will be valid without special handling.
=head1 SMOP__ResponderInterface
The responder interface (which, of course, is also binary-compatible
with C<SMOP__Object>) implements the low-level part of the meta object
protocol. It is through the responder interface that you can perform
actions on the object.
Using the responder interface, arbitrary methods may be invoked on
the object. It's important to realize that this method invocation
happens at the same level that any high-level language might call.
This means that there's no distinction between native operators and
high-level operators, nor between native values and high-level values.
The structure of a responder interface is as follows:
struct SMOP__ResponderInterface {
SMOP__ResponderInterface* RI;
SMOP__Object* (*MESSAGE) (SMOP__Object* interpreter,
SMOP__ResponderInterface* self,
SMOP__Object* identifier,
SMOP__Object* capture);
SMOP__Object* (*REFERENCE)(SMOP__Object* interpreter,
SMOP__ResponderInterface* self,
SMOP__Object* object);
SMOP__Object* (*RELEASE) (SMOP__Object* interpreter,
SMOP__ResponderInterface* self,
SMOP__Object* object);
SMOP__Object* (*WEAKREF) (SMOP__Object* interpreter,
SMOP__ResponderInterface* self,
SMOP__Object* object);
char* id;
/* Maybe there is something here, maybe there is nothing here.
* Only the responder interface in member .RI knows.
*/
}
However, the SMOP base defines a few macros that should be used when
interacting with SMOP Objects. While in theory, the use of those
macros is optional, it's strongly advised that you stick with them, to
make transitions to newer versions easier.
As such, each of the function hooks defined in the above structure
will be described along with the macros which should be used to access
them.
=over
=item macro SMOP_DISPATCH
SMOP_DISPATCH(interpreter, object, identifier, capture)
This macro (and all its parameters) correspond with the C<MESSAGE> function
hook member. This is the function that handles method invocation for
the objects which this responder interface oversees:
SMOP__Object* (*MESSAGE) (
SMOP__Object* interpreter, /* gets interpreter */
SMOP__ResponderInterface* self, /* gets (responder) object */
SMOP__Object* identifier, /* gets identifier */
SMOP__Object* capture /* gets capture (instance object inside) */
);
As you might have noticed, it receives objects as arguments and returns, of
course, an object.
C<SMOP_DISPATCH> uses the C<.MESSAGE> function in the responder found at
C<object> to invoke a method with a name found at C<identifier>. It invokes
that method in the context of the interpeter found at C<interpreter> using
the capture found at C<capture> to pass data to the method's parameters.
Each of these macro arguments are expanded upon in other documentation, however,
you may notice that something appears to be missing. Methods usually have
an "invocant" -- which would be a C<SMOP__Object> that was used to find the
responder that is being pointed to in C<object> above. If there is one, it
is tucked away inside the capture.
=item macros SMOP_REFERENCE and SMOP_RELEASE
SMOP_REFERENCE(interpreter, object)
SMOP_RELEASE(interpreter, object)
C<SMOP_REFERENCE> and C<SMOP_RELEASE> call, respectively, the C<.REFERENCE> and
C<.RELEASE> functions in a responder interface. The responder interface
used is the one that is pointed to by the C<.RI> member of the object structure
pointed to by C<object>. The C<object> pointer itself is also passed to the
REFERENCE or RELEASE function:
SMOP__Object* (*REFERENCE)(
SMOP__Object* interpreter, /* gets interpreter */
SMOP__ResponderInterface* self, /* gets the RI member found at object */
SMOP__Object* object /* gets object itself */
);
These functions increment or decrement the reference count of C<object> in
the context of C<interpreter>. The reference count is used to handle
automatic cleanup of objects when they are no longer needed -- more on this
subject later.
The macros both return the same value that was passed into C<object>, so you
can use the macro in most places where you would use an object pointer, much
like you would use C<i++> to postincrement an integer in-place. This is handy
in keeping code terse, but take care, you should do nothing like
C<SMOP_RELEASE(interp,current++)> nor C<SMOP_RELEASE(interp,current)++> when
working with arrays of objects.
=item macro SMOP_WEAKREF
SMOP_WEAKREF(interpreter, object)
C<SMOP_WEAKREF> calls the C<.WEAKREF> function in a responder interface. It
works much the same way as the C<SMOP_REFERENCE> macro, above.
SMOP__Object* (*WEAKREF) (
SMOP__Object* interpreter, /* gets interpreter */
SMOP__ResponderInterface* self, /* gets the RI member found at object */
SMOP__Object* object /* gets object itself */
);
C<SMOP_WEAKREF> can be used wherever you would normally use C<SMOP_REFERENCE>
to obtain a "weak reference" instead. This call is allowed to return you a
different object than the one you point to with C<object>, and you are supposed
to use that as a proxy. Weak references do not count as a reference
against the original C<object> for the purposes of garbage collection.
This means that the original object may be freed before the weak reference itself
is destroyed. If this happens, the weak reference will start to refer
to some appropriate constant (like C<False>) instead of the now-dead object.
The implementation of the weak-reference is private to each responder
interface's implementation, so the exact behavior may vary depending on the
kind of objects you are working with. Especially, note that if an object
does not actually need to be reference counted, a weak reference may end
up returning the original object, so you are not allowed to assume the
macro will always return a different pointer than the one passed via C<object>.
Note that a weak reference is itself an object. So you do still need to
call C<SMOP_RELEASE> on it when you are done with it. (It isn't provided
just to help us be lazy.) However, all C<SMOP_REFERENCE> and C<SMOP_RELEASE>
calls on the weak reference object count references to the the proxy object,
not the original object.
That makes weak references a handy way to break circular dependencies
between objects and code.
=back
=head1 Other Macros
=over
=item macro SMOP__Object__BASE
This macro defines the top members present in every SMOP Object, basically
defining the members documented in the section above. Currently that is just
the C<.RI> member, but should members be added in future versions, they
will appear in this list. It should be used when declaring new types
of objects.
=item macro SMOP__ResponderInterface__BASE
Like the above macro, except that this defines the members present in
all responder interface objects, as documented further above. Note
this does not include C<SMOP__Object__BASE>. It is best not to nest
such macros to keep them reusable for compound types.
=item macro SMOP_RI(value)
Shorthand to dereference the C<.RI> member of a C<SMOP__Object> structure given
the address of the C<SMOP__Object> structure.
=back
=head1 Talking Trash (Garbage Collection)
SMOP uses reference counting garbage collection convetions, as you probably
can tell from the above documentation for C<SMOP_REFERENCE> and C<SMOP_RELEASE>.
In the initial implementation, a reference counting garbage collector was
selected since this type of garbage collector is considerably simpler to
implement (even if considerably harder to debug and maintain.) However,
when design goals expanded to include interoperability with perl5, it
became evident that following reference counting conventions would be a
necessity in making SMOP and perl5 work together.
One thing that might not be obvious from the above technical notes
is that it's up to each responder interface to implement its own garbage
collector. This means that we can have several garbage collectors
coexisting within the same process. For instance, the SMOP default
low-level and the perl5 garbage collectors could both manage different
sets of objects. In addition, objects that do not do any garbage collection
at all may be present. Even in this case, all objects at least pretend
to implement the mechanisms that make reference counting possible.
That is why the C<.REFERENCE>, C<.RELEASE> and C<.WEAKREF> functions
are included at the base level. Relatively few objects should be responder
interfaces, so it is better for them just to carry vestigial members than
make the code complex by trying to do without them. This set of functions
should be sufficient to interact with the majority of reference counting
garbage collectors.
=head2 Who owns an object?
This is the most important question: when to call C<SMOP_REFERENCE> and
when to call C<SMOP_RELEASE>. The following documents the policy that
must be followed to correctly garbage collect SMOP objects.
The below will refer to ownership "stakes" which belong to either
sections of code, or other objects -- an ownership stake is a concept,
not a solid object residing in memory somewhere. One stake in an object
is merely an obligation by the owner to call C<SMOP_RELEASE> once on the
object, or to transfer the stake by ensuring that some other code will
call C<SMOP_RELEASE> on the object when appropriate.
There is also an obligation never to call C<SMOP_RELEASE> on an object in
which you have no ownership stakes.
REFERENCE/RELEASE conventions:
=over
=item *
When an object is created, it becomes owned by that code which called the
method that created it. The code has one ownership stake in the newly
created object after the creation is complete.
=item *
Code that calls C<SMOP_REFERENCE> assumes an additional ownership stake in
the object. Since it is so easy to give away stakes, C<SMOP_REFERENCE> is
an important tool for keeping objects alive.
As such, code may have more than one stake in a single object,
even though there is no way to distinguish between the results of object
creation or the results of any of the calls to C<SMOP_REFERENCE>. It is
up to the developer to keep count of the number of stakes.
Code that has more than one stake in an object needs to
C<SMOP_RELEASE> (or transfer) the reference as many times as it has stakes,
and B<only> that many times.
=item *
Installing an object in a capture implies transferring one stake
in the object to the capture object (or more than one, if the object is
installed more than once in the capture.)
As such, the code installing the object in the capture is no longer responsible
for calling C<SMOP_RELEASE> for this one ownership stake. If it has other
ownership stakes, it must still call release for each of those.
Note that this means to install an object in a capture more than once,
you should have obtained more than one stake in the object, because the
capture will call C<SMOP_RELEASE> more than once.
Also note that, as long as the capture is around to own the object, the
original code may still use references to the object, without acquiring a
new one. However, this may not be advisable for code legibility and
maintainability.
=item *
References owned by capture objects will be automatically C<SMOP_RELEASE>d
when the capture object itself is destroyed. Capture objects automatically
fulfill their obligations to ownership stakes (as long as the ownership
stakes to the aggregate capture object itself are correctly fulfilled.)
Again note, if an object is in a capture more than one time, the capture
is going to call C<SMOP_RELEASE> on the object more than one time when it is
destroyed.
=item *
Once an object is installed in a capture, getting a new reference to the
individual object requires the use of a special direct-access API that
bypasses the normal C<.MESSAGE> method calling interface. This procedure
will be documented elsewhere -- the important thing to know is that the
capture will automatically call C<SMOP_REFERENCE> on any object extracted
from it.
This stake is owned by the code that extracted the value.
=item *
When a capture is passed to a C<SMOP_DISPATCH>/C<.MESSAGE> as the capture
parameter, the code receiving the capture assumes one ownership stake in
the capture object from the caller. That is, the caller has one less
ownership stake in the capture after passing it on. Thus, the receiving
code should C<SMOP_RELEASE> the capture before returning (or pass it
on somewhere else.)
In this scenario, the capture is still the owner of the objects inside it.
=item *
TODO: ownership behavior of return.
=item *
A call to C<SMOP_RELEASE> implies that this owner no longer wants B<one>
of its ownership stakes in the object. The owner will still retain any other
ownership stakes.
=item *
Passing an object to the C<intepreter>, C<object>/C<self>, or C<identifier>
parameters of a C<SMOP_DISPATCH>/C<.MESSAGE> does not transfer the ownership
stake in that object, unlike the C<capture> parameter.
=item *
If a C<SMOP_RELEASE> or C<SMOP_REFERENCE> happens inside a subroutine, and
the subroutine returns with a net gain or loss of ownership stakes, then
the code that called the subroutine will gain or lose that many ownership
stakes. There is no requirement to keep all ownership stake manipulation
within the same block of C code.
However, from a good coding practice standpoint, it is avisable to balance
ownership stakes where possible, or otherwise, to fully comment and document
the behavior.
=item *
C<SMOP_WEAKREF> is used to return a weak reference to an object, it may return
a different pointer, to an entirely new object, owned by the code that
called it. Calling C<SMOP_WEAKREF> doesn't change the ownership stake in the
original object (at least, never when it matters.)
However, since it may create a new object, the weakref itself should
still be C<SMOP_RELEASE>d.
=back
=head2 Summary
Most reference counting will happen around C<SMOP_DISPATCH>/C<.MESSAGE>
method invocations.
In general, the caller can "fire and forget" and the callee has to clean
up the mess. From the caller side, the only tricky part is remembering
to take an extra C<SMOP_REFERENCE> when installing one object into a capture
more than once, or if the object is to be used after a capture it is
inside has been destroyed.
The callee, on the other hand, must remember to C<SMOP_RELEASE> any objects it
extracted from the capture (once for every time that object is extracted)
and after that, to C<SMOP_RELEASE> the capture itself, before returning.
Alternatively it may dispose of the ownership stakes by transferring them
to other code or captures, like, for example, inside its result.
=head1 IMPORTANT SPEC NOTICE
This document describes everything that you can assume about an arbitrary
object. This means that you can only introspect in more detail by
either calling a method, or via special knowledge of the internals of
the responder interface of the given object (for example, inside the
code of the responder interface itself.)
It is erroneous to assume anything about the internal structure of any
object, even responder interface objects, beyond what is described in
this document.
=cut