<title>PerlGuts Illustrated</title>
<BODY bgcolor="#FFFFFF"
text="#000000" link="#000055" vlink="#550000" alink="#000000"
topmargin=0>
<h1 align=center>PerlGuts Illustrated<br><small>Version 0.09</small></h1>
<p>This document is meant to supplement the <i>perlguts(1)</i> manual
page that comes with Perl. It contains commented illustrations of all
(eventually?) major internal Perl data structures. Having this
document handy hopefully makes reading the Perl source code easier.
It might also help you interpret the <i>Devel::Peek</i>
dumps. I'll try to expand it as I learn more.
<p>The first things to look at are the data structures that represent
Perl data; scalars of various kinds, arrays and hashes. Internally
Perl calls a scalar <i>SV</i> (scalar value), an array <i>AV</i>
(array value) and a hash <i>HV</i> (hash value). In addition it uses
<i>IV</i> for integer value, <i>NV</i> for numeric value (aka double),
<i>PV</i> for a pointer value (aka string value (char*), but 'S' was
already taken), and <i>RV</i> for reference value. The <i>IVs</i> are
further guaranteed to be big enough to hold a <code>void*</code>.
<p>The internal relationship between the Perl data types is really
object oriented. Perl relies on using C's structural equivalence to
help emulate something like C++ inheritance of types. The various
data types that Perl implement are illustrated in this class hierarchy
diagram. The arrows indicate inheritance (IS-A relationships).
<p><center><img src="svtypes.png"></center>
<p>As you can see, Perl uses multiple inheritance with C <i>SvNULL</i> (also named just <i>SV</i>)
acting as some kind of virtual base class. All the Perl types are
identified by small numbers, and the internal Perl code often gets away
with testing the ISA-relationship between types with the <= operator.
As you can see from the figure above, this can only work reliably for
some comparisons. All Perl data value objects are tagged with their
type, so you can always ask an object what its type is and act
according to this information.
<p>The symbolic type names (and associated value) are:
<blockquote><dl>
<dt>0) <b>SVt_NULL</b>
<dt>1) <b>SVt_IV</b>
<dt>2) <b>SVt_NV</b>
<dt>3) <b>SVt_RV</b>
<dt>4) <b>SVt_PV</b>
<dt>5) <b>SVt_PVIV</b>
<dt>6) <b>SVt_PVNV</b>
<dt>7) <b>SVt_PVMG</b>
<dt>8) <b>SVt_PVBM</b>
<dt>9) <b>SVt_PVLV</b>
<dt>10) <b>SVt_PVAV</b>
<dt>11) <b>SVt_PVHV</b>
<dt>12) <b>SVt_PVCV</b>
<dt>13) <b>SVt_PVGV</b>
<dt>14) <b>SVt_PVFM</b>
<dt>15) <b>SVt_PVIO</b>
</dl></blockquote>
<p>In addition to the simple type names already mentioned, the
following names are found in the hierarchy figure: An <i>SvPVIV</i> value can
hold a string and an integer value. An <i>SvPVNV</i> value can hold a
string, an integer and a double value. The <i>SvPVMG</i> is used when
magic is attached or the value is blessed. The <i>SvPVBM</i> adds
information for fast searching (Boyer-Moore) on the string value. The
<i>SvPVLV</i> represents a LValue object.
<i>CV</i> is a code value, which represents a perl
function/subroutine/closure or contains a pointer to an XSUB.
<i>GV</i> is a glob value and <i>IO</i> contains pointers to open
files and directories and various state information about these. The
<i>SvPVFM</i> is used to hold information on forms.
<p>A Perl data object can change type as the value is modified. The SV is
said to be upgraded in this case. Type changes only go down the
hierarchy. (See the sv_upgrade() function in sv.c.)
<p>The actual layout in memory does not really match how a typical C++
compiler would implement a hierarchy like the one depicted above.
Let's see how it is done.
<blockquote><small><i>
In the description below we use field names that match the macros that
are used to access the corresponding field. For instance the
<code>xpv_cur</code> field of the <code>xpvXX</code> structs are
accessed with the <code>SvCUR()</code> macro. The field is referred
to as <b>CUR</b> in the description below. This also match the field
names reported by the <i>Devel::Peek</i> module.
</i></small></blockquote>
<a name="svnull"><h2>SvNULL and struct sv</h2></a>
<p>The simplest type is SvNULL. It always represents an
<i>undefined</i> scalar value. It consist of the "struct sv" only,
and looks like this:
<p><center><img src="svnull.png"></center>
<p>It contains a pointer (ANY) to more data, which in this case is
always NULL. All the subtypes are implemented by attaching
additional data to the ANY pointer.
<p>The second field is an integer reference counter (REFCNT) which should
tell us how many pointers reference this object. When Perl data types
are created this value is initialized to 1. The field must be
incremented when a new pointer is made to point to it and decremented
when the pointer is destroyed or assigned a different value. When the
reference count reaches zero the object is freed.
<p>The third word contains a FLAGS field and a TYPE field.
The TYPE field contains a small number that represents one of the types
shown in the type hierarchy figure above.
The FLAGS field has room for 24 flag bits, which encode how various
fields of the object should be interpreted, and
other state information. Some flags are just used as optimizations
in order to avoid having to dereference several levels of pointers
just to find that the information is not there.
<p><center><img src="flags.png" alt=""></center>
<p>The purpose of the flag bits are:
<blockquote>
<dl>
<dt> 0) <b>PADBUSY</b>
<dd> reserved for tmp or my already
<p>
<dt> 1) <b>PADTMP</b>
<dd> in use as tmp
<p>
<dt> 2) <b>PADMY</b>
<dd> in use a "my" variable
<p>
<dt> 3) <b>TEMP</b>
<dd> string is stealable
<p>
<dt> 4) <b>OBJECT</b>
<dd> This flag is set when the object is "blessed". It can only be
set for value type SvPVMG or subtypes of it. This flag also
indicates that the STASH pointer is valid and
points to a namespace HV.
<p>
<dt> 5) <b>GMAGICAL</b> (Get Magic)
<dd> This flag indicates that the object has a magic <i>get</i> or
<i>len</i> method to be invoked.
It can only be set for value type SvPVMG or subtypes
of it. This flag also indicate that the MAGIC pointer is valid.
<p>
<dt> 6) <b>SMAGICAL</b> (Set Magic)
<dd> This flag indicates that the object has a magic <i>set</i> method to
be invoked.
<p>
<dt> 7) <b>RMAGICAL</b> (Random Magic)
<dd> This flag indicates that the object has any other magical methods
(besides get/len/set magic method) or even methodless magic attached.
<p>The RMAGICAL flag is used mainly for tied HV and AV (having 'P'
magic) and SVs which have magic <i>clear</i> method. It is used as an
optimization to avoid setting GMAGICAL and SMAGICAL flags for SVs
which need to be marked as MAGICAL otherwise.
<p>
Any of GMAGICAL, SMAGICAL and RMAGICAL is called MAGICAL
<p>
<dt> 8) <b>IOK</b> (Integer OK)
<dd> This flag indicates that the object has a valid IVX field value.
It can only be set for value type SvIV or subtypes of it.
<p>
<dt> 9) <b>NOK</b> (Numeric OK)
<dd> This flag indicates that the object has a valid NVX field value.
It can only be set for value type SvNV or subtypes of it.
<p>
<dt> 10) <b>POK</b> (Pointer OK)
<dd> This flag indicates that the object has a valid PVX, CUR and LEN
field values (i.e. a valid string value).
It can only be set for value type SvPV or subtypes of it.
<p>
<dt> 11) <b>ROK</b> (Reference OK)
<dd> This flag indicates that the type should be treated as an SvRV
and that the RV field contains a valid reference pointer.
<p>
<dt> 12) <b>FAKE</b>
<dd>glob or lexical is just a copy
<p>
<dt> 13) <b>OOK</b> (Offset OK)
<dd> This flag indicates that the IVX value is to be interpreted as
a string offset. This flag can only be set for value type SvPVIV
or subtypes of it. It also follows that the IOK (and IOKp) flag must
be off when OOK is on. Take a look at the <i>SvOOK</i> figure
below.
<p>
<dt> 14) <b>BREAK</b>
<dd>refcnt is artificially low
<p>
<dt> 15) <b>READONLY</b>
<dd> This flag indicate that the value of the object may not be
modified.
<p>
<dt> 16) <b>IOKp</b> (Integer OK Private)
<dd>has valid non-public integer value.
<p>The private OK flags (IOKp, NOKp, POKp)
are used by the magic system. During
execution of a magic callback, the private flags will be used to set
the public flags. When the callback returns, then the
public flags are cleared. This effectively is used to pass the
value to get/set to/from magic callbacks.
<p>
<dt> 17) <b>NOKp</b> (Numeric OK Private)
<dd>has valid non-public numeric value
<p>
<dt> 18) <b>POKp</b> (Pointer OK Private)
<dd>has valid non-public pointer value
<p>
<dt> 19) <b>SCREAM</b>
<dd>has been studied
<p>
<dt> 20) <b>AMAGIC</b>
<dd>has magical overloaded methods
<p>
<dt> 21) <b>SHAREKEYS</b>
<dd> Only used by HVs. See description of HV below.
<p>
<dt> 22) <b>LAZYDEL</b>
<dd> Only used by HVs. See description of HV below.
<p>
<dt> 22) <b>TAIL</b>
<dd> Only used by SvPVBMs. See description of SvPVBM below.
<p>
<dt> 23) <b>VALID</b>
<dd> Only used by SvPVBMs. See description of SvPVBM below.
<p>
<dt> 23) <b>COMPILED</b>
</dl>
</blockquote>
<p>The <code>struct sv</code> is common for all subtypes of SvNULL in
Perl. In the Perl source code this structure is typedefed to
<i>SV</i>, <i>AV</i>, <i>HV</i>, <i>CV</i>, <i>GV</i> and <i>IO</i>.
Routines that can take any type as parameter will have
<code>SV*</code> as parameter. Routines that only work with arrays or
hashes have <code>AV*</code> or <code>HV*</code> respectively in their
parameter list. Likewise for the rest.
<a name="svpv"><h2>SvPV</h2></a>
<p>A scalar that can hold a string value is called an <i>SvPV</i>. In
addition to the <i>SV</i> struct of SvNULL, an <i>xpv</i> struct is
allocated and it contains 3 fields. PVX is the pointer to an
allocated char array. CUR is an integer giving the current length of
the string. LEN is an integer giving the length of the allocated
string. The byte at (PVX + CUR) should always be '\0' in order
to make sure that the string is NUL-terminated if passed to C library
routines. This requires that LEN is always at least 1 larger than CUR.
<p><center><img src="svpv.png"></center>
<p>The POK flag indicates that the string pointed to by PVX contains
an valid string value. If the POK flag is off and the ROK flag is
turned on, then the PVX field is used as a pointer to an SV (see SvRV
below) and CUR/LEN is unused. An SvPV with both the POK and ROK flags
turned off represents <i>undef</i>. The PVX pointer can also be NULL
when POK is off and no string storage has been allocated.
<h2><a name="svpviv">SvPVIV</a> and <a name="svpvnv">SvPVNV</a></h2>
<p>The <i>SvPVIV</i> type is like <i>SvPV</i> but has an additional
field to hold a single integer value called IVX. The IOK flag
indicates if the IVX value is valid. If both the IOK and POK flag is
on, then the PVX will (usually) be a string representation of the same
number found in IVX.
<p><center><img src="svpviv.png"></center>
<p>The <i>SvPVNV</i> type is like <i>SvPVIV</i> but has an additional
field to hold a single <i>double</i> value called NVX. The
corresponding flag is called NOK.
<p><center><img src="svpvnv.png"></center>
<a name="ook"><h2>SvOOK</h2></a>
As a special hack, in order to improve the speed of removing characters
from the beginning of a string, the OOK flag is used. When this flag
is on, then the IVX value is not interpreted as an integer value, but
is instead used as an <i>offset</i> into the string. The PVX, CUR,
LEN is adjusted to point within the allocated string instead.
<p><center><img src="ook.png"></center>
<h2><a name="sviv">SvIV</a> and <a name="svnv">SvNV</a></h2>
<p>As a special case we also have <i>SvIV</i> and <i>SvNV</i> types
that only have room for a single integer or a single double value.
These are special in that the PVX/CUR/LEN fields are not present even
if the ANY pointer actually points to the ghostual incarnation of
them. This arrangement makes it possible for code to always access
the IVX/NVX fields at a fixed offset from where the SV field ANY
points.
<p><center><img src="sviv.png"></center>
<p><center><img src="svnv.png"></center>
<a name="svrv"><h2>SvRV</h2></a>
<p>The <i>SvRV</i> subtype just lets the SV field ANY point to a
pointer which points to an SV (which can be any of the SvNULL
subtypes). A SvRV object with ROK flag off represents an undefined
value.
<p><center><img src="svrv.png"></center>
<p>Subclasses of SvPV can also be treated as SvRV objects when the ROK
flag is turned on for them. The PVX will then be used as the RV field
of the SvRV. Of course ROK and POK is mutually exclusive and the
CUR/LEN fields have no meaning when ROK is on.
<p><center><img src="svpvrv.png"></center>
<a name="svpvmg"><h2>SvPVMG</h2></a>
<p>The <i>SvPVMG</i> is like <i>SvPVNV</i> above, but has two
additional fields; MAGIC and STASH. MAGIC is a pointer to additional
structures that contains callback functions and other data. If the
MAGIC pointer is non-NULL, then one or more of the MAGICAL flags will
be set.
<p>STASH (<b>s</b>ymbol <b>t</b>able h<b>ash</b>) is a pointer to a HV
that represents some namespace/class.
(That the HV represents some namespace means that the NAME field
of the HV must be non-NULL. See description of HVs and stashes below).
The STASH field is set when the value is blessed into a package
(becomes an object). The OBJECT flag will be set when STASH is.
<small><i>(IMHO, this field should really have been named "CLASS".
The GV and CV subclasses introduce their own unrelated fields called
STASH which might be confusing.)</i></small>
<p><center><img src="svpvmg.png"></center>
<p>The field MAGIC points to an instance of <code>struct magic</code>
(typedefed as <code>MAGIC</code>). This struct has 8 fields:
<ol>
<li>
<i>moremagic</i> is a pointer to another MAGIC and is used to form a
single linked list of the MAGICs attached to an SV.
<li><i>virtual</i> is a pointer to a struct containing 5 function
pointers. The functions (if set) are invoked when the corresponding
action happens to the SV.
<li><i>private</i> is a 16 bit number (U16) not used by Perl.
<li><i>type</i> is a character field and is used to denote which kind
of magic this is. The interpretation of the rest of the fields depend
on the <i>type</i> (actually it is the callbacks attached to
<i>virtual</i> that do any interpretation). There is usually a direct
correspondence between the <i>type</i> field and the <i>virtual</i>
field.
<li><i>flags</i> contains 8 flag bits. Only 2 are generally used.
Bit 1 is the <b>REFCOUNTED</b> flag. It indicates that the <i>obj</i>
is assumed to be an SV and that it's reference count must be
decremented when this magic is freed. The <b>GSKIP</b> flag indicate
that invocation of the magical GET method should be suppressed. Other
flag bits are used depending of the kind of magic.
<li><i>obj</i> is usually a pointer to some SV. How it is used
depends on the kind of magic this is.
<li><i>ptr</i> is usually a pointer to some character string. How it
is used depends on the kind of magic this is. If the <i>len</i> field
is >= 0, then <i>ptr</i> is assumed to point to a malloced buffer and
will be automatically freed when the magic is.
<li><i>len</i> is usually the length of the character string pointed
to by <i>ptr</i>. How it is used depends on the kind of magic this
is.
</ol>
<a name="svpvbm"><h2>SvPVBM</h2></a>
<p>The <i>SvPVBM</i> is like <i>SvPVMG</i> above, but has three
additional fields; BmUSEFUL, BmPREVIOUS, BmRARE. The SvPVBM value
types are used internally to implement very fast lookup of the string
in PVX using the Boyer-Moore algorithm. They are used by the Perl
index() builtin when the search string is a constant, as well as in
the RE engine. The fbm_compile() function turns normal SvPVs into
this value type.
<p>A table of 256 elements is appended to the PVX. This table
contains the distance from the end of string of the last occurrence of
each character in the original string. (In recent Perls, the table is
not built for strings shorter than 3 character.) In addition
fbm_compile() locates the rarest character in the string (using
builtin letter frequency tables) and stores this character in the
BmRARE field. The BmPREVIOUS field is set to the location of the
first occurrence of the rare character. BmUSEFUL is incremented
(decremented) by the RE engine when this constant substring (does not)
help in optimizing RE engine access away. If it goes below 0, then
the corresponding substring is forgotten and freed;
<p><center><img src="svpvbm.png"></center>
<p>The extra SvPVBM information and the character distance table is
only valid when the <b>VALID</b> flag is on. A magic structure with
the sole purpose of turning off the VALID flag on assignment, is always
attached to a <i>valid</i> SvPVBM.
<p>The <b>TAIL</b> flag is used to indicate that the search for the SvPVMG
should be <i>tail anchored</i>, i.e. a match should only be considered
at the end of the string (or before newline at the end of the string).
<a href="svpvlv"><h2>SvPVLV</h2></a>
The <i>SvPVLV</i> is like <i>SvPVMG</i> above, but has four additional
fields; TARGOFF, TARGLEN, TARG, TYPE. The typical use is for Perl
builtins that can be used in the LValue context (substr, vec,...).
They will return an SvPVLV value, which when assigned to use magic to
affect the <i>target</i> object, which they keep a pointer to in the TARG
field.
<p>The TYPE is a character variable. It encodes the kind if LValue
this is. Interpretation of the other LValue fields depend on the TYPE.
The SvPVLVs are (almost) always magical. The magic type will match
the TYPE field of the SvPVLV. The types are:
<blockquote><dl compact>
<dt> <b>'x'</b>
<dd> Type-x LVs are returned by the <code>substr($string,
$offset, $len)</code> builtin.
<dt> <b>'v'</b>
<dd> Type-v LVs are returned by the <code>vec($string,
$offset, $bits)</code> builtin.
<dt> <b>'.'</b>
<dd> Type-. LVs are returned by the <code>pos($scalar)</code> builtin.
<dt> <b>'k'</b>
<dd> Type-k LVs are returned when <code>keys %hash</code> is
used on the left side of the assignment operator.
<dt> <b>'y'</b>
<dd> Type-y LVs are used by auto-vivification (of hash and array
elements) and the foreach array iterator variable.
<dt> <b>'/'</b>
<dd> Used by <i>pp_pushre</i>. <i>(I don't understand this yet.)</i>
</dl></blockquote>
<p>The figure below shows an SvPVLV as returned from the
<code>substr()</code> builtin. The first substr parameter (the
string to be affected) is assigned to the TARG field. The substr
offset value goes in the TARGOFF field and the substr length parameter
goes in the TARGLEN field.
<p><center><img src="svpvlv.png"></center>
<p>When assignment to an SvPVLV type occurs, then the value to be
assigned is first copied into the SvPVLV itself (and affects the PVX,
IVX or NVX). After this the magic SET method is invoked, which will
update the TARG accordingly.
<a name="av"><h2>AV</h2></a>
<p>An array is in many ways represented similar to strings. An AV
contains all the fields of SvPVMG and adds the following tree fields:
ALLOC is a pointer to the allocated array. ARYLEN is a pointer to a
magic SV (which is returned when <code>$#array</code> is requested)
and is only created on demand. FLAGS contains some extra flag bits
that are specific of the array subtype.
<p>The first three fields of xpvav have been renamed even though they
serve nearly the same function as the corresponding SvPV fields. PVX
has become ARRAY, CUR has become FILL and LEN has become MAX. One
difference is that the value of FILL/MAX is always one less than
CUR/LEN would be in the same situation. The IVX/NVX fields are
unused.
<p><center><img src="av.png"></center>
<p>The array pointed to by ARRAY contains pointers to any of the
SvNULL subtypes. Usually ALLOC and ARRAY both point to the start of
the allocated array. The use of two pointers is similar to the OOK
hack described above. The shift operation can be implemented
efficiently by just adjusting the ARRAY pointer (and FILL/MAX).
Similarly, the pop just involves decrementing the FILL count.
<p>The are only 3 array flags defined:
<blockquote><dl>
<dt> 0) <b>REAL</b>
<dd> It basically means that all
SVs contained in this array is owned and must have their
reference counters decremented when the reference is removed
from the array. All normal arrays are REAL. For the
<code>stack</code> the REAL flag is turned off.
For <code>@_</code> the REAL flag is initially turned off.
<dt> 1) <b>REIFY</b>
<dd> The array is <i>not</i> REAL but should be made REAL if modified.
The <code>@_</code> array will have the REIFY flag turned on.
<dt> 2) <b>REUSED</b>
<dd> This flag does not seem to be used for anything currently.
</dl></blockquote>
<a name="hv"><h2>HV</h2></a>
<p>Hashes are the most complex of the Perl data types. In addition to
what we have seen above, HVs use <i>HE</i> structs to represent
key/value pairs and <i>HEK</i> structs to represent keys.
<p>The HV type itself contains all the fields of SvPVMG and then adds
four new fields:
<dl>
<dt><b>RITER, EITER</b>:
<dd>The first two fields are used to implement a single iterator over the
elements in the hash.
RITER which is an integer index into the array referenced by ARRAY and
EITER which is a pointer to an HE. In order find the next hash
element one would first look at EITER->next and if it turns out to be
NULL, RITER is incremented until ARRAY[RITER] is non-NULL. The
iterator starts out with RITER = -1 and EITER = NULL.<p>
<dt><b>RMROOT/NAME</b>:
<dd>The last two fields are only used when the hash represents a name
space (<i>stash</i>). PMROOT points to a node in the Perl syntax
tree. It is used to implement the reset() builtin for REs. NAME is a
NUL-terminated string which denotes the fully qualified name of the
name space (aka <i>package</i>). This is one of the few places where
Perl does not allow strings with embedded NULs.<p>
</dl>
<p>The first few fields of the xpvhv have been renamed in the same way
as for AVs. MAX is the number of elements in ARRAY minus one. (The
size of the ARRAY is required to be a power of 2, since the code that
deals with hashes just mask off the last few bits of the HASH value to
locate the correct HE column for a key: <code>ARRAY[HASH &
MAX]</code>). Also note that ARRAY can be NULL when the hash is empty
(but the MAX value will still be at least 7, which is the minimum
value assigned by Perl.)
The FILL is the number of elements in ARRAY which are not NULL. The
IVX field has been renamed KEYS and is the number of hash elements in
the HASH. The NVX field is unused.
<p>The <b>HE</b>s are simple structs containing 3 pointers. A pointer to the
next HE, a pointer to the key and a pointer to the value of the given hash
element.
<p>The <b>HEK</b>s are special variable sized structures that store the hash
keys. They contain 3 fields. The computed <i>hash</i> value of the string,
the <i>len</i>gth of the string, and <i>len</i>+1 bytes for the
key string itself (including trailing NUL).
As a special case, a <i>len</i> value of <code>HEf_SVKEY</code> (-2)
indicate that a pointer to an SV is stored in the HEK instead of a
string. This hack is used for some magical hashes.
<p><center><img src="hv.png"></center>
<p>In a perfect hash both KEYS and FILL are the same value. This
means than all HEs can be located directly from the pointer in the
ARRAY (and all the he->next pointers are NULL).
<p>The following two hash specific flags are found among the common
SvNULL flags:
<blockquote><dl>
<dt> 21) <b>SHAREKEYS</b>
<dd> When this flag is set, then the hash will share the HEK structures
with a special hash pointed to by the <code>strtab</code> variable.
This reduce the storage occupied by hash keys, especially when we
have lots of hashes with the same keys.
The SHAREKEYS flag is on by default for newly created HVs.
<p>
<center><img src="strtab.png"></center>
<p>
What is special with the <code>strtab</code> hash is that the <i>val</i>
field of the HE structs is used as a reference counter for the
HEK. The counter is incremented when new hashes link up this HEK
and decremented when the key is removed from the hashes.
When the reference count reach 0, the HEK (and corresponding HE)
is removed from <code>strtab</code> and the storage is freed.
<p>
<dt> 22) <b>LAZYDEL</b>
<dd>This flag indicates that the hash element pointed to by EITER is
really deleted. When you delete the current hash element, perl
only marks the HV with the LAZYDEL flag, and when the iterator
is advanced, then the element is zapped. This makes it possible
to delete elements in a hash while iterating over it.
<p>
</dl></blockquote>
<a name="gv"><h2>GV</h2></a>
<The <i>GV</i> ("glob value" aka "symbol") is like <i>SvPVMG</i> above,
but has five additional fields; GP, NAME, NAMELEN, GvSTASH, FLAGS.
<p>The GP is a pointer to structure that holds pointers to data of
various kinds. Perl use a pointer, instead of including the GP fields
in the xpvgv, in order to implement the proper glob aliasing
behavior (i.e. different GVs can share the same GP).
<p>The NAME/NAMELEN denotes the unqualified name of this symbol and
GvSTASH points to the symbol table where this symbol belongs. The
fully qualified symbol name is obtained by taking the NAME of the
GvSTASH (see HV above) and appending "::" and NAME to it. The hash
pointed to by GvSTASH will usually contain an element with NAME as key
and a pointer to this GV as value. See description of stashes below.
<p>The FLAGS field contains space for 8 additional flag bits:
<blockquote><dl>
<dt>0) <b>INTRO</b>
<dt>1) <b>MULTI</b>
<dd> Have we seen more than one occurrence of this glob. Used to
implement the "possibly typo" warning.
<dt>2) <b>ASSUMECV</b>
<dt>4) <b>IMPORTED_SV</b>
<dt>5) <b>IMPORTED_AV</b>
<dt>6) <b>IMPORTED_HV</b>
<dt>7) <b>IMPORTED_CV</b>
</dl></blockquote>
<p>A magic of type '*' is always attached to the GV (not shown in the
figure). The magic GET method is used to stringify the globs (as the fully
qualified name prefixed with '*'). The magic SET method is used to alias
an GLOB based on the name of another glob.
<p><center><img src="gv.png"></center>
<h3>GP</h3>
<p>GPs can be shared between one or more GVs. The data type fields
for the GP are: SV, IO, FORM, AV, HV, CV. These hold a pointer to the
corresponding data type object. (The SV must point to some simple SvNULL
subtype (i.e. with type <= SVt_PVLV). The FORM field must point to a
SvPVFM if non-NULL. The IO field must point to an IO if non-NULL, the AV
to an AV, etc.) The SV is always present (but might point to a
SvNULL object). All the others are initially NULL.
<p>The additional administrative fields in the GP are: REFCNT, EGV,
CVGEN, LASTEXPR, LINE, FILEGV.
<p>REFCNT is a reference counter. It says how many GVs have a pointer
to this GP. It is incremented/decremented as new GVs reference/forget
this GP. When the counter reach 0 the GP is freed.
<p>EGV is a pointer to the GV that originally created this GP (used to
tell the real name of any aliased symbol). If the original GV is freed,
but GP should stay since another GV reference it, then the EGV is
NULLed.
<p>CVGEN is an integer used to validate method cache CV entries in the
GP. If CVGEN is zero, then the CV is real. If CVGEN is non-zero, but
less than the global variable <tt>subgeneration</tt>, then the CV
contains a stale method cache entry. If CVGEN is equal to
<tt>subgeneration</tt> then the CV contains a valid method cache
entry.
(Every time some operation that might invalidate some of the
method caches are performed, then the <tt>subgeneration</tt> variable
is incremented.)
<p>LASTEXPR is an integer I don't understand the purpose of yet.
<p>FILEGV is a pointer to a GV where the SV of the GP is the name of
the file where this symbol was first created. The NAME of this GV is
the filename prefixed with "_<" and GvSTASH is
<code>defstash</code> (see description of stashes below).
<p>LINE is the corresponding line number in the file.
<h3>Stashes</h3>
GVs and stashes work together to implement the name spaces of Perl.
Stashes are named HVs with all the element values being pointers to
GVs. The root of the namespace is pointed to by the global variable
<code>defstash</code>.
<p>In the figure below we have simplified the representation of
stashes to a single box. The text in the blue field is the NAME of
the HV/stash. The hash elements keys are shown as field names and the
element values are shown as a pointers to globs (GV). The GVs are
also simplified to a single box. The text in the green field in the
fully qualified name of the GV. Only the GP data fields are shown (and
FORM has been eliminated because it was not 2 letters long :-).
<p>The figure illustrates how the scalar variables <code>$::foo</code>
and <code>$foo::bar::baz</code> are represented by Perl.
<p><center><img src="stash.png"></center>
<p>All resolution of qualified names starts with the stash pointed to
by the <code>defstash</code> variable. Nested name spaces are
implemented by a stash entry with a key ending in "<code>::</code>".
The entry for "<code>main::</code>" ensures that <code>defstash</code> is also
known as "<code>main</code>" package (and has the side-effect that the
"<code>main::main::main</code>" package is <code>defstash</code> too.)
Unqualified names are resolved starting at <code>curstash</code> or
<code>curcop->cop_stash</code> which are influenced by the
<code>package</code> declaration in Perl.
<p>As you can see from this figure, there are lots of pointers to
dereference in order to look up deeply nested names. Each stash
is at least 4 levels deep and each glob is 3 levels, giving at least
24 pointer dereferences to access the data in the
<code>$foo::bar::baz</code> variable from <code>defstash</code>.
<p>The <code>defstash</code> stash is also a place where globs
representing source files are entered. These entries are prefixed
with "<code>_<</code>". The FILEGV field of the GP points to the
same glob as the corresponding "<code>_<</code>" entry in
<code>defstash</code> does.
<a name="cv"><h2>CV</h2></a>
The <i>CV</i> ("code value") is like <i>SvPVMG</i> above, but has
thirteen additional fields; CvSTASH, START, ROOT, XSUB, XSUBANY, GV, FILEGV,
DEPTH, PADLIST, OUTSIDE, MUTEXP, OWNER, FLAGS. (The fields MUTEXP and
OWNER are only present for Perls that support threading).
<p><center><img src="cv.png"></center>
<p><i>I will need to describe PADs and OPs before detailed description
of CVs makes sense.</i>
<a name="svpmfm"><h2>SvPVFM</h2></a>
The <i>SvPVFM</i> is like <i>CV</i> above, but adds a single field
called LINES.
<p><center><img src="svpvfm.png"></center>
<a name="io"><h2>IO</h2></a>
The <i>IO</i> is like <i>SvPVMG</i> above, but has quite a few
additional fields.
<p><center><img src="io.png"></center>
<a name="pad"><h2>PAD</h2></a>
NOTYET
<a name="op"><h2><a href="op.html">OP</a></h2></a>
<i>I will perhaps make a separate document that describes <a
href="op.html">syntax trees and OP codes</a>.</i>
<h2>Stacks</h2>
During compilation and runtime Perl use several stacks to manage
itself and the program running. The first three described here
implements scopes, including variables and values which are restored (or
actions to be performed) when the scope is left.
<p>The <code>scopestack</code> pushes the <code>savestack_ix</code>
when <code>ENTER</code> is executed. On <code>LEAVE</code> the top
<code>savestack_ix</code> entry is popped and all things saved on the
<code>savestack</code> since this is restored. This means that a
<code>ENTER/LEAVE</code> pairs represents dynamic nestable scopes.
<p>The <code>savestack</code> contains records of things saved in
order to be restored when the scopes are left. Each record consist of
2-4 ANY elements. The first one is a type code, which is used to
decide how long the record is and how to interpret the other elements.
(In the figure the type codes are marked pinkish color.) The
restoring involves updating memory locations of various types as well
as more general callbacks (destructors).
<p>The <code>tmps_stack</code> implement mortal SVs. Each time a new
mortal is made, then <code>tmps_ix</code> is incremented and the
corresponding entry in <code>tmps_stack</code> made to point to it.
When <code>SAVETMPS</code> is executed, then the old
<code>tmps_floor</code> value is saved on the <code>savestack</i> and
then <code>tmps_floor</code> is set equal to <code>tmps_ix</code>.
When <code>FREETMPS</code> is executed, then all SVs pointed to by the
pointers between <code>tmps_floor</code> and <code>tmps_ix</code> will
have their REFCNT decremented. How many this will be depend on how
many scopes has been left. Note that the <code>tmps_floor</code> and
<code>tmps_ix</code> values is the index of the last SV* pushed. They
both start out as -1 when the stack is empty.
<p><center><img src="scope.png"></center>
<p>The next fours stacks manage subroutine calls, evals, and
loops.
<p>The first one is simply denoted as <em>the stack</em> and is really
an AV. The variable <code>curstack</code> points to this AV. To
speed up access Perl also maintain direct pointers to the start
(<code>stack_base</code>) and the end (<code>stack_max</code>) of the
allocated ARRAY of this AV. The AV so special that it is marked as
not REAL and the FILL field is not updated. Instead we use a
dedicated pointed called <code>stack_sp</code>, the stack pointer.
The stack is used to pass arguments to PP operations and subroutines
and is also the place where the result of these operations as well as
subroutine return values are placed.
<p>The <code>markstack</code> is used to indicate the extent of the
stack to be passed as @_ to Perl subroutines. When a subroutine is to
be called, then first the start of the arguments are marked by pushing
the <code>stack_sp</code> offset onto <code>markstack</code>, then the
arguments themselves are calculated and pushed on the stack. Then the
<code>@_</code> array is set up with pointers the SV* on the stack
between the <code>MARK</code> and <code>stack_sp</code> and the
subroutine starts running. For XSUB routines, the creation of
<code>@_</code> is suppressed, and the routine will use the
<code>MARK</code> directly to find it's arguments.
<p>The <code>retstack</code> contains pointers to the operation to go
to after subroutines return. Each time a subroutine is called a new
OP* is pushed on this stack. When a subroutine returns, Perl pop the
top OP* from <code>retstack</code> and continue execution from this
place.
<p>The <code>cxstack</code> (context stack) contains records that
describe the current context. Each time a subroutine, an eval or a
loop is entered, then a new PERL_CONTEXT record is pushed on the
<code>cxstack</code>. When subroutine/eval/loop finished then the
record top record is poped and the corresponding values restored. The
PERL_CONTEXT record will eventually be descibed below.
<p><center><img src="stack.png"></center>
<h3>PERL_CONTEXT</h3>
NOTYET
<h2>REGEXP</h2>
NOTYET
<h2>PerlInterpreter</h2>
NOTYET
<!-- ############################################################ -->
<pre>
</pre>
<hr>
<div align=right>
<small>
<i>© 1998-1999 Gisle Aas</i><br>
<a href="mailto:gisle@aas.no"><gisle@aas.no></a><br>
$Date: 1999/11/02 17:02:43 $
</small>
</div>
</BODY>