Python Inclusion Plan
There is a Google summer of code project underway for getting a base
array object that exports and consumes this interface into
Python. The PEP has been started. Anybody who would
like to
help with the PEP is strongly encouraged. The location of the PEP draft
is http://svn.scipy.org/svn/PEP (use an svn client to check it out).
Description (Version 3)
The homogeneous N-dimensional array interface is a default mechanism
for
objects to share N-dimensional array memory and information. The
interface
consists of a Python-side and a C-side using two attributes.
Objects wishing to be considered an
N-dimensional
array in
application code should support at least one of these
attributes. Objects wishing
to support an N-dimensional array in application code should look for
at least one of these
attributes and use the information provided appropriately.
This interface describes homogeneous arrays in the sense that each item
of the array has the same "type". This type can be very simple or
it can be a quite arbitrary and complicated C-like structure.
Array Interface
There are two ways to use the interface: A Python side and a
C-side. Both are separate attributes.
Python side
This approach to the interface consists of the attribute __array_interface__ which returns a
dictionary of items (3 required and 5 optional).
The optional keys in the dictionary have implied defaults if they are
not provided. The keys are:
shape (required)
- Tuple whose elements are the array size
in each dimension. Each entry is an integer (a Python int or
long). Note that these integers could be larger than the platform
"int" or "long" could hold (a Python int is a C long). It is up to the
code using this attribute to handle this appropiately; either by
raising an error when overflow is possible, or by using Py_LONG_LONG as
the C type for the shapes.
- typestr (required)
- A string providing the basic type of the homogenous array
The basic string format
consists of 3 parts: a character describing the byteorder of
the
data ('<': little-endian, '>': big-endian, '|':not-relevant), a
character code giving the basic type of the
array, and an integer providing the number of bytes the type
uses.
The basic type character codes are:
- 't' -- Bit field (following integer gives the number of bits
in the bit
field).
- 'b' -- Boolean (integer type where all values are only True
or False)
- 'i' -- Integer
- 'u' -- Unsigned integer
- 'f' -- Floating point
- 'c' -- Complex floating point
- 'O' -- Object (i.e. the memory contains a pointer to PyObject)
- 'S' -- String (fixed-length sequence of char)
- 'U' -- Unicode (fixed-length sequence of Py_UNICODE)
- 'V' -- Other (void * -- each item is a fixed-size chunk of
memory)
Thus, an array of unsigned integers using two bytes as big-endian
would be described as '>u2'. More complicated
memory layouts can be described using the descr key and
using the 'V' type here. A regular expression for this string is
'[<|>][tbiufcOSUV][0-9]+'. Notice that the byteorder code
must be present.
-
- descr (optional)
- A list of tuples providing a more detailed description of
the memory
layout for each item in the homogeneous array. Each tuple in the
list
has two or three elements. Normally, this attribute would
be used when typestr is
'V[0-9]+', but this is not a
requirement. The only requirement is that the number of bytes
represented in the typestr key
is the same as the total number of
bytes represented here. The idea is to support descriptions
of C-like structs (records) that make up array elements. The
elements of
each tuple in the list are
-
- A string providing a name associated with this portion of the
record. This could also be a tuple of ('full name', 'basic_name')
where basic name would be a valid Python variable name representing the
full name of the field.
- Either a basic-type description string as in typestr or another list (for nested
records)
- An optional shape tuple providing how many times this part of
the record should be repeated. No repeats are assumed if this is
not given.
Very complicated structures can be described using this
generic interface. Notice, however, that each element of the
array is still of the same data-type. Some examples of
using
this interface are given below.
Default: [('',typestr)]
- data (optional)
- A
2-tuple whose first argument is an integer (a long integer if
necessary) that points to the data-area storing the array
contents. This pointer must
point to the first element of data (in other words any offset is always ignored in this
case). The second entry
in the tuple is a read-only flag (true means the data area is
read-only).
- This attribute can also be an object exposing the buffer
interface which will be used to share the data. If this key is
not present (or returns None), then memory sharing will be done through
the buffer interface of the object itself. In this case, the offset key can be used to
indicate the start of the buffer. A reference to the object
exposing the array interface must be stored by the new object if the
memory area is to be
secured.
- Default:
None
- strides (optional)
- Either None to indicate a C-style contiguous array or a Tuple of
strides which provides the number of bytes needed to
jump to the next array element in the corresponding dimension. Each
entry must be an integer (a Python int or long). As with shape, the values may be larger than
can be represented by a
C "int" or "long"; the calling code should handle this appropiately,
either by raising an error, or by using Py_LONG_LONG in C. The default
is None which implies a C-style
contiguous memory buffer. In this model, the last dimension of
the array varies the fastest. For example, the default strides tuple for an object whose
array entries are 8 bytes
long and whose shape is
(10,20,30) would be (4800, 240, 8)
Default: None (C-style contiguous)
- mask (optional)
- None or an object exposing the array interface. All
elements of
the mask array should be interpreted only as true or not true
indicating which elements of this array are valid. The
shape of this object should be "broadcastable" to the shape of the
original array. "Broadcastable" means that either 1) it is
the same shape, 2) it is the same shape except for possibly a 1 in some
of the shape-tuple elements, or 3) it can be made to satisfy (2) by
pre-pending 1's to the shape.
Default: None (All array values are valid)
- offset (optional)
- An integer offset into the array data region. This can only be
used when data is
None or returns a buffer object.
- Default: 0.
version (required)
An integer showing the
version of the interface (i.e. 3 for this version). Be careful
not to use this to invalidate objects exposing future
versions of the interface.
C-Struct Access
This approach to the array interface allows for faster access to an
array using only one attribute lookup and a well-defined C-structure.
__array_struct__
A PyCObject whose "voidptr" member
contains a pointer to
a filled PyArrayInterface structure. Memory for the structure is
dynamically created and the PyCObject is also created with an
appropriate destructor so the retriever of this attribute simply has to
apply Py_DECREF() to the object returned by this attribute when it is
finished. Also, either the data needs to be copied out, or a
reference to the object exposing this attribute must be held to ensure
the data is not freed. Objects exposing the __array_struct__
interface must also not reallocate their memory if other objects are
referencing them.
typedef struct {
int two;
/* contains the integer
2
-- simple sanity check */
int nd;
/* number of
dimensions */
char typekind;
/* kind in array --- character
code of typestr */
int
itemsize; /* size of
each element */
int
flags;
/* flags indicating how the data should be interpreted */
/* must set ARR_HAS_DESCR bit to validate descr */
Py_intptr_t *shape; /* A length-nd array
of shape information */
Py_intptr_t *strides; /* A length-nd array of stride
information */
void
*data; /* A
pointer to the first element of the array */
PyObject *descr; /* NULL or
data-description (same as descr key
of __array_interface__) -- must set ARR_HAS_DESCR
flag or this will be ignored. */
} PyArrayInterface;
The flags member may consist of 5 bits showing how the data
should be interpreted and one bit showing how the Interface should be
interpreted. The data-bits are CONTIGUOUS (1),
FORTRAN (2), ALIGNED
(0x100), NOTSWAPPED (0x200), and WRITEABLE (0x400). A final flag
ARR_HAS_DESCR (0x800) indicates whether or not this structure has the
arrdescr field. The field should not be accessed unless this flag
is present.
New since June 16, 2006:
In the past most implementations used the "desc" member of the
PyCObject
itself (do not confuse this with the "descr" member of the
PyArrayInterface structure above --- they are two separate things)
to hold the pointer to the object exposing the interface. This is
now an explicit part of the interface. Be sure to own a
reference to the object when the PyCObject is created using
PyCObject_FromVoidPtrAndDesc(...)
Type description examples
For clarity it is useful to provide some examples of the type
description and corresponding __array_interface__['descr']
entries.
Thanks to Scott Gilbert for these examples:
In every case, the 'descr' key is optional, but of course
provides more information which may be important for various
applications.
* Float data
typestr ==
'>f4'
descr ==
[('','>f4')]
* Complex double
typestr ==
'>c8'
descr ==
[('real','>f4'), ('imag','>f4')]
* RGB Pixel data
typestr ==
'|V3'
descr ==
[('r','|u1'), ('g','|u1'), ('b','|u1')]
* Mixed endian (weird but could happen).
typestr ==
'|V8' (or '>u8')
descr ==
[('big','>i4'), ('little','<i4')]
* Nested structure
struct {
int ival;
struct {
unsigned short sval;
unsigned char bval;
unsigned char cval;
} sub;
}
typestr ==
'|V8' (or '<u8' if you want)
descr ==
[('ival','<i4'), ('sub', [('sval','<u2'), ('bval','|u1'),
('cval','|u1') ])
]
* Nested array
struct {
int ival;
double data[16*4];
}
typestr ==
'|V516'
descr ==
[('ival','>i4'), ('data','>f8',(16,4))]
* Padded structure
struct {
int ival;
double dval;
}
typestr ==
'|V16'
descr ==
[('ival','>i4'),('','|V4'),('dval','>f8')]
It should be clear that any record type could be described using this
interface.
Differences with Array interface (Version 2)
The version 2 interface was very similar. The differences were
largely asthetic. In particular:
- The PyArrayInterface structure had no descr member at the end
(and therefore no flag ARR_HAS_DESCR)
- The desc member of the PyCObject returned from __array_struct__
was not specified. Usually, it was the object exposing the array
(so that a reference to it could be kept and destroyed when the
C-object was destroyed). Now it must be a tuple whose first
element is a string with "PyArrayInterface Version #" and whose second
element is the object exposing the array.
- The tuple returned from __array_interface__['data'] used to be a
hex-string (now it is an integer or a long integer).
- There was no __array_interface__ attribute instead all of the
keys (except for version) in the __array_interface__ dictionary were
their own attribute: Thus to obtain the Python-side
information you had to access separately the attributes:
- __array_data__
- __array_shape__
- __array_strides__
- __array_typestr__
- __array_descr__
- __array_offset__
- __array_mask__
Last updated June 20, 2006 by Travis E.
Oliphant