PEP: Title: Multidimensional Arrays for Python Version: $Revision: 1.23 $ Last Modified: $Date: 2005/03/17 06:27:45 $ Author: Travis Oliphant Status: Draft Type: Standards Track Created: 02-Feb-2005 Python-Version: 2.5 Abstract Multidimensional arrays are often used in scientific and engineering programming, but they have uses in other application areas as well as evidenced by the popularity of spreadsheet applications. Numeric has for 10 years provided a multidimensional array object for Python users. However, it had to be installed separately from core Python which has lead others who would benefit from the array structrue to invent their own object to accomplish similar purposes. More recently, Numarray has provided examples of how some of the problems people have identified with Numeric over the years could be successfully resolved. It would be highly beneficial not only to the scientific users of Python, but the entire Python community if a single arrayobject could be placed in the core. Therefore, this PEP proposes the addition of an arrayobject to Python along with an associated C-API and compiled module. This could be seen as a hybrid of the Numeric and Nummarray objects. One aspect that this PEP does not address is the addition of a basic set of universal functions for the arrayobject which would allow them to be easily used in mathematical calculations. The addition of a universal function (ufunc) object and that set of math functions is handled by another PEP. It is the hope that any conflicts that may arise which could delay acceptance of that PEP do not have to delay the acceptance of this PEP. Rationale Numeric and now Numarray are seeing increased adoption among many users of Python. These array objects make it possible for Python to be used in exactly the same way as other high-level interactive-and-programming environments such as MATLAB and IDL, while taking advantage of the superior programming language of Python. The presence of both Numeric and Numarray is somewhat confusing and nobody involved believes that having two arrayobjects is an ideal situation. The purpose of this PEP is to propose a hybrid arrayobject that merges the two objects (providing the benefits of both). By placing this object into the Python core, it will have the added benefit of allowing other package developers to easily exchange data that naturally makes use of multidimensional arrays. Though the goal of this PEP is not to replace the arraymodule already in Python, it could conceivably do that as well. If that is to happen, this PEP proposes phasing out the arraymoudule only when it becomes obvious that the multidimensional array object is a fitting replacement. Specification The bulk of this PEP describes the specification of the multi-dimensional array (hereafter referred to as array) object. Where conflicts have arisen in the specification, a discussion of the issues will be presented along with a description of the proposed hybrid solution. Name Numeric named the type ArrayType as it was supposed to replace the arraymodule. Trying to mesh the goals of the numerical array object with that module at this point is difficult and so this PEP proposes to call the new C-type PyArray_Type (used in the C-API) and the Python type name ndarray. This is completely open to comment. Other possibilities include marray, numarray, narray, or ndarray (universal array). Basic Types The arrayobject will support arrays of *all* the basic c-types plus a few C99 extensions that will be supported for all platforms (though functionality may be reduced on platforms that do not support certain types). The array will understand 21 main C-types bool signed char (byte) unsigned char (ubyte) signed short unsigned short (ushort) signed int unsigned int (uint) signed long unsigned long (ulong) PY_LONG_LONG -- signed long long or equivalent on platform (longlong) unsigned PY_LONG_LONG -- unsigned long long or equivalent (ulonglong) float double longdouble -- long double (equivalent to double if not available) cfloat -- complex float ------| cdouble -- complex double |-- has .real and .imag members clongdouble -- complex long double ----| PyObject * -- object arrays char * -- character array PY_UNICODE * -- unicode array void * -- arbitrary collection of bits (useful for record-array). an intp type will be an alias for a signed integer type such that sizeof(intp) == sizeof(void *) for the platform. The uintp type will be the unsigned equivalent. The char *, Py_UNICODE *, and void * arrays all have flexible itemsizes while the other arrays have fixed itemsizes. The void * array is intended to hold arbitrary records (equivalent to C-structures) as array elements. Aliases to these types for a specific bit width will be available so that users may refer to the array as a SHORT array or an Int16 array. The aliases will be be mapped to the "largest" c type that exactly holds the desired bit width for that C implementation. Thus, if both long and int are 32 bit on a platform, Int32 will map to long. Arrays will not be assumed to always be aligned on word boundaries, nor necessarily stored in the same byte-order as the platform default, nor even writeable (mutable). Flags in the array object will indicate if the array is well-behaved or not. There will also be a hierchial tree of arraytypes defined. The leaves of this tree will be actual new Python-scalars (that also inherit from the appropriate Python-scalar if it exists). The other types in the tree will have no "data" The following inheritance hierarchy (partly borrowed from Numarray) is suggested. Parenthesis indicate the PythonType that the type will also inherit From. (The actual c-names will be PyNAMEArrType_Type where NAME is what is given in the outline.) GenericArrType BoolArrType : BOOL (PyBool_Type) NumericArrType IntegerArrType SignedIntegerArrType ; (IntXX) ByteArrType : BYTE ShortArrType : SHORT IntArrType : INT LongArrType: LONG (PyInt_Type) LongLongArrType : LONGLONG UnsignedIntegerArrType ; (UIntXX) UByteArrType : UBYTE UShortArrType : USHORT UIntArrType : UINT ULongArrType : ULONG ULongLongArrType : ULONGLONG FloatingArrType ; (FloatXX) FloatArrType : FLOAT DoubleArrType : DOUBLE (PyFloat_Type) LongDoubleArrType : LONGDOUBLE ComplexArrType ; (ComplexXX) CFloatArrType : CFLOAT CDoubleArrType : CDOUBLE (PyComplex_Type) CLongDoubleArrType : CLONGDOUBLE FlexibleArrType CharacterArrType StringArrType : CHAR (StringType) UnicodeArrType : UNICODE (UnicodeType) VoidArrType : VOID ObjectArrType : OBJECT This is 30 new type objects: 9 marker-only types serving to make it easier to detect certain types of arrays and 21 scalar array objects that will inherit much of their behavior from the basic arraytype. Note that these new types are essentially new Python scalar types meant to connect the single array type to the standard Python scalars which can also be interpreted as ndim-0 arrays. They serve two purposes 1) to smooth the transition between Python objects and a 0-dim array which is another kind of scalar. 2) to provide a nice hierarchial type structure for arrays Along with this fundamental set of objects, aliases will be available for creating and specifying arrays using the desired bit-width (e.g. Int16, Float32). These aliases will map to the largest C-type that exactly holds the requested bitwidth. Bit-widths will also be the default way that arrays print themselves. Because of redundancies some underlying c-types may not be accessible through the bit-width approach or a user may not know or care about the bit-width but want a particular c-type. If a specific c-type is desired for some reason it may be specified through its C-type name (the names given after the : in the above table). The type classes and the C-type aliases are all stored in a typeDict dictionary and available as names in a separate Python module. Standard sub-classes (or container classes): Record Arrays Record Arrays will be implemented in Python (either as a subclass or a container class of ndarray). They will make use of the VOID array to store their data. The getfield method will be particularly useful for these arrays. The specification of Record Arrays will not be done in this PEP. As proof of concept, a record array class adhering to the numarray specification will be available. Matrices Masked Arrays Sequence and Mapping Behavior The ArrayObject will allow sophisticated multidimensional indexing for both retrieving and setting items. Much of this behavior is familiar to users of Numeric or numarray. Some specifics that this PEP proposes. 1) Basic slicing (using only scalars, slices, NewAxis, or Ellipses) will always return a view of the array as currently done in Numeric and numarray (data of the returned array is shared with the data of the array that is sliced). 2) Advanced indexing using boolean arrays or integer index arrays (or lists which will always be interpreted as index arrays) will return a copy. The behavior of this type of slicing will be defined by the rules given below. Setting array elements using advanced indexing will be similar to getting. The object used for setting will be force-cast to the array's type if needed. This type must be "broadcastable" to the required shape specified by the indexing, where "broadcastable" is more fully explained below. Alternatively, the object can be an array iterator. This will repeatedly iterate over the object until the desired elements are set. The shape of X is never changed. Integers in index arrays can be negative to mean selection from the end of the array. Out of range indexes will raise errors. In the following, X is the array to be indexed, B is a boolean index array, and T is a tuple of indexing objects or a single indexing object (except for a boolean array) (note that X[a,b] is equivalent to X[(a,b)] ). The rules for indexing are: a) X[B] B.ndim <= X.ndim If B.ndim = X.ndim selects a 1-d array filled with the elements of A corresponding to the non-zero values of B. The search order will be C-style (last-index varies the fastest). If B is larger than X in any dimension an error will be raised. If B is smaller than X in any dimension, it's values will be assumed 0. If ndim(B) < ndim(X), then this is equivalent to X[nonzero(B)] where nonzero is a function that returns a (B.ndim)-tuple of 1-D index arrays that specify which elements of B are True. In fact, the above statement is true when ndim(B)=ndim(X) as well but it is probably easier to understand the ndim(B) = ndim(X) case without bringing in tuple-indexing. b) X[T] if len(T) > X.ndim an error is raised. General-purpose (partial) indexing. A single indexing object that is not a boolean array will be promoted to a 1-tuple. The length of the tuple object must always be less than or equal to the number of dimensions of X. If any element of the tuple is a list, boolean array, or integer index array, then advanced indexing occurs. Most things in tuple are interpreted as integer index arrays: Lists: interpreted as integer arrays (forced casting to INTP is done). Scalars: Cast to INTP type array and used as a broadcastable array. Other sequences: Any sequence that allows it will be cast to INTP an used. The shape of all these objects in the tuple must be "broadcastable" to a single global shape. The arrays are "broadcastable" if any of the following holds: 1) The arrays all have exactly the same shape. 2) The arrays all have the same ndim and the dimensions are all the same value or 1. 3) Arrays with ndim too small can be pre-pended with 1's in their shape to satisfy property 2). Thus, if a.shape == (5,1), b.shape = (1,6), c.shape = (6,) and d.shape = () (i.e. d is a scalar) then a, b, c, and d are all broadcastable to dimension (5,6). The array "a" acts like a (5,6) array where a[:,0] is broadcast to the other columns, "b" acts like a (5,6) array where b[0,:] is broadcast to the other rows, "c" acts like a (1,6) array and thus a (5,6) array where c[:] is broadcast to every row. Finally, "d" acts like a (5,6) array where the single value is repeated. These arrays are not actually constructed. But, the behavior is the same. Ellipsis and Slice objects in the tuple allow altering partial index behavior. Ellipsis: represents one or more ":" to fill the dimensions. The first ellipse encountered will be expanded. Other ellipses are equivalent to an additional ":" Slices: indicates partial indexing for the given dimension. if len(T) < X.ndim it is equivalent to having an Ellipsis object at the end. Let Ns be the number of slice objects in the tuple (after expanding the Ellipsis). Let Nt be the number of standard objects (i.e. converted to integer indexes) in the tuple. Notice that Nt + Ns = X.ndim (because of implied Ellipsis) If Ns=0 then the result is the same shape as the unified "broadcast" array with elements determined by the pseudo-formula: result[Nidx] = x[ind1[Nidx], ind2[Nidx], etc., indN[Nidx]] where Nidx is an N-dimensional tuple (i1,...,iN) and ind1 to indN are the "broadcast-equivalent" integer-array elements of the tuple. A three-dimensional example is result[i,j,k] = x[ind1[i,j,k], ind2[i,j,k], ind3[i,j,k]] If Ns > 0 then partial indexing is done. This can be confusing, but it is straightforward if you think in terms of the shapes of the arrays involved. The rule is that the shape of the result (or the desired shape of the setting object) is the shape of X with its indexed subspaces replaced with the broadcasted indexing subspace. If the indexing subspaces are right next to each other, then the broadcasted indexing space directly replaces all of the indexed subspaces in X. If the indexing subspaces are separated (by slices or Ellipsis), then the broadcasted indexing space is first, followed by the un-indexed portions of X. Some examples can help illumniate the rule: If there is only one indexing array, then the shape of this indexing array replaces the corresponding axis in result (this is so partial indexing can fully replace take) Thus if X.shape is (10,20,30) and ind1 is a (2,3,4) indexing array, then result = X[...,ind1,:] gives result.shape = (10,2,3,4,30) because the (20,)-shaped subspace has been replaced with a (2,3,4)-shaped subspace. Note that if i,j,k loop over the (2,3,4)-shaped subspace then result[...,i,j,k,:] = X[...,ind1[i,j,k],:] This is the same as take(X, ind1, axis=-2) Now, let X.shape be (10,20,30,40,50) and suppose ind1.shape is (2,3,4). Then, X[:,ind1,ind2] has shape (10,2,3,4,40,50) because the (20,30)-shaped subspace in X has been replaced by a (2,3,4) subspace from the indexes. result[:,i,j,k,:,:] = X[:,ind1[i,j,k],ind2[i,j,k],:,:] However, X[:,ind1,:,ind2,:] has shape (2,3,4,10,30,50) because there is no unambiguous place to drop in the single indexing subspace, thus it is tacked-on to the beginning. result[i,j,k,:,:,:] = X[:,ind1[i,j,k],:,ind2[i,j,k],:] Note: It is always possible to use swapaxes to position the subspace where you would like (on both getting and setting). In pseudo-code, the general split-subspace result is: result[Midx,...] = X[<>,ind1[Midx],<>,indM[Midx],<>] where <> indicates possible partial indexing objects (slices or Ellipsis). If no slice objects are specified but Nt < X.dim, then it is assumed that an Ellipsis object was inserted at the end so that Ns=X.dim-Nt and results[Midx,...] = X[ind1[Midx], etc., indM[Midx], ...] Remember: when the indexing arrays are right next to each other (defining a connected subspace), the shape of the result is derived from the shape of X with the indexed-subspace replaced with the (broadcasted) indexing subspace. 3) All operations (including) slicing that reduce the array to a scalar will return an Array Scalar. These will be very similar to standard Python scalars where the equivalent type exists, but will have all the methods and attributes of higher-dimensional arrays. As indicated above, these new Python scalars will inherit from the existing Python scalar where an exact match exists. 0-dim arrays and Python Scalars Background Here scalars are defined as Python integers, floats, or complex objects. 0-dim arrays are equivalent to a scalar mathematically. There shape is arguably () and they represent a single-element of the corresponding basic type. There can be a 0-dim array for each of the supported types. These 0-dim arrays can fall out naturally in multi-dimensional computation and manipulation. For example: Consider X[i] where i is an integer and X is an array with N dimensions. Then X[i] is an array with N-1 dimensions. So, if N is 1 (X is a vector) then X[i] is a 0-dim array which could also be considered a scalar. 0-dim arrays can behave like ints or floats as long as Python asks the object to convert itself to an int or float before raising an error. Python also defines some scalar types. For exactly 6 of the new array types there is a corresponding Python scalar type (bool, integer, float, complex, string, unicode). These types (particularly the integer type) have special priveledges with Python. For example, there are places in Python code where an int is needed and Python does not ask an arbitrary object to convert itself to an int before raising an exception (indexing is the only place currently identified) Questions 1) should sequence behavior (i.e. some combination of slicing, indexing, and len) be supported for 0-dim arrays? Pros: It means that len(a) always works and returns the size of the array. Slicing code and indexing code will work for any dimension (the 0-dim array is an identity element for the operation of slicing) Cons: 0-dim arrays are really scalars. They should behave like Python scalars which do not allow sequence behavior 2) should array operations that result in a 0-dim array that is the same basic type as one of the Python scalars, return the Python scalar instead? Pros: 1) Some cases when Python expects an integer (the most dramatic is when slicing and indexing a sequence: _PyEval_SliceIndex in ceval.c) it will not try to convert it to an integer first before raising an error. Therefore it is convenient to have 0-dim arrays that are integers converted for you by the array object. 2) No risk of user confusion by having two types that are nearly but not exactly the same and whose separate existence can only be explained by the history of Python and NumPy development. 3) No problems with code that does explicit typechecks (isinstance(x, float) or type(x) == types.FloatType). Although explicit typechecks are considered bad practice in general, there are a couple of valid reasons to use them. 4) No creation of a dependency on Numeric in pickle files (though this could also be done by a special case in the pickling code for arrays) Cons: It is difficult to write generic code because scalars do not have the same methods and attributes as arrays. (such as .type or .shape). Also Python scalars have different numeric behavior as well. This results in a special-case checking that is not pleasant. Fundamentally it lets the user believe that somehow multidimensional homoegeneous arrays are something like Python lists (which except for Object arrays they are not). Proposed Solution: Create Python scalar types for all of the 21 types and also inherit from the three that already exist. Define equivalent methods and attributes for these Python scalar types. These scalar types may be internally converted to the corresponding 0-dim array when needed and to simplify code. Users will never see 0-dim arrays on the Python level and it is likely that only advanced users will be aware of their existence. Currently, only the LongArrType will inherit from the IntType and therefore be used in all places that integers are used. An __index__ method added to Python and arrayobjects will allow other integer-like arrays to be used as indexes. When ndarray is imported, it will alter the numeric table for python int, float, and complex to behave the same as array objects. Thus, in the proposed solution, 0-dim arrays would never be returned from calculation, but instead, the equivalent Python Array Scalar Type. Internally, these ArrayScalars can be quickly converted to 0-dim arrays when needed. Each scalar would also have a method to convert to a "standard" Python Type upon request (though this shouldn't be needed often). The new Python scalars will not have sequence behavior except where the underlying scalar allows it (StringObjects, etc.). They will also have read-only buffer behavior. Buffer Behavior Arrays can be C-style contiguous, Fortran-style contiguous, or discontiguous in memory (with well-defined strides specifying how far to jump to get to the next object in that dimension). (C or Fortran) contiguous arrays will have a single buffer segment, while discontiguous arrays will have a segment for each contiguous portion (in the worst case it could be a 1-element segment for each element in the array). Note that the array will not reallocate the memory block if another PyObject object is referencing it. All objects that implement the buffer protocol, should guarantee this. Number Behavior Number methods will be defined but will refer only to an internal dictionary of functions that can be loaded by the user. A separate (ufunc) PEP will describe a set of default number methods that could be loaded. Python Scalars will call these array methods as well for their number behavior (internally converted to 0-dim arrays). Standard Iterator iter(a) on an array will return a Python-default iterator object that returns a ndim (N-1) array on each iteration as if a[i] had been called. Flat Iterator A 1-d iterator will be defined that will walk through any array, returning a Python scalar at each step. Order of the iteration is the same for contiguous and discontiguous arrays. The last index always varies the fastest These 1-d iterators can also be indexed and set. In which case the underlying array will be considered 1-d (but does not have to be contiguous in memory). Mapping Iterator An additional mapping iterator is defined that will walk through an array according to a specific subscript behavior. This iterator can be used as a[iterator] to both get and set specific elements of an array. A mapping iterator can also be constructed as mapiter(*args) where args is a tuple of objects that can be used for "indexing". A mapiter object can be used for slicing any array (it must be bound to an array before being useful however). Note that a[] is equivalent to a[mapiter()] with one syntactical difference: (1) in the mapiter() creation function slicing constructs such as 3:10 must be replaced with the appropriate slice object 3:10:2 <==> slice(3,10,2) because the slicing syntax does not work in function calls The utility of using the mapiter object is that it performs some of the overhead for implementing the subscript behavior once. Attributes Attributes are intrinsic parts of an array. Therefore, attributes will not produce copies of the data of an array unless the copy is read-only. The following attributes are proposed (the attributes cannot be set unless specified) .ndim -- the number of dimensions (See **Note**) .flags -- a dictionary with flag_name keys and value entries flags set with a similar dictionary (or an integer) (But, only certain flags can be set: NOTSWAPPED, WRITEABLE, UPDATEIFCOPY. The latter can only be made FALSE). .shape -- tuple showing the shape of the array (can be set if contiguous array, provided total size of new shape is not different) .strides -- tuple showing the strides of the array .data -- returns a buffer (readonly or writeable depending) if the data is contiguous, otherwise an error is raised .itemsize -- the itemsize of this array .size -- the total number of elements in this array .base -- either the array that this data references or an array to be updated with copies of this one when this one is deleted or an object exposing a buffer interface if this array is using the memory of that buffer object or None. .type -- the type of the array (an actual type object) (equivalent to type(X.flat[0])) .typenum -- the C-enum typenumber .typechar -- the typecode character (some of these have changed from Numeric) .real -- get and set the real part of an array .imag -- get and set the imaginary part of an array return a read-only array of zeros if not imaginary. .flat -- return a flatiter object (can be indexed) or if set then fill the array by casting and iterating over the given object .op_priority -- return the numeric priority for this array (default 0.0) bigger numbers take precedence in mixed-operations. **Note** The attribute rank is not suggested because the meaning of rank used here could conflict with an established but different meaning of rank in linear algebra which should be an important domain of use for array objects. Methods More methods are proposed for arrays than in Numeric. Some of these are in numarray. transpose(permutation=default) "fast" view-based transpose (default permutation of shape is reversed). tolist() construct "nested list" (of Python Scalars) version of array toscalar() get an "original" Python Scalar Object tofile(fid) write raw-data to file pointer getfield(type, offset=0) get a "field" of an array. A field is a view of the array memory starting at the given offset in each element of the array and interpreting the data as the given type byteswap() byteswap the array in-place (return nothing) byteswapped() return a byteswapped copy of the array astype() cast the array to a new array (returns a copy if already the right type). copy() get a copy of the array resize() "In-place" reshape the array to a new shape. Works quickly using realloc if possible. Otherwise, new data is created and information is copied over. __copy__() same as .copy -- for copy module __deepcopy__() deepcopy of an array (mainly for PyArray_OBJECT arrays) sort(axis=-1) sort the array in place along the given axes. argsort(axis=-1) return indices giving "how" to sort the array swapaxes() max() min() mean() argmax() argmin() trace() dump() dumps() clip() conj() diagonal() new() nonzero() ravel() repeat() resize() sort() stddev() sum() cumsum() product() cumproduct() alltrue() sometrue() allclose() concatenate() swapaxes() trace() compress() choose() search() There will be an easy way for an external library to update these methods to make use of ATLAS or other faster codes that may be too complicated to distribute with the core Numeric object but others may find useful. Both a C-API and a python API will be available to replace array methods with a new function (perhaps only for specific types?) Module Functions (some of these are also methods or attributes, but they remain as function calls also for backward compatibility) C-level (multiarraymodule.c) New functionality: set_string_function set_numeric_ops set_typeDict set_array_method array arange zeros empty fromstring fromfile frombuffer concatenate _fastCopyAndTranspose cross_correlate innerproduct Also methods or attributes: take put putmask reshape transpose repeat choose sort (but function call returns a copy) argsort searchsorted (.search) argmax Python-level (numeric.py) Also methods or attributes (function calls for backwards compatibility) shape rank ndim size sum average (mean) mean cumsum product - Total product over a specified dimension cumproduct - Cumulative product over a specified dim alltrue - Logical and over an entire axis sometrue - Logical or over an entire axis swapaxes - Exchange axes transpose - Permute axes argmin - Index of smallest value resize - Return array with arbitrary new shape diagonal - Return diagonal array trace - Trace of array dump - Dump array to file object (pickle) dumps - Return pickled string representing data ravel - Return array as 1-D nonzero - Indices of nonzero elements for 1-D array shape - Shape of array compress - Elements of array where condition is true clip - Clip array between two values New functionality: ones asarray - Guarantee NumPy array convolve - Convolve two 1-d arrays concatenate - Join arrays together dot - Dot product (matrix multiplication) outerproduct - Outerproduct of two arrays indices - Tuple of indices fromfunction - Construct array from universal function load - Return array stored in file object loads - Return array from pickled string where - Construct array from binary result identity - 2-D identity array (matrix) allclose - Tests if sequences are essentially equal Matrix Class A default Matrix class will either inherit from or contain the ndarray. There will also be ways to update the matrix methods with faster versions of the default operations. Implementation Plan This PEP proposes the implementation of the hybrid array object as a new-style C-type (with the ability to be subclassed) with the exception of a few python-level functions contained in numeric.py. Therefore, the C-structure defining the new object is described here followed by a description of the proposed C-API. As in the specification section, any potential points of disagreement will be discussed. C-Structures The C-structure for the basic arrayobject is very similar to Numeric's C-structure. The only addition is the itemsize attribute which provides the size of each element. This must be a per-array variable to allow for variable itemsize arrays (record (VOID) arrays, character arrays, and unicode arrays). Also, the dimensions and strides arrays have been changed to intp (which will be a signed integer type that can hold a pointer for the platform -- it could in theory be changed to an unsigned integer, but there are places where it is convenient to use negative strides and index values). PyObject_HEAD char *data; /* pointer to raw memory */ int nd; /* number of dimensions, also called ndim */ intp *dimensions; /* size in each dimension */ intp *strides; /* bytes to jump to get to the next element in each dimension */ PyObject *base; /* Points to the object that owns the memory or to an array to update when this one disappears */ PyArray_Descr *descr; /* Pointer to type structure */ int flags; /* Flags describing array -- see below*/ PyObject *weakreflist; /* For weakreferences */ int itemsize; /* needed for CHAR, UNICODE, and VOID arrays which can have arbitray sizes: same as descr->elsize for other types */ The flags variable can be the bit-wise OR of any of the defined flags: CONTIGUOUS -- set if array is c-style contiguous in memory with the last dimension varying the fastest. FORTRAN -- set if array is fortran-style contiguous in memory with the first dimension varying the fastest. OWN_DATA -- set if this array owns the data buffer and should de-allocate it when the array is deallocated. ALIGNED -- set if the elements of the array are aligned on appropriate boundaries. Typically set unless array is a view of a VOID array. NOTSWAPPED -- set if values are in system byteorder. User can change this to support automatic buffered I/O with data buffers in the wrong byte-order. Usually set. WRITEABLE -- set if the array can be written to (default). The user can clear this flag if array should not be written to. Perhaps it is a read-only memory-mapped file or the user needs to preserve the array contents. UPDATEIFCOPY -- when this array is deallocated, update the array pointed to by base with the contents of this array (useful if a copy was made on creation but user desires automatic updating of the original (perhaps misbheaved) array. When a new array is created with this flag set, the array it will be copying back to is set to readonly for the duration of this current array's existence. An error is raised if the original array is already not-writeable. Standard named FLAG combinations: #define BEHAVED_FLAGS ALIGNED | NOTSWAPPED #define CARRAY_FLAGS CONTIGUOUS | BEHAVED_FLAGS #define FARRAY_FLAGS FORTRAN | BEHAVED_FLAGS #define DEFAULT_FLAGS CARRAY_FLAGS | WRITEABLE #define UPDATE_ALL_FLAGS CONTIGUOUS | FORTRAN | ALIGNED New arrays with newly-allocated data get DEFAULT_FLAGS. Descr Structure -- holds type information (essentially static attributes for the type objects) /* Functions to cast to all other standard types*/ PyArray_VectorUnaryFunc *cast[PyArray_NTYPES]; /* Functions to get and set items */ PyArray_GetItemFunc *getitem; PyArray_SetItemFunc *setitem; /* Functions to compare items */ PyArray_CompareFunc *compare; /* Functions to select largest */ PyArray_ArgFunc *argmax; /* Function to compute dot product */ PyArray_DotFunc *dotfunc; /* Function to scan an ASCII file and place a single value plus possible separator */ PyArray_ScanFunc *scanfunc; PyTypeObject *typeobj; /* the type object for this type */ int type_num; /* number representing this type */ int elsize; /* element size for this type -- or 0 if variable */ int alignment; /* alignment needed for this type */ char type; /* character representing this type */ The typenumbers and characters representing each type are enumerations The typenumbers in C are: PyArray_BOOL=0, PyArray_SBYTE, PyArray_UBYTE, PyArray_SHORT, PyArray_USHORT, PyArray_INT, PyArray_UINT, PyArray_LONG, PyArray_ULONG, PyArray_LONGLONG, PyArray_ULONGLONG, PyArray_FLOAT, PyArray_DOUBLE, PyArray_LONGDOUBLE, PyArray_CFLOAT, PyArray_CDOUBLE, PyArray_CLONGDOUBLE, PyArray_OBJECT=17, PyArray_STRING, PyArray_UNICODE, PyArray_VOID, PyArray_NTYPES, PyArray_NOTYPE There is also a PyArray_INTP and PyArray_UINTP #defined to one of the above types depending on sizeof(void *) for the platform. This is the default type for integer arrays. For each of the unsigned integer types MAX_UXXXX is defined. For the signed integer types MAX_XXXX and MIN_XXXX are defined. The type characters are: PyArray_BOOLLTR = '?', PyArray_SBYTELTR = 'b', PyArray_UBYTELTR = 'B', PyArray_SHORTLTR = 'h', PyArray_USHORTLTR = 'H', PyArray_INTLTR = 'i', PyArray_UINTLTR = 'I', PyArray_LONGLTR = 'l', PyArray_ULONGLTR = 'L', PyArray_LONGLONGLTR = 'k', PyArray_ULONGLONGLTR = 'K', PyArray_FLOATLTR = 'f', PyArray_DOUBLELTR = 'd', PyArray_LONGDOUBLELTR = 'g', PyArray_CFLOATLTR = 'F', PyArray_CDOUBLELTR = 'D', PyArray_CLONGDOUBLELTR = 'G', PyArray_OBJECTLTR = 'O', PyArray_STRINGLTR = 'S', PyArray_UNICODELTR = 'U', PyArray_VOIDLTR = 'V', /* No Descriptor, just a define -- this let's Python users specify an array of integers large enough to hold a pointer for the platform*/ PyArray_INTPLTR = 'p', PyArray_UINTPLTR = 'P', The character codes match Python's PyArg_ParseTuple and the struct module for the most part. But, signed integers are always lower case and the unsigned counterparts are always uppercase. C-API Macros Memory allocation Data buffer -- returns char * PyDataMem_NEW(size) PyDataMem_FREE(ptr) PyDataMem_RENEW(ptr, newsize) Dimensions (and strides) -- returns intp * PyDimMem_NEW(size) PyDimMem_FREE(ptr) PyDimMem_RENEW(ptr, newsize) Flags PyArray_CHKFLAGS(obj, FLAGS) PyArray_ISCONTIGUOUS(obj) PyArray_ISFORTRAN(obj) PyArray_ISWRITEABLE(obj) PyArray_ISNOTSWAPPED(obj) PyArray_ISCARRAY(obj) PyArray_ISFARRAY(obj) PyArray_ISBEHAVED(obj) Array C-structure access PyArray_NDIM(obj) PyArray_DATA(obj) PyArray_DIMS(obj) PyArray_STRIDES(obj) PyArray_DIM(obj,n) PyArray_STRIDE(obj,n) PyArray_BASE(obj) PyArray_DESCR(obj) PyArray_FLAGS(obj) PyArray_ITEMSIZE(obj) PyArray_TYPE(obj) PyArray_GETITEM(obj, itemptr) PyArray_SETITEM(obj, itemptr, v) TypeTests PyArray_Check(obj) PyArray_CheckExact(obj) PyArray_CheckScalar(obj) PyTypeNum_ISUNSIGNED(type) PyTypeNum_ISSIGNED(type) PyTypeNum_ISINTEGER(type) PyTypeNum_ISFLOAT(type) PyTypeNum_ISNUMBER(type) PyTypeNum_ISSTRING(type) PyTypeNum_ISCOMPLEX(type) PyTypeNum_ISPYTHON(type) PyTypeNum_ISFLEXIBLE(type) PyArray_ISUNSIGNED(type) PyArray_ISSIGNED(type) PyArray_ISINTEGER(type) PyArray_ISFLOAT(type) PyArray_ISNUMBER(type) PyArray_ISSTRING(type) PyArray_ISCOMPLEX(type) PyArray_ISPYTHON(type) PyArray_ISFLEXIBLE(type) Iterator PyArrayIter_Check(it) PyArray_ITER_RESET(it) PyArray_ITER_NEXT(it) PyArray_ITER_GOTO(it, destination) Misc MAX(a,b) MIN(a,b) tMAX(a,b,type) -- avoid repeating evaluation of a and b tMIN(a,b,type) PyArray_GETCONTIGUOUS(obj) PyArray_SIZE(obj) PyArray_NBYTES(obj) Functions Basic API PyArray_FromAny() PyArray_New() PyArray_AsCArray() PyArray_CopyArray() PyArray_CastToType() PyArray_CastTo() PyArray_INCREF() PyArray_DECREF() PyArray_SetStringFunction() PyArray_ObjectType() PyArray_SetNumericOps() PyArray_GetNumericOps() PyArray_UpdateFlags() PyArray_IterNew() PyArray_MultiplyList() PyArray_CompareLists() PyArray_TypecodeConverter() PyArray_Converter() PyArray_Resize() PyArray_Copy() PyArray_Zeros() API mimicking multiarraymodule.c calls Reference Implementation The arrayobject of Numeric3 contained in multiarraymodule.c along with the arrayobject.h header and __multiarray_api.h header is the in-development reference implementation for this PEP. Copyright This document is placed in the public domain