NumPy Structured Arrays

NumPyBeginner
Practice Now

Introduction

In this lab, you will learn about structured arrays in NumPy. Structured arrays are a powerful feature for working with heterogeneous data, similar to tables in a database or a spreadsheet. Each element of a structured array can be thought of as a row, with named columns called "fields". This makes them ideal for organizing and manipulating tabular data directly within Python.

Throughout this lab, you will write and execute Python code in the structured_arrays.py file provided in the WebIDE.

Creating and Accessing a Structured Array

First, let's create a simple structured array. A structured array's data type (dtype) is defined as a list of tuples. Each tuple specifies a field with its (name, data_type). This allows us to store different data types, like strings and integers, in the same array.

Open the structured_arrays.py file from the file explorer on the left panel. Add the following code to create a structured array representing a list of people with their names and ages.

## Create a structured array
data = np.array([('Alice', 25, 55.5), ('Bob', 30, 68.0)],
                dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])

print("Original Array:")
print(data)

## Access a specific field by its name
names = data['name']
print("\nNames field:")
print(names)

Code Explanation:

  • import numpy as np: This line imports the NumPy library.
  • np.array([...], dtype=[...]): We create an array. The first argument is a list of tuples, where each tuple ('Alice', 25, 55.5) represents one row of data.
  • dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]: This is the crucial part. We define three fields:
    • 'name': A Unicode string with a maximum length of 10 characters (U10).
    • 'age': A 4-byte (32-bit) integer (i4).
    • 'weight': A 4-byte (32-bit) float (f4).
  • data['name']: We can access all values from a specific field (column) by using its name as an index, which returns a new NumPy array.

Now, save the file and run it from the terminal to see the output.

python structured_arrays.py

You should see the following output, which shows the full structured array and the array containing only the names.

Original Array:
[('Alice', 25, 55.5) ('Bob', 30, 68. )]

Names field:
['Alice' 'Bob']

Modifying Fields and Indexing

Structured arrays are mutable, meaning you can change their values. You can modify an entire field at once or access a specific element by its index and then modify its field. You can also create a new array containing a subset of the original fields.

Add the following code to the end of your structured_arrays.py script.

## Modify the 'age' field
data['age'] = [26, 31]
print("\nArray after modifying age:")
print(data)

## Access a single element (the first row)
first_person = data[0]
print("\nFirst person's data:")
print(first_person)

## Create a new array with a subset of fields
subset = data[['name', 'weight']]
print("\nSubset of array (name and weight):")
print(subset)

Code Explanation:

  • data['age'] = [26, 31]: This assigns a new list of values to the age field, updating the entire column.
  • data[0]: This accesses the first element (row) of the array. The result is a NumPy void scalar, which holds the data for that single row.
  • data[['name', 'weight']]: By passing a list of field names, you can select multiple columns, which creates a new structured array with only those fields.

Save the file and run it again from the terminal.

python structured_arrays.py

Your output will now include the new sections, showing the modified array and the subset.

... (previous output) ...

Array after modifying age:
[('Alice', 26, 55.5) ('Bob', 31, 68. )]

First person's data:
('Alice', 26, 55.5)

Subset of array (name and weight):
[('Alice', 55.5) ('Bob', 68. )]

Using Record Arrays for Attribute Access

While indexing by name (e.g., data['name']) is powerful, it can be verbose. NumPy provides a special subclass of ndarray called a record array (np.recarray). Record arrays allow you to access fields as attributes, using dot notation (e.g., record_array.name), which can make your code cleaner and more readable.

You can create a record array directly or convert an existing structured array. Let's see how to do both. Add the following code to the end of structured_arrays.py.

## Convert the structured array to a record array using view()
record_array = data.view(np.recarray)

print("\nType of the new view:")
print(type(record_array))

## Access fields using attribute (dot) notation
print("\nAccessing names via attribute:")
print(record_array.name)

print("\nAccessing ages via attribute:")
print(record_array.age)

Code Explanation:

  • data.view(np.recarray): The .view() method creates a new array object that looks at the same data. By specifying np.recarray, we get a record array view of our structured array data. No data is copied; it's just a different way to interact with it.
  • record_array.name: This is the key feature of record arrays. You can access the name field as if it were an attribute of the object. This is equivalent to record_array['name'].

Save the file and execute it.

python structured_arrays.py

The output will now show the type of the new array view and demonstrate attribute access.

... (previous output) ...

Type of the new view:
<class 'numpy.recarray'>

Accessing names via attribute:
['Alice' 'Bob']

Accessing ages via attribute:
[26 31]

Summary

In this lab, you have learned the fundamentals of using structured arrays in NumPy. You started by creating a structured array with named fields and multiple data types. You then practiced accessing specific fields (columns) using dictionary-style key indexing and modifying their values. Finally, you explored record arrays, a convenient alternative that allows for accessing fields as attributes, and learned how to convert between standard structured arrays and record arrays using the .view() method.

Structured arrays are an essential tool for handling complex, tabular datasets in scientific computing and data analysis with Python.