Advanced Einsum Operations
Now that we're comfortable with basic einsum operations, let's explore some more advanced applications. These operations demonstrate the true power and flexibility of the einsum function.
Extracting the diagonal elements of a matrix is a common operation in linear algebra. For a matrix A, its diagonal elements form a vector d where:
d_i = A_{ii}
Here's how to extract the diagonal using einsum:
## Create a random square matrix
A = np.random.rand(4, 4)
print("Matrix A:")
print(A)
## Extract diagonal using einsum
diagonal = np.einsum('ii->i', A)
print("\nDiagonal elements using einsum:")
print(diagonal)
## Verify with NumPy's diagonal function
numpy_diagonal = np.diagonal(A)
print("\nDiagonal elements using np.diagonal():")
print(numpy_diagonal)
The notation 'ii->i' means:
ii represents the repeated index for the diagonal elements of A
i means we extract these elements into a 1D array
Matrix Trace
The trace of a matrix is the sum of its diagonal elements. For a matrix A, its trace is:
\text{trace}(A) = \sum_i A_{ii}
Here's how to calculate the trace using einsum:
## Using the same matrix A from above
trace = np.einsum('ii->', A)
print("Trace of matrix A using einsum:", trace)
## Verify with NumPy's trace function
numpy_trace = np.trace(A)
print("Trace of matrix A using np.trace():", numpy_trace)
The notation 'ii->' means:
ii represents the repeated index for the diagonal elements
- The empty output index means we sum all diagonal elements to get a scalar
Batch Matrix Multiplication
einsum really shines when performing operations on higher-dimensional arrays. For example, batch matrix multiplication involves multiplying pairs of matrices from two batches.
If we have a batch of matrices A with shape (n, m, p) and a batch of matrices B with shape (n, p, q), batch matrix multiplication gives us a result C with shape (n, m, q):
C_{ijk} = \sum_l A_{ijl} \times B_{ilk}
Here's how to perform batch matrix multiplication using einsum:
## Create batches of matrices
n, m, p, q = 5, 3, 4, 2 ## Batch size and matrix dimensions
A = np.random.rand(n, m, p) ## Batch of 5 matrices, each 3x4
B = np.random.rand(n, p, q) ## Batch of 5 matrices, each 4x2
print("Shape of batch A:", A.shape)
print("Shape of batch B:", B.shape)
## Batch matrix multiplication using einsum
C = np.einsum('nmp,npq->nmq', A, B)
print("\nShape of result batch C:", C.shape) ## Should be (5, 3, 2)
## Let's check the first matrix multiplication in the batch
print("\nFirst result matrix from batch using einsum:")
print(C[0])
## Verify with NumPy's matmul function
numpy_batch_matmul = np.matmul(A, B)
print("\nFirst result matrix from batch using np.matmul:")
print(numpy_batch_matmul[0])
The notation 'nmp,npq->nmq' means:
nmp represents the indices of batch A (n for batch, m for rows, p for columns)
npq represents the indices of batch B (n for batch, p for rows, q for columns)
nmq represents the indices of the output batch C (n for batch, m for rows, q for columns)
- The repeated index
p is summed over (matrix multiplication)
Why Use Einsum?
You might wonder why we should use einsum when NumPy already provides specialized functions for these operations. Here are some advantages:
- Unified Interface:
einsum provides a single function for many array operations
- Flexibility: It can express operations that would otherwise require multiple steps
- Readability: Once you understand the notation, the code becomes more concise
- Performance: In many cases,
einsum operations are optimized and efficient
For complex tensor operations, einsum often provides the clearest and most direct implementation.