Importing Data with Genfromtxt

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will learn how to import data using the numpy.genfromtxt function. This function allows us to read tabular data from various sources and convert it into NumPy arrays. We will explore different options for defining the input, splitting the lines into columns, choosing columns, setting the data type, and tweaking the conversion.

Note: You can write code in 03-io-genfromtxt.ipynb. Some printing operations are omitted in the steps, and you can print output as needed.

Importing the Required Libraries

First, let's import the necessary libraries. We will use the numpy library for creating arrays and the io module from the io library for creating file-like objects.

import numpy as np
from io import StringIO

Defining the Input

The numpy.genfromtxt function requires the source of the data as the only mandatory argument. It can be a string, a list of strings, a generator, or an open file-like object with a read method.

data = "1, 2, 3\n4, 5, 6"

Splitting the Lines into Columns

The delimiter argument is used to define how the lines should be split into columns. By default, numpy.genfromtxt assumes delimiter=None, meaning that the line is split along white spaces (including tabs).

np.genfromtxt(StringIO(data), delimiter=",")

Choosing Columns

The usecols argument is used to select which columns to import. It accepts a single integer or a sequence of integers corresponding to the indices of the columns to import.

np.genfromtxt(StringIO(data), usecols=(0, -1))

Setting the Data Type

The dtype argument is used to control how the strings are converted to other types. It can be a single type, a sequence of types, a comma-separated string, a dictionary, a sequence of tuples, an existing numpy.dtype object, or None to determine the type from the data itself.

np.genfromtxt(StringIO(data), dtype=float)

Tweaking the Conversion

The converters argument allows us to define conversion functions to handle more complex conversions. It accepts a dictionary with column indices or column names as keys and conversion functions as values.

convertfunc = lambda x: float(x.strip(b"%"))/100.
np.genfromtxt(StringIO(data), converters={1: convertfunc})

Using Missing and Filling Values

The missing_values and filling_values arguments are used to handle missing data. The missing_values argument is used to recognize missing data, and the filling_values argument is used to provide a value for missing entries.

np.genfromtxt(StringIO(data), missing_values="N/A", filling_values=0)

Using Shortcut Functions

The numpy.lib.npyio module provides shortcut functions derived from numpy.genfromtxt. These functions have different default values and return either a standard NumPy array or a masked array.

from numpy.lib.npyio import recfromtxt

recfromtxt(StringIO(data), delimiter=",")

Summary

In this lab, we learned how to import data using the numpy.genfromtxt function. We explored different options for defining the input, splitting the lines into columns, choosing columns, setting the data type, and tweaking the conversion. We also learned about shortcut functions provided by the numpy.lib.npyio module.

Other Python Tutorials you may like