Python Programming Pipelines | Generators | Data Processing

Introduction

Objectives:

Using generators to set up processing pipelines

Files Created: ticker.py

Note

For this exercise the stocksim.py program should still be running in the background. You're going to use the follow() function you wrote in the previous exercise.

Setting up a processing pipeline

A major power of generators is that they allow you to create programs that set up processing pipelines--much like pipes on Unix systems. Experiment with this concept by performing these steps:

>>> from follow import follow
>>> import csv
>>> lines = follow('stocklog.csv')
>>> rows = csv.reader(lines)
>>> for row in rows:
        print(row)

['BA', '98.35', '6/11/2007', '09:41.07', '0.16', '98.25', '98.35', '98.31', '158148']
['AA', '39.63', '6/11/2007', '09:41.07', '-0.03', '39.67', '39.63', '39.31', '270224']
['XOM', '82.45', '6/11/2007', '09:41.07', '-0.23', '82.68', '82.64', '82.41', '748062']
['PG', '62.95', '6/11/2007', '09:41.08', '-0.12', '62.80', '62.97', '62.61', '454327']
...

Well, that's interesting. What you're seeing here is that the output of the follow() function has been piped into the csv.reader() function and we're now getting a sequence of split rows.

Making more pipeline components

In a file ticker.py, define the following class (using your structure code from before) and set up a pipeline:

## ticker.py

from structure import Structure

class Ticker(Structure):
    name = String()
    price = Float()
    date = String()
    time = String()
    change = Float()
    open = Float()
    high = Float()
    low = Float()
    volume = Integer()

if __name__ == '__main__':
    from follow import follow
    import csv
    lines = follow('stocklog.csv')
    rows = csv.reader(lines)
    records = (Ticker.from_row(row) for row in rows)
    for record in records:
        print(record)

When you run this, you should see some output like this:

Ticker('IBM',103.53,'6/11/2007','09:53.59',0.46,102.87,103.53,102.77,541633)
Ticker('MSFT',30.21,'6/11/2007','09:54.01',0.16,30.05,30.21,29.95,7562516)
Ticker('AA',40.01,'6/11/2007','09:54.01',0.35,39.67,40.15,39.31,576619)
Ticker('T',40.1,'6/11/2007','09:54.08',-0.16,40.2,40.19,39.87,1312959)

✨ Check Solution and Practice

Keep going

Oh, you can do better than that. Let's plug this into your table generation code. Change the program to the following:

## ticker.py
...

if __name__ == '__main__':
    from follow import follow
    import csv
    from tableformat import create_formatter, print_table

    formatter = create_formatter('text')

    lines = follow('stocklog.csv')
    rows = csv.reader(lines)
    records = (Ticker.from_row(row) for row in rows)
    negative = (rec for rec in records if rec.change < 0)
    print_table(negative, ['name','price','change'], formatter)

This should produce some output that looks like this:

      name      price     change
---------- ---------- ----------
         C      53.12      -0.21
       UTX      70.04      -0.19
       AXP      62.86      -0.18
       MMM      85.72      -0.22
       MCD      51.38      -0.03
       WMT      49.85      -0.23
        KO       51.6      -0.07
       AIG      71.39      -0.14
        PG      63.05      -0.02
        HD      37.76      -0.19

Now, THAT is crazy! And pretty awesome.

Discussion

Some lessons learned: You can create various generator functions and chain them together to perform processing involving data-flow pipelines.

A good mental model for generator functions might be Lego blocks. You can make a collection of small iterator patterns and start stacking them together in various ways. It can be an extremely powerful way to program.

✨ Check Solution and Practice

Summary

Congratulations! You have completed the Set Up Processing Pipelines lab. You can practice more labs in LabEx to improve your skills.

Utilize Generators For Stocksim Pipelines

Introduction

Setting up a processing pipeline

Making more pipeline components

Keep going

Summary

Other Python Tutorials you may like