Introduction
The collections
module provides a number of useful objects for data handling. This part briefly introduces some of these features.
This tutorial is from open-source community. Access the source code
The collections
module provides a number of useful objects for data handling. This part briefly introduces some of these features.
Let's say you want to tabulate the total shares of each stock.
portfolio = [
('GOOG', 100, 490.1),
('IBM', 50, 91.1),
('CAT', 150, 83.44),
('IBM', 100, 45.23),
('GOOG', 75, 572.45),
('AA', 50, 23.15)
]
There are two IBM
entries and two GOOG
entries in this list. The shares need to be combined together somehow.
Solution: Use a Counter
.
from collections import Counter
total_shares = Counter()
for name, shares, price in portfolio:
total_shares[name] += shares
total_shares['IBM'] ## 150
Problem: You want to map a key to multiple values.
portfolio = [
('GOOG', 100, 490.1),
('IBM', 50, 91.1),
('CAT', 150, 83.44),
('IBM', 100, 45.23),
('GOOG', 75, 572.45),
('AA', 50, 23.15)
]
Like in the previous example, the key IBM
should have two different tuples instead.
Solution: Use a defaultdict
.
from collections import defaultdict
holdings = defaultdict(list)
for name, shares, price in portfolio:
holdings[name].append((shares, price))
holdings['IBM'] ## [ (50, 91.1), (100, 45.23) ]
The defaultdict
ensures that every time you access a key you get a default value.
Problem: We want a history of the last N things. Solution: Use a deque
.
from collections import deque
history = deque(maxlen=N)
with open(filename) as f:
for line in f:
history.append(line)
...
The collections
module might be one of the most useful library modules for dealing with special purpose kinds of data handling problems such as tabulating and indexing.
In this exercise, we'll look at a few simple examples. Start by running your report.py
program so that you have the portfolio of stocks loaded in the interactive mode.
$ python3 -i report.py
Suppose you wanted to tabulate the total number of shares of each stock. This is easy using Counter
objects. Try it:
>>> portfolio = read_portfolio('portfolio.csv')
>>> from collections import Counter
>>> holdings = Counter()
>>> for s in portfolio:
holdings[s['name']] += s['shares']
>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>>
Carefully observe how the multiple entries for MSFT
and IBM
in portfolio
get combined into a single entry here.
You can use a Counter just like a dictionary to retrieve individual values:
>>> holdings['IBM']
150
>>> holdings['MSFT']
250
>>>
If you want to rank the values, do this:
>>> ## Get three most held stocks
>>> holdings.most_common(3)
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]
>>>
Let's grab another portfolio of stocks and make a new Counter:
>>> portfolio2 = read_portfolio('portfolio2.csv')
>>> holdings2 = Counter()
>>> for s in portfolio2:
holdings2[s['name']] += s['shares']
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>>
Finally, let's combine all of the holdings doing one simple operation:
>>> holdings
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>> combined = holdings + holdings2
>>> combined
Counter({'MSFT': 275, 'HPQ': 250, 'GE': 220, 'AA': 150, 'IBM': 150, 'CAT': 150})
>>>
This is only a small taste of what counters provide. However, if you ever find yourself needing to tabulate values, you should consider using one.
The collections
module is one of the most useful library modules in all of Python. In fact, we could do an extended tutorial on just that. However, doing so now would also be a distraction. For now, put collections
on your list of bedtime reading for later.
Congratulations! You have completed the Collections Module lab. You can practice more labs in LabEx to improve your skills.