Source

Dictionaries

Introduction to Dictionaries

Dictionaries are associative data structures, storing key/value pairs. Given a key, the corresponding value can be retrieved. For example, the following dictionary maps English colour names to French ones.

fr_colours = { 'red': 'rouge', 'blue': 'bleu',
       'black': 'noir', 'white': 'blanc',
       'yellow': 'jaune'}

To retrieve the value corresponding to a given key, the notation is the same as when retrieving an element from a list or tuple. When indexing an array, you can only For example, to get the French word for ‘white’, you would write fr_colours[‘white’] in Python code.

A dictionary can use any immutable Python value as the key, and there are no restrictions on what you can store as values. That means that keys can be integers, floating point numbers, strings, or tuples; lists are the most important type that //can’t// be used as keys.

Dictionaries are implemented as resizable hash tables. Hash tables are data structures optimized for quick retrieval; the result of a semi-random hash function is computed for each key, and the value is stored in a bin given by the hash function’s result. Retrieving a key is then a matter of calculating the hash function and looking in the corresponding bin, so retrieval always takes the same amount of time no matter how large the dictionary is. (At least, this is true unless you have a pathological set of keys that all hash to the same value, or to a very small set of values. This is highly unlikely, unless you deliberately engineer your keys to beat the hash function.)

The hash table implementation of dictionaries explains the reason for the restriction to immutable keys. If the key were mutable like a list, it could be changed in-place which would also modify the hash value. The value would then no longer be stored in the right bin, and lookups would incorrectly fail.

Checking Whether a Key is in a Dictionary

Problem

You wish to check whether a given key is stored in a dictionary. For example, you wish to know if the key ‘green’ is in the fr_colours dictionary.

Solution

There are two ways of doing this. The has_key() dictionary method returns true or false depending on whether the key is in the dictionary or not.

>>> fr_colours.has_key('blue')
1
>>> fr_colours.has_key('purple')
0

If a dictionary retrieval fails, Python raises a KeyError exception. You can catch this exception and do something appropriate in the exception handler.

try:
   fcolr = fr_colours[ ecolr ]
except KeyError:
   fcolr = "<Untranslated colour %s>" % (ecolr,)

Discussion

Deciding whether to use has_key() or catch the exception is a question of both style and speed. Either way is fine from a stylistic point of view, though you may have a preference for one or the other, depending on what your programming background is. For example, one of the authors rarely catched the KeyError exception, preferring to write if dict.has_key(key): ... before a block of code which can then assume that key is present. You can do whatever you like.

The comparative speed of the two solutions depends on how often you expect the key to be present. Raising an exception and then catching it is slower than a call to the has_key() method. Consider a program where you’re looping over 100,000 keys, and doing something different depending on whether the key is present or not. If you expect only a few keys to be missing, the exception will almost never be raised, while calling the has_key() method would cost a little bit of time on every pass through the loop. In this case, catching the infrequent exceptions will be faster.

On the other hand, if 50,000 or 90,000 of the keys won’t be present, the overhead of raising exceptions and catching them will be larger because an exception will have to be raised more often, so checking the result of has_key() is probably a better idea. The exact crossover point will vary a bit; in [[a presentation|http://barry.warsaw.us/papers/ecshort.pdf]] ([[PDF]]) at the sixth Python conference, Barry Warsaw found it to be around 5 or 10 percent.

Filtering a dictionary

Suppose you have the following dictionary:

d = {'a':1, 'b':2, 'c':3}

And you want to filter it so that you can only see the keys in:

f = ['a','b']

The easiest way to go about that is to build a set of tuples inside a list comprehension and turn the result into a dictionary like so:

filtered = dict([(k,v) for k,v in d.items() if k in f])

For instance, it may be handy to grab some local variables (from locals()) and put them into a dict:

somelocals = dict([(k,v) for k,v in locals().items() if k in ['author','date','title']])

Iterating Over a Dictionary’s Contents

After creating a dictionary, you want to loop over the dictionary’s contents and do something for each entry, such as printing them.

The keys() method of dictionaries returns a list of the dictionary’s keys, arranged in essentially random order:

>>> for colr in fr_colours.keys(): print colr
...
yellow
red
black
white
blue

If you want to loop over the contents in some sorted order, you’ll have to retrieve the list of keys and sort it before looping through it.

>>> keylist = fr_colours.keys() ; keylist.sort()
>>> for colr in keylist: print colr
...
black
blue
red
white
yellow

Remember that the sort() list method sorts a list in-place and returns None. for colr in fr_colours.keys().sort()**will not work, because the value of **fr_colours.keys().sort() is None.

Discussion

The list returned by keys() is arranged randomly, because the key/value pairs are scattered across random bins in a hash table. Dictionaries therefore don’t impose any ordering on their contents. For example, if you assemble a dictionary element-by-element, and print the dictionary after each new key/value pair is added, there’s no order to how the contents are displayed.

>>> d = {} ; d['red'] = 'rouge' ; print d
{'red': 'rouge'}
>>> d['blue'] = 'bleu' ; print d
{'red': 'rouge', 'blue': 'bleu'}
>>> d['yellow'] = 'jaune' ; print d
{'yellow': 'jaune', 'red': 'rouge', 'blue': 'bleu'}
>>> d['green'] = 'vert' ; print d
{'yellow': 'jaune', 'red': 'rouge', 'green': 'vert', 'blue': 'bleu'}

There are two other methods available for retrieving the contents of a dictionary. If your loop will require each key and its corresponding value, you can use the items() method, which returns a list of (key, value) 2-tuples. Again, the list contents are arranged in a random order.

>>> for english, french in fr_colours.items():
...     print english, '\t', french
...
yellow  jaune
red     rouge
black   noir
white   blanc
blue    bleu

Least commonly used of all, the values() method returns a list containing only the values from a dictionary.

>>> for colr in fr_colours.values(): print colr
...
jaune
rouge
noir
blanc
bleu

The way to keep these method names straight is to remember that dictionaries map from keys to values; the keys() and values() methods return a list of the corresponding elements of the dictionary. That leaves items() to return a list of 2-tuples.

Merging Dictionaries

Problem

You have two dictionaries d1 and d2. You wish to add the key/value pairs from d2 to d1.

Solution

The update method of a dictionary adds the contents of another dictionary. To solve this problem, you can simply do: d1.update(d2).

d1.update(d2) is equivalent to the following code:

for key, value in d2.items():
   d1[ key ] = value

If the same key is associated with a value in both d1 and d2, update() will replace the value in d1 with the value from d2. d2 is never modified by this method.

Multiple-Valued Dictionaries

Problem

You need a dictionary that can store the key and more than one value. For example, after d[key] = value1 and d[key] = value2, you want d[key] to return a list containing [value1, value2]. This differs from an ordinary dictionary, where assigning to the same key twice causes the first value to be deleted and replaced by the second.

Solution

The following class stores lists of values; setting a new value causes it to be appended to the list.

import UserDict
class MultipleDict(UserDict.UserDict):
   def __setitem__(self, key, item):
if self.data.has_key( key ):
    self.data[key].append( item )
else:
    self.data[key] = [item]

Using the class is simple:

d = MultipleDict()
d[1] = 'a'
d[1] = 'b'
d[1] = 'c'

d[1] will return [‘a’, ‘b’, ‘c’]. To remove the key, you’ll have to delete the entire list of values with del d[1].

This class subclasses the UserDict class from the <<PythonModule UserDict>> module, which implements all the methods of dictionaries in a Python class, and stores its contents in the data attribute, which is a real dictionary.

To implement the desired change in behaviour, we only have to override the ____setitem____ method, and store the right result in self.data.

<<<

Note ~~~~ Since Python 2.2, you can derive from the built-in types (“new style classes”), and MultipleDict would be best implemented as inheriting from dict. <<<