Like the articles? Buy the book! Dead Simple Python by Jason C. McDonald is available from No Starch Press.
Remember the last time you lost something?
You probably turned your house upside looking for it. You go through, room by room, while the people around you ask pointless questions like "where was the last place you had them?" (Seriously, if I knew that, I wouldn't be looking for them!) It'd be great to optimize your search, but your house isn't sorted...or particularly well organized, if you're anything like me. You're stuck with a linear search.
In programming, as in real life, we don't usually get data handed to us in any meaningful order. We start out with a whole mess, and we have to perform tasks on it. Searching through unordered data is probably the first example that springs to mind, but there are hundreds of other things you might want to do: convert all Fahrenheit temperature recordings to Celsius, find the average of all the data points, whatever.
"Yeah, yeah, that's what loops are for!"
But this is Python. Loops here are on a whole different level. They're so good, they're practically criminal.
An Overview of Loops
Let's get the boring stuff out of the way, shall we?
In Python, like in most languages, we have two basic loops: while
and for
.
while
A while
loop is pretty basic.
clue = None
while clue is None:
clue = searchLocation()
As long as the loop condition, clue is None
in this case, evaluates to True
, the loop's code will be executed.
In Python, we also have a couple of useful keywords: break
immediately stops the loop, while continue
skips to the next iteration of the loop.
One of the most useful aspects of break
is if we want to run the same code until the user provides valid input.
while True:
try:
age = int(input("Enter your age: "))
except ValueError:
print(f"Please enter a valid integer!")
else:
if age > 0:
break
As soon as we encounter the break
statement, we exit the loop. Granted, that was a fairly convoluted example, but it demonstrates the point. You also often see while True:
used in game loops.
Gotcha Alert: If you've ever worked with loops in any language, you're already familiar with the infinite loop. This is most often caused by a while
condition which always evaluates to True
and no break
statement within the loop.
for
Coming from Java, C++, or many similar ALGOL-style languages, you're probably familiar with the tripartite for
loop: for i := 1; i < 100; i := i + 1
. I don't know about you, but when I first encountered that, it scared the dickens out of me. I'm comfortable with it now, but it just doesn't possess the elegant simplicity of Python, does it?
Python's for
loop looks vastly different. The Python equivalent to that pseudocode above is...
for i in range(1,100):
print(i)
range()
is a special "function" in Python that returns a sequence. (Technically, it's not a function at all, but that's getting pretty deep into pedantics.)
This is the impressive thing about Python - it iterates over a special type of sequence, called an iterable, which we'll talk about later.
For now, it's easiest to understand that we can iterate over a sequential data structure, like an array (called a "list" in Python).
Thus, we can do this...
places = ['Nashville', 'Norway', 'Bonaire', 'Zimbabwe', 'Chicago', 'Czechoslovakia']
for place in places:
print(place)
print("...and back!")
...and we get this...
Nashville
Norway
Bonaire
Zimbabwe
Chicago
Czechoslovakia
...and back!
for...else
Python has another unique little trick in its loops: the else
clause! After the loop is completed, and has not encountered a break
statement, it will run the code in else
. However, if the loop is broken out of manually, it will skip the else
altogether.
places = ['Nashville', 'Norway', 'Bonaire', 'Zimbabwe', 'Chicago', 'Czechoslovakia']
villain_at = 'Mali'
for place in places:
if place == villain_at:
print("Villain captured!")
break
else:
print("The villain got away again.")
Since 'Mali' wasn't in the list, we see the message "The villain got away again." However, if we change the value of villain_at
to Norway
, we'll see "Villain captured!" instead.
Where's the do
?
Python does not have a do...while
loop. If you're looking for one, the typical Python convention is to use a while True:
with an inner break
condition, like we demonstrated earlier.
A Few Containers
Python has a number of containers, or data structures, that hold data. We won't go into much depth on any of these, but I want to quickly skim over the most important ones:
list
A list
is a mutable sequence (basically, an array).
It is defined with square brackets [ ]
, and you can access its elements via index.
foo = [2, 4, 2, 3]
print(foo[1])
>>> 4
foo[1] = 42
print(foo)
>>> [2, 42, 2, 3]
Although there is no strict technical requirement for it, the typical convention is for lists to only contain items of the same type ("homogeneous").
tuple
A tuple
is an immutable sequence. Once you've defined it, you technically can't change it (recall the meaning of immutability from before). This means you can't add or remove elements from a tuple after it's been defined.
A tuple is defined within parenthesis ( )
, and you can access its elements via index.
foo = (2, 4, 2, 3)
print(foo[1])
>>> 4
foo[1] = 42
>>> TypeError: 'tuple' object does not support item assignment
Unlike lists, standard convention permits tuples to contain elements of different types ("heterogeneous").
set
A set
is an unordered mutable collection that is guaranteed not to have duplicates. That "unordered" part is important to remember: the sequence of individual elements cannot be guaranteed!
A set is defined within curly braces { }
, although if you want an empty set, you must say foo = set()
, as foo = {}
creates a dict
. You cannot access its elements via index, since it is unordered.
foo = {2, 4, 2, 3}
print(foo)
>>> {2, 3, 4}
print(foo[1])
>>> TypeError: 'set' object does not support indexing
For an object to be added to a set, it also must be hashable. An object is hashable if:
It defines the method
__hash__()
, which returns a hash as an integer. (See below)It defines the method
__eq__()
for comparing two objects.
A valid hash should always be the same for the same object (value), and it should be reasonably unique, so that it is somewhat uncommon that another object returns the same hash. (Two or more objects having the same hash is called a hash collision, and they still happen.)
Dictionary (dict
)
A dict
is a key-value data structure.
It is defined within curly braces { }
, using :
to separate keys and values. It is unordered, so you cannot access its elements via index; however, you indicate the key within square brackets [ ]
in much the same way.
foo = {'a' : 1, 'b' : 2, 'c' : 3, 'd' : 4}
print(foo['b'])
>>> 2
foo['b'] = 42
print(foo)
>>> {'a': 1, 'b': 42, 'c': 3, 'd': 4}
Only hashable objects may be used as dictionary keys. (See the section on set
above for more information on hashability.)
Other Data Structures
Python offers additional containers besides the basics. You can find them all in the collections
built-in module.
Unpacking a Container
There's an important piece of Python syntax we haven't talked about yet, but which will be useful shortly. We can assign each of the items in a container to a variable! This is called unpacking.
Of course, we need to know exactly how many items we're unpacking for this to work, otherwise we'll get a ValueError
exception.
Let's look at a basic example, using a tuple.
fullname = ('Carmen', 'Sandiego')
first, last = fullname
print(first)
>>> Carmen
print(last)
>>> Sandiego
The secret sauce is in that second line. We can list multiple variables to assign to, separated by commas. Python will unpack the container on the right side of the equal sign, assigning each value to a variable in order, left-to-right.
Gotcha Alert: Remember, set
is unordered! While you can technically do this with a set, you can't be certain what value is assigned to what variable. It isn't guaranteed to be in any order; the fact that sets usually unpack their values in sorted order is incidental, and NOT guaranteed!
The in
Thing
Python offers a nifty keyword, in
, for checking if a particular element is found within a container.
places = ['Nashville', 'Norway', 'Bonaire', 'Zimbabwe', 'Chicago', 'Czechoslovakia']
if 'Nashville' in places:
print("Music city!")
This works with many containers, including lists, tuples, sets, and even with dictionary keys (but not dictionary values).
If you want one of your custom classes to support the in
operator, you need only to define the __contains__(self, item)
method, which should return True
or False
. (See the documentation).
Iterators
Python's loops are designed to work with iterables, which I mentioned earlier. These are objects that can be iterated over, using an iterator.
Cricket sounds.
Okay, let's take this from the top. A Python container object, such as a list
, is also an iterable, because it has an __iter__()
method defined, which returns an iterator object.
An iterator as a __next__()
method defined, which in the case of a container iterator, returns the next item. Even unordered containers, like set()
, can be traversed using iterators.
When nothing else can be returned by __next__()
, it throws a specialized exception called StopIteration
. This can be caught and handled using the typical try...except
.
Let's look again at a for
loop traversing over a list
, for example...
dossiers = ['The Contessa', 'Double Trouble', 'Eartha Brute', 'Kneemoi', 'Patty Larceny', 'RoboCrook', 'Sarah Nade', 'Top Grunge', 'Vic the Slick', 'Wonder Rat']
for crook in dossiers:
print(crook)
dossiers
is a list
object, which is an iterable. When Python reaches the for
loop, it does three things:
Calls
iter(dossiers)
, which in turn executesdossiers.__iter__()
. This returns an iterator object that we'll calllist_iter
. This iterator object will be used by the loop.For each iteration of the loop, it calls
next(list_iter)
, which executeslist_iter.__next__()
, and assigns the returned value tocrook
.If the iterator threw the special exception
StopIteration
, the loop is finished, and we exit.
It might be easier to understand this if I rewrite that logic in a while True:
loop...
list_iter = iter(dossiers)
while True:
try:
crook = next(list_iter)
print(crook)
except StopIteration:
break
If you try both loops, you'll see they do the exact same thing!
Understanding how __iter__()
, __next__()
, and the StopIteration
exception work, you can now make your own classes iterable!
Hack Alert: While it's fairly typical to define your iterator class separately from your iterable class, you don't necessarily have to! As long as both methods are defined in your class, and __next__()
behaves appropriately, you can just define __iter__()
to return self
.
It's worth noting that iterators themselves are iterables: they have a __iter__()
method which returns self
.
The Curious Case of the Dictionary
Let's say we have a dictionary we want to work with...
locations = {
'Parade Ground': None,
'Ste.-Catherine Street': None,
'Pont Victoria': None,
'Underground City': None,
'Mont Royal Park': None,
'Fine Arts Museum': None,
'Humor Hall of Fame': 'The Warrant',
'Lachine Canal': 'The Loot',
'Montreal Jazz Festival': None,
'Olympic Stadium': None,
'St. Lawrence River': 'The Crook',
'Old Montréal': None,
'McGill University': None,
'Chalet Lookout': None,
'ÃŽle Notre-Dame': None
}
If we just wanted to see each of the items in it, we'd just use a for
loop. So, this should work, right?
for location in locations:
print(location)
Oops! That only shows us the keys, not the values. Definitely not what we're wanting, is it? What in the world is going on?
dict.__iter__()
returns a dict_keyiterator
object, which does what its class name suggests: it iterates over the keys, but not the values.
To get both the key and value, we need to call locations.items()
, which returns dict_items
object. dict_items.iter()
returns a dict_itemiterator
, which will return each key-value pair in the dictionary as a tuple.
Legacy Note: If you're using Python 2, you should call locations.iteritems()
instead.
Remember earlier, when we talked about unpacking? The fact we're dealing with each pair as a tuple means we can unpack those into two variables.
for key, value in locations.items():
print(f'{key} => {value}')
That prints out the following:
Parade Ground => None
Ste.-Catherine Street => None
Pont Victoria => None
Underground City => None
Mont Royal Park => None
Fine Arts Museum => None
Humor Hall of Fame => The Warrant
Lachine Canal => The Loot
Montreal Jazz Festival => None
Olympic Stadium => None
St. Lawrence River => The Crook
Old Montréal => None
McGill University => None
Chalet Lookout => None
ÃŽle Notre-Dame => None
Ahhh, that's more like it! Now we can work with the data. For example, I might want to record the important information in another dictionary.
information = {}
for location, result in locations.items():
if result is not None:
information[result] = location
# Win the game!
print(information['The Loot'])
print(information['The Warrant'])
print(information['The Crook'])
print("Vic the Slick....in jaaaaaaaaail!")
That will find the Loot, Warrant, and Crook, and list them in the proper order:
Lachine Canal
Humor Hall of Fame
St. Lawrence River
Vic the Slick....in jaaaaaaaaail!
Behold, the crime fighting power of loops and iterators!
Your Own Iterators
I already mentioned earlier that you can make your own iterables and iterators, but showing is better than telling!
Imagine we want to keep a list of agents handy, so we can always identify them by their agent number. However, there are some agents that we can't talk about. We can accomplish this pretty easily by storing agent id and name in a dictionary, and then maintaining a list of classified agents.
Gotcha Alert: Remember from our discussion of classes, there isn't actually such a thing as a private variable in Python. If you REALLY intend to keep secrets, use industry standard encryption and security practices, or at least don't expose your API to any VILE operatives. ;)
For starters, here's the basic structure of that class:
class AgentRoster:
def __init__(self):
self._agents = {}
self._classified = []
def add_agent(self, name, number, classified=False):
self._agents[number] = name
if classified:
self._classified.append(name)
def validate_number(self, number):
try:
name = self._agents[number]
except KeyError:
return False
else:
return True
def lookup_agent(self, number):
try:
name = self._agents[number]
except KeyError:
name = "<NO KNOWN AGENT>"
else:
if name in self._classified:
name = "<CLASSIFIED>"
return name
We can go ahead and test that out, just for posterity:
roster = AgentRoster()
roster.add_agent("Ann Tickwitee", 2539634)
roster.add_agent("Ivan Idea", 1324595)
roster.add_agent("Rock Solid", 1385723)
roster.add_agent("Chase Devineaux", 1495263, True)
print(roster.validate_number(2539634))
>>> True
print(roster.validate_number(9583253))
>>> False
print(roster.lookup_agent(1324595))
>>> Ivan Idea
print(roster.lookup_agent(9583253))
>>> <NO KNOWN AGENT>
print(roster.lookup_agent(1495263))
>>> <CLASSIFIED>
Great, that works exactly as expected! Now, what if we want to be able to loop through the entire dictionary, perhaps as part of some awesome code that shows their name and current location on a snazzy global map.
However, we don't want to just access the roster._agents
dictionary directly, because that will disregard the whole "classified" aspect of this class. How do we handle that?
As I mentioned before, we could just have this class also serve as its own iterator, meaning it has a __next__()
method. In that case, we'd only return self
. However, this is Dead Simple Python, so let's skip the annoyingly simplistic stuff and actually create a separate iterator class.
In this example, I'll actually turn that dictionary into a list of tuples, which will allow me to use indexing. (Remember, dictionaries are unordered.) I'll also figure out how many agents aren't classified. All of that logic belongs in the __init__()
method, of course:
class AgentRoster_Iterator:
def __init__(self, container):
self._roster = list(container._agents.items())
self._classified = container._classified
self._max = len(self._roster) - len(self._classified)
self._index = 0
To be an iterator, the class must have a __next__()
method; that's the only requirement! Remember, that method needs to throw StopException
as soon as we have no more data to return.
I'll define AgentRoster_Iterator
's __next__()
method as follows:
class AgentRoster_Iterator:
# ...snip...
def __next__(self):
if self._index == self._max:
raise StopIteration
else:
r = self._roster[self._index]
self._index += 1
return r
Now we return to the AgentRoster
class, where we need to add an __iter__()
method that returns an appropriate iterator object.
class AgentRoster:
# ...snip...
def __iter__(self):
return AgentRoster_Iterator(self)
That little bit of magic is all it takes, and now our AgentRoster
class behaves exactly as expected with a loop! This code...
roster = AgentRoster()
roster.add_agent("Ann Tickwitee", 2539634)
roster.add_agent("Ivan Idea", 1324595)
roster.add_agent("Rock Solid", 1385723)
roster.add_agent("Chase Devineaux", 1495263, True)
for number, name in roster:
print(f'{name}, id #{number}')
...produces...
Ann Tickwitee, id #2539634
Ivan Idea, id #1324595
Rock Solid, id #1385723
Looking Forward
I hear that Pythonista in the back: "Wait, wait, we can't be done yet! You haven't even touched on list comprehensions yet!"
Python indeed adds a whole additional level of magic on top of loops and iterators, with a special tool called a generator. This type of class provides another incredible tool called a comprehension, which is like a deliciously compact loop for creating a data structure.
I've also deliberately skipped such goodness as zip()
and enumerate()
, which make loops and iteration even more powerful. I would have included them here, but I didn't want to make the article too long. (It's already pushing it.) I'll be touching on those later as well.
I see some of you are already vibrating with excitement, but alas, you're going to have to wait until the next article to learn more.
Review
Let's review the most important concepts from this section:
- A
while
loop runs as long as its condition evaluates toTrue
. - You can break out of a loop with the
break
keyword, or skip to the next iteration with thecontinue
keyword. - A
for
loop iterates over an iterable (an object that can be iterated over), such as a list. - The
range()
function returns an iterable sequence of numbers, which can be used in afor
loop, e.g.for i in range(1, 100)
. - Python does NOT have a
do...while
loop. Use awhile True:
loop with an explicit break statement within it. - Python has four basic data structures, or containers:
- Lists are mutable, ordered, sequential structures...basically, arrays.
- Tuples are immutable, ordered, sequential structures. Think list, but you can't modify the contents.
- Sets are mutable, unordered structures that are guaranteed never to have any duplicate elements. They can only store hashable objects.
- Dictionaries are mutable, unordered structures that store key-value pairs. You look up items by key, not by index. Only hashable objects may be used as keys.
- You can unpack the values of a container into multiple variables using the convention
a, b, c = someContainer
. The number of variables on the left and the number of elements in the container on the right must be the same! - You can quickly check if an element is in a container with the
in
keyword. If you want your class to support this, define thecontains()
method. - Python's containers are examples of iterables: they return iterators that can traverse their contents. An iterable object always returns an iterator object via its
iter()
method. - An iterator object always has a
next()
method, which returns a value. A container iterator'snext()
method would return the next element in the container. When there is nothing more to return, the iterator raises theStopIteration
exception.
Ned Batchelder has a phenomenal talk on iterators and loops entitled "Loop Like A Native". I strongly recommend checking it out!
Also, as usual, be sure to read the documentation. There's plenty more you can do with loops, containers, and iterators.
- Python Wiki: While Loop
- Python Wiki: For Loop
- Python Reference: Compound Statements - The
while
statement StackOverflow: What exactly are iterator, iterable, and iteration?
Thank you to deniska
, grym
, and ikanobori
(Freenode IRC #python
) for suggested revisions.
Top comments (14)
hello
it's a perfect article all the
series really
but in the code there something wrong
in the iterator for agents
as you implemented it it will return
the first nth items in the agents list
even if they're classified
Hm. I just tested the code as implemented again, and it doesn't display the classified agents. If your tests have produced otherwise, would you mind sharing a screenshot? Thanks!
yes it'l work right
but try this
agents.add(not secret)
agents.add(secret)
agent.add(not secret)
agents.add(secret)
then
self.agents=[not secret,secret,not secret,secret]
then
self.agents-self.secret
4-2=2
then what will print is
self.agents[0] # not secret
then
self.agents[1]#secret
actually it's not big deal
not deal at all but
i read the article and when i reach that
i read it several times
because i thought may be i missed something
thanks for this series i hope
you continue it
Ah, I hadn't quite addressed the
[]
operator in that example. Good catch.I noticed this error case while reading through the article as well. In the sake of learning I just couldn't move on and ignore it.
Here is my fix. I am sure there are better ways to accomplish this, so please critique and let me know how I could better accomplish this.
The only thing I changed was the next method as follows:
Hey, that's pretty good. However, my only concern is that it would delete the internally stored information about the classified agent (which we don't want).
Ah, good point.
Take 2 [move classified to end of _roster]:
Thanks so much for these articles and for being so responsive. They are written very well , engaging, and a great resource.
If I were going to fix this problem (which I may well do soon -- I have to take another pass through this material when writing the book), I would actually define the
__getitem__()
function instead, as that controls the behavior of the[]
operator.This all comes down to separation of concerns. It shouldn't be the responsibility of
__next__()
to mutate the internal data to obscure information. It's only job should be to determine whether it exposes that information, and how.Of course, in all honesty, there's nothing preventing a direct call to
agents._roster[1]
(Python has no private variables). If we were going to obfuscate or remove classified data, that should really occur on theadd_agent()
function.I see another comment thread addressed what I was wondering about the classified agents showing up if they weren't last ;)
For some reason I had the idea that lists and arrays were different in some functional way, but from your article it sounds they are functionally the same things, just with different vocabulary based on language?
I have a hard time grokking hashability. I've tried several times but something eludes me about the logic of why one thing is hashable and another isn't. I'd rather understand it than memorize it/look things up to check when it matters.
(Also, funny thing, OSX has grokking in the dictionary, but not hashable? what.)
Yes, lists and arrays are effectively the same things. (There are some implementation differences internal to the language, mind you.)
Hashing means you perform some function, usually a form of one-way (lossy, as it were) encryption on the data to produce a (usually) unique value, often shorter than the original.
For example, here's the hashes for a few strings, according to Python's
hash
function. You'll notice they all produce a unique integer value, and all those integers are the same length."Hello" → 3483667490880649043
"Me" → 6066670828640188320
"Reverse the polarity of the neutron flow." → 7317767150724217908
"What evil lurks in the hearts of men? The shadow knows!" → -6411620787934941530
When two different input values have the same hash, that's known as a "hash collision", so any container that relies on a hash (such as Python's dictionaries) needs to be able to handle that situation.
For more information, watch this excellent explanation by the legendary @vaidehijoshi :
Hash Tables — BaseCS Video Series
Vaidehi Joshi ・ Jun 7 '18 ・ 1 min read
Thanks for the link, that's a great video and the visuals are quite helpful. Will check out the rest of her series, too.
Hello, I reimplemented the
__next__
method to avoid classified elements got exposedOh, top notch, thanks for catching that, mate.