Like the articles? Buy the book! Dead Simple Python by Jason C. McDonald is available from No Starch Press.
For me, any project doesn't start feeling "real" until the program starts reading from or writing to external files.
Unfortunately, this is one of those topics that often suffers the most from tutorial oversimplification. If only someone would document all the weirdness with files in one place...
Basic File Opening
Let's start with the most basic file reading example. Assume I have a file in the same directory as my code file, entitled
The standard way of opening files is using the built-in
open() function, which is imported by default from the
file = open("journal1.txt", 'r') for line in file: print(line) file.close()
open() function accepts a number of arguments for interacting with files in advanced ways, but most of the time, you'll only need the first two.
The first argument,
file, accepts a string containing either an absolute or relative path to the file being opened. This is the only strictly required argument.
The second argument,
mode, accepts a string indicating the file mode. If this is not specified,
'rt' will be used, meaning it will read the file as text. The mode
'r' is effectively the same thing, since text-mode (
t) is part of the default behavior.
I could have just used this line instead and gotten the same behavior...
file = open("journal1.txt")
...but I personally prefer to explicitly indicate whether I'm reading (
r), writing (
w), or what have you.
All "file" objects returned from
open() are iterable. In the case of text files, a
TextIOWrapper object is returned. In my example above, I iterate over the lines in the
file, and I print out each line.
Once I'm done working with the file, I need to close it with
file.close(). It's important not to rely on the garbage collector to close files for you, as such behavior is neither guaranteed nor portable across implementations. Additionally, Python isn't guaranteed to finish writing to a file until
.close() is called.
Running that code, at least in my case, prints out the contents of
Could this be where all missing things are found? Magic Hair Only for the Pure of Heart Not naturally occurring? Could result in minor gravity anomalies!
In practice, always remembering to call
close() can be a royal pain, especially once you factor in possible errors with opening a file. Thankfully, there's an even cleaner way: context managers!
A context manager is defined by the
with statement in Python. I can rewrite my earlier code using this syntax:
with open("journal1.txt", 'r') as file: for line in file: print(line)
open() function is called, and if it succeeds, the resulting
TextIOWrapper object is stored in
file, and can be used in the body of the
file.close() is implicitly called once control leaves the
with statement; you never have to remember to call it!
We'll cover this in more depth in the next chapter.
The documentation mentions several modes which can be used with
ropens the file for reading (the default).
wopens or creates the file for writing, deleting (truncating) its contents first.
aopens or creates the file for writing, but appends to the end instead of truncating.
xcreates and opens a new file for writing; it cannot open existing files.
+opens the file for both read and write (see table below).
tworks with the file in text mode (the default).
bworks with the file in binary mode.
These mode flags can be combined together. For example,
a+b allowing writing and reading, where writing appends to the end of the file, in binary mode.
+ flag is always combined with another flag. When combined with
r, it adds the functionality of
a, except it starts at the beginning of the file (without truncating). When combined with
x, it allows reading as well.
The behavior of the different flags is best understood with this table, adapted from this Stack Overflow answer by industryworker3595112:
| r r+ w w+ a a+ x x+ ---------------------|---------------------------------- allow read | ✓ ✓ ✓ ✓ ✓ allow write | ✓ ✓ ✓ ✓ ✓ ✓ ✓ create new file | ✓ ✓ ✓ ✓ ✓ ✓ open existing file | ✓ ✓ ✓ ✓ ✓ ✓ erase file contents | ✓ ✓ allow seek | ✓ ✓ ✓ ✓ ✓ position at start | ✓ ✓ ✓ ✓ ✓ ✓ position at end | ✓ ✓
We can read from a file in text mode using using either the
readlines() functions, or by iterating over it directly.
Of course, this requires that the file be opened for reading, using the appropriate file mode flags (see "File Modes" section). If you ever need to check whether an object
file can be read, use the
Let's contrast the three ways of reading from a file:
read() function reads the entire contents of the file as one long string.
with open("journal1.txt", 'r') as file: contents = file.read() print(contents) # Could this be where all missing things are found? # Magic Hair Only for the Pure of Heart # Not naturally occurring? # Could result in minor gravity anomalies!
Alternatively, you can tell
read() the maximum number of characters to read from the file stream:
with open("journal1.txt", 'r') as file: contents = file.read(20) print(contents) # Could this be where
readline() function behaves exactly like
read(), except it stops reading when it encounters a line break. The line break is included in the returned string.
with open("journal1.txt", 'r') as file: contents = file.readline() print(contents) # Could this be where all missing things are found?
read(), you can specify the maximum number of characters to be read:
with open("journal1.txt", 'r') as file: contents = file.readline(20) print(contents) # Could this be where
readlines() function returns the entire file as a list of strings, with each string being one line.
with open("journal1.txt", 'r') as file: contents = file.readlines() for c in contents: print(c) # Could this be where all missing things are found? # # Magic Hair Only for the Pure of Heart # # Not naturally occurring? # # Could result in minor gravity anomalies! #
You'll notice that the newline character is included with each line. We can remove that by calling the
.strip() function on each string.
with open("journal1.txt", 'r') as file: contents = file.readlines() for c in contents: print(c.strip()) # Could this be where all missing things are found? # Magic Hair Only for the Pure of Heart # Not naturally occurring? # Could result in minor gravity anomalies!
You can also limit how much is read from the file by specifying a maximum number of characters. Unlike before, however, this is not a hard limit. Instead, once the specified limit has been exceeded by the total number of characters read from all lines so far, only the rest of the current line will be read.
This is best understood by comparing
readlines(). First, I'll read to a hard limit of 60 characters:
with open("journal1.txt", 'r') as file: contents = file.read(60) print(contents) # Could this be where all missing things are found? # Magic Hair
Compare that to calling
readlines() with a "hint" of 60 characters:
with open("journal1.txt", 'r') as file: contents = file.readlines(60) for c in contents: print(c.strip()) # Could this be where all missing things are found? # Magic Hair Only for the Pure of Heart
In that second example, the entirety of the first two lines is read, but no more.
Unlike with the other two functions,
readlines() only always reads the whole line.
As you saw earlier, we can iterate over a file directly:
with open("journal1.txt", 'r') as file: for line in file: print(line) # Could this be where all missing things are found? # Magic Hair Only for the Pure of Heart # Not naturally occurring? # Could result in minor gravity anomalies!
This is functionally the same as:
with open("journal1.txt", 'r') as file: for line in file.readlines(): print(line)
The difference between the two is that the first approach, the direct iteration, is lazy, whereas the second approach reads the whole file first, before iterating over the contents.
We can write to a file in much the same way, using either the
This requires that the file be opened for writing (see "File Modes" section). The
file.writable() function can be used to check if the
file object is writable.
In the examples for this section, I'll show the file contents in the comment at the bottom.
write() function writes the given lines to a file.
I can write an entire multiline string to a new file called
write(), like so:
entry = """If you go on enough road trips chances are, you've seen a certain bumper sticker: WHAT IS THE MYSTERY SHACK? """ with open("journal3.txt", 'x') as file: file.write(entry) # If you go on enough road trips # chances are, you've seen a # certain bumper sticker: # WHAT IS THE MYSTERY SHACK? #
As long as
journal3.txt does not already exist, it will be created with the given contents.
I can override the entire contents of
journal3.txt using the
w file mode:
with open("journal3.txt", 'w') as file: file.write("GNOMES\nWEAKNESS?\n") # GNOMES # WEAKNESS? #
GOTCHA ALERT: Watch your file mode!
w+will erase the entire contents of the file. Use either
a+to write to the end of the file.
I can append to the file instead using the
a file mode:
with open("journal3.txt", 'a') as file: file.write("leaf blowers\n") # GNOMES # WEAKNESS? # leaf blowers #
write() function also returns an integer representing the number of characters written.
writelines() function writes a list of strings to a file.
lines = [ "Finally back safe and sound\n", "from one of the weirdest days\n", "at Gravity Falls.\n" ] with open("journal3.txt", 'w') as file: file.writelines(lines) # Finally back safe and sound # from one of the weirdest days # at Gravity Falls. #
Unlike with with
writelines() function only ever returns
file.seek() function allows you to move back and forth within a file object
file, character-by-character. When working with text streams, it accepts one argument: a positive integer representing a new position to move to, represented as the number of characters from the beginning.
In addition to changing position, the
file.seek() function will also return an integer representing the new absolute position in the file. You can also get the current position by calling the
r+ file mode is best used in conjunction with the
seek() function, although it can be used with any other file mode besides
I'll first use the
seek() function to read only part of the
with open("journal1.txt", 'r') as file: file.seek(50) contents = file.read(5) print(contents) # MAGIC
I'll write a new initial version of the
with open("journal3.txt", 'w') as file: file.write("FLOATING EYEBALLS") # FLOATING EYEBALLS
I can use the
r+ mode to change part of this file.
GOTCHA ALERT: The
write()command will always write over existing contents of the file, unless you append to the end. To insert text into a file non-destructively, it's usually best to read the entire contents in as a string (or list), edit the string, and then write it back out.
Here, I'll replace the word "EYEBALLS" with "NONSENSE!":
with open("journal3.txt", 'r+') as file: file.seek(9) file.write("NONSENSE!") # FLOATING NONSENSE!
After opening the file, I move to the 9th character from the beginning, and then
write() the new data.
Seek with Binary
When you open a file in binary mode (
b), you can move in the file in a more dynamic way, with two arguments instead of one:
offset: the distance in characters to move (can be negative)
whence: the position from which to calculate the offset:
0for the start of the file (default),
1for the current position, or
2for the end of the file.
Unfortunately, use of the
whence argument doesn't work with files opened in text mode.
The four most common errors relating to working with files are as follows:
r+ modes require that a file exist before opening it. Otherwise, a
FileNotFoundError will be raised:
try: with open("notreal.txt", 'r') as file: print(file.read()) except FileNotFoundError as e: print(e)
x+ file modes are specifically for creating a new file. If the file already exists, a
FileExistsError will be raised:
try: with open("journal3.txt", 'x+') as file: print(file.read()) except FileExistsError as e: print(e)
io.UnsupportedOperation error is raised whenever you try to read on a file only opened for writing, or write on a file opened only for reading:
import io try: with open("journal3.txt", 'w') as file: print(file.read()) except io.UnsupportedOperation as e: print(e) try: with open("journal3.txt", 'r') as file: file.write('') except io.UnsupportedOperation as e: print(e)
An Issue of Line Separators
Some savvy readers will recall that, while UNIX uses
\n as a line separator, Windows uses
\r\n. Surely this matters when we're reading and writing files, right?
In fact, Python abstracts this out for us behind-the-scenes. Always use
\n as a line separator when writing files in text mode, regardless of operating system!
Up to this point, I've just been using a file in the same folder as the code, but this is very rarely what we want! We need to be able to build paths to files.
The trouble is, file paths are not the same on all systems. UNIX-style systems, such as macOS and Linux, use the UNIX file path conventions, while Windows uses an entirely different scheme. Our solution has to work for both, meaning that harded paths are not an option.
To address this, Python offers two modules:
Creating a Path
Python actually offers multiple classes for building paths, depending on your particular needs. In most cases, however, you should just use
Let's say I wanted to make a special directory in the current user's home folder called
.dead_simple_python, and then write a file to that location. Here's how I'd do that:
First, I create a
Path() object pointing to just the final desired directory (not yet the file).
Path() constructor, I pass each part of the path as a separate string. I can use the class method
Path.home() to acquire the path to the user directory.
from pathlib import Path import os file_path = Path(Path.home(), ".dead_simple_python")
Next, I'll check if the path already exists using
file_path.exists(), and if it doesn't exist, I'll use the
os.makedirs function to create any of the missing directories in the path:
if not file_path.exists(): os.makedirs(file_path)
Finally, I can add the filename to the path object I already have, and then open that file for writing:
file_path = file_path.joinpath("journal4.txt") with file_path.open('w') as file: lines = [ "If you've ever taken a road trip \n", "through the Pacific Northwest, you've \n", "probably seen a bumper sticker for a \n", "place called Gravity Falls.\n" ] file.writelines(lines)
You'll notice that I used
file_path.open('w') instead of
open(file_path, 'w'). Technically, both do the exact same thing, although the member function is preferred.
The reason that
open("journal1.txt") works is because it's a relative path, starting in the directory the code is being executed from.
If I have a
journals/ directory in the same directory as my code, I can use this:
from pathlib import Path file_path = Path("journals", "journal1.txt") with file_path.open('r') as file: print(file.read())
As long as I don't start with an absolute path, such as that produced by
Path.home(), Paths are relative.
But what if I want to move up a directory, instead of down? You might be tempted to use the
.., but as you may guess, this is not guaranteed to be portable across all operating systems. Instead, I can use
os.pardir to move to the previous directory.
Imagine we have a directory structure that looks like this:
example ├── code │ └── read_file.py └── journals └── journal1.txt
If, from within
path_relative2/code, I run
python read_file.py, I can access
journal1.txt with the following:
from pathlib import Path import os file_path = Path(os.pardir, "journals", "journal1.txt") with file_path.open('r') as file: print(file.read())
We've only just brushed the surface of dealing with files, but hopefully this has demystified the
open() function and
Path objects. Here's a quick recap:
open()function has several modes:
wfor truncate and write,
xfor create and write, and
afor append. Adding
+to any of those adds the missing functionality, either reading or writing.
You must remember to close any file you open. You can do so manually with
If you use the
withstatement (context manager) to open a file, the file will be closed automatically.
myfile.seek()lets you change position in the open file
myfile. This doesn't work with the modes
pathlib.Pathobject lets you create a portable path, build from the strings passed to the
You can call
open()directly on a
Path.home()to get the absolute path to the current user's home folder.
os.pardirto access the parent directory (equivalent to
In the next section, I'll dive into context managers in more depth.
Here's the documentation, as usual:
Top comments (11)
Nice and detailed article. I saw a typo that might be worth to fix. When you write about
readline(), the section title is
readlines(). So there are two sections called
Thanks for the post!
Good catch! Fixing that now. Thanks.
Would love to see the rest of the planned series! I'm sure 2020 has thrown your scheduling off. But I am really enjoying your teaching style and humor!
Heheh, there's more coming, I promise! 2020 has indeed thrown me off, and I've been catching up on the book itself (which is 3/5 done). The latest plan is on the first post in the series.
Excited to hear that! I tried my hand at teaching Python, but took the approach of teaching someone who doesn't know programming at all. So it's much more simple.
Hope all is well! Thank you for continuing to chip away at this series.
Please mention operator / in pathlib too.
Also, read_text() which is very convenient.
Thanks for the suggestions. The book will go into more depth than I can explore in the article, and I'll be sure to include these in that!
In the bulletpoint "You must remember to close any file you open. You can do so manually with
open(), or...", I think you mean
Nice and to the point
In your examples of writelines(), you pass a list containing a single string element, but in practice, isn’t the usefulness of writelines() in passing several individual lines in a list?
Hmm, I appear to be doing exactly that? I'm not sure what you mean.