loading...
Cover image for Dead Simple Python: Working with Files

Dead Simple Python: Working with Files

codemouse92 profile image Jason C. McDonald ・12 min read

For me, any project doesn't start feeling "real" until the program starts reading from or writing to external files.

Unfortunately, this is one of those topics that often suffers the most from tutorial oversimplification. If only someone would document all the weirdness with files in one place...

Basic File Opening

Let's start with the most basic file reading example. Assume I have a file in the same directory as my code file, entitled journal1.txt.

The standard way of opening files is using the built-in open() function, which is imported by default from the io module.

file = open("journal1.txt", 'r')
for line in file:
    print(line)
file.close()

The open() function accepts a number of arguments for interacting with files in advanced ways, but most of the time, you'll only need the first two.

The first argument, file, accepts a string containing either an absolute or relative path to the file being opened. This is the only strictly required argument.

The second argument, mode, accepts a string indicating the file mode. If this is not specified, 'rt' will be used, meaning it will read the file as text. The mode 'r' is effectively the same thing, since text-mode (t) is part of the default behavior.

I could have just used this line instead and gotten the same behavior...

file = open("journal1.txt")

...but I personally prefer to explicitly indicate whether I'm reading (r), writing (w), or what have you.

All "file" objects returned from open() are iterable. In the case of text files, a TextIOWrapper object is returned. In my example above, I iterate over the lines in the TextIOWrapper object file, and I print out each line.

Once I'm done working with the file, I need to close it with file.close(). It's important not to rely on the garbage collector to close files for you, as such behavior is neither guaranteed nor portable across implementations. Additionally, Python isn't guaranteed to finish writing to a file until .close() is called.

Running that code, at least in my case, prints out the contents of journal1.txt:

Could this be where all missing things are found?

Magic Hair Only for the Pure of Heart

Not naturally occurring?

Could result in minor gravity anomalies!

Context Managers

In practice, always remembering to call close() can be a royal pain, especially once you factor in possible errors with opening a file. Thankfully, there's an even cleaner way: context managers!

A context manager is defined by the with statement in Python. I can rewrite my earlier code using this syntax:

with open("journal1.txt", 'r') as file:
    for line in file:
        print(line)

The open() function is called, and if it succeeds, the resulting TextIOWrapper object is stored in file, and can be used in the body of the with statement. file.close() is implicitly called once control leaves the with statement; you never have to remember to call it!

We'll cover this in more depth in the next chapter.

File Modes

The documentation mentions several modes which can be used with open():

  • r opens the file for reading (the default).
  • w opens or creates the file for writing, deleting (truncating) its contents first.
  • a opens or creates the file for writing, but appends to the end instead of truncating.
  • x creates and opens a new file for writing; it cannot open existing files.
  • + opens the file for both read and write (see table below).
  • t works with the file in text mode (the default).
  • b works with the file in binary mode.

These mode flags can be combined together. For example, a+b allowing writing and reading, where writing appends to the end of the file, in binary mode.

The + flag is always combined with another flag. When combined with r, it adds the functionality of a, except it starts at the beginning of the file (without truncating). When combined with w, a, or x, it allows reading as well.

The behavior of the different flags is best understood with this table, adapted from this Stack Overflow answer by industryworker3595112:

                     | r   r+   w   w+   a   a+   x   x+
---------------------|----------------------------------
allow read           | ✓   ✓        ✓        ✓        ✓
allow write          |     ✓    ✓   ✓    ✓   ✓    ✓   ✓
create new file      |          ✓   ✓    ✓   ✓    ✓   ✓
open existing file   | ✓   ✓    ✓   ✓    ✓   ✓
erase file contents  |          ✓   ✓
allow seek           |     ✓    ✓   ✓             ✓   ✓
position at start    | ✓   ✓    ✓   ✓             ✓   ✓
position at end      |                   ✓   ✓

Reading

We can read from a file in text mode using using either the read(), readline(), or readlines() functions, or by iterating over it directly.

Of course, this requires that the file be opened for reading, using the appropriate file mode flags (see "File Modes" section). If you ever need to check whether an object file can be read, use the file.readable() function.

Let's contrast the three ways of reading from a file:

read()

The read() function reads the entire contents of the file as one long string.

with open("journal1.txt", 'r') as file:
    contents = file.read()
    print(contents)

# Could this be where all missing things are found?
# Magic Hair Only for the Pure of Heart
# Not naturally occurring?
# Could result in minor gravity anomalies!

Alternatively, you can tell read() the maximum number of characters to read from the file stream:

with open("journal1.txt", 'r') as file:
    contents = file.read(20)
    print(contents)

# Could this be where

readline()

The readline() function behaves exactly like read(), except it stops reading when it encounters a line break. The line break is included in the returned string.

with open("journal1.txt", 'r') as file:
    contents = file.readline()
    print(contents)

# Could this be where all missing things are found?

As with read(), you can specify the maximum number of characters to be read:

with open("journal1.txt", 'r') as file:
    contents = file.readline(20)
    print(contents)

# Could this be where

readlines()

The readlines() function returns the entire file as a list of strings, with each string being one line.

with open("journal1.txt", 'r') as file:
    contents = file.readlines()
    for c in contents:
        print(c)

# Could this be where all missing things are found?
#
# Magic Hair Only for the Pure of Heart
#
# Not naturally occurring?
#
# Could result in minor gravity anomalies!
#

You'll notice that the newline character is included with each line. We can remove that by calling the .strip() function on each string.

with open("journal1.txt", 'r') as file:
    contents = file.readlines()
    for c in contents:
        print(c.strip())

# Could this be where all missing things are found?
# Magic Hair Only for the Pure of Heart
# Not naturally occurring?
# Could result in minor gravity anomalies!

You can also limit how much is read from the file by specifying a maximum number of characters. Unlike before, however, this is not a hard limit. Instead, once the specified limit has been exceeded by the total number of characters read from all lines so far, only the rest of the current line will be read.

This is best understood by comparing read() and readlines(). First, I'll read to a hard limit of 60 characters:

with open("journal1.txt", 'r') as file:
    contents = file.read(60)
    print(contents)

# Could this be where all missing things are found?
# Magic Hair

Compare that to calling readlines() with a "hint" of 60 characters:

with open("journal1.txt", 'r') as file:
    contents = file.readlines(60)
    for c in contents:
        print(c.strip())

# Could this be where all missing things are found?
# Magic Hair Only for the Pure of Heart

In that second example, the entirety of the first two lines is read, but no more.

Unlike with the other two functions, readlines() only always reads the whole line.

Iteration

As you saw earlier, we can iterate over a file directly:

with open("journal1.txt", 'r') as file:
    for line in file:
        print(line)

# Could this be where all missing things are found?
# Magic Hair Only for the Pure of Heart
# Not naturally occurring?
# Could result in minor gravity anomalies!

This is functionally the same as:

with open("journal1.txt", 'r') as file:
    for line in file.readlines():
        print(line)

The difference between the two is that the first approach, the direct iteration, is lazy, whereas the second approach reads the whole file first, before iterating over the contents.

Writing

We can write to a file in much the same way, using either the write() or writelines() functions.

This requires that the file be opened for writing (see "File Modes" section). The file.writable() function can be used to check if the file object is writable.

In the examples for this section, I'll show the file contents in the comment at the bottom.

write()

The write() function writes the given lines to a file.

I can write an entire multiline string to a new file called journal3.txt with write(), like so:

entry = """If you go on enough road trips
chances are, you've seen a
certain bumper sticker:
WHAT IS THE MYSTERY SHACK?
"""

with open("journal3.txt", 'x') as file:
    file.write(entry)

# If you go on enough road trips
# chances are, you've seen a
# certain bumper sticker:
# WHAT IS THE MYSTERY SHACK?
#

As long as journal3.txt does not already exist, it will be created with the given contents.

I can override the entire contents of journal3.txt using the w file mode:

with open("journal3.txt", 'w') as file:
    file.write("GNOMES\nWEAKNESS?\n")

# GNOMES
# WEAKNESS?
#

GOTCHA ALERT: Watch your file mode! w and w+ will erase the entire contents of the file. Use either a or a+ to write to the end of the file.

I can append to the file instead using the a file mode:

with open("journal3.txt", 'a') as file:
    file.write("leaf blowers\n")

# GNOMES
# WEAKNESS?
# leaf blowers
#

The write() function also returns an integer representing the number of characters written.

writelines()

The writelines() function writes a list of strings to a file.

lines = [
    "Finally back safe and sound\n",
    "from one of the weirdest days\n",
    "at Gravity Falls.\n"
]

with open("journal3.txt", 'w') as file:
    file.writelines(lines)

# Finally back safe and sound
# from one of the weirdest days
# at Gravity Falls.
#

Unlike with with write(), the writelines() function only ever returns None.

Seeking

The file.seek() function allows you to move back and forth within a file object file, character-by-character. When working with text streams, it accepts one argument: a positive integer representing a new position to move to, represented as the number of characters from the beginning.

In addition to changing position, the file.seek() function will also return an integer representing the new absolute position in the file. You can also get the current position by calling the file.tell() function.

The r+ file mode is best used in conjunction with the seek() function, although it can be used with any other file mode besides a and a+.

I'll first use the seek() function to read only part of the journal1.txt file:

with open("journal1.txt", 'r') as file:
    file.seek(50)
    contents = file.read(5)
    print(contents)

# MAGIC

I'll write a new initial version of the journal3.txt file:

with open("journal3.txt", 'w') as file:
    file.write("FLOATING EYEBALLS")

# FLOATING EYEBALLS

I can use the r+ mode to change part of this file.

GOTCHA ALERT: The write() command will always write over existing contents of the file, unless you append to the end. To insert text into a file non-destructively, it's usually best to read the entire contents in as a string (or list), edit the string, and then write it back out.

Here, I'll replace the word "EYEBALLS" with "NONSENSE!":

with open("journal3.txt", 'r+') as file:
    file.seek(9)
    file.write("NONSENSE!")

# FLOATING NONSENSE!

After opening the file, I move to the 9th character from the beginning, and then write() the new data.

Seek with Binary

When you open a file in binary mode (b), you can move in the file in a more dynamic way, with two arguments instead of one:

  • offset: the distance in characters to move (can be negative)
  • whence: the position from which to calculate the offset: 0 for the start of the file (default), 1 for the current position, or 2 for the end of the file.

Unfortunately, use of the whence argument doesn't work with files opened in text mode.

File Errors

The four most common errors relating to working with files are as follows:

FileNotFoundError

The r and r+ modes require that a file exist before opening it. Otherwise, a FileNotFoundError will be raised:

try:
    with open("notreal.txt", 'r') as file:
        print(file.read())
except FileNotFoundError as e:
    print(e)

FileExistsError

The x and x+ file modes are specifically for creating a new file. If the file already exists, a FileExistsError will be raised:

try:
    with open("journal3.txt", 'x+') as file:
        print(file.read())
except FileExistsError as e:
    print(e)

UnsupportedOperation

The io.UnsupportedOperation error is raised whenever you try to read on a file only opened for writing, or write on a file opened only for reading:

import io

try:
    with open("journal3.txt", 'w') as file:
        print(file.read())
except io.UnsupportedOperation as e:
    print(e)


try:
    with open("journal3.txt", 'r') as file:
        file.write('')
except io.UnsupportedOperation as e:
    print(e)

An Issue of Line Separators

Some savvy readers will recall that, while UNIX uses \n as a line separator, Windows uses \r\n. Surely this matters when we're reading and writing files, right?

In fact, Python abstracts this out for us behind-the-scenes. Always use \n as a line separator when writing files in text mode, regardless of operating system!

File Paths

Up to this point, I've just been using a file in the same folder as the code, but this is very rarely what we want! We need to be able to build paths to files.

The trouble is, file paths are not the same on all systems. UNIX-style systems, such as macOS and Linux, use the UNIX file path conventions, while Windows uses an entirely different scheme. Our solution has to work for both, meaning that harded paths are not an option.

To address this, Python offers two modules: os and pathlib.

Creating a Path

Python actually offers multiple classes for building paths, depending on your particular needs. In most cases, however, you should just use pathlib.Path.

Let's say I wanted to make a special directory in the current user's home folder called .dead_simple_python, and then write a file to that location. Here's how I'd do that:

First, I create a Path() object pointing to just the final desired directory (not yet the file).

In the Path() constructor, I pass each part of the path as a separate string. I can use the class method Path.home() to acquire the path to the user directory.

from pathlib import Path
import os

file_path = Path(Path.home(), ".dead_simple_python")

Next, I'll check if the path already exists using file_path.exists(), and if it doesn't exist, I'll use the os.makedirs function to create any of the missing directories in the path:

if not file_path.exists():
    os.makedirs(file_path)

Finally, I can add the filename to the path object I already have, and then open that file for writing:

file_path = file_path.joinpath("journal4.txt")

with file_path.open('w') as file:
    lines = [
        "If you've ever taken a road trip \n",
        "through the Pacific Northwest, you've \n",
        "probably seen a bumper sticker for a \n",
        "place called Gravity Falls.\n"
    ]
    file.writelines(lines)

You'll notice that I used file_path.open('w') instead of open(file_path, 'w'). Technically, both do the exact same thing, although the member function is preferred.

Relative Paths

The reason that open("journal1.txt") works is because it's a relative path, starting in the directory the code is being executed from.

If I have a journals/ directory in the same directory as my code, I can use this:

from pathlib import Path

file_path = Path("journals", "journal1.txt")

with file_path.open('r') as file:
    print(file.read())

As long as I don't start with an absolute path, such as that produced by Path.home(), Paths are relative.

But what if I want to move up a directory, instead of down? You might be tempted to use the .., but as you may guess, this is not guaranteed to be portable across all operating systems. Instead, I can use os.pardir to move to the previous directory.

Imagine we have a directory structure that looks like this:

example
├── code
│   └── read_file.py
└── journals
    └── journal1.txt

If, from within path_relative2/code, I run python read_file.py, I can access journal1.txt with the following:

from pathlib import Path
import os

file_path = Path(os.pardir, "journals", "journal1.txt")

with file_path.open('r') as file:
    print(file.read())

Review

We've only just brushed the surface of dealing with files, but hopefully this has demystified the open() function and Path objects. Here's a quick recap:

  • The open() function has several modes: r for read, w for truncate and write, x for create and write, and a for append. Adding + to any of those adds the missing functionality, either reading or writing.

  • You must remember to close any file you open. You can do so manually with open(), or...

  • If you use the with statement (context manager) to open a file, the file will be closed automatically.

  • myfile.seek() lets you change position in the open file myfile. This doesn't work with the modes r, a, or a+.

  • A pathlib.Path object lets you create a portable path, build from the strings passed to the Path() initializer.

  • You can call open() directly on a Path object.

  • Use Path.home() to get the absolute path to the current user's home folder.

  • Use os.pardir to access the parent directory (equivalent to .. on Unix.)

In the next section, I'll dive into context managers in more depth.

Here's the documentation, as usual:

Posted on by:

Discussion

pic
Editor guide
 

Nice and detailed article. I saw a typo that might be worth to fix. When you write about readline(), the section title is readlines(). So there are two sections called readlines().
Thanks for the post!

 

Good catch! Fixing that now. Thanks.

 

Would love to see the rest of the planned series! I'm sure 2020 has thrown your scheduling off. But I am really enjoying your teaching style and humor!

 

Heheh, there's more coming, I promise! 2020 has indeed thrown me off, and I've been catching up on the book itself (which is 3/5 done). The latest plan is on the first post in the series.

 

Excited to hear that! I tried my hand at teaching Python, but took the approach of teaching someone who doesn't know programming at all. So it's much more simple.

Hope all is well! Thank you for continuing to chip away at this series.

 

Please mention operator / in pathlib too.
Also, read_text() which is very convenient.

 

Thanks for the suggestions. The book will go into more depth than I can explore in the article, and I'll be sure to include these in that!

 
 

In your examples of writelines(), you pass a list containing a single string element, but in practice, isn’t the usefulness of writelines() in passing several individual lines in a list?

 

Hmm, I appear to be doing exactly that? I'm not sure what you mean.

lines = [
    "Finally back safe and sound\n",
    "from one of the weirdest days\n",
    "at Gravity Falls.\n"
]
with open("journal3.txt", 'w') as file:
    file.writelines(lines)