DEV Community


Posted on • Updated on • Originally published at

Processing Email Contents

Part 4 in a series of articles on implementing a notification system using Gmail and Line Bot

Hi. Back again for another instalment.

Today I will be working through accessing parts of an email. In part 3 we got to the stage where I had created an email.message.EmailMessage object. You may not believe that is ready to use. Since if you played with it you might have found that it still contained a lot of non-ascii encoded characters, but thats ok. Once we start using the member methods provided by a email.message.EmailMessage object. Things will be clear.

So a quick review:

We did our search, got our message ids, boiled it down to just the message ids minus the threadIds

We then used get_message(service, msg_id) to return an email.message.EmailMessage object.

single_email = get_message(service, some_id)
Enter fullscreen mode Exit fullscreen mode

If you print this with print(single_email) you will get the string representation of the entire email. If it is not in ascii you might see subject line that looks like this.

Subject: =?ISO-2022-JP?B?GyRCRn5CYDw8Pn//yROJCpDTiRpJDsbKEI=?=

Enter fullscreen mode Exit fullscreen mode

And have an email body which is just as confusing. But thats ok. We will use the methods provided by email.message.EmailMessage to get these string returned in a readable form.

Here is a list of some the methods we can use:

single_email.add_alternative()          single_email.get_params()
single_email.add_attachment()           single_email.get_payload()
single_email.add_header()               single_email.get_unixfrom()
single_email.add_related()              single_email.is_attachment()
single_email.as_bytes()                 single_email.is_multipart()
single_email.as_string()                single_email.items()
single_email.attach()                  single_email.iter_attachments()
single_email.clear()                   single_email.iter_parts()
single_email.clear_content()           single_email.keys()
single_email.defects                   single_email.make_alternative()
single_email.del_param()               single_email.make_mixed()
single_email.epilogue                  single_email.make_related()
Enter fullscreen mode Exit fullscreen mode

Headers can simply be accessed using single_email.get('headername')


from = single_email.get("from")
subject = single_email.get("subject")
Enter fullscreen mode Exit fullscreen mode

To check if an email is multipart; single_email.is_multipart() will return a True or False

There are lots of methods to use to deconstruct an email. Fortunately for me. The emails I am dealing with are system generated and also very simple plain text non-multipart.

Let's look at the subject of the email.

Subject: =?ISO-2022-JP?B?GyRCRn5CYDw8Pn//yROJCpDTiRpJDsbKEI=?=
Enter fullscreen mode Exit fullscreen mode

Using the get method:

sub = single_email.get('subject')
Enter fullscreen mode Exit fullscreen mode

I get:

Enter fullscreen mode Exit fullscreen mode

Note that I didn't actually need to know the character encoding. This was due to the way the parser object was setup using the arguments policy=policy.default in the previous post.

So as you can see, getting the header details is pretty easy. How about getting the body of the email? Again this is pretty simple when dealing with a single non-multipart email. I will simply use get_content()

body = single_email.get_content()
Enter fullscreen mode Exit fullscreen mode
Redacted Redacted
�=J_] �@O5 �12K^0[0=^ 様の入退室情報をお知らせします。

2021-02-01 19:08:26 に退室しました。

Enter fullscreen mode Exit fullscreen mode

If you are dealing with an email that is multipart, you will need to use the walk() method. Combined with get_content_maintype() and get_content_subtype() to identify or find things like plain text and HTML or binary attachments.
There already exists some good Python documentation for this. So I won't go into it here.

That's it for this article. Next I will give some information on regex for dealing with Japanese. But you can also take a look at this earlier post.

Top comments (0)