Adam.S

Posted on Feb 3, 2021 • Edited on Jun 28, 2021 • Originally published at bas-man.dev

Processing Email Contents

#google #python #api #email

Part 4 in a series of articles on implementing a notification system using Gmail and Line Bot

Hi. Back again for another instalment.

Today I will be working through accessing parts of an email. In part 3 we got to the stage where I had created an email.message.EmailMessage object. You may not believe that is ready to use. Since if you played with it you might have found that it still contained a lot of non-ascii encoded characters, but thats ok. Once we start using the member methods provided by a email.message.EmailMessage object. Things will be clear.

So a quick review:

We did our search, got our message ids, boiled it down to just the message ids minus the threadIds

We then used get_message(service, msg_id) to return an email.message.EmailMessage object.

single_email = get_message(service, some_id)

If you print this with print(single_email) you will get the string representation of the entire email. If it is not in ascii you might see subject line that looks like this.

Subject: =?ISO-2022-JP?B?GyRCRn5CYDw8Pn//yROJCpDTiRpJDsbKEI=?=

And have an email body which is just as confusing. But thats ok. We will use the methods provided by email.message.EmailMessage to get these string returned in a readable form.

Here is a list of some the methods we can use:

single_email.add_alternative()          single_email.get_params()
single_email.add_attachment()           single_email.get_payload()
single_email.add_header()               single_email.get_unixfrom()
single_email.add_related()              single_email.is_attachment()
single_email.as_bytes()                 single_email.is_multipart()
single_email.as_string()                single_email.items()
single_email.attach()                  single_email.iter_attachments()
single_email.clear()                   single_email.iter_parts()
single_email.clear_content()           single_email.keys()
single_email.defects                   single_email.make_alternative()
single_email.del_param()               single_email.make_mixed()
single_email.epilogue                  single_email.make_related()
...
...

Headers can simply be accessed using single_email.get('headername')

Examples:

from = single_email.get("from")
subject = single_email.get("subject")

To check if an email is multipart; single_email.is_multipart() will return a True or False

There are lots of methods to use to deconstruct an email. Fortunately for me. The emails I am dealing with are system generated and also very simple plain text non-multipart.

Let's look at the subject of the email.

Subject: =?ISO-2022-JP?B?GyRCRn5CYDw8Pn//yROJCpDTiRpJDsbKEI=?=

Using the get method:

sub = single_email.get('subject')
print(subject)

I get:

入退室情報のお知らせ

Note that I didn't actually need to know the character encoding. This was due to the way the parser object was setup using the arguments policy=policy.default in the previous post.

So as you can see, getting the header details is pretty easy. How about getting the body of the email? Again this is pretty simple when dealing with a single non-multipart email. I will simply use get_content()

body = single_email.get_content()
print(body)

Redacted Redacted
�=J_] �@O5 �12K^0[0=^ 様の入退室情報をお知らせします。

【セーフティメール情報】
2021-02-01 19:08:26 に退室しました。

※なお、このメールに返信することはできませんのでご注意ください

If you are dealing with an email that is multipart, you will need to use the walk() method. Combined with get_content_maintype() and get_content_subtype() to identify or find things like plain text and HTML or binary attachments.
There already exists some good Python documentation for this. So I won't go into it here.

That's it for this article. Next I will give some information on regex for dealing with Japanese. But you can also take a look at this earlier post.

DEV Community

Processing Email Contents

Top comments (0)