DEV Community

Marco
Marco

Posted on • Originally published at blog.disane.dev

Download Amazon invoices automatically

I often order from Amazon, from hardware to other things. Since Amazon doesn't send invoices by email and I need them in my DMS, I've now found a solution for this. 🔥


I always have the problem that I order a lot from Amazon throughout the year. Whether it's hardware or other stuff I need. But now I have the problem that Amazon doesn't send emails and I want to have the invoices in my DMS. But now there's a solution for that 🔥


If you're also interested in a good document management system, just take a look at my article about a very good system:

Everyone needs a DMS at homePreview imageYour documents are piling up in folders? Just digitize them 📃


The problem 🤔

More and more service providers are moving away from sending invoices by email and instead making them available in customer portals. This has advantages but also disadvantages. At the latest when you receive a lot of invoices, it becomes stupid to "just" download these invoices. Amazon sets a precedent here for customer unfriendliness.

There is no way to download all invoices in a bundle, at least not in the "normal" version. It may be that business accounts can do this, but I don't know.

But if, like me, you also do your annual income tax return and also want to deduct some things, you don't want to have to rummage through Amazon at the end of the year. I did that for years, it's stupid and really no fun. So I had to find a solution for this.

The solution 💡

As you know, I've been a professional software developer for almost 16 years. So it made sense for me to write a tool for this and that's how "Docudigger" came about.

GitHub - Disane87/docudigger: Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)Preview imageWebsite scraper for getting invoices automagically as pdf (useful for taxes or DMS) - GitHub - Disane87/docudigger: Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)

Background 🛠️

The challange is that some providers also do not offer APIs with which you could download your invoices. Docudigger uses NodeJS and the library Puppeteer in the background. Puppeteer is essentially a simple web scraper that pretends to be a user and then calls up websites in the browser. But the trick is that everything happens automatically and via code.

This is exactly how you can log in to websites and navigate normally as if you were a "real" user.

Installation ⌨️

In order to use Docudigger, you must have a Docker host. This is the easiest web to run applications in a controlled environment. That's why I chose this.

Docudigger also offers the option of running this via command line, but this has the disadvantage that it does not run periodically. Here you have to decide for yourself what the best procedure is for you.

If you decide to use Docker, you can simply execute the following command and Docudigger should be ready for use immediately:

docker run \
  -e AMAZON_USERNAME='[YOUR MAIL]' \
  -e AMAZON_PASSWORD='[YOUR PW]' \
  -e AMAZON_TLD='en' \
  -e AMAZON_YEAR_FILTER='2020' \
  -e AMAZON_PAGE_FILTER='1' \
  -e LOG_LEVEL='info' \
  -v "C:/temp/docudigger/:/home/node/docudigger" \
  ghcr.io/disane87/docudigger
Enter fullscreen mode Exit fullscreen mode

For more configuration options, see the documentation:

GitHub - Disane87/docudigger: Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)Preview imageWebsite scraper for getting invoices automagically as pdf (useful for taxes or DMS) - GitHub - Disane87/docudigger: Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)

💡 Please adjust the paths according to your system. Above all, you must adapt the login and file system paths to your circumstances.

The nice thing about it is that Docudigger pulls all PDFs and stores them in the path /home/node/docudigger (in the container). If you now map it so that it points to the Paperless Consume folder, for example, Paperless can process these documents directly as soon as they are stored there. You should take a look at the ONLY_NEW setting. If this is switched on, the first run is logged and the next run starts at the last processed item.

Other cool features 🧙

I designed Docudigger in such a way that it consists of plugins that you can extend yourself. My idea behind it was that you write one plugin per provider (as page layout etc.) will differ significantly from one another.

So it is planned, for example, that you can pull invoices from your Internet provider or account statements from your bank, all completely automated.

However, all of this is still under construction and currently only supports Amazon. However, more plugins are planned for the future. However, as this is a hobby project, it sometimes takes a little longer 😊

Restrictions ⚠️

Of course, there are also a few restrictions. For example, it is not possible for captchas on the page to be answered automatically (at least not yet).

The tool is also not yet able to handle two-factor authentication. So you'll have to decide for yourself whether the tool is right for you. These features are planned, but when I will be able to implement them is unfortunately still up in the air.

However, my plan is to actively work on the tool 🤞


I hope I have been able to open up a new possibility for you to automate your invoices from Amazon (or other systems in the near future).


If you like my posts, it would be nice if you follow my Blog for more tech stuff.

Top comments (2)

Collapse
 
michaeltharrington profile image
Michael Tharrington

Wow! This is super handy. Cool creation, Marco, and appreciate ya sharing! 🙌

Collapse
 
disane profile image
Marco

Thank you! Could be that some edge cases are not handled currently and only German Amazon is tested but that could be pretty handy indeed :)