I have what was a million dollar grossing Amazon business under Brooke's Books, named after my lovely wife Brooke. It was started in 2009 while I was a senior at the Air Force Academy after my brother suggested I sell a few of my older textbooks on Amazon. Ironically, at the time I had scoffed at the idea and asked him to sell them for me; fortunately he refused and I quickly learned how easy it was to sell on Amazon. What began as a small venture selling textbooks for other students for a minimal profit, soon transformed to an all out effort to buy and sell books on Amazon.
What follows is my business and programming journey from a near worthless startup with Excel to seasoned expert using Python and the modern tools of the trade.
When I left the Academy and moved to my first Air Force assignment in Ohio, I no longer had access to a huge college textbook market and almost gave up on the business as I started my studies at AFIT (Air Force Institute of Technology). I dabbled in library and yard sales, but there were almost no current textbooks that would sell for anything less than a few dollars online. After some time, I discovered that eBay was a far superior marketplace to buy from as some auctions would either be missing key data (like the ISBN) or were being sold in the bottom of the market. College textbook prices tend peak in January and August when students go back to school, and if you sell in December and May through July, you're selling when everyone else is while there's no demand.
My education at the Academy was focused in Astronautical engineering, and at the time programming wasn't even remotely emphasized. That meant automating the searching, parsing, and bidding of the eBay auctions using what I knew best: Microsoft Excel. I still had a license left over from school and used that to perform web scraping of eBay for any auction containing a textbook that ending in the next 24 hours using Excel VBA. This worked out surprisingly quite well, and served as the foundation for the business over the next year and a half.
Over this time, I was starting to learn about the cyclic behavior of the market, the aforementioned January and August peaks versus the December and May lows by tracking price history. In a time where free price history resources like CamelCamelCamel or Keepa didn't exist, this meant maintaining my price database entirely in, again, Excel. When buying books on eBay, I needed to know the maximum price they would sell at peak so I knew how to place my maximum bids. The early price collection system would use an external program to collect a single data point for each day for one of the hundred thousand books I was tracking at the time, and save it as a csv to be read into excel. While this process was slow and manual, it was the best I had as I was ignorant of Python and there was no other way of getting the history of a product on Amazon.
In 2011 the business had plateaued at the time to around $50k of profit a year, which was reasonable, but I was always looking for ways to improve it. I still recall the moment where I looked at an excel plot for the past history of a law textbook and realized that instead of performing market arbitrage (buying from eBay and selling on Amazon), I could focus on market timing. Here's an example of a price history plot you can generate right now using the keepa API:
import keepa api = keepa.Keepa('<your API key from https://keepa.com/#!api>') # query and plot asin = '1628101326' products = api.query(asin) keepa.plot_product(products)
This textbook has clear periodic peaks at the start of the fall semester, and when I saw a similar plot of this back in 2011, I realized that I could focus entirely on buying and selling just on Amazon. No more sifting through eBay listings! Now that there are resources that track Amazon prices, you don't have to maintain your own database, but for me, still fresh out of college and with almost no business experience, I was floored by this discovery. With this asymmetric information, I knew when to buy, when to sell, and how much I could afford to buy textbooks for given the estimated future price of a textbook.
The business profitability doubled the next year to $100k and my excel database was getting strained. Even using an excel binary file, the spreadsheet containing my database was over a gigabyte and just opening it up would take around 5 minutes. Querying for a textbook would take another 10 seconds, and it was becoming difficult to maintain a tracking database.
At the time, I was an Air Force officer working at the Air Force Research Laboratory at Wright Patterson AFB, and the Air Force used (and sadly still uses) a lot of MATLAB. I was starting to get into coding and automation there and decided to transition my excel codebase and database over to MATLAB. At the time, this seemed like a logical choice given MATLAB's strong suite of database functionality, and I was fine with shelling out the cost of a license to increase the business profitability. Since I'd never heard of Python, and MATLAB was leaps and bounds ahead of Excel VBA, I was fine with transition. I even created a GUI within MATLAB that allowed me to have a spreadsheet-like experience with a list of books I was tracking on the bottom half of the screen and a plot and entry box on the top of the screen to plot the price history.
Over the next two years from 2012 to 2014, I ran the business solely with the help of MATLAB and my grit. Due to MATLAB's proprietary nature and lack of great third party packages, whenever I wanted to add functionality to the program or database, I had to write it myself. At the time, there was no integration with Amazon's API; when creating shipments or tracking inventory I had to use excel or web forms. Grabbing data still required external programs to pull data from Amazon and loading CSV files was manually done. In other words, like Excel, it worked, but it wasn't optimal and worse I was stuck entirely within MATLAB's ecosystem of pay-to-use toolboxes.
Three things happened around the same time to force me to transition out of MATLAB to a superior language. First, I was leaving the Air Force and would be starting to work as a subcontractor for them while working remotely. At the time I had to purchase another license of MATLAB, and really didn't want to have two licenses (one for the book business and one as an Air Force subcontractor). I was getting frustrated with the inflexibility of MATLAB's licensing and toolboxes, and had even considered moving to Octave before discovering Python. I can't remember exactly how I had discovered it, but I recall installing the Anaconda Python distribution and immediately feeling comfortable with Spyder given how similar it was with MATLAB's IDE. Learning that there was more to programming than matrices and arrays really took me by surprise coming from a MATLAB only background. When I realized that I could install someone else's module with
pip install <module>, I was sold.
The second reason to transition to a more flexible, modular programming ecosystem was my shift from fulfillment by merchant (FBM) to fulfillment by seller (FBA). For years I had been personally buying, sorting, binning, repackaging and shipping thousands of books a year. At one point, the post office even refused to deliver or pickup my books because it was straining their resources! Back in 2013, I was personally contacted by Amazon to transition my inventory to fulfillment by Amazon, which was at the time looking for early adopters. I needed a new method of inventory management as I needed to be able to interface with Amazon's API. Enter the world of Python modules! The boto module worked out of the box and allowed me to automate my shipments, inventory management, and even pricing of my entire inventory. After a month, I had a collection of personal inventory and pricing scripts all using a free software and community maintained, open-source modules. I could even start writing my own classes:
class MWS(object): """Interface with MWS""" def __init__(self): """ initializes mws api """ if connected(): self.Open() def open(self): """ Opens connection to Amazon MWS """ # Establish global MWS connection log.info('Establishing MWS connection...') self.api = MWSConnection(ACCESSKEYID, SECRETKEY, validate_certs=False) self.api.SellerId = MERCHANT self.api.Merchant = MERCHANT self.api.MarketplaceId = MARKETPLACEID log.info('Established') def get_single_order(self, orderID, callback=None): """ Returns text on a single order """ # check for internet connection before submitting connected(allowException=True) log.debug('Requesting %s order information' % orderID) result = self.api.get_order(AmazonOrderId=[orderID]) orders = result.GetOrderResult.Orders.Order if len(orders): order = orders
This enabled me to finally solve my third goal: automated price research and database maintenance. The bottlenose module allows access to Amazon's product advertising API, which can be used for price research. It's fairly limited in the number of requests that you an make with it, but if you limit the number of products you track, you can update an entire database (I had pared mine down to 50k books), overnight, all within Python, as a scheduled python script.
These changes were necessary as I was moving on from being an Air Force officer to becoming a subcontractor with a, what some may find ironic, higher workload. I had also hired a subcontractor of my own to deal with the repackaging and shipment of books from individual sellers to Amazon's warehouses, and I needed precise knowledge over inventory so that I could track the any book from purchase to eventual sale, which wasn't possible within MATLAB without writing everything from scratch. With Python, it generally meant looking for anything between a code snippet on Stack Overflow or a full blown module on GitHub or PyPi, saving weeks if not months of effort. Want to read excel files? Try xlrd or
openpyxl. Need to automate the tracking of shipments to check if they were delivered? Use tracking-url. Large databases?
pandas. Web scraping?
beautifulsoup4. Validate ISBNs?
pyisbn. Schedule scripts or programs from python?
python-crontab. The list goes on and on and on.
With the magic of
pyqt, I created a fully featured GUI that allowed me to track which books could be purchased, query Amazon for the listing, preview the history of the book, and check my own inventory in real-time all without even opening up a browser. As the database was loaded in advance, I didn't need to manually query for anything and with the use of a cache, I could instantly have all the data I needed.
Over the next few years, I kept incrementally improving my code to make it easier to maintain and track changes. I converted the collection of scripts to a genuine python module so I could install it on multiple machines, incorporated GitHub for private code backup, and even added some (very) basic unit tests with
pytest to make sure that I didn't break anything between my revisions.
At the peak of the business in 2016, I was bringing in $250k per year and I was only investing about 4 to 8 hours a week maintaining the code and running the software. For a short time, I thought that the business would go on indefinitely.
Like all good things, especially those with low cost of entry, other people were quick to figure out and enter the business. For some time, I was able to be competitive without changing the business model or my purchasing algorithm, but I was soon finding that other sellers were undercutting me by a few cents during peak sales season and winning the buy box. It got so bad that during a two week selling period (Christmas break no less) I had to spend several hours each day manually updating prices to ensure that I remained competitive. Given the release schedule of college textbooks, missing out on a semester meant that your book is no longer the most current edition and effectively worthless.
While on a ski trip in 2018 with my wife and (at the time) three kids and unable to manually micromanage book prices, I realized that I needed to create a script/module that could automatically, and persistently update my prices for me within certain parameters. Given the inconsistency of the hotel WiFi, this wasn't something I trusted to run on my local computer, so I resorted to creating a script that would run on the cloud (Google cloud to be specific) on one of their free offerings. That evening, after managing the kids and going on a few runs, I wrote a very basic script that would:
- Grab my current inventory using
- Check the competitive prices through
- Modify my prices based on a certain price threshold
- Submit the prices and repeat.
And behold, it actually worked:
brookesbooks.inventory - DEBUG - Loading inventory file /home/alex/.brookesbooks/databases/inventory/1599180665.tsv brookesbooks.inventory - DEBUG - Requesting inventory brookesbooks.mws - DEBUG - Requesting _GET_FBA_MYI_UNSUPPRESSED_INVENTORY_DATA_ report brookesbooks.mws - DEBUG - Report attempt 1... brookesbooks.mws - DEBUG - Getting report 79083018510 brookesbooks.mws - DEBUG - Acquired report 79083018510 brookesbooks.inventory - DEBUG - Inventory saved to /home/alex/.brookesbooks/databases/inventory/1599334709.tsv brookesbooks.inventory - DEBUG - Loading inventory file /home/alex/.brookesbooks/databases/inventory/1599334709.tsv brookesbooks.manager - INFO - Starting persistent undercut brookesbooks.inventory - DEBUG - Loading inventory file /home/alex/.brookesbooks/databases/inventory/1599180665.tsv brookesbooks.mws - DEBUG - Getting competitive pricing brookesbooks.mws - DEBUG - Competitive pricing 0 of 13 brookesbooks.mws - DEBUG - Competitive pricing 10 of 13 brookesbooks.mws - INFO - Acquired Competitive pricing brookesbooks.manager - DEBUG - Changing 1454868295_VG from $26.19 to $26.05 brookesbooks.manager - DEBUG - Due to compeditive price $26.14 brookesbooks.manager - DEBUG - Original price was $26.19 brookesbooks.manager - DEBUG - Changing 1454868406_LN from $97.00 to $96.71 brookesbooks.manager - DEBUG - Due to compeditive price $96.75 brookesbooks.manager - DEBUG - Original price was $97.00 brookesbooks.manager - DEBUG - Changing 1454881798_A from $146.98 to $146.04 brookesbooks.manager - DEBUG - Due to compeditive price $146.10 brookesbooks.manager - DEBUG - Original price was $146.98 brookesbooks.manager - DEBUG - Changing 1454881798_N from $220.89 to $220.70 brookesbooks.manager - DEBUG - Due to compeditive price $220.79 brookesbooks.manager - DEBUG - Original price was $220.89 brookesbooks.manager - DEBUG - Changing 163043051X_G from $34.08 to $33.82 brookesbooks.manager - DEBUG - Due to compeditive price $33.85 brookesbooks.manager - DEBUG - Original price was $34.08 brookesbooks.manager - DEBUG - Changing 1640208453_A from $124.89 to $124.68 brookesbooks.manager - DEBUG - Due to compeditive price $124.78 brookesbooks.manager - DEBUG - Original price was $124.89 brookesbooks.mws - DEBUG - Sending 6 item price feed brookesbooks.mws - DEBUG - 1454868295_VG 26.05 brookesbooks.mws - DEBUG - 1454868406_LN 96.71 brookesbooks.mws - DEBUG - 1454881798_A 146.04 brookesbooks.mws - DEBUG - 1454881798_N 220.70 brookesbooks.mws - DEBUG - 163043051X_G 33.82 brookesbooks.mws - DEBUG - 1640208453_A 124.68 brookesbooks.mws - DEBUG - Waiting on feed result
In fact, it worked so well the hardest part was thinking about how to deploy my module on a remote VM and have it run consistently. I ended up simply using my private GitHub repository, cloning it on the VM, and then running
brookesbooks was now a python module, I could install it anywhere (even on the cloud). Here's what the
setup.py looks like:
"""Setup.py for Brooke's Books""" import os from setuptools import setup from io import open as io_open package_name = 'brookesbooks' __version__ = None version_file = os.path.join(os.path.dirname(__file__), package_name, '_version.py') with io_open(version_file, mode='r') as fd: exec(fd.read()) setup( name=package_name, packages=[package_name], # Version version=__version__, description="Brooke's Books", # Author details author='Alex Kaszynski', email@example.com', scripts=['brookesbooks/brookesbooks', 'brookesbooks/undercut'], # Run-time dependencies install_requires=['numpy', 'bottlenose', 'boto', 'matplotlib', 'pandas', 'pyqt5', 'xlrd', 'pyisbn', 'python-crontab', 'tqdm', 'pyqtgraph', 'beautifulsoup4', 'openpyxl', 'keepa', 'lxml', 'tracking_url', 'boto', 'qdarkstyle'] )
Over the next few days, I could actually log in to the VM on my iPhone using either
terminus with SSH keys I'd generated, or Google's app. I was partial to
terminus as I really liked their interface, so it was worth it to create SSH keys and upload those. With this, I could check query my inventory status, competitive pricing, and status of my remote application all on my phone over a terminal without investing time creating a web application! My biggest challenge was not dropping the phone on the chairlift.
With this and a several other tweaks, I had bought some time to consider my options on Amazon. At this time, I was living in Germany and still employed with AFRL as a sub-contractor. The business was doing well, but was nowhere as profitable in the past due to the number of competitors that had come to Amazon to do (what I'm guessing) the exact same thing I was doing. While I was investing remarkably little time in the business, I still had to do all the accounting, taxes (in two countries...), and customer service that goes along with an online business. Profit margins per book, which had started out as an astronomical $35 per book, had fallen to $5 to $10 with many textbooks barely breaking even. FBA storage pricing had gone up, and I was paying my second subcontractor over $15,000 a year in inventory management.
The business was still profitable, but my time investment with AFRL, my family, and (gasp!) recreation meant that time had become more valuable to me and I had less motivation to dedicate to the business. In the middle of 2019 when I started to work for ANSYS, I made the decision to begin to exit the business at the end of the year. I purchased my last book in December and have been selling off the rest of my inventory since. As of the writing of this, I have 137 books left; down from my peak of 7000.
$ python3 -c "from brookesbooks import inventory; print(inventory.Inventory().total)" 137
While I'm still selling books, I'm not too worried about potential competition at this point as I'm nearly out, which is why I'm fine not redacting the ASINs and ISBNs of some of the books that I've used. As with anything, it's more about the algorithm than just the idea, and I should be able to sell off the last bit this winter. I think there's still life in this business, but it's no longer at the level of profitability to make it worthwhile to invest my time in. I'm ready to move on to something new.
One of the most important things I learned over the decade of the business was, without a doubt, the need for delegation. In the beginning, I liked to code everything myself within Excel VBA and MATLAB and rarely used other people's work. With the advent of open-source solutions, Stack Overflow, and emerging languages like Python, it's become much easier to use other people's solutions. I wish I had focused using what others had created earlier on. When looking at a project or goal, think first of how you can utilize someone else's work and then only as a last resort write your own. I see too many cases where organizations, small and large, choose to write their own private code when they could use something maintained by the community because of organizational inflexibility or mistrust of open-source. I think the software community is coming around, but I still see a remarkable amount of wheel-reinventing in both the public, private, and government sectors.
This same principle of delegation also applies outside of coding. I took too long to hire someone to deal with physical inventory management and should have taken steps to interview, hire, and delegate the responsibility of handling the textbooks to someone I trusted. To those that I did finally hire, I thank you for your over 6 years of loyal and steadfast work taking care of what I couldn't.
There are a variety of general software lessons learned that most will see as common knowledge, but I still think it's important to say:
- Feel free to write simple scripts, but anything complex should be modularized. And don't extend your python path.
pip install -e .it.
- Premature optimization is the root of all evil. That week that you optimized a function could have been a week introducing a new feature that saved you a month of work.
- Slow code is better than no code. Get something that works first and then optimize it once you have nothing else better to do.
- Document your code if you think you'll use it more than once. That's basically 99.9% of your code.
- Please use git and GitHub. This seems obvious, but I think new coders are scared of git. Upload your code to GitHub under a private repository and keep it there as a backup. You won't regret it and might even thank yourself if your laptop gets stolen, or worse, you run
rm -rf *.
- Get comfortable with the command line regardless of your OS. You'll hate it at first, learn to like it, and eventually wonder why you'd use anything else.
- Don't write the GUI first or concurrent with the core backend. Write the core module first and maybe put a GUI if you think you need a UI. Writing the GUI first is asking for pain. Get something that works as a standalone service/program and then think about the UI. UI development is over half the battle, and you might find you don't even need it.
There's more, but it's been a long post already.
If you're new or have no experience with programming and need to automate something, be sure to check out Python. I feel that universities are doing their students a disservice by not exposing them to Python and just giving them experience with MATLAB, Excel, or various other proprietary products (I'm looking at you Mathematica). Python can just about do it all. For free. And if it can't, someone will probably add it, and that person might just be you.
Like parenting or teaching, what you gain by giving of yourself is usually far and beyond your investment of time. Case in point, I was hired because of my open source module pyansys, and I've had numerous offers for part-time work (which I've had to turn down) because of keepa. Additionally, I have a friend who was hired in part of his work with pyvista, which used to be
vtki, another module I had created because I saw there was a missing feature that could benefit the community in general.
Open source is the future, and as Microsoft, Google, and Amazon have shown, it doesn't mean the end of profitable software. In fact, the opposite is more true now more than ever. Most closed source software I've seen contains about 90% duplication, 9% new, and less than 1% genuine critical proprietary code. Having someone else maintain that 90% to 99% is a boon, not a bane. It's that 1% that makes all the difference, so focus on that and leave the rest for someone else to maintain. And given what I've seen, they'll do a better job at it than you ever would have, for free.
I might at some point clean up and post my
brookesbooks module should there be any requests for it. If you're interested, please let me know. It's not a trivial amount of work to clean it up, so I won't do it unless there's a genuine need for it.
Thanks for reading!