ZhiHong Chua

Posted on Nov 26, 2022

Approach to Develop Internal Tools

#webdev #javascript #career

Want to know how to:
✔️ Create an internal tool that solves > 10% of bugs over past 6 months,
✔️ has incredibly low adoption cost,
✔️ requires no change in workflow / user behavior,
✔️ with potential to impact the entire company?

Read on!

Inspiration
Understanding the Problem
Ideation
Solution

1. Inspiration

I was first inspired by Rahul Pandey's video on the Taro App, titled "Session #2 - Building A Meta Internal Tool To Empower An Entire Org: Staff Promotion Story".

TLDR; the video was about how he created a simple tool to extract the most critical 20% error messages from Android logs (think Pareto Optimality) to debug problems faster for the Facebook Portal team. Version 1 of his project had an quote-unquote "embarrassingly old style of UI from the last decade", and had <100 engineers using it in the team. Eventually he managed to make it more malleable to extend the product to other teams like Instagram and Whatsapp teams, with 500 active users.

His story was inspiring because many times as an Engineer, it feels daunting to try to approach internal tooling because of the uncertainty, and especially when the company is focused on building new products, FAST.

The one thing I especially loved about Rahul's story is that he spent 2 years understanding the problem. I am not a fan of promotion-driven development, and the amount of groundwork (talking to other engineers, reflecting on past experiences, etc.) felt testament to his commitment to a product that actually focused more on helping the team than seeking a promotion.

Inspired by him, here was my approach (and hopefully this inspires someone else too!):

2. Understanding the Problem

I first started by filtering JIRA bug tickets, to find those that are either:

most critical
most frequent

Given the limited time available for such exploration and the lack of a core issue for most critical bugs, I decided to work on the most frequent bugs. This issue of i18n content management is #4 on the list.

We use a 4-step process for the i18n content management. Major problems usually happen in step #2 (see red text in diagram above). The worst part is that this problem affects other areas, for example, because much mental capacity is spent on the manual validation, there is less focus / time to look at other problems.

So the problem here is, how can we make sure to tackle the 4 problems in red text in diagram above?

3. Ideation

I wanted to ensure that this does not affect the rest of the workflow, but only enhanced it. Why? Because I think the PMs have a pretty solid workflow with their Powerpoint Slides, and the Developers have a very solid Database upload functionality. To make them adjust to a new workflow wastes the established knowledge, and might create new problems.

Here were the iterations:

Scrape Google Slides directly into a CSV for upload to Database. This was no possible as they render the items in <canvas> html tags. Text could not be retrieved, because while it looks like text on screen, they are just an amalgamation of pixels.
Use Cmd + Shift + Option + P to export Google Slides into HTML file. THEN make a HTML table scraper. While the top result for (Headless) Browser HTML Scraper yielded BeautifulSoup (Python), I opted for Puppeteer (JS). This is because our Front End team all have Node.js installed, thus setup cost to use this project will be much lower. However, Google Slides will require login (can only run Puppeteer on new window, which does not have login credentials). This thus requires users to write their username and password into the HTML Scraper script, which I suspect can be uncomfortable for some.
Download the Google Slides HTML into local and run the vanilla JS script. Perfect! A mind-juggling chore now done in 5 minutes :)

4. Solution

That's the 3rd iteration from above and it looks promising!

Of course, remembering the coding interview process REACTO (I cover more about it here), after [C]ode comes:

[T]est, and
[O]ptimization.

Test

I did manual validation over 400 slides in 2 Google Slides files 😭, and picked out some issues which I fixed:

U+200B (zero-width space) can appear when scraping, which messes up the code. Trimmed them all.
JS sort works better with localeCompare than the standard array.sort((a,b) => a-b) (see StackOverflow answer)

Optimization

Shared this with a colleague so far who has given the suggestion of adding a new feature to scrape not the entire slides, but just a part of it that they need. Always interesting to hear new ideas, and I will update this section as I get more ideas after sharing.

In the meantime, if anyone wants a skeleton of the script for your own use, or have more questions in general, feel free to comment or hit me up on LinkedIn!

DEV Community