A small introduction
You are probably tired of me saying how bad the internet connection is overall in Cuba, so I'll skip a bit that part today :D
A few years ago I was working with a terrible internet connection. I could not have better and loading pages on StackOverflow took minutes. Can you imagine? I was so frustrated and accomplishing so little at the time that I decided to fix the problem the way I know how to: programming a solution.
Here goes the testing
When I was in college I used to use iptables to control my internet traffic. That way I could decide what I wanted to consume and reduce my internet consumption to save the scarce amount of megabytes (100) I was getting every month. Yes, I'm NOT talking about speed. That's the actual amount of megabytes I could use every month.
So my first idea was to reuse my old iptables script with lots and lots of blocked addresses. But this eventually became problematic. I needed to again start to update my lists with new addresses and deprecate some that didn't even exist anymore. It was also not portable, so I could not give it to my friends using Windows. I scratched that one really fast from the list.
My second thought was to make a proxy to do the whole thing for me. I could distribute it or share it with my friends without problems but again updating it would be problematic so I again decided not to go that way.
At the time, a lot of my friends were also using these web proxies to bypass blocked sites so I had the idea of making something like that but to strip the webs of everything but text (and some styling because reading code in plaintext is a natural cause for headaches).
The product
I'm a lazy person :-) I know I should not be proud of it, but somehow I am. Being lazy makes you work extra to find a way to do things with less effort. In my case, I just realized that If I used a library to turn HTML into Markdown and then back to HTML, the result was tremendously clean. I just needed to then remove the images and trick the links so they will open inside the same site. I added a bit of compression and that's it.
No styles, no images, no Javascript. Just plain text. You can find it hosted on https://txtmdweb.herokuapp.com/
You introduce the address that you want to "textify" and hit the "Text it!" button. The result will be something like this:
You can also do Google searches with it, but it is not particularly good at it :')
I used Django (1.11 at the time) to write the project and the sources can be found here
Disadvantages
- I only implemented getting sites. No further interactions are permitted, so no login or cookies or anything but clear text which is publically available.
- No Javascript. Sadly, a lot of websites today are mostly Javascript. If the website you are trying to convert to text is using Javascript to load its content, this is not going to work.
Conclusion
I hope you liked the story and if you are in need of using such a tool (I really hope you are not), well now you know it exists.
Want to make it better? PRs are open ;-) or just go open an issue.
And that's all. If you liked the project, go hit that star button on GitHub which makes open-source developers like me very happy.
Top comments (1)
Another solution for the same problem!!! Thanks for sharing!