DEV Community

Cover image for Create your own ChatGPT by scraping your website
Thomas Hansen
Thomas Hansen

Posted on • Updated on

Create your own ChatGPT by scraping your website

I've just released a new version of Magic. However, I think an image says more than a thousand words, so let me show you a screenshot from it.

Generate a ChatGPT bot by scraping your website

It's able to scrape our documentation website in some 2 to 3 minutes, at which point we've got 690 training snippets for creating our own ChatGPT fine-tuning model, serving as a "domain expert" on whatever subject our website has knowledge about. For me personally, this increases the value of OpenAI and ChatGPT by at least some 10x to 100x, since it allows me to create a ChatGPT based bot that answers questions exactly how I want it to answer questions. Basically giving me my own "expert AI brain" for whatever tasks I happen to need it to help me with. I've listed some example use case below.

  • AI based documentation for your software project
  • Indian law expert system based upon legal data from your government
  • AI based secretary answering questions related to for instance coal mines in South Africa
  • Medical expert with expertise in cardiology
  • Support chatbot answering questions related to your company
  • Etc, etc, etc - I suspect only your imagination sets the limits here.

The thing is entirely Open Source, and I would appreciate a star on our GitHub profile. If you're too lazy to download the thing, you can also signup at Aista.com, and have your own cloudlet in some 30 seconds. The latter is what I am using in the following YouTube video where I am illustrating how to use it and its features.

You trigger the thing by supplying it with the root domain to some website, at which point it will scrape your website in some few minutes, creating thousands of "training snippets" for you, which you can later submit to OpenAI's API to create your own fine-tuned AI model. Below is an example of how it answered a couple of questions related to my own company after having trained it on our documentation, which is a website with some 100 pages, with fairly good structure, and medium quality data.

How ChatGPT performs when fine-tuned

The above questions of course would give me 100% perfectly irrelevant answers if I asked ChatGPT directly, without training my own model first. Notice, the above HTML chat interface was 99% automatically generated by ChatGPT and its "generic version". Maybe somebody out there wants to create a version of it that generates beautiful designs. The generic version of ChatGPT is quite frankly terrible at HTML designs, and tends to mess up margins, CSS, etc ... :/

All in all, the process took me 2 hours, however for 90 minutes I was waiting for my training session to go through, due to an insane amount of interest in ChatGPT these days, resulting in that OpenAI's API sometimes literally is down due to too much pressure. To play around with it for yourselves, the easiest is to register at our website, create a cloudlet, and click the Manage/Machine Learning menu item once you're inside your cloudlet.

Every single time I have heard somebody say something negative about ChatGPT, it's always been a variation of the same; "It didn't solve my problem" - Well, my reply is because you haven't taught it about your problem. With Aista Magic Cloud, you can now teach it how to solve your problem in some 5 minutes, by simply pointing it at some website that has knowledge about your problem domain, and 1 hour later it's an expert in your domain. Regardless of what your domain happens to be - And yes, before you ask, scraping websites is 100% perfectly legal :)

Top comments (0)