DEV Community

Cover image for Create your own ChatGPT by scraping your website
Thomas Hansen
Thomas Hansen

Posted on • Updated on

Create your own ChatGPT by scraping your website

I've just released a new version of Magic. However, I think an image says more than a thousand words, so let me show you a screenshot from it.

Generate a ChatGPT bot by scraping your website

It's able to scrape our documentation website in some 2 to 3 minutes, at which point we've got 690 training snippets for creating our own ChatGPT fine-tuning model, serving as a "domain expert" on whatever subject our website has knowledge about. For me personally, this increases the value of OpenAI and ChatGPT by at least some 10x to 100x, since it allows me to create a ChatGPT based bot that answers questions exactly how I want it to answer questions. Basically giving me my own "expert AI brain" for whatever tasks I happen to need it to help me with. I've listed some example use case below.

  • AI based documentation for your software project
  • Indian law expert system based upon legal data from your government
  • AI based secretary answering questions related to for instance coal mines in South Africa
  • Medical expert with expertise in cardiology
  • Support chatbot answering questions related to your company
  • Etc, etc, etc - I suspect only your imagination sets the limits here.

The thing is entirely Open Source, and I would appreciate a star on our GitHub profile. If you're too lazy to download the thing, you can also signup at Aista.com, and have your own cloudlet in some 30 seconds. The latter is what I am using in the following YouTube video where I am illustrating how to use it and its features.

You trigger the thing by supplying it with the root domain to some website, at which point it will scrape your website in some few minutes, creating thousands of "training snippets" for you, which you can later submit to OpenAI's API to create your own fine-tuned AI model. Below is an example of how it answered a couple of questions related to my own company after having trained it on our documentation, which is a website with some 100 pages, with fairly good structure, and medium quality data.

How ChatGPT performs when fine-tuned

The above questions of course would give me 100% perfectly irrelevant answers if I asked ChatGPT directly, without training my own model first. Notice, the above HTML chat interface was 99% automatically generated by ChatGPT and its "generic version". Maybe somebody out there wants to create a version of it that generates beautiful designs. The generic version of ChatGPT is quite frankly terrible at HTML designs, and tends to mess up margins, CSS, etc ... :/

All in all, the process took me 2 hours, however for 90 minutes I was waiting for my training session to go through, due to an insane amount of interest in ChatGPT these days, resulting in that OpenAI's API sometimes literally is down due to too much pressure. To play around with it for yourselves, the easiest is to register at our website, create a cloudlet, and click the Manage/Machine Learning menu item once you're inside your cloudlet.

Every single time I have heard somebody say something negative about ChatGPT, it's always been a variation of the same; "It didn't solve my problem" - Well, my reply is because you haven't taught it about your problem. With Aista Magic Cloud, you can now teach it how to solve your problem in some 5 minutes, by simply pointing it at some website that has knowledge about your problem domain, and 1 hour later it's an expert in your domain. Regardless of what your domain happens to be - And yes, before you ask, scraping websites is 100% perfectly legal :)

Oldest comments (2)

Collapse
 
_theblackdev profile image
Malik Warren

Hey @polterguy thanks for sharing this product! Potentially looking to use the product as a solution for a school system and wanted to workup a demo to see if it would be feasible.

I ran into 2 issues:

1) While following the chatbot wizard, I entered each of the API keys/secrets and when trying to hit any URL and a flavor I received a gray start button. Please see screenshot below:
Image description

2) The second error I encountered after going through the Machine Learning documentation. After plugging in my OPENAI token and clicking "import" I received an error "domain is not valid" causing the CRAWL feature to not work. I've tried going to your domain "ainiro.io" as well and have attached a screenshot of the error

Image description

Would love to get your assistance on resolving these issues!
Thank you!

Collapse
 
polterguy profile image
Thomas Hansen

Hi Malik, I've parted with AISTA, and as far as I know, they haven't continued development of the Magic platform. I might be wrong though, so don't take my word for it.

We are however actively maintaining the platform, and we've added tons of features since we parted. One of those features being that you don't need to add the scheme in front of the domain, in neither the wizard nor the crawler.

Hint; Add "https://" in front of your domain ...