So... I neglected to write last week because, honestly, not much happened. It was just some development on the secret project, and I can't really detail what that was about so I figured it was a waste of time to write a post. Regardless, it's kind of on pause now until we find a proper API for what we want, so there's that.
Last week's major theme was fine-tuning. The Friday prior to last week, I filled in with the product team how to start filling out the generated conversations that I generated for them through basically the same script I had from Lawgoat's time. So they were on that for a couple days.
In the meantime, I tried fiddling around with RAG because the backend engineer was having issues with the bot not actually using our real prompt. It would just use the default GPT-4 with pulling up random info from the books. It wasn't really documented, so I spent a lot of time trying to figure out what was even going on.
At some point, I had to give up because the fine-tuning was looking to be most important right now. We reckoned that RAG wasn't really applicable to our use case right now anyway. Someday down the line I'd like to rewrite it all from scratch because I'm not entirely sure our backend engineer actually understands how Langchain works.
Anyway, so we transitioned back to focusing on fine-tuning, so I spent a lot of Thursday and most of Friday just grooming out the finer details of the data to make sure it was all ready. Thursday, we also had a product team call talking about v0 and what needs to be included in it. The most notable thing to come from that was that we could implement few-shot to generate the second part of the process.
I noticed that one JSON of our FT data didn't have enough examples so I took that idea of few-shot and rendered it into my generator script, and it produced significantly higher-quality examples. I had the team fix that up, though, and by Friday I was good to start writing a bunch of post-processing scripts to make things a lot easier.
Once everything was post-processed and checked for grammar/awkwardness/spelling/punctuation, I finally threw it into the fine-tuning interface on OpenAI's website. Ten minutes later, we got our new model (took way faster than I expected).
It definitely was better in terms of voice, but it might still need some work depending on our testing. But either way, it's definitely a step in the right direction.
Hopefully this trend continues. Anyway, until next time, cheers.
Top comments (0)