Ranjan Dailata

Posted on Apr 14

Conversational Intelligence Miner

#cloudflarechallenge #devchallenge #ai #serverless

This is a submission for the Cloudflare AI Challenge.

What I Built

The Conversational Intelligence Miner, a specialized AI based solution designed for performing various data mining tasks on the pre-recorded conversations. At the moment, this solution is targeting the YouTube video's. However, one could easily tweak to accommodate or handle the recorded conversations in any format.

The following are the list of actions one could make as part of the Conversational Intelligence Miner.

Get Transcript: Retrieve the textual content of a video or audio recording.
Get Summary: Generate a concise overview or synopsis of a document or content.
Get Keywords: Extract important or relevant terms or phrases from a piece of text.
Get Topics: Identify and categorize the main themes or subjects discussed in a text or conversation.
Get Action Items: Extract actionable tasks or to-do items from a document or meeting notes.
Get Sentiments: Analyze the emotional tone or sentiment expressed in text, typically as positive, negative, or neutral.
Get Recommendations: Provide suggestions or advice based on user preferences or past behavior.
Get Trends: Identify patterns or developments over time, often in data or user behavior.
Get Aspects: Extract specific features, attributes, or elements from the transcript.
Get Banner: Generate a cover image or visual representation, typically for promotional or advertising purposes.

This Conversational Intelligence Data Miner will enable everyone to have a deep understanding on the specific content because of the above-mentioned modules or features, which will helps the humans to easily perform the required operations and get the relevant insights in no time.

Architecture

Demo

Conversational Intelligence Miner Demo

First and Foremost, you will have to click on the "Transcript" to get the YouTube video transcript. After that, all other product features like Keywords, Aspects, Trends, Topics, Recommendations etc. will get enabled.

Get Transcripts

Get Action Items

Get Aspects

Get Keywords

Get Recommendations

Get Sentiments

Get Summary

Get Topics

Get Trends

Get Banner

My Code

conversation-intelligence-miner-source

Journey

The overall Cloudflare AI Journey was amazing. I am really proud of building the "Serverless" Worker AI product on Cloudflare. Thanks to the folks who created an easy-to-use platform leveraging a ton of Large Language Models (LLMs). The most interesting thing which I have learned is the mechanism of interfacing with the open source LLMs with ease. Whether it could be Hugging Face models or other open source hosted LLMs which Cloudflare AI provided helped me in developing the LLM based product in no time. Especially, the ease to integrate and experiment with multiple models is what really helps the development community to experiments or experience with a variety of models.

Some Challenges

Tried the Summarization model @cf/facebook/bart-large-cnn, However, it's not effective in producing the right result. More over, the input max length of 1024 is the biggest issue.
A Majority of the models were having the max context window of 4096, However when dealing with the reasonable transcripts in general, we need a really SOLID LLM model which can support a max context window, at least 200k.

Behind the Scenes

Wondering how to fetch the YouTube Transcript?
https://youtubetranscript.com is being utilized for this demo. You can check the repo → youtube-transcript.ts
How the LLM API calls are made?
The LLM model integrations are done via the Restful API approach. More info, read here - workers-ai-restapi
What is Get Banner?
Banner is the terminology which was utilized for generating the state-of-the-art summary based textual data to branding image which one could utilize for cover page or marketing purposes.
How Banner Images are created?
The underlying model which was used for generating the image is "stable-diffusion-xl-base-1.0". However, the input is not the direct transcript, but instead it's a summary of the transcript.

Here's what I hope to do to next -

Build a full-fledged product with registration, login etc.
An ability for the end users to upload the media and then do the data mining against it.
Perform some more analytics or data analysis or mining.
Integrate with the BI Reports.
Provide multiple user role based dashboards targeting various users of the product.
Write unit/integration tests.
Dealing with the reasonable or massive transcript requires an LLM with the context window of 200k or more. At least for the summarization and other key features of this product. More R&D needs to be done for effectively handling the product features and also taking into account of the LLM performance or limitations.

Multiple Models and/or Triple Task Types

This product is utilizing multiple Cloudflare AI models.

llama-2-7b-chat-fp16 - Full precision (fp16) generative text model with 7 billion parameters from Meta. It's utilized for the "Summarization" purpose.
llama-2-7b-chat-int8 Quantized (int8) generative text model with 7 billion parameters from Meta. A majority of the data miners ex: Keywords, Aspects, Trends, Topics, Recommendations etc. are utilizing this model.
m2m100-1.2b Multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. It's being used for the language translation purposes. The transcript translation is being done by utilizing this model.
stable-diffusion-xl-base-1.0 Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts. This model is being used as part of the "Banner" generation. The initial text or content is being created by the concept of transcript summarization. Later, the summary is being fed to this model for the generation of banner image.

References

This product wouldn't have completed without referring to the publically available resources.

DEV Community