DEV Community

Victor
Victor

Posted on • Updated on

MATT AI

This is a submission for the Cloudflare AI Challenge.

What I Built

This open-source project demonstrates the possibilities with Cloudflare Workers AI in a single, seamless conversation. Additionally, for privacy reasons, everything is stored locally in the browser with no server logging or storage.

Demo

https://matt-ai.pages.dev

My Code

GitHub logo demosjarco / matt-ai

Magically All The Things AI

matt-ai

Get Started

Visit live

  1. Go to matt-ai.pages.dev

Run local

Instructions moved to wiki

CI/CD

Dependabot

Automated

Dependabot[bot] will automatically create and merge PRs for the following:

  • typescript-types group
  • code-management group





I will stop pushes to production branch at submission deadline, however work (outside of the competition) will continue in other branches.

Journey

From the start, I wanted a private (as much as possible without running the inference yourself) solution for chats. That means no server side storage or even accounts to identify people. In order to combat spam, bots, and abuse I implemented Turnstile in invisible mode (on every message send) and Llamaguard for message content.

The cornerstone of this project is TypeChat, originally developed by Microsoft's TypeScript team. I patched it to eliminate the node:fs requirement and decoupled it from OpenAI/Azure. My version on npm uses LangChain, supporting virtually any AI provider. However, for this submission, I used a further modified version that utilizes Worker AI over bindings, as LangChain runs only over HTTP REST (as of writing this), and bindings provide even better performance.

Qwik is exceptionally fast (resisting the obvious pun here). Honestly, try loading this project on cellular data with 4G/5G turned off. Despite this, due to Vite's bundling quirks, several issues arose (such as node:buffer not being externalized despite explicit configuration). As a workaround, I paired it with a worker for those specific tasks. Initially, the worker used service bindings and the hono/yoga/gql HTTP stack. It was fast, albeit cluttered. I later switched to RPC reducing latency and the bundle size by almost 90%.

I am also developing a Queues callback system using web sockets and durable objects for handling extremely rate-limited services like Browser Rendering. For more details, see the wiki.

A major future goal is to allow users to select the AI model preference before dispatch and to regenerate parts of previous messages with the same context and instructions.

There's a secret mode under development that will revolutionize AI interaction... but more on that later. However, I did leave a fun easter egg in the source code...

Multiple Models and/or Triple Task Types

When working with models, the priority is to deliver data with minimal latency, even if some decision-making processes need to occur first. To achieve this, LlamaGuard, initial text generation, and TypeChat fire off immediately. The last two are buffered and not displayed until LlamaGuard approves them. Once approved, all loaded chunks display immediately, followed by any remaining content. currently shelved due to buffering and loss of context issues. Will return at a later date.

TypeChat orchestrates the entire experience, managing everything from previous content lookup to image generation to fully autonomous internet browsing. This provides not just AI-driven responses but a complete AI-controlled experience.

Current capabilities:

  • TypeChat (@hf/mistralai/mistral-7b-instruct-v0.2)
  • Text gen (@cf/meta/llama-2-7b-chat-fp16 @hf/thebloke/llama-2-13b-chat-awq)
  • Previous message searching (not using Vector DBs, but keyword generation AND searching)
  • Web searching (thx duckduckgo - even if it's a limited version)
  • Image generation (@cf/lykon/dreamshaper-8-lcm, @cf/stabilityai/stable-diffusion-xl-base-1.0, @cf/bytedance/stable-diffusion-xl-lightning)

Eventually:

  • Web browsing (Browser Rendering API - but get in line/queue)
  • Translation (@cf/meta/m2m100-1.2b)
  • Image detection (@cf/microsoft/resnet-50, @cf/unum/uform-gen2-qwen-500m, @cf/facebook/detr-resnet-50)
  • Audio (Uploading recorded audio or live mic recording)

Top comments (0)