Made AI-Powered Interactive Storybook Generator with Next.js, Gemini and Elevenlabs ️‍🔥

#elevenlabs #storybook #nextjs #ai

Hey there, fellow developers! 👋 Today, I'm excited to share how I built Story Wizard Pro, an interactive storybook generator that combines the power of AI for story generation, text-to-speech, and image generation. This project showcases how to create an engaging web application that turns simple prompts into full-fledged illustrated stories with audio narration.

Tech Stack

Frontend Framework: Next.js with React
UI Components: shadcn/ui
Styling: Tailwind CSS
AI Services:

Google's Gemini AI for story generation
ElevenLabs API for text-to-speech
GetImg.ai for image generation

Additional Libraries:
jsPDF for PDF generation
Lucide React for icons
React Hooks for state management

Key Features
AI-powered story generation based on user prompts
Automatic illustration generation for each story page
Text-to-speech narration
Interactive page navigation
PDF and audio download capabilities
Responsive design with a modern UI

Step-by-Step Implementation Guide

Project Setup First, create a new Next.js project with Tailwind CSS:

npx create-next-app@latest story-wizard-pro --typescript --tailwind cd story-wizard-pro

Install required dependencies:
npm install @google/generative-ai jspdf lucide-react npm install @radix-ui/react-dialog @radix-ui/react-slot

UI Components Setup The application uses shadcn/ui components for a polished look. Install the core components:

npx shadcn-ui@latest init npx shadcn-ui@latest add button card input dialog

Core Functionality Implementation Story Generation with Gemini AI The story generation uses Google's Gemini AI model. Here's the key implementation:

const initializeChatSession = async () => {
  const genAI = new GoogleGenerativeAI(process.env.NEXT_PUBLIC_GEMINI_API_KEY);

  const model = genAI.getGenerativeModel({
    model: "gemini-1.5-flash",
  });

  const generationConfig = {
    temperature: 1,
    topP: 0.95,
    topK: 64,
    maxOutputTokens: 8192,
  };

  const chatSession = model.startChat({
    generationConfig,
    safetySettings,
  });

  return chatSession;
};

Image Generation Integration
The application uses GetImg.ai for generating illustrations:

const generateImageForPage = async (pageContent) => {
  const response = await fetch('https://api.getimg.ai/v1/flux-schnell/text-to-image', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${YOUR_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      prompt: pageContent.join(' '),
      width: 1200,
      height: 1200,
      steps: 2,
      output_format: 'png',
      response_format: 'url',
    }),
  });

  const data = await response.json();
  return data.url;
};

Text-to-Speech Implementation
ElevenLabs API is used for generating natural-sounding narration:

const generateAudio = async (text) => {
  const response = await fetch("https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM", {
    method: 'POST',
    headers: {
      "Accept": "audio/mpeg",
      "Content-Type": "application/json",
      "xi-api-key": YOUR_API_KEY
    },
    body: JSON.stringify({
      text: text,
      model_id: "eleven_monolingual_v1",
      voice_settings: {
        stability: 0.5,
        similarity_boost: 0.5
      }
    })
  });

  const blob = await response.blob();
  return URL.createObjectURL(blob);
};

User Interface Design The UI is built with a combination of Tailwind CSS and shadcn/ui components. Here's the main layout structure:

<div className="min-h-screen bg-gradient-to-b from-slate-900 via-slate-800 to-slate-900">
  <NavigationBar />
  <main className="container mx-auto px-4 py-8">
    {/* Story Input Section */}
    <div className="max-w-2xl mx-auto space-y-4 mb-12">
      <Input
        type="text"
        value={storyType}
        onChange={(e) => setStoryType(e.target.value)}
        placeholder="What's your story about?"
        className="w-full pl-12 pr-4 py-3"
      />
      <Button onClick={generateStory}>
        Generate Story
      </Button>
    </div>

    {/* Story Display Section */}
    <Card className="bg-slate-800/50 border-slate-700">
      {/* Navigation Controls */}
      {/* Story Content */}
      {/* Audio Controls */}
    </Card>
  </main>
</div>

PDF Generation The PDF download feature uses jsPDF:

const downloadPDF = () => {
  const pdf = new jsPDF();
  let y = 20;

  // Add title
  pdf.setFont("helvetica", "bold");
  pdf.setFontSize(16);
  pdf.text(`A Story About ${storyType}`, 105, y, { align: "center" });

  // Add content
  storyPages.forEach((page, index) => {
    if (pageImages[index]) {
      pdf.addImage(pageImages[index], 'JPEG', 20, y, 170, 100);
    }
    // Add text content
    page.forEach(paragraph => {
      const lines = pdf.splitTextToSize(paragraph, 170);
      lines.forEach(line => {
        pdf.text(line, 20, y);
        y += 7;
      });
    });
  });

  pdf.save("storybook.pdf");
};

Conclusion
Building Story Wizard Pro was an exciting journey into combining multiple AI services into a cohesive web application. The project demonstrates how modern web technologies can be used to create engaging, interactive experiences.

DEV Community

Made AI-Powered Interactive Storybook Generator with Next.js, Gemini and Elevenlabs ️‍🔥

Top comments (0)

Read next

AI Language Models Show Strange "Hyperfitting" Effect When Fine-Tuned for Precision

New Context Pruning Method Cuts AI Memory Use by 50% While Maintaining Accuracy

AI Models Still Fail Basic Physics Tests, New Benchmark Shows 18.4% Improvement Possible

Crew.ai vs Langgraph: A Comprehensive Comparison of AI Agent Frameworks