Hey there, fellow developers! π Today, I'm excited to share how I built Story Wizard Pro, an interactive storybook generator that combines the power of AI for story generation, text-to-speech, and image generation. This project showcases how to create an engaging web application that turns simple prompts into full-fledged illustrated stories with audio narration.
Tech Stack
Frontend Framework: Next.js with React
UI Components: shadcn/ui
Styling: Tailwind CSS
AI Services:
Google's Gemini AI for story generation
ElevenLabs API for text-to-speech
GetImg.ai for image generation
Additional Libraries:
jsPDF for PDF generation
Lucide React for icons
React Hooks for state management
Key Features
AI-powered story generation based on user prompts
Automatic illustration generation for each story page
Text-to-speech narration
Interactive page navigation
PDF and audio download capabilities
Responsive design with a modern UI
Step-by-Step Implementation Guide
- Project Setup First, create a new Next.js project with Tailwind CSS:
npx create-next-app@latest story-wizard-pro --typescript --tailwind
cd story-wizard-pro
Install required dependencies:
npm install @google/generative-ai jspdf lucide-react
npm install @radix-ui/react-dialog @radix-ui/react-slot
- UI Components Setup The application uses shadcn/ui components for a polished look. Install the core components:
npx shadcn-ui@latest init
npx shadcn-ui@latest add button card input dialog
- Core Functionality Implementation Story Generation with Gemini AI The story generation uses Google's Gemini AI model. Here's the key implementation:
const initializeChatSession = async () => {
const genAI = new GoogleGenerativeAI(process.env.NEXT_PUBLIC_GEMINI_API_KEY);
const model = genAI.getGenerativeModel({
model: "gemini-1.5-flash",
});
const generationConfig = {
temperature: 1,
topP: 0.95,
topK: 64,
maxOutputTokens: 8192,
};
const chatSession = model.startChat({
generationConfig,
safetySettings,
});
return chatSession;
};
Image Generation Integration
The application uses GetImg.ai for generating illustrations:
const generateImageForPage = async (pageContent) => {
const response = await fetch('https://api.getimg.ai/v1/flux-schnell/text-to-image', {
method: 'POST',
headers: {
'Authorization': `Bearer ${YOUR_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
prompt: pageContent.join(' '),
width: 1200,
height: 1200,
steps: 2,
output_format: 'png',
response_format: 'url',
}),
});
const data = await response.json();
return data.url;
};
Text-to-Speech Implementation
ElevenLabs API is used for generating natural-sounding narration:
const generateAudio = async (text) => {
const response = await fetch("https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM", {
method: 'POST',
headers: {
"Accept": "audio/mpeg",
"Content-Type": "application/json",
"xi-api-key": YOUR_API_KEY
},
body: JSON.stringify({
text: text,
model_id: "eleven_monolingual_v1",
voice_settings: {
stability: 0.5,
similarity_boost: 0.5
}
})
});
const blob = await response.blob();
return URL.createObjectURL(blob);
};
- User Interface Design The UI is built with a combination of Tailwind CSS and shadcn/ui components. Here's the main layout structure:
<div className="min-h-screen bg-gradient-to-b from-slate-900 via-slate-800 to-slate-900">
<NavigationBar />
<main className="container mx-auto px-4 py-8">
{/* Story Input Section */}
<div className="max-w-2xl mx-auto space-y-4 mb-12">
<Input
type="text"
value={storyType}
onChange={(e) => setStoryType(e.target.value)}
placeholder="What's your story about?"
className="w-full pl-12 pr-4 py-3"
/>
<Button onClick={generateStory}>
Generate Story
</Button>
</div>
{/* Story Display Section */}
<Card className="bg-slate-800/50 border-slate-700">
{/* Navigation Controls */}
{/* Story Content */}
{/* Audio Controls */}
</Card>
</main>
</div>
- PDF Generation The PDF download feature uses jsPDF:
const downloadPDF = () => {
const pdf = new jsPDF();
let y = 20;
// Add title
pdf.setFont("helvetica", "bold");
pdf.setFontSize(16);
pdf.text(`A Story About ${storyType}`, 105, y, { align: "center" });
// Add content
storyPages.forEach((page, index) => {
if (pageImages[index]) {
pdf.addImage(pageImages[index], 'JPEG', 20, y, 170, 100);
}
// Add text content
page.forEach(paragraph => {
const lines = pdf.splitTextToSize(paragraph, 170);
lines.forEach(line => {
pdf.text(line, 20, y);
y += 7;
});
});
});
pdf.save("storybook.pdf");
};
Conclusion
Building Story Wizard Pro was an exciting journey into combining multiple AI services into a cohesive web application. The project demonstrates how modern web technologies can be used to create engaging, interactive experiences.
Top comments (0)