Disclaimer 1: This article was supposed to be a book, but ChatGPT quota is against it. If you didn’t get the joke, it’s probably the API latency.
Disclaimer 2: Johan Guterman has fully reviewed, edited and approved this piece. Any copyright breach claims are preposterous, but will be tolerated.
Our application development journey started with a bit of a lazy for Johan and my growing fascination for automation. We both started using Fuji X cameras around the same time and very shortly after were fully charmed by the magic of JPEG film simulation recipes.
“Film Simulation modes are part of what make X Series cameras so special” fujifilm-x.com
Over time, our enthusiasm grew, and we found ourselves wanting something more tailored. Say, I watch some series or a movie and I enjoy the picture so much that I would like to take my photos with that same color effect that I see on the screen. More importantly, I want to get the result before post-production in the dedicated software.
Replicating the cinematic quality of film or digital post-processing in a JPEG recipe is an ambitious challenge, often requiring extensive effort. It requires certain experience, and for beginners it’s a lot of work even in the specialized apps like Capture One or Lightroom (although it gets easier each year with the new or improved tooling).
Despite the challenge, we can (and will) give it a try.
Sure, the results won’t be exactly the same, but:
- we can get a unique JPEG recipe named after the series or a movie
- it is time we put up AI to the challenge
- we can always tune the JPEG-s in the post-processing with a preset
Don’t Blink
“It’s quite simple, really” — Johan explains — “I get to the website, upload some pictures, set my camera features, wait couple of seconds and BAM — get a JPEG recipe similar to Fuji X Weekly
💭 Hm, so we need a form, collection of cameras parameters and some basic communication with AI.
“That it?” — I ask.
“Well, It’d be also great to share the link with other photo enthusiasts! So when they get to the page…”
“Yup, let’s pause here for a second!”💭 We’ll also need a database to store and fetch recipe params and an image storage, apparently.
“Okay, certainly doable. You were saying?”
“Right, so other people could remix existing recipes!”
“Remix? How does that work?”
“You just click a button…”
“…a-ha and it just works, sure”💭 Raising AI model temperature and tweaking some other params with the same input data should work.
“Aa-and a nice landing page, of course!”
“Naturally”
“In the Fuji-retro-style, with grids and splashy gradients…”
“We’ll see”
“Oh, and I can write testimonials!”
“Yep… wait, is this all about promoting your Instagram?!”
“Alter… komm schon”
The Implementation — Snap-Snap
Hey there,ChatGPT
To kick off, let’s prototype using a tool we already have at our disposal: ChatGPT.
The very first attempt was to find out the feasibility of the whole affair. To my genuine pleasure, it worked just fine! ChatGPT understood the requirements, correctly processed the images and output a decent starting point for the JPEG recipe… after several back-and-forth-s.
Let’s visualize this:
If we dig a bit deeper and extrapolate this to the application UX the process is not exactly that smooth. In our app we’ll be making a single request and getting a single response.
What we’re looking forward to is something like this:
To get the better idea, just compare a casual conversation (fast response, short messages, information builds on top of the previous data) to an old-school physical letter (slow delivery, long sentences, all details covered to a single detail in one message).
Let’s get to it!
Hello, Open AI API
We reached the important point of designing the request.
Since this is a huge topic on its own, I would only mention a few crucial details.
We evolved from the ChatGPT dialogue to using Open AI API.
API request for the needs of Fuji X Studio is composed of text and image data. Images are passed in the base64 format, which is rather standard.
Text, however is a combination of several information bits. Remember, it’s a snail-mail-physical letter that should not fail. And when it does, failure should be predictable.
Text (aka instructions) consists of:
- introduction and the point of request
- camera details from the form data, composed in a certain way
- JPEG recipe templates for different generations of camera sensors
- templates for the output for every parameter
- response template
- error response templates
- instructions for images edge cases
It’s not an exaggeration to note that the AI instructions is the most edited file in the project. Fine-tuning of the text part for request took somewhat the same time as for creating the whole UI for the app.
The full instruction takes around 200 lines and about 7000 characters, resulting in roughly 1800 tokens for every request. Adding images tokens (I admit, we should’ve started with images) to that number will give us around 3000–4000 tokens per request on average. Output tokens volume is somewhat negligible, we limit it to around 250 per recipe.
“Is that much?” — Johan chimes in to check the results.
“Well, it depends. For personal occasional usage it’s pretty much affordable, but for our free app we don’t want to get a hefty tokens bill out of a sudden”
“How hefty?”
“Any unpredictable credit is a risk, especially for a public app”
“We can add a «buy me a coffee» button…”
“Or we can try recently released Gemini AI, it does have vision capabilities and it’s got a free tier. The only catch that it’s not available in EU yet”
“So, about that coffee button…”
“Although, access from EU is not an issue with the cloud functions. But we’ll need another round of testing.”
“I guess I’ll go take some photos then. And a coffee maybe.”
Pivot #1
Looking for alternative to Open AI’s token costs and release of Gemini AI aligned perfectly and resulted in the first turning point in development. In retrospect we can compare not only the price but also the output quality, which may be especially handy for other photo/development enthusiasts.
So how and what are we actually comparing?
Same conditions as before, text request is around 1800 tokens plus two high-res images on average interaction. However for different operators images cost varies.
🤖 Open AI
- model: gpt-4-vision-preview (currently several cheaper models are available, i.e. gpt-4o)
- image tokens calculation: “A 2048 x 4096 image in detail: high mode costs 1105 tokens”
- tokens per user session: (1800 + 2 images x 1100) x 3 retries = 12000 tokens
- tokens cost: $10/1M tokens ($2.5/1M for modern models)
- user session cost: $0.12 💰 ($0.03 for modern models)
🤖 Gemini AI
- model: gemini-pro-vision, later gemini-1.5-flash
- image tokens calculation: “During processing, the Gemini API considers images to be a fixed size, so they consume a fixed number of tokens (currently 258 tokens), regardless of their display or file size.”
- tokens per user session: (1800 + 2 images x 258) x 3 retries = 6900
- free tokens — 1M tokens per minute and 1500 requests per day
- tokens cost: $0.075/1M tokens
- user session cost: free, then $0.0005 💰
And what about the output quality?
Open AI (gpt-4-vision-preview) seems to be having more proficiency in vision capabilities and in the photo specific area and produces more intricate and detailed output.
Gemini AI (gemini-1.5-flash) is more on the conservative side. It takes more effort, instructions and templates with examples to prompt out the desired effect. But it’s almost 60x times (yes, 6000%) cheaper!
It’s also worth mentioning that upgrading from gemini-pro-vision to gemini-1.5-flash resulted in around 50% cut on response latency. Great improvement!
So, you see, it’s tricky. On one hand, you pay for better output result, spending fewer tokens. On the other hand, spending twice as much tokens and prompt development time for instructions, but it will be nearly free (depending on the volume of course). For personal usage I would still use Open AI for the same or comparable task.
Finally, let’s briefly compare the JS clients and developer experience
The following snippets contain simplified excerpts of actual app code. Note that imageData is identical for both APIs to accommodate a convenient switch from one to another:
type ImageDataPart = {
url: string;
type: string;
};
type ImageData = ImageDataPart[];
const imageData = [
{ url: '<base64 data>', type: '<image type>' }
]
Open AI
To work with the API you need an API key and have openai installed.
import OpenAI from 'openai';
import { ChatCompletionContentPartImage } from 'openai/resources/index.mjs';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const imageMessages: ChatCompletionContentPartImage[] = imageData.map(
({ url }) => ({
type: 'image_url',
image_url: { url, detail: 'high' },
}),
);
async function main() {
let result;
try {
result = await openai.chat.completions.create({
model: 'gpt-4-vision-preview',
messages: [
{
role: 'system',
content: 'Generate a JPEG film simulation recipe ...',
},
{
role: 'user',
content: [...imageMessages],
},
],
temperature: 0.05,
max_tokens: 256,
frequency_penalty: 0,
presence_penalty: 0,
});
} catch (e) {
throw new Error('Recipe cannot be generated, service error');
}
return result?.choices[0].message.content;
}
main();
Gemini AI
Similarly, get the API key and install @google/generative-ai module to start working.
import {
GoogleGenerativeAI,
InlineDataPart,
Content,
} from '@google/generative-ai';
const genAi = new GoogleGenerativeAI(process.env.GEMINI_API_KEY as string);
const model = genAi.getGenerativeModel({ model: 'gemini-1.5-flash' });
const imageMessages: InlineDataPart[] = imageData.map(({ url, type }) => ({
inlineData: {
mimeType: type,
data: url.split(',')[1], // only the base64-encoded image data
},
}));
let result;
async function main() {
try {
result = await model.generateContent({
contents: [
{
role: 'user',
// it is recommended by Gemini docs to place image data
// before text when it's one image
// make sure to adjust the code accordingly for you needs
parts: [...imageMessages, { text: 'Generate a JPEG film simulation recipe ...' }],
},
],
generationConfig: {
temperature: 0.65,
topK: 32,
topP: 1,
maxOutputTokens: 256, // correct, output is super humble
},
});
} catch (e) {
throw new Error('Recipe cannot be generated, service error');
}
return result?.response.text(),
}
main();
Strictly speaking, there are no significant differences in the clients. All nuances can be figured out in the docs or/and Github issues. Subjectively, Open AI documentation seems to be more comprehensive and practically better organized. On the other hand, Gemini AI docs are regularly updated and have significantly improved compared to the rather humble release stage.
In addition, it’s worth mentioning one important aspect, such as moderation. When dealing with user images, especially the ones that are meant to be public, especially when the service is paid, it’s crucial to setup a safety net. Both Open AI and Gemini AI have dedicated capabilities. They have a slightly different approach to the setup, but conceptually are quite similar. Explore the docs to learn more.
The Implementation — Development
It’s Alive!
Following a 180 turnround with the AI model and meticulous fine-tuning of instructions we are able to get some consistent recipe outputs. At this point our app looks like this:
The next step was implementing services to save and share recipes effectively. Practically speaking, we need a simple yet efficient database and an image storage.
The backbone of our app is built on Next.js. To manage the database we chose Vercel KV, a Redis-based solution that fits our needs just great. The implementation is not so different from the docs, but just for posterity, let’s have a look at writing and retrieving saved data.
When we get a proper result from AI, we need to save it, along with the form- and meta-data. Later when we retrieve the recipe all we need is an ID. Note that every generation produces a unique ID, hence a new, unique recipe.
To write (save) the data:
import { kv } from '@vercel/kv';
import { getId } from '@/utils/getId';
// unique ID is generated
const id = getId();
try {
await kv.hset(id, {
camera: '<camera settings>',
proc: '<sensor settings>',
film: '<film settings>',
light: '<light settings>',
recipe: '<recipe data as returned by AI>',
timestampMs: '<latency, ms>',
});
} catch (e) {
if (e instanceof Error) {
let message = e.message;
return new Response(message, { status: 500 });
}
}
To read the recipe data we need to know it’s id
.
We can get it from params
and generate 404 when it’s not located:
import { notFound } from 'next/navigation';
import { kv } from '@vercel/kv';
const Page = async ({ params }) => {
const { id } = params;
const recipeData = await kv.hgetall(id);
if (!recipeData) {
notFound();
}
// return <RecipePage data={recipeData} />
}
export default Page;
And it’s done!
Let’s take care of the images in the next step.
Pivot #2
Vercel Blob was the very first service that we utilized, and it lasted around 2 days of not that really intense testing. And of course it’s not the question of reliability or anything else. We simply getting back again at the question of pricing.
At first glance, the 250MB quota seemed sufficient, right?
Not sure about you, but spoiled by other services pricing I was convinced it’s the limit per day. So when during testing responses started err on the server without any special premise, I discovered the huge red glaring issue in my Vercel storage. It said that the MONTHLY limit was exhausted already.
Going with a plan B. After some quick research AWS S3 emerged as the ideal alternative, despite its more involved setup process, including account and project setting. And if that’s not enough fun, you’ll also get a change to know the one and only AWS SDK. Who needs SST anyway…
You never asked, but I don’t want to be left alone with this.
Here’s how we can upload the images using @aws-sdk (extra code skipped for brevity). The crucial piece is the id
, which we figured out previously:
import { S3Client, S3ClientConfigType, PutObjectCommand } from '@aws-sdk/client-s3';
// proprietary util
import { stringToBlob } from '@/utils/stringToBlob';
const s3Config: S3ClientConfigType = {
region: S3_REGION,
credentials: {
accessKeyId: process.env.AWS_API_KEY as string,
secretAccessKey: process.env.AWS_API_SECRET as string,
},
};
const s3Client = new S3Client(s3Config);
const blobPromises = imageData.map(async ({ url }, idx) => {
const pathname = `${id}/${idx}`;
const [blob, contentType] = stringToBlob(url);
const putObject = new PutObjectCommand({
Bucket: process.env.AWS_S3_BUCKET,
Key: pathname,
Body: blob,
ContentType: contentType,
});
return await s3Client.send(putObject);
});
await Promise.all(blobPromises);
Similarly, to retrieve the uploaded images we only need the id
:
import { S3Client, S3ClientConfigType, ListObjectsV2Command } from '@aws-sdk/client-s3';
const s3Config: S3ClientConfigType = {
region: S3_REGION,
credentials: {
accessKeyId: process.env.AWS_API_KEY as string,
secretAccessKey: process.env.AWS_API_SECRET as string,
},
};
const s3Client = new S3Client(s3Config);
const listObjects = new ListObjectsV2Command({
Bucket: S3_BUCKET,
Prefix: `${id}/`,
MaxKeys: 5, // Fuji X Studio images limit
});
const { Contents } = await s3Client.send(listObjects);
if (Contents?.length) {
imageUrls = Contents.filter(({ Size }) => !!Size).map(
({ Key }) => `https://${S3_BUCKET}.s3.${S3_REGION}.amazonaws.com/${Key}`
);
}
Now our app starts to get closer to it’s final state:
In the end, how does S3 compare to Blob?
For our use case, the extended testing time alone made AWS S3 at least 60 times more effective. And we are still counting.
“So after two 180 pivots, does it mean we get back to the starting point?”
“Only if you haven’t watched «Rick and Morty»"
“Yeah, I don’t really watch cartoons”
“Believe me, you do in a parallel universe, Johan”
The Implementation — Post Processing
At this point, the app is functioning, but somewhat not complete. Let’s have a brief look on couple of other nice-to-have features.
Kill-Switch
A simple kill-switch mechanism is needed to temporarily disable the service while leaving essential pages and metadata accessible. This can be handy for several reasons, especially when one of our crucial services is no longer available.
The maintenance kill-switch in our case is implemented via an env variable that disables main features on the client and prevents unsolicited requests on the server.
const isMaintenance = process.env.FLAG_MAINTENANCE === SETTING_TRUE;
if (isMaintenance) {
return new Response(
'Sorry, the service is not available at the moment',
{
status: 429,
},
);
}
Server handling is the last resort in this case, so we coupled it with the rate limit checks, hence the 429 status.
There is also a noticeable message in UI informing of the ongoing work on the website:
Reporting
After a while Johan comes up with a new idea — the feedback loop for the recipes — to gather user insights and refine AI outputs. It makes a lot of sense, since we need more field data in order to improve our instructions. Furthermore, it’s sets up the stage for the new features, something like recipes library.
Fortunately, it’s also additive to our codebase.
Reports page will be located in the new admin area of the app. The most important piece of information is the recipe output. It needs to be easily accessible along with the input data and report information. We can browse reports and mark them as read or completed.
The reporting backend relies on CRUD operations using Next.js actions. The only tricky task to solve is how to identify and store the reports.
Since we don’t really want to use another DB or overhaul existing code, the simple solution is to create a new KV record based on the recipe ID. It naturally can be scaled further to accommodate more than 1 report per recipe, yet for our practical debugging needs it’s more than sufficient.
import { kv } from '@vercel/kv';
export const makeKVReportKey = (id: string) => {
return `${id}-report`;
};
const getKVReportKeys = async () => {
return await kv.keys('*report');
};
// create report
export async function addReportAction(newReport: ReportData) {
const reportKey = makeKVReportKey(newReport.id);
try {
const reportData: ReportData = {
id: newReport.id,
comment: newReport.comment ?? '',
// other data
};
// update actions is nearly the same, so we skip it
await kv.hset(reportKey, reportData);
} catch (e) {
if (e instanceof Error) {
throw new Error('Could not send the report');
}
}
}
// get reports
export async function getReportsAction(): Promise<ReportData[]> {
let reports: ReportData[] = [];
const reportKeys = await getKVReportKeys();
for (const key of reportKeys) {
const report = await kv.hgetall(key);
reports.push(report);
}
return reports.sort((a, b) => (a.dateAdded > b.dateAdded ? -1 : 1));
}
// delete report
export async function deleteReportAction(id: string) {
const reportKey = makeKVReportKey(id);
await kv.del(reportKey);
}
You may have noticed certain simplifications for some operations.
The reason being is that “create report” is the only user-facing action. Other actions are rather contained and limited to local usage. However if you intend to inherit this approach, make sure to catch and log the unexpected errors for various app roles.
Unicornix 🦄
When Fuji X Studio just started, the color theme was inspired by Fujifilm brand green color with the sporadic red accents. The theme itself looked fine, but former approach could cause confusing workarounds for the accessible colors, especially when dark theme was involved.
The design challenges with Fuji X Studio inspired the creation of 🦄 Unicornix, which later became its theming backbone.
This transition resulted in a modern, versatile, and fully accessible UI:
Other design tokens are created with the help of Design Tokens Generator as usual.
If you want to have a deeper look at the tokens organization, welcome to explore the design system starter project, specifically the relevant package. It’s essentially the one used in Fuji X Studio for theming.
Also, big kudos to Chakra UI for creating fantastic theming solution. It’s system prop approach is incredibly intuitive and it’s a struggle to switch back after working with it for a while. Highly recommended.
If you wonder about the name differences, someone snatched the fujixrecipes.com domain just a couple of days before I was about to commit a purchase. And you probably guessed what happened next — another pivot led to branding and content changes.
“Frankly, I like the new name better.”
“Johan, you are just happy that the project is complete.”
“No, see, «Studio» implies a professional photography environment”
“And «Recipes» are some sketchy things that you never follow?”
“Replacing ingredients is creative!”
“I wonder why it’s not yet in testimonials…”
Imprint
The Freemited
FUJI X Studio is entirely free — no ads, no signups, and no data sales. It’s designed for hobbyists and developers as an educational and creative tool. And it’s a fun project to work on.
At the same time it is relying on the high quality services like Vercel, Gemini and AWS S3, but respecting the virtual limits, imposed by aforementioned services. Luckily, given the predicted usage and rate limitation it’s fairly enough for hobby experiments.
Specs
FUJI X Studio is based on Next.js and powered by Vercel, featuring Chakra UI and Ark for the interface, Lucide icons, Google fonts, Design Tokens Generator and Unicornix for theming, AWS S3 for images storage, Gemini AI for images processing, Vercel KV for database. While additional services contribute to the project, this lineup offers a clear overview of the core stack.
NextJS 15? Chakra3?
In late October 2024, both Next.js and Chakra released major updates.
Coincidentally, just a couple of days before final polishing and launching the app, posing a precarious challenge of yet another tech update. After giving it a b/g/rief overview and estimating the migration effort it was decided to stay somewhat behind the bleeding edge for a while. It’s totally okay. Let it be.
Frame It
Fuji X Studio started as a spark of curiosity, and evolved through the proof of concept, researches, pivots, and finally emerged as a tool that bridges AI innovation with the practical experimental photography.
We certainly learned a lot through this journey, which resulted in this very article. Of course some parts were skipped otherwise the story would never end. Same goes for the app development, and can apply to the photo editing as well. You need to know when done is done. It’s the essential skill that gets honed with the every next project.
If you got some inspiration, had some fun, found something useful or new, please support the article via sharing or 💜. Thank you for reading!
“Hey, Johan, do you know what are going to do today?”
“Na ja, try to take over the world?”
“Wha…? Jeez. Let’s go take some photos first, shall we?”
Top comments (0)