Hey there!👋 After a bit of a hiatus, I'm back and ready to dive into some coding fun. In this article, we'll be building something exciting – an AI Image Captioning App using React Native and the Hugging Face Inference API.
Getting Started
Without any more delays, let the coding begin! We will be using Expo to bootstrap our React Native project. Expo is a set of tools and services for building React Native applications more easily and quickly.
Before we kick things off, make sure you have Node.js installed on your machine.
To initialize a new project, run the following command:
npx create-expo-app ai-image-captioner && cd ai-image-captioner
After navigating to the project directory, install the following dependencies:
npx expo install expo-camera expo-image-picker expo-font expo-splash-screen
npm install axios
- axios: axios is a promise-based HTTP client for the browser and Node.js. It is commonly used for making HTTP requests and handling responses.
- expo-camera: expo-camera is a part of the Expo framework, providing a set of components and APIs for integrating camera functionality into React Native applications. It simplifies the process of capturing photos and videos.
- expo-image-picker: expo-image-picker is another Expo package that facilitates accessing the device's image and video picker. It allows users to choose images or videos from their device's gallery for use within the application.
- expo-font: expo-font is an Expo module that simplifies the process of loading custom fonts in React Native applications. It provides tools to easily incorporate and use custom fonts for styling text elements.
- expo-splash-screen: expo-splash-screen is an Expo-specific module designed to manage the splash screen (initial screen displayed while the app is loading) in React Native applications. It offers an easy way to customize and control the splash screen experience.
Hugging Face:
Hugging Face is a machine learning and data science platform and community that helps users build, deploy and train machine learning models.
Hugging Face offers a diverse array of tools, models, and datasets that empower developers and researchers in the field of machine learning. Their platform is home to an extensive library of pre-trained models, facilitating easy integration and experimentation for professionals working in artificial intelligence.
Let's head to the Hugging Face official website and create an account.
Once you have created your account, you can navigate to access tokens and generate one.
Copy it and set it aside. We will use it in a few moments.
Pre-trained models:
A pre-trained model is a machine learning (ML) model that has been trained on a large dataset and can be fine-tuned for a specific task.
The Model Hub allows users to discover, share, and use pre-trained models for various tasks.
There are various approaches to integrate pre-trained models seamlessly. For a quick and straightforward implementation, we are going to use the Inference API. However, if your project demands a higher level of customization and control, you can use the Transformers library.
Inference API
When opting for the Inference API, you have two pathways. You can either incorporate the huggingface/inference package into your project for a streamlined experience, or take advantage of direct API access by defining the endpoint variable, such as: ENDPOINT = https://api-inference.huggingface.co/models/<MODEL_ID>
Our app consists of a few components, but the heart of the operation lies in App.js. Copy the code below into the file:
import React, { useState } from "react";
import axios from "axios";
import ImageForm from "./components/ImageForm";
const App = () => {
const [caption, setCaption] = useState("");
const handleImageUrl = async (imageUrl) => {
try {
let image = await (await fetch(imageUrl)).blob();
const HUGGING_FACE_API_KEY = "HUGGING_FACE_API_KEY";
const response = await axios.post(
"https://api-inference.huggingface.co/models/nlpconnect/vit-gpt2-image-captioning",
image,
{
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${HUGGING_FACE_API_KEY}`,
},
transformRequest: [(data) => data],
}
);
setCaption(response.data[0].generated_text);
} catch (error) {
console.error(error);
}
};
return <ImageForm onSubmit={handleImageUrl} caption={caption} />;
};
export default App;
Now let's break down some key parts:
Here, we import React and useState from React for our component's state management. Additionally, we bring in axios for making HTTP requests and ImageForm, a component we'll create to handle image processing. Then, we declare the App component and using the useState hook, we set up a state variable caption and its corresponding setter function setCaption.
handleImageUrl
, is where the real action happens. It takes an imageUrl as an argument, fetches the image, and makes a POST request to the Inference API. We used the nlpconnect/vit-gpt2-image-captioning
model for image captioning. The result is stored in the caption state.
Note: transformRequest: [(data) => data]
in the axios request is used to disable automatic data serialization, and send data in its raw form.
Finally, in the return statement, we render the ImageForm component, passing down the handleImageUrl function and the caption state as props.
Create a new ImageForm.js
 file with the following content:
import React, { useState, useEffect, useCallback } from "react";
import {
View,
TextInput,
TouchableOpacity,
Image,
Text,
ActivityIndicator,
StyleSheet,
Alert,
} from "react-native";
import { Camera, CameraType } from "expo-camera";
import { useFonts } from "expo-font";
import * as ImagePicker from "expo-image-picker";
import * as SplashScreen from "expo-splash-screen";
import CameraScreen from "./CameraScreen";
import { Preview } from "../assets";
SplashScreen.preventAutoHideAsync();
const ImageForm = ({ onSubmit, caption }) => {
const [imageUrl, setImageUrl] = useState(null);
const [selectedImage, setSelectedImage] = useState(null);
const [loading, setLoading] = useState(false);
const [toggleCamera, setToggleCamera] = useState(false);
const [capturedImage, setCapturedImage] = useState(null);
const [permission, requestPermission] = Camera.useCameraPermissions();
const [type, setType] = useState(CameraType.back);
const [fontsLoaded, fontError] = useFonts({
"BebasNeue-Regular": require("../assets/fonts/BebasNeue-Regular.ttf"),
});
useEffect(() => {
// Request permission to access the photo library
(async () => {
const { status } =
await ImagePicker.requestMediaLibraryPermissionsAsync();
if (status !== "granted") {
Alert.alert(
"Permission denied",
"You need to grant permission to access the photo library."
);
}
})();
}, []);
const pickImage = async () => {
try {
const result = await ImagePicker.launchImageLibraryAsync({
mediaTypes: ImagePicker.MediaTypeOptions.Images,
allowsEditing: true,
aspect: [4, 3],
quality: 1,
});
if (!result.cancelled) {
setSelectedImage(result.assets[0].uri);
setImageUrl(result.assets[0].uri);
}
} catch (error) {
console.error(error);
}
};
const handlePress = async () => {
setLoading(true);
await onSubmit(imageUrl);
setLoading(false);
};
const submitImage = () => {
setToggleCamera(!toggleCamera);
};
const cancelCamera = () => {
setToggleCamera(!toggleCamera);
setImageUrl("");
setCapturedImage(null);
setSelectedImage(null);
};
const onLayoutRootView = useCallback(async () => {
if (fontsLoaded || fontError) {
await SplashScreen.hideAsync();
}
}, [fontsLoaded, fontError]);
if (!fontsLoaded && !fontError) {
return null;
}
return (
<View style={styles.wrapper} onLayout={onLayoutRootView}>
{toggleCamera ? (
<CameraScreen
cancelCamera={cancelCamera}
setImageUrl={setImageUrl}
submitImage={submitImage}
capturedImage={capturedImage}
setCapturedImage={setCapturedImage}
requestPermission={requestPermission}
permission={permission}
type={type}
setSelectedImage={setSelectedImage}
/>
) : (
<View style={styles.container}>
<View>
<Text style={[styles.textStyle, styles.header]}>
Image Captioner! 🤗
</Text>
</View>
<View style={styles.imagePreviewContainer}>
<Image
source={ selectedImage ? { uri: selectedImage } : Preview }
style={[styles.imagePreview, !selectedImage && { width: 100 }]}
resizeMode="contain"
/>
</View>
<View>
<Text style={[styles.textStyle, { fontWeight: "bold" }]}>
{caption !== "" && "🪄 " + caption + " 🪄"}
</Text>
</View>
<View style={styles.inputContainer}>
<TextInput
style={styles.inputBox}
autoCapitalize="none"
placeholder="Enter Image Link"
value={imageUrl}
onChangeText={(url) => setImageUrl(url)}
/>
</View>
<View>
<Text
style={{
fontSize: 20,
fontWeight: "bold",
textAlign: "center",
color: "#8C94A5",
marginVertical: 10,
}}
>
OR
</Text>
</View>
<View style={styles.buttonArea}>
<TouchableOpacity
style={[
styles.button,
{ backgroundColor: "#0166FF" },
]}
onPress={submitImage}
>
<Text style={styles.ButtonText}>Take Photo</Text>
</TouchableOpacity>
</View>
<View style={styles.buttonArea}>
<TouchableOpacity
style={[styles.button, { backgroundColor: "#0166FF" }]}
onPress={pickImage}
>
<Text style={styles.ButtonText}>Browse Images</Text>
</TouchableOpacity>
</View>
<View style={styles.lineStyle} />
<View style={[styles.buttonArea, styles.submitBtnArea]}>
<TouchableOpacity
style={[styles.button, { backgroundColor: "#212429" }]}
onPress={handlePress}
>
<Text style={styles.ButtonText}>Process</Text>
</TouchableOpacity>
</View>
{loading && (
<ActivityIndicator
size="large"
color="#0000FF"
style={styles.loading}
/>
)}
</View>
)}
</View>
);
};
const styles = StyleSheet.create({
wrapper: {
height: "100%",
paddingHorizontal: 30,
paddingVertical: 80,
},
container: {
height: "100%",
},
header: {
fontWeight: 700,
fontSize: 35,
fontFamily: "BebasNeue-Regular",
color: "#29323B",
},
textStyle: {
fontSize: 14,
marginTop: 8,
marginBottom: 5,
textAlign: "center",
color: "#29323B",
},
inputContainer: {
marginTop: 20,
},
inputBox: {
borderColor: "#E1E4EB",
height: 55,
width: "100%",
borderRadius: 10,
borderWidth: 2,
padding: 10,
textAlign: "left",
},
button: {
backgroundColor: "blue",
height: 40,
width: "100%",
borderRadius: 5,
padding: 10,
},
buttonArea: {
display: "flex",
alignItems: "center",
justifyContent: "center",
marginBottom: 5,
},
submitBtnArea: {
marginVertical: 10,
},
ButtonText: {
color: "white",
textAlign: "center",
},
loading: {
marginTop: 8,
},
imagePreviewContainer: {
alignItems: "center",
marginBottom: 16,
marginTop: 16,
width: "100%",
height: 200,
borderWidth: 2,
borderStyle: "dashed",
borderColor: "#7BA7FF",
borderRadius: 5,
},
imagePreview: {
borderRadius: 5,
width: "100%",
height: "100%",
},
lineStyle: {
borderWidth: 0.5,
borderColor: "#D3D8E3",
marginBottom: 15,
marginTop: 10,
},
});
export default ImageForm;
We first import necessary modules and components for the ImageForm
component.
Then, we configure SplashScreen that invokes preventAutoHideAsync
to prevent the splash screen from hiding until fonts are loaded.
After that, we initialize state variables using useState for managing the component's state.
toggleCamera
state manages whether the camera is active.
permission
and requestPermission
manage camera permissions.
type
manages the camera type (front/back).
useFonts
loads custom fonts, with error handling.
useEffect
hook here is used to request permission to access the photo library when the component mounts.
We use the useFonts
hook from Expo to import the "BebasNeue-Regular" font and load it. The onLayoutRootView
function, using useCallback, hides the splash screen when fonts are loaded. If fonts are not yet loaded, it returns null to prevent rendering.
pickImage
function uses ImagePicker
from Expo to launch the device's image library. It configures the picker with options like allowed media types, editing capabilities, aspect ratio, and quality. If the user selects an image (!result.cancelled), it updates the state variables setSelectedImage and setImageUrl with the URI of the selected image.
handlePress
function is triggered when the user presses the Process
button to submit the image for processing. It sets the loading state to true to show an activity indicator. Calls the onSubmit function (provided as a prop) with the imageUrl
as an argument, which triggers the image processing logic.
submitImage
function toggles the camera screen.
cancelCamera
function when the user cancels or exits the camera screen.
The UI is conditionally rendered based on the toggleCamera
state. If toggleCamera
is true, it shows the CameraScreen
, passing various props for handling the camera functionality; otherwise, it displays the main ImageForm
UI.
Finally, create a new CameraScreen.js
 file and copy the following code into it:
import React from "react";
import {
View,
Text,
TouchableOpacity,
Image,
StyleSheet,
Button,
} from "react-native";
import { Camera } from "expo-camera";
import { Capture, Submit, Back, Reset } from "../assets";
const CameraScreen = ({
cancelCamera,
submitImage,
setImageUrl,
capturedImage,
setCapturedImage,
setSelectedImage,
permission,
type,
}) => {
const takePicture = async () => {
if (cameraRef) {
const photo = await cameraRef.takePictureAsync();
setCapturedImage(photo);
setImageUrl(photo.uri);
setSelectedImage(photo.uri);
}
};
const resetImage = () => {
setCapturedImage(null);
setImageUrl(null);
setSelectedImage(null);
};
if (!permission) {
// Camera permissions are still loading
return <View />;
}
if (!permission.granted) {
// Camera permissions are not granted yet
return (
<View style={styles.container}>
<Text style={{ textAlign: "center" }}>
We need your permission to show the camera
</Text>
<Button onPress={requestPermission} title="grant permission" />
</View>
);
}
return (
<View style={styles.container} testID="camera-screen">
<Camera
style={styles.camera}
type={type}
ref={(ref) => (cameraRef = ref)}
>
<View style={styles.buttonContainer}>
<TouchableOpacity
style={styles.button}
onPress={takePicture}
testID="capture-button"
>
<Image
source={Capture}
style={styles.imageIcon}
resizeMode="contain"
/>
</TouchableOpacity>
<TouchableOpacity style={styles.button} onPress={cancelCamera}>
<Image
source={Back}
style={styles.imageIcon}
resizeMode="contain"
/>
</TouchableOpacity>
{capturedImage && (
<TouchableOpacity style={styles.button} onPress={resetImage}>
<Image
source={Reset}
style={styles.imageIcon}
resizeMode="contain"
/>
</TouchableOpacity>
)}
{capturedImage && (
<TouchableOpacity style={styles.button} onPress={submitImage}>
<Image
source={Submit}
style={styles.imageIcon}
resizeMode="contain"
/>
</TouchableOpacity>
)}
</View>
</Camera>
{capturedImage && (
<View style={styles.imagePreviewContainer}>
<Text style={{ textAlign: "center", marginBottom: 15 }}>
Captured Image Preview
</Text>
<Image
source={{ uri: capturedImage.uri }}
style={styles.imagePreview}
/>
</View>
)}
</View>
);
};
const styles = StyleSheet.create({
container: {
flex: 1,
},
camera: {
flex: 1,
},
buttonContainer: {
flex: 1,
flexDirection: "row",
backgroundColor: "transparent",
margin: 64,
},
button: {
flex: 1,
alignSelf: "flex-end",
alignItems: "center",
},
cameraContainer: {
flex: 1,
flexDirection: "column",
justifyContent: "space-between",
margin: 20,
width: "100%",
},
buttonText: {
fontSize: 18,
color: "white",
},
imagePreviewContainer: {
flex: 1,
justifyContent: "center",
alignItems: "center",
},
imagePreview: {
width: "80%",
height: "80%",
resizeMode: "contain",
},
imageIcon: {
width: 30,
height: 30,
},
});
export default CameraScreen;
takePicture
function captures a photo using the cameraRef and updates state variables (setCapturedImage, setImageUrl, setSelectedImage).
resetImage
function resets the captured image and associated state variables.
Then we check whether camera permissions are still loading or not granted. Renders a message to grant permission if needed.
Congratulations on making it this far! Now, let's perform some unit testing to ensure our code behaves as expected.
Unit Testing:
Before we dive into the testing arena, let's make sure our environment is prepared. Ensure you have the necessary packages in your toolkit:
npx expo install jest-expo jest
npm install -D @testing-library/react-native
jest: jest is a JavaScript testing framework widely used for testing JavaScript code, including React and React Native applications. It provides a test runner, assertion library, and mocking capabilities.
jest-expo: jest-expo is a Jest preset specifically designed for Expo projects. It configures Jest with settings optimized for Expo and React Native development, making it easier to write and run tests in Expo projects.
@testing-library/react-native: @testing-library/react-native is part of the Testing Library family and provides utilities for testing React Native components. It encourages testing components in a way that simulates user interactions and ensures the application behaves as expected.
Include the Jest configuration in your package.json
file like this:
...
"scripts": {
...
"test": "jest"
},
"jest": {
"preset": "jest-expo",
"transformIgnorePatterns": [
"node_modules/(?!((jest-)?react-native|@react-native(-community)?)|expo(nent)?|@expo(nent)?/.*|@expo-google-fonts/.*|react-navigation|@react-navigation/.*|@unimodules/.*|unimodules|sentry-expo|native-base|react-native-svg)"
]
},
...
This configuration ensures that Jest ignores certain modules during the transformation process, preventing potential issues with Expo and related dependencies.
Writing Our Script
Create a App.test.js
 file with the following content:
import React from "react";
import axios from "axios";
import { render, waitFor, fireEvent, act } from "@testing-library/react-native";
import { Camera } from "expo-camera";
import ImageForm from "./ImageForm";
import App from "./App";
jest.mock("axios");
jest.mock("expo-image-picker", () => ({
...jest.requireActual("expo-image-picker"),
requestMediaLibraryPermissionsAsync: jest.fn(),
}));
describe("App", () => {
describe("ImageForm Component", () => {
it("renders correctly", async () => {
require("expo-image-picker").requestMediaLibraryPermissionsAsync.mockResolvedValue(
{
status: "granted",
}
);
const { getByText, getByPlaceholderText } = render(<ImageForm />);
await waitFor(() => {
expect(getByText("Image Captioner! 🤗")).toBeTruthy();
});
await waitFor(() => {
expect(getByPlaceholderText("Enter Image Link")).toBeTruthy();
});
});
it("handles submit button press correctly", async () => {
const mockOnSubmit = jest.fn();
const { getByPlaceholderText, getByText, getByTestId } = render(
<ImageForm onSubmit={mockOnSubmit} />
);
const input = getByPlaceholderText("Enter Image Link");
const submitButton = getByText("Process");
fireEvent.changeText(input, "https://example.com/image.jpg");
fireEvent.press(submitButton);
await waitFor(() => {
expect(mockOnSubmit).toHaveBeenCalledWith(
"https://example.com/image.jpg"
);
});
});
it("handles image URL submission and displays the generated caption", async () => {
const mockCaption = "Mock Caption";
axios.post.mockResolvedValue({ data: [{ generated_text: mockCaption }] });
const { getByPlaceholderText, getByText } = render(<App />);
const input = getByPlaceholderText("Enter Image Link");
await act(() => {
fireEvent.changeText(input, "https://example.com/image.jpg");
});
await act(() => {
const submitButton = getByText("Process");
fireEvent.press(submitButton);
});
await waitFor(() =>
expect(getByText(`🪄 ${mockCaption} 🪄`)).toBeTruthy()
);
});
});
describe("CameraScreen Component", () => {
it("shows camera if permissions are granted", async () => {
jest
.spyOn(Camera, "useCameraPermissions")
.mockReturnValue([{ granted: true }, () => Promise.resolve({})]);
const { getByTestId, getByText } = render(<App />);
await act(() => {
fireEvent.press(getByText("Take Photo"));
});
const camera = getByTestId("camera-screen");
expect(camera).toBeTruthy();
});
it("takes a picture and displays preview", async () => {
jest
.spyOn(Camera, "useCameraPermissions")
.mockReturnValue([{ granted: true }, () => Promise.resolve({})]);
jest
.spyOn(Camera.prototype, "takePictureAsync")
.mockImplementation(() => {
return Promise.resolve({
uri: "file://some-file.jpg",
});
});
const { getByTestId, getByText } = render(<App />);
fireEvent.press(getByText("Take Photo"));
const captureButton = getByTestId("capture-button");
fireEvent.press(captureButton);
await waitFor(() => {
expect(getByText("Captured Image Preview")).toBeTruthy();
});
});
});
});
Mocking Axios and Expo's image picker is also for testing purposes.
"renders correctly":
The test sets up a mock for requestMediaLibraryPermissionsAsync
to simulate the permission being granted. Then, it renders the ImageForm component, and asserts that the component renders the expected text "Image Captioner! 🤗" and includes a placeholder for entering an image link.
"handles submit button press correctly":
It creates a mock function (mockOnSubmit
) to simulate the onSubmit function, and simulates changing the text in the input field to a sample image link and presses the Process
button. It asserts that the onSubmit
function is called with the expected image URL.
"handles image URL submission and displays the generated caption":
It mocks the Axios POST request to simulate generating a caption for an image URL. Simulates changing the text in the input field to a sample image link and pressing the "Process" button. It asserts that the generated caption is displayed in the expected format.
"shows camera if permissions are granted":
It spies on the useCameraPermissions
function to simulate that camera permissions are granted. Renders the App component. Simulates pressing the Take Photo
button, and finally, it asserts that the camera screen is displayed when permissions are granted.
"takes a picture and displays preview":
Again, it spies on useCameraPermissions
to simulate that camera permissions are granted and takePictureAsync
to mock taking a picture. Simulates pressing the Take Photo
button and then the capture button. It asserts that the captured image preview is displayed.
These were just a few testing scenarios to get you started, but the possibilities are endless. Feel free to explore and add more scenarios that cover different aspects of your application. Think about a specific scenario or functionality you want to test. It could be related to user interactions, edge cases, or error handling.
Full code is available for reference on GitHub here
Conclusion:
And there you have it – your very own Image Captioner is ready! But wait, what if we sprinkle a bit more magic? How about adding language translation feature to those captions? – there's always room for more creativity. You could explore more advanced models, tweak parameters, or even consider deploying your app to the cloud for broader accessibility.
Catch you in the next one. Take care! 🚀✨
Top comments (0)