How to extract plain text from pdf in react

#tutorial #react #pdftotext #pdf

Learn how to efficiently extract plain text from PDF documents in your React applications using the "react-pdftotext" npm package. This guide will walk you through the steps, helping you enhance your app's text processing abilities.

Setting Up The Application

Create a new vite-react application

npm create vite@latest pdf-to-text -- --template react
cd pdf-to-text

Install the "react-pdftotext" npm package

npm i react-pdftotext

Now there are two ways in which we can utilize the pdfToText function provided by the "react-pdftotext" package:

Taking pdf file input from the local machine.
Fetching pdf file data from remote URL.

Let's see each way one by one.

Local File Input

In the App.jsx file add an input tag of type="file" with an onChange handler function.

<input type="file" accept="application/pdf" onChange={extractText} />

Now import the pdfToText function from the package

import pdfToText from 'react-pdftotext'

Add function definition to the onChange handler function in the App component and define a state variable to store the extracted text content.

const [text, setText] = useState("")

function extractText(event) {
    const file = event.target.files[0]
    pdfToText(file)
        .then(text => setText(text))
        .catch(error => console.error("Failed to extract text from pdf"))
}

Remote PDF File Input

For extracting text from pdf files present remotely. You will first need to fetch the file from the remote location and then read the file contents into a javascript blob object. Now, you must provide this blob object and an input to the pdfToText function.

For this case, the extractText function definition will look like

const pdf_url = "REMOTE_PDF_URL"

function extractText() {
    const file = await fetch(pdf_url)
        .then(res => res.blob())
        .catch(error => console.error(error))

    pdfToText(file)
        .then(text => setText(text))
        .catch(error => console.error("Failed to extract text from pdf"))
}

Thank you for reading! Be sure to send your questions and suggestions here.

DEV Community

How to extract plain text from pdf in react

Setting Up The Application

Top comments (0)

Read next

React: LinkedIn Access Token in 10 Steps

🎨 Mastering Angular-React Integration: How to Use tldraw Without Losing Your Mind!

Creating a Linux VM and installing nginx on it

Portfolio Update: A Fresh New Look & Enhanced Features!