DEV Community

Utkarsh212
Utkarsh212

Posted on

How to extract plain text from pdf in react

Learn how to efficiently extract plain text from PDF documents in your React applications using the "react-pdftotext" npm package. This guide will walk you through the steps, helping you enhance your app's text processing abilities.

Setting Up The Application

Create a new vite-react application

npm create vite@latest pdf-to-text -- --template react
cd pdf-to-text
Enter fullscreen mode Exit fullscreen mode

Install the "react-pdftotext" npm package

npm i react-pdftotext
Enter fullscreen mode Exit fullscreen mode

Now there are two ways in which we can utilize the pdfToText function provided by the "react-pdftotext" package:

  • Taking pdf file input from the local machine.
  • Fetching pdf file data from remote URL.

Let's see each way one by one.

Local File Input

In the App.jsx file add an input tag of type="file" with an onChange handler function.

<input type="file" accept="application/pdf" onChange={extractText} />
Enter fullscreen mode Exit fullscreen mode

Now import the pdfToText function from the package

import pdfToText from 'react-pdftotext'
Enter fullscreen mode Exit fullscreen mode

Add function definition to the onChange handler function in the App component and define a state variable to store the extracted text content.

const [text, setText] = useState("")

function extractText(event) {
    const file = event.target.files[0]
    pdfToText(file)
        .then(text => setText(text))
        .catch(error => console.error("Failed to extract text from pdf"))
}
Enter fullscreen mode Exit fullscreen mode

Remote PDF File Input

For extracting text from pdf files present remotely. You will first need to fetch the file from the remote location and then read the file contents into a javascript blob object. Now, you must provide this blob object and an input to the pdfToText function.

For this case, the extractText function definition will look like

const pdf_url = "REMOTE_PDF_URL"

function extractText() {
    const file = await fetch(pdf_url)
        .then(res => res.blob())
        .catch(error => console.error(error))

    pdfToText(file)
        .then(text => setText(text))
        .catch(error => console.error("Failed to extract text from pdf"))
}
Enter fullscreen mode Exit fullscreen mode

Thank you for reading! Be sure to send your questions and suggestions here.

Top comments (0)