Learn how to efficiently extract plain text from PDF documents in your React applications using the "react-pdftotext" npm package. This guide will walk you through the steps, helping you enhance your app's text processing abilities.
Setting Up The Application
Create a new vite-react application
npm create vite@latest pdf-to-text -- --template react
cd pdf-to-text
Install the "react-pdftotext" npm package
npm i react-pdftotext
Now there are two ways in which we can utilize the pdfToText function provided by the "react-pdftotext" package:
- Taking pdf file input from the local machine.
- Fetching pdf file data from remote URL.
Let's see each way one by one.
Local File Input
In the App.jsx file add an input tag of type="file" with an onChange handler function.
<input type="file" accept="application/pdf" onChange={extractText} />
Now import the pdfToText function from the package
import pdfToText from 'react-pdftotext'
Add function definition to the onChange handler function in the App component and define a state variable to store the extracted text content.
const [text, setText] = useState("")
function extractText(event) {
const file = event.target.files[0]
pdfToText(file)
.then(text => setText(text))
.catch(error => console.error("Failed to extract text from pdf"))
}
Remote PDF File Input
For extracting text from pdf files present remotely. You will first need to fetch the file from the remote location and then read the file contents into a javascript blob object. Now, you must provide this blob object and an input to the pdfToText function.
For this case, the extractText function definition will look like
const pdf_url = "REMOTE_PDF_URL"
function extractText() {
const file = await fetch(pdf_url)
.then(res => res.blob())
.catch(error => console.error(error))
pdfToText(file)
.then(text => setText(text))
.catch(error => console.error("Failed to extract text from pdf"))
}
Thank you for reading! Be sure to send your questions and suggestions here.
Top comments (0)