DEV Community

Bhushan Rane
Bhushan Rane

Posted on

.py : Automating PDF Operations (Extracting Text from PDFs)

Description:

This Python script extracts text from PDF files using the PyPDF2 library. It reads each page of the PDF and compiles the extracted text into a single string.

# Python script to extract text from PDFs
import PyPDF2
def extract_text_from_pdf(file_path):
with open(file_path, 'rb') as f:
pdf_reader = PyPDF2.PdfFileReader(f)
text = ''
for page_num in range(pdf_reader.numPages):
page = pdf_reader.getPage(page_num)
text += page.extractText()
return text
Enter fullscreen mode Exit fullscreen mode

Top comments (0)