DEV Community

Usman Aziz
Usman Aziz

Posted on • Edited on • Originally published at blog.groupdocs.com

Extract Images from PDF Documents using C# .NET

Portable Document Format (PDF) is a popular and widely used document format developed by Adobe. The PDF documents can contain a variety of content including formatted text, images, annotations, form fields, etc. Parsing PDF document programmatically is a popular use case and there are multiple ways of extracting the text. However, extracting images from a PDF document is a complex task. This article demonstrates how easily you can extract images from the PDF documents programmatically in C# using GroupDocs.Parser for .NET API. So let’s begin.

Steps to extract images from a PDF document

1. Create a new project.

2. Download GroupDocs.Parser for .NET or install it using NuGet.

3. Add the following namespaces.

using GroupDocs.Parser;
using GroupDocs.Parser.Data;
using System.Drawing;
using System.Drawing.Imaging;
Enter fullscreen mode Exit fullscreen mode

4. Load the PDF document.

// Create an instance of Parser class
using (Parser parser = new Parser("sample.pdf"))
{
  // you code goes here.
}
Enter fullscreen mode Exit fullscreen mode

5. Extract images from the document.

// Extract images
IEnumerable<PageImageArea> images = parser.GetImages();
// Check if images extraction is supported
if (images == null)
{
  Console.WriteLine("Images extraction isn't supported");
  return;
}
Enter fullscreen mode Exit fullscreen mode

6. Access each image from the collection and save it.

// Iterate over images
foreach (PageImageArea image in images)
{
  // Save images
  Image.FromStream(image.GetImageStream()).Save(string.Format("{0}.Jpeg", counter++), ImageFormat.Jpeg);                    
}
Enter fullscreen mode Exit fullscreen mode

Complete code

// Create an instance of Parser class
using (Parser parser = new Parser("C:\\candy.pdf"))
{
    // Extract images
    IEnumerable<PageImageArea> images = parser.GetImages();
    // Check if image extraction is supported
    if (images == null)
    {
        Console.WriteLine("Images extraction isn't supported");
        return;
    }

    int counter = 1;
    // Iterate over images
    foreach (PageImageArea image in images)
    {
        // Save each image
        Image.FromStream(image.GetImageStream()).Save(string.Format("{0}.Jpeg", counter++), ImageFormat.Jpeg);                    
    }
}
Enter fullscreen mode Exit fullscreen mode

Results

PDF Document
Alt Text
Extracted Images
Alt Text

Cheers!

See Also

Top comments (0)