DEV Community

E-iceblue Product Family
E-iceblue Product Family

Posted on

How to extract text and image from a PowerPoint Document in Java

Texts and images are the main elements on the Presentation slides. In this article, we will show you how to extract text and image from a PowerPoint document using Free Spire.Presentation for Java.


If you use maven, you need to specify the dependencies for Free Spire.Presentation for Java library in your project’s pom.xml file.

Enter fullscreen mode Exit fullscreen mode

For non-maven projects, download Free Spire.Presentation for Java, unzip the package and add Spire.Presentation.jar in the lib folder into your project as a dependency.

import com.spire.presentation.*;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;

public class Test {
    public static void main(String[] args) throws Exception {
        //Create a presentation instance.
        Presentation ppt = new Presentation();

        // Load the document from disk.

            StringBuilder buffer = new StringBuilder();
            //Traverse the presentation slides to extract the text.
            for (Object slide : ppt.getSlides()) {
                for (Object shape : ((ISlide) slide).getShapes()) {
                    if (shape instanceof IAutoShape) {
                        for (Object tp : ((IAutoShape) shape).getTextFrame().getParagraphs()) {
                            buffer.append(((ParagraphEx) tp).getText());
            //Save to document to .txt
            FileWriter writer = new FileWriter("ExtractText.txt");

        //Extract all the images from the presentation slides
        for (int i = 0; i < ppt.getImages().getCount(); i++) {
            BufferedImage image = ppt.getImages().get(i).getImage();
            ImageIO.write(image, "PNG", new File(String.format("extractImage-%1$s.png", i)));
Enter fullscreen mode Exit fullscreen mode

The effective screenshot of the extracted images:
Extracted images

Extracted Text

Discussion (0)