DEV Community

Drazan-Jarak
Drazan-Jarak

Posted on • Edited on

DOCX Comments to PDF Annotations Conversion

Intro

This article will explain how to programmatically convey DOCX comments to PDF annotations. First, you need some tools for doing it. Currently, I don’t know any tool which can be used to make this possible other than Aspose.Words and Aspose.PDF.
So, the first part of the conversion will be handled by using Aspose.Words product.
This part contains a few steps:

  • Extract/read comments from the DOCX.
  • Mark the beginning and the end of the comments.
  • And save the document in memory stream (PDF format) with marked comments.

Aspose.PDF API is responsible for the following actions:

  • Iterate through every page and text fragment to find the start and end mark of the comments.
  • When finds the marked text fragments it splits them into segments and adds highlight annotation for each fragment
  • If there were replies in the Word comments in the next step Aspose.PDF adds replies to their original annotations.
  • In the end, we simply save the document.

Code explanation

Extract/read comments from the DOCX:

Open document with comments:

       String filename = @" source.docx";
       Aspose.Words.Document d = new 
       Aspose.Words.Document(filename);

Extract/read comments:

Get child nodes

            NodeCollection comments = doc.GetChildNodes(NodeType.Comment, true);
            int numOfComments = comments.Count;
            foreach (Aspose.Words.Comment comment in comments)
            {
                Comment comm = readComment(comment);
                wordComments.Add(comm);
                if (comment.Ancestor == null)
signComment(doc, new DocumentBuilder(doc), comment); // only main comment
            }
            comments.Clear();

Read comments and replies:

        private static Comment readComment(Aspose.Words.Comment comment)
        {
            Comment com = new Comment();
            com.Id = comment.Id;
            com.Contents = comment.GetText();
            com.Author = comment.Author;
            com.DateTime = comment.DateTime;
            com.Replies = new List<Reply>();

            foreach (Aspose.Words.Comment rpl in comment.Replies)
            {
                Reply r = new Reply();
                r.Contents = rpl.GetText();
                r.DateTime = rpl.DateTime;
                r.Author = rpl.Author;
                r.Id = rpl.Id;
                com.Replies.Add(r);
            }
            return com;
         }

Sign comments/Mark the beginning and the end of the comments:

        private static void signComment(Aspose.Words.Document doc, DocumentBuilder builder, Aspose.Words.Comment comment)
        {
            /*
             * insert tempTExt with tempFontSize before and after commented text
             */
            // start Run
            Run run = new Run(doc);
            run.Text = tempText;
            run.Font.Name = tempFont;
            run.Font.Color = tempColor;
            run.Font.Size = fontSize;
            // end Run
            Run run2 = new Run(doc);
            run2.Text = tempText;
            run2.Font.Name = tempFont;
            run2.Font.Color = tempColor;
            run2.Font.Size = fontSize;

            CommentRangeStart commentStart = (CommentRangeStart)doc.GetChild(NodeType.CommentRangeStart, comment.Id, true);
            CommentRangeEnd commentEnd = (CommentRangeEnd)doc.GetChild(NodeType.CommentRangeEnd, comment.Id, true);
            builder.MoveTo(commentStart);
            builder.InsertNode(run);
            builder.MoveTo(commentEnd);
            builder.InsertNode(run2);
        }

Save the document in memory stream (PDF format)

d.Save(mms, Aspose.Words.SaveFormat.Pdf);

Add highlight annotation:

Find text fragments:

        private static int addMarkupAnnotation(MemoryStream mms)
        {
            _document = new Aspose.Pdf.Document(mms);
            var pdfContentEditor = new PdfContentEditor(_document);
            int pagenum = 0;
            bool start = false;
            bool end = false;
            TextFragment tf = null;
            foreach (Page page in _document.Pages)
            {
                pagenum++;
                int numOfCommmentsPerPage = 0;
                TextFragmentAbsorber tfa = new TextFragmentAbsorber();
                tfa.Visit(page);

                int tfnum = 0;
                foreach (TextFragment textFragment in tfa.TextFragments)
                {
                    tfnum++;
                    foreach (var ts in textFragment.Segments)
                    {
                        if (!end && start && ts.TextState.FontSize == fontSize && ts.Text.Equals(tempText))
                        {
                            end = true;
                            ts.Text = "";
                            start = false;
                        }
                        if (ts.TextState.FontSize == fontSize && ts.Text.Equals(tempText))
                        {
                            start = true;
                            ts.Text = "";
                            if (tf == null)
                            {
                                tf = new TextFragment();
                                tf.Position = textFragment.Position;
                            }
                        }
                        if (start && tf == null)
                        {
                            tf = new TextFragment();
                            tf.Position = textFragment.Position;
                        }
                        if (start && ts.TextState.FontSize != fontSize)
                        {
                            tf.Segments.Add(ts);
                        }
                    }
                    if (tf != null && end)
                    {
                        Comment comm = (Comment)wordComments.Where(comment => comment.Id == (numOfCommments)).First();
                        var highlightAnnotation = HighLightTextFragment(page, tf, comm.Contents, comm.Author, comm.DateTime);
                        Console.WriteLine("main comment " + numOfCommments);
                        Aspose.Pdf.Rectangle rect = highlightAnnotation.Rect;
                        page.Annotations.Add(highlightAnnotation);
                        numOfCommments++;
                        numOfCommmentsPerPage++;
                        if (comm.Replies.Count > 0)
                        {
                            foreach (Reply item in comm.Replies)
                            {
                                addReply(page, rect, item.Contents, item.Author, item.DateTime, numOfCommmentsPerPage);
                                numOfCommmentsPerPage++;
                                numOfCommments++;
                            }
                        }
                        tf = null;
                        start = false;
                        end = false;
                    }
                } // end of fragments
                if (start && !end)
                {
                    Console.WriteLine("Comment appears across two pages");
                    Comment comm = (Comment)wordComments.Where(comment => comment.Id == (numOfCommments)).First();

                    var highlightAnnotation = HighLightTextFragment(page, tf, comm.Contents, comm.Author, comm.DateTime);
                    Aspose.Pdf.Rectangle rect = highlightAnnotation.Rect;
                    page.Annotations.Add(highlightAnnotation);
                    Console.WriteLine();
                    numOfCommmentsPerPage++;
                    if (comm.Replies.Count > 0)
                    {
                        foreach (Reply item in comm.Replies)
                        {
                            addReply(page, rect, item.Contents, item.Author, item.DateTime, numOfCommmentsPerPage);
                            numOfCommmentsPerPage++;
                        }
                    }
                    tf = null;
                    start = true;
                    end = false;
                }
            }
            return numOfCommments;
        }

Add highlight annotations

        private static HighlightAnnotation HighLightTextFragment(Page page, TextFragment textFragment, String commenttext, String author, DateTime datetime)
        {
            if (textFragment.Segments.Count == 1)
                return new HighlightAnnotation(page, textFragment.Segments[1].Rectangle);

            var offset = 0;
            var quadPoints = new Aspose.Pdf.Point[textFragment.Segments.Count * 4];
            foreach (var segment in textFragment.Segments)
            {
                quadPoints[offset + 0] = new Aspose.Pdf.Point(segment.Rectangle.LLX, segment.Rectangle.URY);
                quadPoints[offset + 1] = new Aspose.Pdf.Point(segment.Rectangle.URX, segment.Rectangle.URY);
                quadPoints[offset + 2] = new Aspose.Pdf.Point(segment.Rectangle.LLX, segment.Rectangle.LLY);
                quadPoints[offset + 3] = new Aspose.Pdf.Point(segment.Rectangle.URX, segment.Rectangle.LLY);
                offset += 4;
            }
            var llx = quadPoints.Min(pt => pt.X);
            var lly = quadPoints.Min(pt => pt.Y);
            var urx = quadPoints.Max(pt => pt.X);
            var ury = quadPoints.Max(pt => pt.Y);

            return new HighlightAnnotation(page, new Aspose.Pdf.Rectangle(llx, lly, urx, ury))
            {
                Modified = datetime,
                Title = author,
                QuadPoints = quadPoints,
                Contents = commenttext,
            };
        }

Add replies to annotation:

        private static void addReply(Page page, Aspose.Pdf.Rectangle rect, String contents, String author, DateTime dateTime, int numOfComment)
        {
            HighlightAnnotation highlightAnnotation = new HighlightAnnotation(page, rect);
            Console.WriteLine("reply to " + numOfComment);
            highlightAnnotation.Contents = contents;
            highlightAnnotation.Title = author;
            highlightAnnotation.Modified = dateTime;
            highlightAnnotation.InReplyTo = page.Annotations[numOfComment];
            page.Annotations.Add(highlightAnnotation);
        }

Save the document:

        if (rzlt == numOfComments)
        {
            _document.Save(filename + ".pdf");
            Console.WriteLine(rzlt + " comments copied from word to PDF succesfully");
        }
        else
        {
            Console.WriteLine("something went wrong");
        }

Conclusion

Please have in mind that the presented code converts MS Word document with comments to PDF document where the comments are conveyed to annotations. If you don't need this feature you can simply convert DOCX to PDF using the simple two lines of code:

// Load the document from disk.
Document doc = new Document(dataDir + "Rendering.docx");
// Save the document in PDF format.
doc.Save(dataDir + "SaveDoc2Pdf.pdf");

You can read more in the documentation.

I hope you find this article helpful.
You can find the whole project on this link.

Best regards.

Top comments (0)