DEV Community

Adam Ryczkowski
Adam Ryczkowski

Posted on

What is the naming convention of images inside the "Pictures" folder in the odt Writer document?

When Pandoc creates ODT file from HTML containing SVG images, it losslessly embeds the images as-is. That's a problem for me, because my pipeline does some automatic transformations to the document using LibreOffice UNO API and ultimately saves as DOCX. When LibreOffice saves a ODT file containing the SVG images into DOCX, it rasterizes the images in a very poor resolution, that is according to folks in the LibreOffice forum, uncontrollable. But there is a trick: I can use Inkscape to do convert the SVG images into EMF. EMF files are not rasterized by LibreOffice when it saves the document as DOCX.

The problem is that the EMF files have obviously different binary content than SVG originals. When I replace them in the "Pictures/" folder inside the ODT, LibreOffice notices that the file name of the EMF pictures does not match their hash and claims the "image is corrupted" and gives an option to repair. Unfortunately, that repair dialog cannot get automated in the headless environment, which means I need to know how to make the "non-broken" ODT document in the first place. For that, I need to know the hashing scheme.

I tried to read the Pandoc sources to get the answer myself, but my zero knowledge of Haskell is a major obstacle.

My gut feeling says the answer is somewhere in the pandoc/src/Text/Pandoc/Writers/OpenDocument.hs.

Top comments (1)

adamryczkowski profile image
Adam Ryczkowski

I have found the solution:

It appears I did not change the reference to the file in the META-INF/manifest.xml.

After fixing that, Writer no longer complains.