DEV Community

yaugika-amit
yaugika-amit

Posted on

Best way to convert Word docx to ASCII doc by Pandoc w/o loosing styles

Hello Folks,

I have a huge set of content in docx format that i need to move to ASCII docs / adoc format. I have followed the instruction at page https://docs.asciidoctor.org/asciidoctor/latest/migrate/ms-word/ to convert few docs.
Many custom defined styles in the original word docx template ( such as code block, fonts, styles, ) end up breaking (loosing / misaligned / messed up) and moreover searching such style elements and content and comparing both adoc and docx file side by side for all converted elements is too much painful. (Imagine 300 -400 pages docs of manuals! :( ) . I did try to modify the custom style to match ascii docs format - still not much improvement.

What are the various customization I could use with Pandoc in such conversion ( docx to Ascii ) to minimize manual work for fixing styles? ( in particular code blocks, inline code blocks, etc. .)

Is there a way to use a custom variable or style ( highlighting may be? ) that could be tagged to different original word styles so it can be visible (marked distinctly from other styles and elements after conversion so it can be searched and manually fixed) ?

I dig up the post at https://learnbyexample.github.io/customizing-pandoc/
to find some useful pointers. Planning to test these and see how adoc converted file shows up. Does converting markdown to pdf is similar to converting word docx to Ascii format - does all the customization may hold true in such cases? Any particular command or syntax I could use or look out for?

I would be really thankful if the community could provide useful pointers and thoughts.

PS: Pardon my lack of knowledge and feel free to point out any particular details you may need - I am noob to tech world or programming.

Appreciate the help!
Thanks

Top comments (0)