Multi Word document conversion into PDF with Linux and LibreOffice and Unoconv

01 Apr

2019

Johan van de Merwe

Posted in Tools

The article demonstrates how to efficiently convert a large number of documents into PDF format using LibreOffice, Unoconv, and a simple Python script (provided within the guide).

Efficient PDF Conversion with unoconv

When faced with the task of converting thousands of documents into PDF format for online portal access, tools like unoconv can streamline the process. This article outlines a step-by-step approach to mass conversion using LibreOffice, unoconv, and basic Python scripting on the Linux platform.

Prerequisites

Before proceeding with the conversion process, ensure the following software is installed:

LibreOffice (Download from here)
Unoconv (Installation instructions)
Python 2.7.12 (Installation guidelines)
py-unoconv-batch-recursive (Available on Github)

Installing Py-unoconv-batch-recursive

To set up py-unoconv-batch-recursive, clone the repository to a convenient location on your system, like /<somewhere-easy>/py-unoconv-batch-recursive. Execute the following command in a terminal window:

git clone https://github.com/enovision/py-unoconv-batch-recursive.git

Navigate to the root folder containing the documents to convert (e.g., /media/somewhere/CD-Data) and run the Python script:

python /tmp/py-unoconv-batch-recursive/recursive-pdf-converter.py --in="/media/somewhere/CD-Data"

If the --in parameter is omitted, the script uses the path where the Python script is located as the root directory for conversion.

By default, the program processes documents in formats like docx, doc, rtf, otf, and txt. To specify alternate file extensions, use the --ext parameter:

--ext="doc docx yyy zzz"

The script traverses all subfolders from the root directory, converting files and appending '.pdf' to the original filenames. This prevents filename clashes during conversions. While an --out option exists, it currently serves no purpose.

Unoconv

Unoconv facilitates file format conversions via the command line. The method of conversion using unoconv-LibreOffice ensures the resultant PDFs are rendered as layered documents, preserving text and layout integrity.

These PDF outputs are ideal for integration with tools like ext-pdf-viewer, an Ext JS package leveraging Mozilla's pdf.js library.

Conclusion

Despite its simplistic nature, the Python script performed efficiently during the conversion of 4500 documents. On average, the process completed within 30 minutes on a standard laptop (as in 2019). The script incorporates a 20-second delay to allow the unoconv listener to initialize, ensuring optimal performance. Upon completion, the listener is terminated, and a "Done" message signals the conclusion of the program.

Johan van de Merwe

More from same category

	How to install Bitnami Gitlab on a VMWare ESX Server and make SMTP mail work Software Tools 08 Apr 2015 A walkthrough on installing Bitnami Gitlab on a VMWare ESX server and make your smtp email work.
	Review Sencha Architect 3, a mixed bag of feelings Ext JS Tools 02 Dec 2013 Sencha Architect is presented as the ultimate tool for developing HTML5 applications. Time for an honest and independent review.
	Drawing flowchart diagrams online, free and paid Tools 01 Jul 2013 Sometimes you like to draw flowcharts or some other diagrams to support your documentation. Nowadays there are many online services where you can even...
	How to dump MySQL databases with PHPBU and transfer them with secured ftp Tools 18 Oct 2019 PHPBU is not a very sexy name for a great utility that helps you to manage your backups, compress them, organize them or even transfers them. And all...
	Fast way for unzipping large libraries and frameworks on your ftp server Tools 07 Oct 2013 Moving an unpacked large library or frameworks to a remote ftp server can take a long time. You can do this much faster with this small utility.
	Freeware font viewer tools for Windows and Linux Software Tools 14 Apr 2019 Babelmap is a Windows freeware tool to show font information and a detailed character map. But also for Linux you will find useful tools for that purp...
	Microsoft Windows 98 nostalgia in a VMWare Player Software Tools 30 Jul 2015 To be able to play 500 Nations from Microsoft I needed Windows 98. I decided to use VMWare Player for this.
	Multi Word document conversion into PDF with Linux and LibreOffice and Unoconv Tools 01 Apr 2019 The article demonstrates how to efficiently convert a large number of documents into PDF format using LibreOffice, Unoconv, and a simple Python script...