A PDF file is meant to look like a printed version of a document. Extraction of text from it can be done with different softwares in the market now. But there’s this problem of format retention. If you simply copy the text from the pdf file, you might see that the text doesn’t have the same format, alignment / layout as it is in the original file.
Here I’ll be sharing how I’ve been extracting text from pdf and at the same time, retaining the format and alignment. The best practice is to convert the pdf file into word rtf. Most softwares out there offer conversion to doc, excel, PowerPoint, epub, html, and plain text. Converting to plain text looks simple, but it might be the worst.
Whenever I want to extract files with the same layout as the original, I use a software – PDF Shaper. PDF shaper is a free software that converts pdf files into Rich Text Format (RTF).
The mages below show comparison between conversion to doc and conversion to rtf. The rtf output looks exactly like the original with the same font style and size, but the doc is in another font style and size.
|in RTF Format|
|in DOC Format|
Other Features of The Software
You can use this program to convert multiple files at a time. You can also choose to ignore images, tables, or shapes in your conversion. And the good part is, it is completely free and light.
You can download pdf shaper here