scanned PDF files converted to Word files Thread poster: Emilia Delibasheva
|
Hello,
I have a large volume of PDF files and I have to edit them. I did some research work on the Internet and realized that there was software converting PDF files to Word docs. However, mine are not true PDF files but they are scanned. Is it possible at all to perform such kind of conversion? Thank you. | | |
|
Natalie Poland Local time: 15:39 Member (2002) English to Russian + ... Moderator of this forum SITE LOCALIZER Of course, it is possible | Feb 2, 2013 |
Please make a search in this forum - you will find a number of previous threads on the same topic.
Imho, the best software for doing this is FineReader, though there exists many other programs that do the same. | | |
You sure can! | Feb 2, 2013 |
Natalie wrote:
Please make a search in this forum - you will find a number of previous threads on the same topic.
Imho, the best software for doing this is FineReader, though there exists many other programs that do the same.
Here's a link to the finereader, since it can be a little tricky to find sometimes http://www.abbyy.com/
I haven't had that much luck with these programs, since they don't catch accent marks, so I typically just translate directly or use dragon and read it over first.
If the image is clean and the program set up right, you shouldn't have to much trouble though. | |
|
|
|
Emilia Delibasheva Local time: 16:39 Member (2005) English to Bulgarian + ... TOPIC STARTER |
finnword1 United States Local time: 09:39 English to Finnish + ...
I use a separate OCR program. I can then make necessary adjustments, depending on the quality of the scanner image. | | |
FineReader is what I use. | |
|
|
Emilia Delibasheva Local time: 16:39 Member (2005) English to Bulgarian + ... TOPIC STARTER |
Emma Goldsmith Spain Local time: 15:39 Member (2004) Spanish to English Quality of scanned pdf | Feb 3, 2013 |
Triston Goodwin wrote:
I haven't had that much luck with these programs, since they don't catch accent marks
If you set the language correctly before you OCR the document, Abbyy Findreader and other programs should certainly catch accents.
Of course, much depends on the quality of the scanned PDF. If you have a lot of background noise (a vertical line crossing through all pages, stamps placed on top of text, etc.) then no program will be able to decipher what the text says. But real people might not be able to in that case, either! | | |
Emma Goldsmith wrote:
Triston Goodwin wrote:
I haven't had that much luck with these programs, since they don't catch accent marks
If you set the language correctly before you OCR the document, Abbyy Findreader and other programs should certainly catch accents.
Backed. I used to think that OCR was pretty much unusable, esp. with languages with accented characters. This might have been the case a decade ago, but it is definitely not any more. They use very smart algorithms to determine what each character might logically be and do a somewhat decent job of formatting. As an example, ABBYY consistently gets u, ü and ű right. Maybe it prints an ü instead of an ű or an o instead of an ö in one case out of a thousand. Whey you look it up in the source text you're likely to find that the image quality was abysmal at that spot. That said, for translation, it's generally better to use a setting that does not conserve much of the formatting and format the output text at the end. Otherwise, you end up with text boxes all over the place and mis-recognized headers and so on.
ABBYY Finereader recognizes Hungarian text pretty much perfectly, even if the image quality leaves a lot to be desired. I'm impressed.
[Edited at 2013-02-03 09:57 GMT] | | |
Rolf Keller Germany Local time: 15:39 English to German Catch false/missing accents etc. automatically | Feb 3, 2013 |
FarkasAndras wrote:
As an example, ABBYY consistently gets u, ü and ű right. Maybe it prints an ü instead of an ű or an o instead of an ö in one case out of a thousand.
Just get a good spellchecker for that language to run over the ocr'ed text. | | |