scanned PDF files converted to Word files (Software applications)

Technical forums » Software applications »
scanned PDF files converted to Word files
Track this topic

scanned PDF files converted to Word files

Thread poster: Emilia Delibasheva

Emilia Delibasheva

Local time: 08:04
English to Bulgarian
+ ...

Feb 2, 2013

Hello,

I have a large volume of PDF files and I have to edit them. I did some research work on the Internet and realized that there was software converting PDF files to Word docs. However, mine are not true PDF files but they are scanned. Is it possible at all to perform such kind of conversion? Thank you.

Walter Landesman

Uruguay
Local time: 02:04
English to Spanish
+ ...

Nop

Feb 2, 2013

No, I don't think so.

Natalie

Poland
Local time: 07:04
Member (2002)
English to Russian
+ ...

Moderator of this forum

SITE LOCALIZER

Of course, it is possible

Feb 2, 2013

Please make a search in this forum - you will find a number of previous threads on the same topic.
Imho, the best software for doing this is FineReader, though there exists many other programs that do the same.

Triston Goodwin

United States
Local time: 23:04
Spanish to English
+ ...

You sure can!

Feb 2, 2013

Natalie wrote:

Please make a search in this forum - you will find a number of previous threads on the same topic.
Imho, the best software for doing this is FineReader, though there exists many other programs that do the same.

Here's a link to the finereader, since it can be a little tricky to find sometimes http://www.abbyy.com/

I haven't had that much luck with these programs, since they don't catch accent marks, so I typically just translate directly or use dragon and read it over first.

If the image is clean and the program set up right, you shouldn't have to much trouble though.

Michel de Ruyter

Finland
Local time: 08:04
Member (2011)
English to Dutch
+ ...

here for example:

Feb 2, 2013

http://www.proz.com/forum/wordfast_support/195890-wordfast_anywhere_announces_support_for_scanned_pdfs.html

Emilia Delibasheva

Local time: 08:04
English to Bulgarian
+ ...

TOPIC STARTER

Thanks

Feb 2, 2013

Thank you all very much!

finnword1
United States
Local time: 01:04
English to Finnish
+ ...

OCR

Feb 2, 2013

I use a separate OCR program. I can then make necessary adjustments, depending on the quality of the scanner image.

Angelique Blommaert

Netherlands
Local time: 07:04
German to Dutch
+ ...

Works for me

Feb 2, 2013

FineReader is what I use.

Emilia Delibasheva

Local time: 08:04
English to Bulgarian
+ ...

TOPIC STARTER

Thanks

Feb 3, 2013

Thank you all.

Emma Goldsmith

Spain
Local time: 07:04
Member (2004)
Spanish to English

Quality of scanned pdf

Feb 3, 2013

Triston Goodwin wrote:

I haven't had that much luck with these programs, since they don't catch accent marks

If you set the language correctly before you OCR the document, Abbyy Findreader and other programs should certainly catch accents.

Of course, much depends on the quality of the scanned PDF. If you have a lot of background noise (a vertical line crossing through all pages, stamps placed on top of text, etc.) then no program will be able to decipher what the text says. But real people might not be able to in that case, either!

FarkasAndras

Local time: 07:04
English to Hungarian
+ ...

They work

Feb 3, 2013

Emma Goldsmith wrote:

Triston Goodwin wrote:

I haven't had that much luck with these programs, since they don't catch accent marks

If you set the language correctly before you OCR the document, Abbyy Findreader and other programs should certainly catch accents.

Backed. I used to think that OCR was pretty much unusable, esp. with languages with accented characters. This might have been the case a decade ago, but it is definitely not any more. They use very smart algorithms to determine what each character might logically be and do a somewhat decent job of formatting. As an example, ABBYY consistently gets u, ü and ű right. Maybe it prints an ü instead of an ű or an o instead of an ö in one case out of a thousand. Whey you look it up in the source text you're likely to find that the image quality was abysmal at that spot. That said, for translation, it's generally better to use a setting that does not conserve much of the formatting and format the output text at the end. Otherwise, you end up with text boxes all over the place and mis-recognized headers and so on.
ABBYY Finereader recognizes Hungarian text pretty much perfectly, even if the image quality leaves a lot to be desired. I'm impressed.

[Edited at 2013-02-03 09:57 GMT]

Rolf Keller
Germany
Local time: 07:04
English to German

Catch false/missing accents etc. automatically

Feb 3, 2013

FarkasAndras wrote:

As an example, ABBYY consistently gets u, ü and ű right. Maybe it prints an ü instead of an ű or an o instead of an ö in one case out of a thousand.

Just get a good spellchecker for that language to run over the ocr'ed text.

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

scanned PDF files converted to Word files

Forum rules

Help and orientation

Pastey
Your smart companion app Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations. Find out more »

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

scanned PDF files converted to Word files

scanned PDF files converted to Word files

You have native languages that can be verified

Your current localization setting

Select a language