Should I OCR this document? Thread poster: Mark Connolly
|
I have turned down jobs before because clients tell me not to OCR a pdf document. This time I accepted the job before getting the instructions and work is thin on the ground. I turns out the document is full of tables that OCR beautifully.
I always OCR without a format, could I get away with it? | | | Kevin Fulton United States Local time: 22:47 German to English I don't see why not | Jun 3, 2018 |
To be honest, I don't understand why a client might not want you to use OCR on a file. After all, how you produce a usable intermediate (i.e. working) document is your business.
However, using OCR isn't always trouble-free.
One problem with using OCR on PDF files is that all sorts of artifacts including hidden tags can be embedded in the converted file which then interfere with successful formatting in Word, for example. There are various utilities available, such as Co... See more To be honest, I don't understand why a client might not want you to use OCR on a file. After all, how you produce a usable intermediate (i.e. working) document is your business.
However, using OCR isn't always trouble-free.
One problem with using OCR on PDF files is that all sorts of artifacts including hidden tags can be embedded in the converted file which then interfere with successful formatting in Word, for example. There are various utilities available, such as Code Zapper, or TransTools Suite which "clean up" such artifacts and help regularize fonts and spacing. Another issue is faulty character recognition – although a character or word may appear legible to the human eye, it might be misinterpreted during the OCR process. Again, a careful reading of the output document should help eliminate such errors.
If you are using a CAT tool, you don't have many alternatives to using OCR, apart from INFIX, which results in reproducing a translated PDF file after using a CAT tool.
Using OCR to reproduce tables makes perfect sense to me, assuming the process doesn't introduce spacing or formatting errors.
You might ask the client regarding the instruction not to use OCR. It's possible that the client uses DTP and hidden embedded tags interfere with the process. As mentioned above, there are utilities that remedy this issue. ▲ Collapse | | | finnword1 United States Local time: 22:47 English to Finnish + ... ignorant clients | Jun 3, 2018 |
Ask them to send you the material in text or Word document or to OCR the material themselves. | | | Germaine Canada Local time: 22:47 English to French + ... Agree with Kevin | Jun 3, 2018 |
Using Adobe Acrobat (Standard), you can simply "save as" the pdf in one of the various format offered, including Word and Excel and most of the time, there's little word processing to do. An OCR (EN+FR) is also included, should the pdf be a scan.
Sure, the software is pricey at first, but upgrades (and you don't have to buy each and everyone) are more affordable. See it as an investment. You'll be surprised by all you can do with it (and even more with Adobe Acrobat Pro). I started... See more Using Adobe Acrobat (Standard), you can simply "save as" the pdf in one of the various format offered, including Word and Excel and most of the time, there's little word processing to do. An OCR (EN+FR) is also included, should the pdf be a scan.
Sure, the software is pricey at first, but upgrades (and you don't have to buy each and everyone) are more affordable. See it as an investment. You'll be surprised by all you can do with it (and even more with Adobe Acrobat Pro). I started with version 4 and I am now using version X. I never regretted buying it. It has been worth every cent!
P.S.: should you buy it, don't forget to install the pdf printer. You'll get better pdfs by "printing" your Word/Excel documents than "saving as". ▲ Collapse | |
|
|
Tom in London United Kingdom Local time: 03:47 Member (2008) Italian to English I agree with F | Jun 4, 2018 |
finnword1 wrote:
Ask them to send you the material in text or Word document or to OCR the material themselves.
Finnword's suggestion is the correct one. | | | LEXpert United States Local time: 21:47 Member (2008) Croatian to English + ... Be careful what you wish for | Jun 4, 2018 |
Tom in London wrote:
finnword1 wrote:
Ask them to send you the material in text or Word document or to OCR the material themselves.
Finnword's suggestion is the correct one.
That often results in a slipshod effort yielding tag soup and horrible segmentation that costs you more than time than it saves, especially since, if the client is going to go through the trouble of OCRing for you, they're going to figure that they might as well run it through their CAT tool and knock your price down a bit. 9 times out of 10, I can do a much better job of OCRing a file than the client can. | | | Definitely true! | Jun 4, 2018 |
LEXpert wrote:
9 times out of 10, I can do a much better job of OCRing a file than the client can.
I always wonder why clients - particularly agencies - "lie" about having done (horrible) OCR work.
They send me a table with a sea of typos, I ask them for the original file, and they say it's all they've got.
Later they ask me to proofread a laid-out PDF to check whether they've put all my translations in the right places. | | | DZiW (X) Ukraine English to Russian + ... extra work = extra charge | Jun 4, 2018 |
Sometimes I use FreeTM.com (free WordFast Anywhere), which can convert not very complicated or bizarre PDFs to email box, otherwise I have to use FineReader. Anyway, I do charge for this, because it takes more time and efforts to make the text ok.
Most clients know very little even regarding the final translation, so many just aren't aware of an editable document, types of PDF/DJVU and why OCR/DTP at all. In this view, translators wor... See more Sometimes I use FreeTM.com (free WordFast Anywhere), which can convert not very complicated or bizarre PDFs to email box, otherwise I have to use FineReader. Anyway, I do charge for this, because it takes more time and efforts to make the text ok.
Most clients know very little even regarding the final translation, so many just aren't aware of an editable document, types of PDF/DJVU and why OCR/DTP at all. In this view, translators work as mentors and educators, teaching the ABC.
Shortly, clients don't know why exactly they must pay for something not asked for. When I had a similar issue and asked for an editable copy, my client insisted the file must be intact and very reluctantly sent me a password to unprotect the PDF. I had to explain to him again that a scanned PDF is no different with or without a password for it's but a set of images, no text. He was surprised and wondered whether translation involves reading a hardcopy or from the screen. I was ready to cancel the deal, when he suddenly replied he understood the problem--he could only view the file as photos without selection word or making remarks... Finally he sent me the original DOC and once more he was dumbfounded by a question which final format was required--DOC, PDF or some other... Yes, as far as there were charts and I didn't want to mess with explaining about ZIP/RAR and sent him a DOC, an RTF, and a searchable PDF... He was puzzled and asked whether he had to pay threefold)
Why, I believe it's much better than "a plain DOC file without tables and graphics", which turned to be a DOC with scanned handwriting. ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Should I OCR this document? Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |