Pages in topic: [1 2] > | Poll: Who should be in charge of using an OCR tool to prepare the source text for translation? Thread poster: ProZ.com Staff
|
This forum topic is for the discussion of the poll question "Who should be in charge of using an OCR tool to prepare the source text for translation?".
This poll was originally submitted by Amanda DesJardins. View the poll results »
| | |
I can't think of anything offensive to say about this poll.
If you have a problem with that, please do have the courtesy to take me to task publicly or privately rather than going behind my back and getting a moderator to do it for you.
And before I get deleted again for breach of rule 645.4(b)(ii): The project manager, of course. | | | neilmac Spain Local time: 21:03 Spanish to English + ... I can, but won't | May 24, 2016 |
Chris S wrote:
I can't think of anything offensive to say about this poll.
If you have a problem with that, please do have the courtesy to take me to task publicly or privately rather than going behind my back and getting a moderator to do it for you.
And before I get deleted again for breach of rule 645.4(b)(ii): The project manager, of course.
I invariably find your comments amusing and in general think we need a bit more "offensiveness" and calling out of BS on the site.
My issue with this particular poll is that it assumes too many things. For example, that source texts should need to be OCR'd in the first place. | | |
When working through an agency, the PM.
When working with an end-customer, the translator, so that you have control over how good and translator-friendly the OCR output is. And it's an extra item to charge or include in your fee.
Philippe | |
|
|
Post removed: This post was hidden by a moderator or staff member because it was not in line with site rule |
Ideally someone who understands the language...
Preferably the project manager, but if the result is going to be a dog's dinner of garbled Greek and mangled formatting, then I would rather do it myself.
Or simply translate from the PDF and accept that OCR cannot cope with everything.
Danish has three extra letters compared with English, and even some of the ordinary brackets and other things get mangled by OCR, so it may be useless anyway. I dread to think ... See more Ideally someone who understands the language...
Preferably the project manager, but if the result is going to be a dog's dinner of garbled Greek and mangled formatting, then I would rather do it myself.
Or simply translate from the PDF and accept that OCR cannot cope with everything.
Danish has three extra letters compared with English, and even some of the ordinary brackets and other things get mangled by OCR, so it may be useless anyway. I dread to think what it makes of languages like Czech or Polish.
Of course, if I do the OCR, I charge for my time!
One client actually re-typed 4000 words for me, and sent me the first three pages for approval (and so I could get started) while she typed the rest. Now THAT was a person who understood the problem. There were very few typos, AND she paid my top rate for the translation. I'm not allowed to name her here, but Hola! I still remember you!
Still chuckling about your comments yesterday, Chris! At least you were in good company - it is not every day that Jack's comments get deleted.
[Edited at 2016-05-24 09:27 GMT] ▲ Collapse | | | Theory vs. practice | May 24, 2016 |
While in theory it's the PM who should perform OCR, in practice I have yet to see a PM (or PM's technician) who would use OCR tools properly. In fact, I've been toying with an idea to offer a course in OCR to translation agencies.
Actually, just a couple of days ago a PM proudly sent me a Cyrillic text OCRed into the Latin alphabet.
[Edited at 2016-05-24 09:45 GMT] | | | Thomas Pfann United Kingdom Local time: 20:03 Member (2006) English to German + ...
In those rare cases where an OCR tool is needed to prepare the source text for translation, it should be done by whoever gets paid to do it. | |
|
|
Muriel Vasconcellos (X) United States Local time: 12:03 Spanish to English + ... The PM - or just skip it | May 24, 2016 |
If the OCR tool is sophisticated and the PM knows how to use it and cleans up the text, then it's a blessing to have an electronic copy to work with, but I've been faced with many OCR'd texts that were impossibly chopped up, with every few words clumped into separate text boxes. Plus, the margins and indents are almost always weird. Much as I hate PDFs, there are times when they would be easier than working with OCR output.
Just this month I had a 74-page OCR and my client arranged ... See more If the OCR tool is sophisticated and the PM knows how to use it and cleans up the text, then it's a blessing to have an electronic copy to work with, but I've been faced with many OCR'd texts that were impossibly chopped up, with every few words clumped into separate text boxes. Plus, the margins and indents are almost always weird. Much as I hate PDFs, there are times when they would be easier than working with OCR output.
Just this month I had a 74-page OCR and my client arranged to have it digitized for me - I assume, using OCR. So, if the client knows how to to it, maybe that's even better, as they will gain some respect for the challenges we have to face. ▲ Collapse | | | Other - open for negotiation | May 24, 2016 |
It depends on who is better equipped to do it. We all want to have the best possible outcome.
One agency I work for has spared no money in getting the best OCR software available, and their PM skilled in making it work. Now and then I find a couple of typos in, say, ten pages, and she'll admit they got it in hard copy, scanned it, and did OCR. To me, it looked like an original DOC file.
The most typical OCR flag is mixing "rn" (RN) and "m" (M) when a spell checker will ... See more It depends on who is better equipped to do it. We all want to have the best possible outcome.
One agency I work for has spared no money in getting the best OCR software available, and their PM skilled in making it work. Now and then I find a couple of typos in, say, ten pages, and she'll admit they got it in hard copy, scanned it, and did OCR. To me, it looked like an original DOC file.
The most typical OCR flag is mixing "rn" (RN) and "m" (M) when a spell checker will accept both as valid words, e.g. gaRNer and gaMer.
Some other clients have worse OCR software than mine, so I do it. No point in charging for it, because it's a relatively quick process that can be done in the background... while my computer (Pentium D - 2.8 GHz) is supposedly NOT a speed demon, and runs under Windows XP. From what I've seen so far, doing it with an i5 under Windows 10 should be an uphill drag.
For direct clients, I know that it will be better for me if I do it, as part of the job.
The MAJOR problem in such cases is not OCR, but scanning!
Now and then I hear a lawyer's secretary rejoicing over the phone, saying "don't know why, but our scanner is sooo much faster today, that I'm sending you a PDF instead of a messenger with that 60-page contract". Whenever I hear something like this, I already know what I'll be getting... a PDF scanned at 72 DPI. On the Acrobat Reader screen I'll see unreadable spots that a Bulgarian woman would call "kukunikas" - no idea on whether she made up this word, or if it means anything in her language.
Have you ever seen the reverse side of a generic maritime bill of lading? It contains about 5,000 words within one letter-sized page! I scanned it at 600 dpi, and OCR via OmniPage was 100% perfect. I found each and every mistake also present in the original text. ▲ Collapse | | | Mario Chavez (X) Local time: 15:03 English to Spanish + ... Say it ain't so! | May 24, 2016 |
Chris S wrote:
I can't think of anything offensive to say about this poll.
If you have a problem with that, please do have the courtesy to take me to task publicly or privately rather than going behind my back and getting a moderator to do it for you.
And before I get deleted again for breach of rule 645.4(b)(ii): The project manager, of course.
Welcome to the club, Chris. Telling someone face to face (or by other available means, like the phone, letter, or email) what we think of him/her or his/her statements is not only honest. It's good manners. | | | Mario Chavez (X) Local time: 15:03 English to Spanish + ... Extra item to charge | May 24, 2016 |
Philippe Etienne wrote:
When working through an agency, the PM.
When working with an end-customer, the translator, so that you have control over how good and translator-friendly the OCR output is. And it's an extra item to charge or include in your fee.
Philippe
In agreement.
Most times, I do the OCR for free as part of the whole enchilada because I have an established relationship with the client.
Other times, I have to persuade the customer or project manager not to use OCR to provide me with the text because a) I can do it better, b) I can type it faster and better and/or c) I need the native (original) files. This scenario happens with PDF files and some well-meaning clients (or PMs) think they're doing you a favor by OCR'ing it for you. | |
|
|
Mario Chavez (X) Local time: 15:03 English to Spanish + ... OCR technologies are a mixed bag | May 24, 2016 |
If the document is in PDF format and the layout is very simple (one column, few if any text boxes or lists), then I'll accept the PM, client or their grandma to send me an OCR'd copy to work with. Otherwise I politely refuse their misguided efforts.
OCR technologies look like magic to the uninitiated or to whomever doesn't know how character recognition works, let alone when it is applied to foreign languages and complex layouts. By complex layout I mean anything more than on... See more If the document is in PDF format and the layout is very simple (one column, few if any text boxes or lists), then I'll accept the PM, client or their grandma to send me an OCR'd copy to work with. Otherwise I politely refuse their misguided efforts.
OCR technologies look like magic to the uninitiated or to whomever doesn't know how character recognition works, let alone when it is applied to foreign languages and complex layouts. By complex layout I mean anything more than one column and two typefaces. ▲ Collapse | | | Edward Potter Spain Local time: 21:03 Member (2003) Spanish to English + ... Not a big fan of OCRing | May 24, 2016 |
I'm a touch typist and I go quite fast with a good old hard copy of my PDF.
Many times a CAT slows me down since I have to double check what automatically gets put in my target field. Add the OCR manipulation of the text, I lose even more time. And then, there are the inevitable defects from the OCR conversion.
OCR has its place, but more often than not it doesn't do me a lot of good. | | | Katrin Bosse (X) Germany Local time: 21:03 Dutch to German + ...
Thomas Pfann wrote:
In those rare cases where an OCR tool is needed to prepare the source text for translation, it should be done by whoever gets paid to do it.
It's a job and it has to be done correctly so - yes! | | | Pages in topic: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Poll: Who should be in charge of using an OCR tool to prepare the source text for translation? Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |