Pages in topic: [1 2 3 4] > | Recommendations for creating glossary from websites? Thread poster: Miranda Drew
| Miranda Drew Italy Local time: 09:40 Member (2009) Italian to English
What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?
I'm going to use Memoq for the translation of a word document, and the client has a bilingual website. Any suggestions on the easiest way to make a glossary - the client has a lot of technical terms in their business. | | | Mario Chávez United States Local time: 03:40 Member (Jun 2024) English to Spanish + ... Skeptical about terminology extraction but... | Aug 15 |
Thanks to a phenomenon called polysemy, most languages record more than one meaning to a single term. I used memoQ for complex technical translations of CAD software intended for architects and others in the construction industries. I had to make several entries for a single term because the meaning and context were different. I've never used terminology extraction tools before, instead opting for applying my expertise in the field and disciplined instinct to compare read bilingual docume... See more Thanks to a phenomenon called polysemy, most languages record more than one meaning to a single term. I used memoQ for complex technical translations of CAD software intended for architects and others in the construction industries. I had to make several entries for a single term because the meaning and context were different. I've never used terminology extraction tools before, instead opting for applying my expertise in the field and disciplined instinct to compare read bilingual documents to see how a term was rendered in the foreign language.
Your client's bilingual website may be a pain to navigate, it may have been translated by different people, who in turn used different terms for the same thing. Does your client want you to stick closely to the translations in their bilingual website? Do you have leeway to use better industry terms? Is your project's volume large enough to warrant building a client-specific termbase? I love memoQ's features to build and maintain termbases, by the way, and I exploit them as much as possible. Another question to ponder: do you have enough time to research terms?
Even if you use a terminology extraction tool (free or paid), I bet you'll have a time-consuming task ahead of you. I wouldn't touch any AI application to do any term extraction at all.
Miranda Drew wrote:
What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?
I'm going to use Memoq for the translation of a word document, and the client has a bilingual website. Any suggestions on the easiest way to make a glossary - the client has a lot of technical terms in their business. ▲ Collapse | | | Miranda Drew Italy Local time: 09:40 Member (2009) Italian to English TOPIC STARTER Thanks for mansplaining that | Aug 15 |
I've been a translator for 20 years. I know how to check terminology. Your answer is extremely condescending and completely useless. | | | Jorge Payan Colombia Local time: 02:40 Member (2002) German to Spanish + ...
Miranda Drew wrote:
What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?
I'm going to use Memoq for the translation of a word document, and the client has a bilingual website. Any suggestions on the easiest way to make a glossary - the client has a lot of technical terms in their business.
I would use HTTrack or a similar tool to download the two versions of your customer's website to my PC.
Then, I would align both versions and create a TM in TMX format.
Finally, I would use Synchroterm to extract and curate bilingual terms and expressions from that TM, producing the desired glossary.
So far, I haven't found any AI-driven tool specifically for this bilingual term extraction process; I'm sure someone is working on it.
[Edited at 2024-08-16 05:06 GMT] | |
|
|
Jorge Payan Colombia Local time: 02:40 Member (2002) German to Spanish + ... | Philippe Locquet Portugal Local time: 08:40 Member (2013) English to French + ... Synchroterm or Copilot | Aug 16 |
Jorge Payan wrote:
Finally, I would use Synchroterm to extract and curate bilingual terms and expressions from that TM, producing the desired glossary.
Synchroterm is very good at terminology extraction. I find the best results when used on "Bitext" that are produced by Align Factory (from the same developpers, Terminotix). Align Factory also allows to align directly from the URL if the website doesn't have scraping protection.
Jorge Payan wrote:
So far, I haven't found any AI-driven tool specifically for this bilingual term extraction process; I'm sure someone is working on it.
AI: You don't necessarily need a tool to do this. If both source and target are found on the same page, then you can write a prompt to ask Copilot to extract source and target terms from the page. Open Copilot and the page as I describe in this video: https://youtu.be/u4nROnmnIxI?si=YAMmP5yQtwif-HhV You will need to accept to work in no memory mode (Copilot shenaningans...) and then ask it to extract the terms and tell it to format it in markdown format, this will allow you to copy and paste it in a table. You'll have a limit though, it will probably do only 20 or so term pairs at a time. If you need more, and your prompt works, try "Search GPT" (paid).
Hope this helps | | | Mario Chávez United States Local time: 03:40 Member (Jun 2024) English to Spanish + ... No need to be unprofessional | Aug 16 |
Have you considered for a moment that the first sentence in my comment was introductory to address the topic, not to address you personally?
At no time my comment was addressing your knowledge or lack thereof. I find your reaction unprofessional and uncalled for.
MC
Miranda Drew wrote:
I've been a translator for 20 years. I know how to check terminology. Your answer is extremely condescending and completely useless. | | | Miranda Drew Italy Local time: 09:40 Member (2009) Italian to English TOPIC STARTER You talk down to me and I'm unprofessional? | Aug 16 |
[quote]Mario Chávez wrote:
Have you considered for a moment that the first sentence in my comment was introductory to address the topic, not to address you personally?
At no time my comment was addressing your knowledge or lack thereof. I find your reaction unprofessional and uncalled for.
MC
[quote]Miranda Drew wrote:
I didn't ask for a general treatise on language and translation. I asked a specific question about specific tools. You decided to mansplain things that I think I learned in kindergarten (wow words can have more than one meaning?), give me unsolicited advice and not provide me with anything remotely near an actual answer to my question. You have the right to post whatever you want, but I've dealt with this kind of condescending behavior from men my whole life and I'm not going to be quiet about it anymore, even if that makes me 'unprofessional '. | |
|
|
Dan Lucas United Kingdom Local time: 08:40 Member (2014) Japanese to English
Philippe Locquet wrote:
If you need more, and your prompt works, try "Search GPT" (paid). Hope this helps
It does, thank you. My quote engineering is not very good, so this is very useful.
Do you think it would be possible to use an LLM and two pdfs to extract terms?
I haven't found Synchroterm useful for Japanese so far.
Regards,
Dan | | | | Miranda Drew Italy Local time: 09:40 Member (2009) Italian to English TOPIC STARTER
That looks useful, I'll check it out, thanks | | | Dan Lucas United Kingdom Local time: 08:40 Member (2014) Japanese to English
This is interesting. Unfortunately I tried it with two one-page PDF documents in Japanese and English and it was unable to process the job, several times. The error was not informative. Perhaps it works better with European languages?
But thanks again,
Dan | |
|
|
Philippe Locquet Portugal Local time: 08:40 Member (2013) English to French + ...
Dan Lucas wrote:
Philippe Locquet wrote:
If you need more, and your prompt works, try "Search GPT" (paid). Hope this helps
It does, thank you. My quote engineering is not very good, so this is very useful.
Do you think it would be possible to use an LLM and two pdfs to extract terms?
I haven't found Synchroterm useful for Japanese so far.
Regards,
Dan
If you wish to use AI for this task, something like Chat GPT should work. To engineer your prompt, first, tell the robot what you want from it; and that it will have to wait for you to upload the two files on which the job is to be executed. Then pop both files in, Bob's your uncle!
Hope it works (it should, unless Chat GPT complains about pdf...).
I said Chat GPT, but Claude is very good too with text, they both need slightly different prompt styles, but with some tweaking you should be OK.
Bests,
Philippe | | | | Samuel Murray Netherlands Local time: 09:40 Member (2006) English to Afrikaans + ...
Miranda Drew wrote:
What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?
Do you mean a glossary of terms or a translation memory?
I'm not aware of any tool that can reliably create a list of words in the source language that are likely to be "terms" and then find their translations in the target language. In fact, the biggest problem with what you're proposing is how difficult it is to create a list of source language terms. I've tried some programs that do this in the past, but the results were dismal. These tools either assume that (a) frequently occurring words are "terms" or (b) highly unique words are "terms". This approach may work in languages with compound nouns, but not in e.g. English. Do you think it'll work in Italian?
I have found that the best way to extract terminology from a bilingual website is manually. In other words, create a TM from the website, then load that TM into the CAT tool, and then regularly look up terms initially, and add them to the glossary based on matches from the TM.
I wonder if it would be possible to ask an AI tool to come up with a list of words in the source language that are "likely to be terms". You can then add that list to your glossary (which would be useful even if the glossary entries have no target text).
[Edited at 2024-08-17 16:35 GMT] | | | Pages in topic: [1 2 3 4] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Recommendations for creating glossary from websites? Pastey | Your smart companion app
Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.
Find out more » |
| CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |