Pages in topic:   [1 2 3 4] >
Recommendations for creating glossary from websites?
Thread poster: Miranda Drew
Miranda Drew
Miranda Drew  Identity Verified
Italy
Local time: 09:40
Member (2009)
Italian to English
Aug 15

What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?

I'm going to use Memoq for the translation of a word document, and the client has a bilingual website. Any suggestions on the easiest way to make a glossary - the client has a lot of technical terms in their business.


jules fumaz (X)
annabernardi
 
Mario Chávez
Mario Chávez
United States
Local time: 03:40
Member (Jun 2024)
English to Spanish
+ ...
Skeptical about terminology extraction but... Aug 15

Thanks to a phenomenon called polysemy, most languages record more than one meaning to a single term. I used memoQ for complex technical translations of CAD software intended for architects and others in the construction industries. I had to make several entries for a single term because the meaning and context were different. I've never used terminology extraction tools before, instead opting for applying my expertise in the field and disciplined instinct to compare read bilingual docume... See more
Thanks to a phenomenon called polysemy, most languages record more than one meaning to a single term. I used memoQ for complex technical translations of CAD software intended for architects and others in the construction industries. I had to make several entries for a single term because the meaning and context were different. I've never used terminology extraction tools before, instead opting for applying my expertise in the field and disciplined instinct to compare read bilingual documents to see how a term was rendered in the foreign language.

Your client's bilingual website may be a pain to navigate, it may have been translated by different people, who in turn used different terms for the same thing. Does your client want you to stick closely to the translations in their bilingual website? Do you have leeway to use better industry terms? Is your project's volume large enough to warrant building a client-specific termbase? I love memoQ's features to build and maintain termbases, by the way, and I exploit them as much as possible. Another question to ponder: do you have enough time to research terms?

Even if you use a terminology extraction tool (free or paid), I bet you'll have a time-consuming task ahead of you. I wouldn't touch any AI application to do any term extraction at all.

Miranda Drew wrote:

What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?

I'm going to use Memoq for the translation of a word document, and the client has a bilingual website. Any suggestions on the easiest way to make a glossary - the client has a lot of technical terms in their business.
Collapse


Herbspro (X)
 
Miranda Drew
Miranda Drew  Identity Verified
Italy
Local time: 09:40
Member (2009)
Italian to English
TOPIC STARTER
Thanks for mansplaining that Aug 15

I've been a translator for 20 years. I know how to check terminology. Your answer is extremely condescending and completely useless.

Jennifer Levey
Grace Anderson
Emanuele Vacca
Ines Radionovas-Lagoutte, PhD
 
Jorge Payan
Jorge Payan  Identity Verified
Colombia
Local time: 02:40
Member (2002)
German to Spanish
+ ...
My suggestion Aug 16

Miranda Drew wrote:

What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?

I'm going to use Memoq for the translation of a word document, and the client has a bilingual website. Any suggestions on the easiest way to make a glossary - the client has a lot of technical terms in their business.


I would use HTTrack or a similar tool to download the two versions of your customer's website to my PC.

Then, I would align both versions and create a TM in TMX format.

Finally, I would use Synchroterm to extract and curate bilingual terms and expressions from that TM, producing the desired glossary.

So far, I haven't found any AI-driven tool specifically for this bilingual term extraction process; I'm sure someone is working on it.

[Edited at 2024-08-16 05:06 GMT]


expressisverbis
Philippe Locquet
 
Jorge Payan
Jorge Payan  Identity Verified
Colombia
Local time: 02:40
Member (2002)
German to Spanish
+ ...
You can also use memoQ for the final step Aug 16

Jorge Payan wrote:

Finally, I would use Synchroterm to extract and curate bilingual terms and expressions from that TM, producing the desired glossary.


Alternatively, see https://helpcenter.memoq.com/hc/en-us/articles/360010378139-Setting-up-and-performing-a-term-extraction-in-memoQ

However, I am not sure if that capability is included in memoQ Pro edition; I know it exists in the PM edition.


Miranda Drew
 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 08:40
Member (2013)
English to French
+ ...
Synchroterm or Copilot Aug 16

Jorge Payan wrote:
Finally, I would use Synchroterm to extract and curate bilingual terms and expressions from that TM, producing the desired glossary.


Synchroterm is very good at terminology extraction. I find the best results when used on "Bitext" that are produced by Align Factory (from the same developpers, Terminotix). Align Factory also allows to align directly from the URL if the website doesn't have scraping protection.

Jorge Payan wrote:
So far, I haven't found any AI-driven tool specifically for this bilingual term extraction process; I'm sure someone is working on it.


AI: You don't necessarily need a tool to do this. If both source and target are found on the same page, then you can write a prompt to ask Copilot to extract source and target terms from the page. Open Copilot and the page as I describe in this video: https://youtu.be/u4nROnmnIxI?si=YAMmP5yQtwif-HhV You will need to accept to work in no memory mode (Copilot shenaningans...) and then ask it to extract the terms and tell it to format it in markdown format, this will allow you to copy and paste it in a table. You'll have a limit though, it will probably do only 20 or so term pairs at a time. If you need more, and your prompt works, try "Search GPT" (paid).

Hope this helps


Miranda Drew
Jorge Payan
expressisverbis
 
Mario Chávez
Mario Chávez
United States
Local time: 03:40
Member (Jun 2024)
English to Spanish
+ ...
No need to be unprofessional Aug 16

Have you considered for a moment that the first sentence in my comment was introductory to address the topic, not to address you personally?

At no time my comment was addressing your knowledge or lack thereof. I find your reaction unprofessional and uncalled for.

MC

Miranda Drew wrote:

I've been a translator for 20 years. I know how to check terminology. Your answer is extremely condescending and completely useless.


julfiker1
Wolfgang Schoene
 
Miranda Drew
Miranda Drew  Identity Verified
Italy
Local time: 09:40
Member (2009)
Italian to English
TOPIC STARTER
You talk down to me and I'm unprofessional? Aug 16

[quote]Mario Chávez wrote:

Have you considered for a moment that the first sentence in my comment was introductory to address the topic, not to address you personally?

At no time my comment was addressing your knowledge or lack thereof. I find your reaction unprofessional and uncalled for.

MC

[quote]Miranda Drew wrote:


I didn't ask for a general treatise on language and translation. I asked a specific question about specific tools. You decided to mansplain things that I think I learned in kindergarten (wow words can have more than one meaning?), give me unsolicited advice and not provide me with anything remotely near an actual answer to my question. You have the right to post whatever you want, but I've dealt with this kind of condescending behavior from men my whole life and I'm not going to be quiet about it anymore, even if that makes me 'unprofessional '.


Jennifer Levey
Ines Radionovas-Lagoutte, PhD
Susanne Toito
 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 08:40
Member (2014)
Japanese to English
And with files Aug 16

Philippe Locquet wrote:
If you need more, and your prompt works, try "Search GPT" (paid). Hope this helps

It does, thank you. My quote engineering is not very good, so this is very useful.
Do you think it would be possible to use an LLM and two pdfs to extract terms?
I haven't found Synchroterm useful for Japanese so far.

Regards,
Dan


 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
SketchEngine Aug 16

SketchEngine does offer bilingual term extraction, including for non aligned data, without relying on AI:

https://www.sketchengine.eu/guide/bilingual-term-extraction/


Miranda Drew
Dan Lucas
Zea_Mays
 
Miranda Drew
Miranda Drew  Identity Verified
Italy
Local time: 09:40
Member (2009)
Italian to English
TOPIC STARTER
Looks interesting Aug 16

Jean Dimitriadis wrote:

SketchEngine does offer bilingual term extraction, including for non aligned data, without relying on AI:

https://www.sketchengine.eu/guide/bilingual-term-extraction/


That looks useful, I'll check it out, thanks


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 08:40
Member (2014)
Japanese to English
Unsuccessful Aug 16

Jean Dimitriadis wrote:
SketchEngine does offer bilingual term extraction, including for non aligned data, without relying on AI:
https://www.sketchengine.eu/guide/bilingual-term-extraction/

This is interesting. Unfortunately I tried it with two one-page PDF documents in Japanese and English and it was unable to process the job, several times. The error was not informative. Perhaps it works better with European languages?

But thanks again,
Dan


 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 08:40
Member (2013)
English to French
+ ...
Chat GPT Aug 16

Dan Lucas wrote:

Philippe Locquet wrote:
If you need more, and your prompt works, try "Search GPT" (paid). Hope this helps

It does, thank you. My quote engineering is not very good, so this is very useful.
Do you think it would be possible to use an LLM and two pdfs to extract terms?
I haven't found Synchroterm useful for Japanese so far.

Regards,
Dan


If you wish to use AI for this task, something like Chat GPT should work. To engineer your prompt, first, tell the robot what you want from it; and that it will have to wait for you to upload the two files on which the job is to be executed. Then pop both files in, Bob's your uncle!

Hope it works (it should, unless Chat GPT complains about pdf...).
I said Chat GPT, but Claude is very good too with text, they both need slightly different prompt styles, but with some tweaking you should be OK.

Bests,
Philippe


Dan Lucas
Zea_Mays
 
Cilian O'Tuama
Cilian O'Tuama  Identity Verified
Germany
Local time: 09:40
German to English
+ ...
ChatGPT... Aug 17

... is hardly a solution to anything, or?

https://www.proz.com/forum/machine_translation_mt/366985-automate_commands_to_chatgpt_windows-page2.html

Be well.

Maybe it improves.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 09:40
Member (2006)
English to Afrikaans
+ ...
@Miranda Aug 17

Miranda Drew wrote:
What programs or software or AI (free or for pay) would you recommend for creating a glossary from an existing bilingual website?

Do you mean a glossary of terms or a translation memory?

I'm not aware of any tool that can reliably create a list of words in the source language that are likely to be "terms" and then find their translations in the target language. In fact, the biggest problem with what you're proposing is how difficult it is to create a list of source language terms. I've tried some programs that do this in the past, but the results were dismal. These tools either assume that (a) frequently occurring words are "terms" or (b) highly unique words are "terms". This approach may work in languages with compound nouns, but not in e.g. English. Do you think it'll work in Italian?

I have found that the best way to extract terminology from a bilingual website is manually. In other words, create a TM from the website, then load that TM into the CAT tool, and then regularly look up terms initially, and add them to the glossary based on matches from the TM.

I wonder if it would be possible to ask an AI tool to come up with a list of words in the source language that are "likely to be terms". You can then add that list to your glossary (which would be useful even if the glossary entries have no target text).


[Edited at 2024-08-17 16:35 GMT]


 
Pages in topic:   [1 2 3 4] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Recommendations for creating glossary from websites?







Pastey
Your smart companion app

Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.

Find out more »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »