Pages in topic:   < [1 2 3 4]
Recommendations for creating glossary from websites?
Thread poster: Miranda Drew
Zea_Mays
Zea_Mays  Identity Verified
Italy
Local time: 19:10
English to German
+ ...
which bot and which texts Aug 21

I used Gemini because you don't need an account, to log in etc. It can even be used in anonimous mode.
Regarding the input, if you don't use a paid version (where they promise data is not used for training and/or other purposes), try to avoid to enter any data that could be confidential and the like.
The question here was how to create glossaries from websites, so since they are publicly available and accessible there is no confidentiality concern.

Zea_Mays wrote:

To sum things up, the steps for creating a first draft for a glossary with chat bots like Gemini, ChatGPT etc., are the following:
- Copy and paste the content of the two versions of the webpage into a txt file, one after the other. This way you don't need to bother about images, formatting etc. You can write before the single versions "English copy" and "German copy" (or what the relevant languages are).
- Tell the bot what it should do (this is called a "prompt") and then paste both versions into the chat window after your prompt. I just used something like "Please create a glossary extracting all technical terms from the x field from the following copy. There is the English follwed by the German version." (x will be the relevant industry). You can specify how the bot should delimit the terms and, if required, definitions, using something like "Use pipes to delimit glossary terms and a colon for the definition." If you don't need definitions, just tell it the bot.
- You'll get a first list and now you can refine the prompt if there is need for.
- Then you can tell the bot to continue with term extraction. You can also ask it to add any relevant terms from the industry.
- Create an Excel glossary: Once completed, copy & paste the list into an Excel file (before, write "Source" in cell A1 and "Target" in cell B1 and in case "Definition" in cell C1), defining the delimiters for the correct segmentation. https://www.google.de/search?q=excel%20define%20delimiter%20for%20copying
- If needed, remove leading/extra spaces in cells: https://www.google.de/search?q=remove%20leading%20spaces%20cells%20excel
- Refine the glossary.
- If you'd like to use the Excel glossary in a CAT tool, create a term base from it: https://www.google.de/search?q=create%20term%20base%20from%20excel

The entire task takes little time (the example above took around 15-20 minutes) and is highly customizable.




[Bearbeitet am 2024-08-21 16:10 GMT]


Masako Johnson
 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 18:10
Member (2013)
English to French
+ ...
So much more Aug 22

Hans Lenting wrote:

Samuel Murray wrote:

PlusTools is a monolingual extractor, and it extracts only frequently occurring terms. It's rather slow since it's a macro that runs in Word (and naturally it works only on files that you can open in Word).


This morning I learned that they (probably Yves) are working on a standalone version of Plustools. It's already available for download.


Indeed Yves has been working on a special utility that is called +Tools too. It's an exe file. It is in Beta. I'll sum up some of what it does:
_Open TM (.txt, tmx) for editing and filtering without placing everything in RAM, it works directly on disk so it's fast and it can open immense TMs, I was able to open a 50 million TU TM and perform search/replace on it and it didn't crash!!!
_Open and edit Glossary (.txt, .tbx) same as above
_Align text (copy paste in boxes, then align)

And it also is a no-headache CAT:
_drag-drop document (docx only)
_Auto TM creation
_Pretranslate with MT or AI BEFORE segmentation (this helps with MT/AI output because it makes the context available.
_Open TM/Glo in secondary window
_Dark mode
_Full control mode
_Ai with presets to steer output (formal, short for subtitling etc.)

All this in a package that would fit on a floppy disk!!! yep, check the size when you download it!!!

As it is in Beta, experienced translators are welcome to try it out and send feedback. Just remember it's not intended to be a Trados, a Memoq or a Wordfast Pro, it's supposed to be a utility tool.

I will release videos on it in the coming weeks.

My bests




[Edited at 2024-08-22 11:10 GMT]


Hans Lenting
expressisverbis
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Mac version Aug 23

Will he make a Mac version too?

 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 18:10
Member (2013)
English to French
+ ...
yes Aug 23

Hans Lenting wrote:

Will he make a Mac version too?


That's his plan, yes.
When is the great question... that will be in the works once the app is out of beta. But it's so tiny that you can run it in a VM with no issues. off-course if you want to work on massive TM, then you need all your hardware.
The CAT function works with barely anything, I'm sure it would run fine on a Windows XP PCwith 1 GB RAM.


expressisverbis
 
Pages in topic:   < [1 2 3 4]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Recommendations for creating glossary from websites?







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Pastey
Your smart companion app

Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.

Find out more »