Pages in topic: < [1 2 3 4] > | Recommendations for creating glossary from websites? Thread poster: Miranda Drew
| Monolingual term extraction | Aug 19 |
Also, preliminary monolingual term extraction can yield useful results for determining term candidates and then finding how these terms have been translated on the website. It could have its place here, including for AI text analysis.
For example, one very good online extractor that does not solely rely on statistical analysis (but only for French, English, Italian, Spanish and Portug... See more Also, preliminary monolingual term extraction can yield useful results for determining term candidates and then finding how these terms have been translated on the website. It could have its place here, including for AI text analysis.
For example, one very good online extractor that does not solely rely on statistical analysis (but only for French, English, Italian, Spanish and Portuguese) is Termostat: https://termostat.ling.umontreal.ca (simple free registration needed).
It only accepts TXT, but there are various ways of downloading pages, converting to raw text and concatenating files to create a single one as needed as well as for searching corpora.
The resulting TXT can be renamed to CSV/TSV and opened in a spreadsheet program.
The analysis looks for simple terms (nouns, but can also include verbs, adjectives, etc.) and, importantly, nominal compounds.
[Edited at 2024-08-19 02:41 GMT] ▲ Collapse | | | Zea_Mays Italy Local time: 15:16 English to German + ...
I've now asked Gemini to extract glossary terms from the EN and DE versions of a turbine manufacturer's website (just copied the pages' content into a TXT file) to see how it handles technical content. The entire task took around 15 minutes.
Here's what it delivered after the first prompt (this time it added term explanations without being asked to; NB: I am not sure how accurate the EN version of the web page is.):
Pelton-Turbine - Pelton turbine: A type of impulse turbine specifically designed for high-head hydropower applications.
Hochdruck-Freistrahlturbine - High-pressure impulse turbine: Another term for a Pelton turbine, emphasizing its operation under high pressure and with a free jet of water.
Freistrahlturbine - Impulse turbine: A turbine where the energy of the fluid is converted into mechanical work by the impulse of a jet.
Fallhöhe - Head: The vertical distance between the water surface of the reservoir and the turbine runner.
Wirkungsgrad - Efficiency: The ratio of the output power to the input power of a device.
Düsen - Nozzles: Orifices that control the flow of water onto the turbine runner.
Laufrad - Runner: The rotating part of a turbine that is struck by the water jet.
Monoblock - Monoblock: A single, solid piece of material from which a part is machined.
Fräsbearbeitung - Milling: A machining process that uses a rotating cutter to remove material.
Rohblock - Raw block: A piece of material before any machining has been performed.
Spülhub - Scavenging stroke: A mechanism that clears obstructions from a nozzle or pipe.
Trinkwassertauglich - Potable water suitable: Designed and constructed to meet the standards for drinking water.
Wasserkraft - Hydropower: The generation of electricity from the energy of flowing water.
Wasserkraftwerk - Hydroelectric power plant: A facility that converts the energy of flowing water into electricity.
Ingenieurskunst - Engineering: The application of scientific and mathematical principles to design and build structures.
Vertikale Ausführung - Vertical design: A turbine with a vertical axis of rotation.
Horizontale Ausführung - Horizontal design: A turbine with a horizontal axis of rotation.
Fremdkörper - Foreign matter: Impurities or contaminants in a fluid.
Innenregulierten Düsengarnituren - Internally regulated nozzle assemblies: Nozzle assemblies where the nozzle opening is controlled from within the turbine casing.
Außenregulierten Düsengarnituren - Externally regulated nozzle assemblies: Nozzle assemblies where the nozzle opening is controlled from outside the turbine casing.
Maßhaltigkeit - Dimensional accuracy: The precision with which a part conforms to its specified dimensions.
Festigkeitswerte - Strength values: Measures of a material's ability to resist deformation or fracture.
Langlebigkeit - Durability: The ability of a product to withstand wear, tear, and damage over time.
Note: Some terms might have slightly different nuances depending on the specific context. For example, "Wirkungsgrad" could also be translated as "efficiency factor" or "performance coefficient."
Then I asked the LLM to expand the glossary with terms that could be relevant in similar contexts. Here's the output:
Drehzahl - Rotational speed: The number of revolutions of a rotating object per unit of time.
Durchfluss - Flow rate: The volume of fluid passing through a given point per unit of time.
Leistungsbereich - Power range: The range of output power a turbine can produce.
Kavitation - Cavitation: The formation and collapse of vapor bubbles in a liquid, which can cause damage to turbine components.
Hydraulischer Wirkungsgrad - Hydraulic efficiency: The efficiency of converting water energy into mechanical energy.
Elektrischer Wirkungsgrad - Electrical efficiency: The efficiency of converting mechanical energy into electrical energy.
Gesamtwirkungsgrad - Overall efficiency: The product of hydraulic and electrical efficiency.
Regelungssystem - Control system: A system that regulates the operation of a turbine.
Lager - Bearing: A machine element that supports a rotating shaft.
Wellen - Shaft: A rotating machine element that transmits power.
Generator - Generator: A device that converts mechanical energy into electrical energy.
Turbinengehäuse - Turbine casing: The housing that contains the turbine components.
Leitapparat - Guide vanes: Movable vanes that direct the flow of water onto the turbine runner.
Laufspalt - Clearance: The gap between moving and stationary parts of a turbine.
Verschleiß - Wear: The gradual deterioration of a material due to friction or other factors.
Erosion - Erosion: The gradual wearing away of a material by the action of wind, water, or other forces.
Korrosion - Corrosion: The deterioration of a material due to a chemical reaction.
Wartung - Maintenance: The process of preserving equipment in good working condition by performing regular inspections and repairs.
Überholung - Overhaul: A complete inspection and repair of a machine or system.
| | | Miranda Drew Italy Local time: 15:16 Italian to English TOPIC STARTER
Zea_Mays wrote:
I've now asked Gemini to extract glossary terms from the EN and DE versions of a turbine manufacturer's website (just copied the pages' content into a TXT file) to see how it handles technical content. The entire task took around 15 minutes.
That looks great, thank you. I'll try it out. | | | Zea_Mays Italy Local time: 15:16 English to German + ... Glossary translation into EN-IT | Aug 19 |
Miranda Drew wrote:
Zea_Mays wrote:
I've now asked Gemini to extract glossary terms from the EN and DE versions of a turbine manufacturer's website (just copied the pages' content into a TXT file) to see how it handles technical content. The entire task took around 15 minutes.
That looks great, thank you. I'll try it out.
Now I've asked to provide the list in EN-IT out of curiosity. Of course all the output needs to be double checked and edited, but to me the raw results look generally good.
Pelton turbine - Turbina Pelton: Turbina ad impulso utilizzata per impianti idroelettrici con elevate cadute d'acqua.
Impulse turbine - Turbina ad impulso: Turbina che sfrutta l'impulso dell'acqua per generare energia meccanica.
High-pressure turbine - Turbina ad alta pressione: Turbina che opera con elevate pressioni dell'acqua.
Head - Caduta d'acqua: Differenza di quota tra il livello dell'acqua nel bacino e la turbina.
Efficiency - Rendimento: Rapporto tra potenza utile e potenza assorbita.
Nozzle - Ugello: Apertura che controlla il flusso dell'acqua verso la turbina.
Runner - Girante: Parte rotante della turbina colpita dal getto d'acqua.
Monoblock - Monoblocco: Pezzo unico di materiale dal quale viene ricavato un componente.
Milling - Fresatura: Processo di lavorazione che rimuove materiale tramite una fresa rotante.
Raw block - Blocco grezzo: Pezzo di materiale prima della lavorazione.
Scavenging stroke - Colpo di lavaggio: Meccanismo che rimuove ostruzioni dall'ugello.
Potable water suitable - Adatto ad acqua potabile: Progettato e costruito per essere utilizzato con acqua potabile.
Turbine Industry Terms
Rotational speed - Velocità di rotazione: Numero di giri di un oggetto rotante per unità di tempo.
Flow rate - Portata: Volume di fluido che passa attraverso un punto in un'unità di tempo.
Power range - Campo di potenza: Intervallo di potenza che una turbina può produrre.
Cavitation - Cavitazione: Fenomeno di formazione e collasso di bolle di vapore in un liquido, che può danneggiare i componenti della turbina.
Hydraulic efficiency - Rendimento idraulico: Efficienza nella conversione dell'energia dell'acqua in energia meccanica.
Electrical efficiency - Rendimento elettrico: Efficienza nella conversione dell'energia meccanica in energia elettrica.
Overall efficiency - Rendimento complessivo: Prodotto del rendimento idraulico e elettrico.
Control system - Sistema di controllo: Sistema che regola il funzionamento di una turbina.
Bearing - Cuscinetto: Elemento meccanico che sostiene un albero rotante.
Shaft - Albero: Elemento rotante che trasmette potenza.
Generator - Generatore: Dispositivo che converte energia meccanica in energia elettrica.
Turbine casing - Corpo turbina: Involucro che contiene i componenti della turbina.
Guide vanes - Direttrici: Pale mobili che dirigono il flusso dell'acqua verso la girante.
Clearance - Gioco: Spazio tra parti mobili e fisse di una turbina.
Wear - Usura: Deterioramento graduale di un materiale dovuto a frizione o altri fattori.
Erosion - Erosione: Deterioramento graduale di un materiale dovuto all'azione di vento, acqua o altre forze.
Corrosion - Corrosione: Deterioramento di un materiale dovuto a una reazione chimica.
Maintenance - Manutenzione: Processo di conservazione di un'attrezzatura in buone condizioni di funzionamento attraverso ispezioni e riparazioni regolari.
Overhaul - Revisione generale: Ispezione e riparazione completa di una macchina o sistema.
Note: Alcuni termini potrebbero avere sfumature di significato diverse a seconda del contesto specifico.
| |
|
|
expressisverbis Portugal Local time: 14:16 Member (2015) English to Portuguese + ... | |
Samuel Murray wrote:
PlusTools is a monolingual extractor, and it extracts only frequently occurring terms. It's rather slow since it's a macro that runs in Word (and naturally it works only on files that you can open in Word).
This morning I learned that they (probably Yves) are working on a standalone version of Plustools. It's already available for download. | | | Zea_Mays Italy Local time: 15:16 English to German + ... creating an excel file from Gemini output | Aug 20 |
Zea_Mays wrote:
Now I've asked to provide the list in EN-IT out of curiosity. Of course all the output needs to be double checked and edited, but to me the raw results look generally good.
Pelton turbine - Turbina Pelton: Turbina ad impulso utilizzata per impianti idroelettrici con elevate cadute d'acqua.
Impulse turbine - Turbina ad impulso: Turbina che sfrutta l'impulso dell'acqua per generare energia meccanica.
High-pressure turbine - Turbina ad alta pressione: Turbina che opera con elevate pressioni dell'acqua.
Head - Caduta d'acqua: Differenza di quota tra il livello dell'acqua nel bacino e la turbina.
Efficiency - Rendimento: Rapporto tra potenza utile e potenza assorbita.
Nozzle - Ugello: Apertura che controlla il flusso dell'acqua verso la turbina.
Runner - Girante: Parte rotante della turbina colpita dal getto d'acqua.
Monoblock - Monoblocco: Pezzo unico di materiale dal quale viene ricavato un componente.
Milling - Fresatura: Processo di lavorazione che rimuove materiale tramite una fresa rotante.
Raw block - Blocco grezzo: Pezzo di materiale prima della lavorazione.
Scavenging stroke - Colpo di lavaggio: Meccanismo che rimuove ostruzioni dall'ugello.
Potable water suitable - Adatto ad acqua potabile: Progettato e costruito per essere utilizzato con acqua potabile.
So far, the most hassle free tools for the task seem to be Large Language Models (ChatGPT, Gemini, Claude...).
You can easily create an Excel file copying the output and pasting it into an Excel sheet defining (in this case) hyphens and colons as text delimiters. Time required: a few minutes.
You can tell the LLM what signs to use to delimit terms and in case definitions (pipes, colons, semicolons etc.).
For the example above, the final Excel file looks like this (only one line has been wrongly segmented because there was a hyphen in the term too):
| |
|
|
Post removed: This post was hidden by a moderator or staff member for the following reason: spam | Post removed: This post was hidden by a moderator or staff member for the following reason: spam removed | Post removed: This post was hidden by a moderator or staff member for the following reason: spam post removed | expressisverbis Portugal Local time: 14:16 Member (2015) English to Portuguese + ... Thank you for your explanations, Samuel | Aug 20 |
I have to be honest... I've never tried PlusTools, but it was highly recommended to me once by a colleague.
I've only seen a few videos which I thought to be a good and simple tool. That's why I mentioned it, when I read this question.
Regarding the ProjectTermExtract, it's true, it only works with Trados itself, but it's still useful for those who are heavy users and need it for terminology purposes, like me.
It does indeed monolingual extraction, but to achieve bilingu... See more I have to be honest... I've never tried PlusTools, but it was highly recommended to me once by a colleague.
I've only seen a few videos which I thought to be a good and simple tool. That's why I mentioned it, when I read this question.
Regarding the ProjectTermExtract, it's true, it only works with Trados itself, but it's still useful for those who are heavy users and need it for terminology purposes, like me.
It does indeed monolingual extraction, but to achieve bilingual term extraction, we can either manually align terms after extraction or use other tools like Multiterm Extract.
Okay, aligning bilingual text can be 'painful', but I don't mind.
ProjectTermExtract does have the capability to extract compound terms. It can identify phrases where words frequently co-occur in the text, which suggests they function as a single term.
For example, in an IT text, it identifies "user interface" as a multi-word term because the words "user" and "interface" frequently co-occur in that text, and both form a concept that is more meaningful together than as individual words.
Lastly, Synchroterm does not explicitly identify or label a term as a "technical term", but it can extract and identify terms that are frequent or contextually significant within specialized or technical texts.
I believe this makes it a good tool for creating technical glossaries or termbases, where many of the extracted terms are likely to be technical in nature due to the nature of the source text.
I have tried it before myself. ▲ Collapse | |
|
|
expressisverbis Portugal Local time: 14:16 Member (2015) English to Portuguese + ...
https://wordcount.com/keyword-extractor
You can extract texts, websites, fix grammar, change style, and even get them translated.
It's a very simple and easy-to-use online tool, and one that many are probably familiar with.
I don't know if it's useful or not, but I think it's worth mentioning here. | | | Zea_Mays Italy Local time: 15:16 English to German + ... Workflow for glossary creation with LLMs | Aug 21 |
To sum things up, the steps for creating a first draft for a glossary with chat bots like Gemini, ChatGPT etc., are the following:
- Copy and paste the content of the two versions of the webpage into a txt file, one after the other. This way you don't need to bother about images, formatting etc. You can write before the single versions "English copy" and "German copy" (or what the relevant languages are).
- Tell the bot what it should do (this is called a "prompt") and then paste bot... See more To sum things up, the steps for creating a first draft for a glossary with chat bots like Gemini, ChatGPT etc., are the following:
- Copy and paste the content of the two versions of the webpage into a txt file, one after the other. This way you don't need to bother about images, formatting etc. You can write before the single versions "English copy" and "German copy" (or what the relevant languages are).
- Tell the bot what it should do (this is called a "prompt") and then paste both versions into the chat window after your prompt. I just used something like "Please create a glossary extracting all technical terms from the x field from the following copy. There is the English follwed by the German version." (x will be the relevant industry). You can specify how the bot should delimit the terms and, if required, definitions, using something like "Use pipes to delimit glossary terms and a colon for the definition." If you don't need definitions, just tell it the bot.
- You'll get a first list and now you can refine the prompt if there is need for.
- Then you can tell the bot to continue with term extraction. You can also ask it to add any relevant terms from the industry.
- Create an Excel glossary: Once completed, copy & paste the list into an Excel file (before, write "Source" in cell A1 and "Target" in cell B1 and in case "Definition" in cell C1), defining the delimiters for the correct segmentation. https://www.google.de/search?q=excel%20define%20delimiter%20for%20copying
- If needed, remove leading/extra spaces in cells: https://www.google.de/search?q=remove%20leading%20spaces%20cells%20excel
- Refine the glossary.
- If you'd like to use the Excel glossary in a CAT tool, create a term base from it: https://www.google.de/search?q=create%20term%20base%20from%20excel
The entire task takes little time (the example above took around 15-20 minutes) and is highly customizable.
[Bearbeitet am 2024-08-21 07:50 GMT] ▲ Collapse | | | Samuel Murray Netherlands Local time: 15:16 Member (2006) English to Afrikaans + ...
Zea_Mays wrote:
- Tell the bot what it should do (this is called a "prompt") and then paste both versions into the chat window after your prompt.
I used a 2-column table in MS Word with ChatGPT, as follows:
From 50 segments, ChatGPT collected 33 terms, and they were all accurate (except that ChatGPT capitalized the initial letters). | | | Pages in topic: < [1 2 3 4] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Recommendations for creating glossary from websites? Pastey | Your smart companion app
Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.
Find out more » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |