How to make this glossary usable?
Thread poster: Hans Lenting
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
May 31

There is this glossary on the internet:
Screenshot 2024-05-31 at 08.05.23
It has a silly, "space-saving" paper dictionary layout that cannot be used for automatic searching in CAT tools.

How to convert this glossary to this layout?
Screenshot 2024-05-31 at 08.04.36
One could use a spreadsheet, but that will still require a lot of manual editing.


 
Tony M
Tony M
France
Local time: 01:23
Member
French to English
+ ...
SITE LOCALIZER
Start by using "search and replace" May 31

I would start by trying to identify what the actual delimiters used are, and then use S&R to replace them with e.g. TAB; then you can format into a table, if necessary, and then easily delete superfluous columns.
This is more or less what I do a lot of the time to convert miscellaneous customer glossaries into CAT-compatible ones.


Renée van Bijsterveld
 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 00:23
Member (2013)
English to French
+ ...
LLM May 31

Hans Lenting wrote:

There is this glossary on the internet:
Screenshot 2024-05-31 at 08.05.23
It has a silly, "space-saving" paper dictionary layout that cannot be used for automatic searching in CAT tools.

How to convert this glossary to this layout?
Screenshot 2024-05-31 at 08.04.36
One could use a spreadsheet, but that will still require a lot of manual editing.


If that's not confidential, I would use an LLM, with the correct prompt, it would do short work of this. I've had similar tasks in the past and it worked for me.


Hans Lenting
 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 03:23
English to Russian
Use regex May 31

If all the terms are consistently separated with [space hyphen space], you can use two replacements with wildcards checked:

**Before running the replacements, select all in MS Word, press Ctrl+E, Ctrl+L. This would remove all extra spaces at the beginning of each line.**

Then replace as follows:

1) Find what:
-[^32]{2;}

Replace with: [nothing, leave this field blank]

This would remove the [hyphen space]
... See more
If all the terms are consistently separated with [space hyphen space], you can use two replacements with wildcards checked:

**Before running the replacements, select all in MS Word, press Ctrl+E, Ctrl+L. This would remove all extra spaces at the beginning of each line.**

Then replace as follows:

1) Find what:
-[^32]{2;}

Replace with: [nothing, leave this field blank]

This would remove the [hyphen space] combination at the beginning of each line.


2) Find what: [space hyphen space]
Replace with: [tab]

===
Now select all and press one button after the other: Alt, C, 4, G
This would convert text to table¹. Click OK.
That's it.

¹Probably the combination of buttons to convert text to table is different in non-QWERTY keyboards. If this is your case, just use the MS Word menu (Insert - Table - Convert to table).
Collapse


 
Tony M
Tony M
France
Local time: 01:23
Member
French to English
+ ...
SITE LOCALIZER
The real problem here... May 31

...is that the format used replaces the 'headword' in each entry with symbols, and in some cases, a succession of words.
This seems to be Asker's principal headache — and I for one can't think of a solution without, as they point out, major manual editing.


Hans Lenting
 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 03:23
English to Russian
Glossary in table format is available too May 31

I̶ s̶u̶p̶p̶o̶s̶e̶ t̶h̶e̶ g̶l̶o̶s̶s̶a̶r̶y̶ i̶s̶ n̶o̶t̶ c̶o̶n̶f̶i̶d̶e̶n̶t̶i̶a̶l̶ b̶e̶c̶a̶u̶s̶e̶ i̶t̶ i̶s̶ a̶v̶a̶i̶l̶a̶b̶l̶e̶ o̶n̶ t̶h̶e̶ i̶n̶t̶e̶r̶n̶e̶t̶, r̶i̶g̶h̶t̶? C̶a̶n̶ y̶o̶u̶ s̶h̶a̶r̶e̶ t̶h̶e̶ f̶i̶l̶e̶ o̶r̶ l̶i̶n̶k̶ t̶o̶ t̶h̶a̶t̶ g̶l̶o̶s̶s̶a̶r̶y̶? I̶ c̶a̶n̶'t̶ s̶e̶e̶ a̶n̶y̶ p̶r̶o̶b̶l̶e̶m̶ h̶e̶r̶e̶ b̶a̶s̶e̶d̶ o̶n̶ y̶o̶u̶r̶ s̶c̶r̶e̶e̶n̶s̶h̶o̶... See more
I̶ s̶u̶p̶p̶o̶s̶e̶ t̶h̶e̶ g̶l̶o̶s̶s̶a̶r̶y̶ i̶s̶ n̶o̶t̶ c̶o̶n̶f̶i̶d̶e̶n̶t̶i̶a̶l̶ b̶e̶c̶a̶u̶s̶e̶ i̶t̶ i̶s̶ a̶v̶a̶i̶l̶a̶b̶l̶e̶ o̶n̶ t̶h̶e̶ i̶n̶t̶e̶r̶n̶e̶t̶, r̶i̶g̶h̶t̶? C̶a̶n̶ y̶o̶u̶ s̶h̶a̶r̶e̶ t̶h̶e̶ f̶i̶l̶e̶ o̶r̶ l̶i̶n̶k̶ t̶o̶ t̶h̶a̶t̶ g̶l̶o̶s̶s̶a̶r̶y̶? I̶ c̶a̶n̶'t̶ s̶e̶e̶ a̶n̶y̶ p̶r̶o̶b̶l̶e̶m̶ h̶e̶r̶e̶ b̶a̶s̶e̶d̶ o̶n̶ y̶o̶u̶r̶ s̶c̶r̶e̶e̶n̶s̶h̶o̶t̶.

Update: ah, ok, I see now. The "-" char stands for the parent entry...
Ok then just use this link
https://www.gerritspeek.nl/auto/autowoordenboek/autowoordenboek-l.html

[Edited at 2024-05-31 21:00 GMT]
Collapse


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Right May 31

Stepan Konev wrote:

I̶ s̶u̶p̶p̶o̶s̶e̶ t̶h̶e̶ g̶l̶o̶s̶s̶a̶r̶y̶ i̶s̶ n̶o̶t̶ c̶o̶n̶f̶i̶d̶e̶n̶t̶i̶a̶l̶ b̶e̶c̶a̶u̶s̶e̶ i̶t̶ i̶s̶ a̶v̶a̶i̶l̶a̶b̶l̶e̶ o̶n̶ t̶h̶e̶ i̶n̶t̶e̶r̶n̶e̶t̶, r̶i̶g̶h̶t̶? C̶a̶n̶ y̶o̶u̶ s̶h̶a̶r̶e̶ t̶h̶e̶ f̶i̶l̶e̶ o̶r̶ l̶i̶n̶k̶ t̶o̶ t̶h̶a̶t̶ g̶l̶o̶s̶s̶a̶r̶y̶? I̶ c̶a̶n̶'t̶ s̶e̶e̶ a̶n̶y̶ p̶r̶o̶b̶l̶e̶m̶ h̶e̶r̶e̶ b̶a̶s̶e̶d̶ o̶n̶ y̶o̶u̶r̶ s̶c̶r̶e̶e̶n̶s̶h̶o̶t̶.

Update: ah, ok, I see now. The "-" char stands for the parent entry...
Ok then just use this link
https://www.gerritspeek.nl/auto/autowoordenboek/autowoordenboek-l.html

[Edited at 2024-05-31 21:00 GMT]


Correct.

Anyway, here’s the link: https://www.cardiagnostics.be/-now/Engels-Nederlands%20Woordenboek/A.htm

(For any potential geniuses)


 
Jennifer Levey
Jennifer Levey  Identity Verified
Chile
Local time: 21:23
Spanish to English
+ ...
VB.NET-inspired pseudocode - untested! Jun 1

I've been faced with many similar tasks over the years, and if the volume of data justifies the time spent on writing a bit of program code, then I prefer do do that and avoid error-prone and time-consuming manual fiddling with word tables, spreadsheets, etc.
We all have our 'pet' programming languages, and mine happens to be VB.NET - but I don't doubt that Hans and others will have no problem following the logic.

************
DIM an empty DataTable 'DT' with 10 columns
... See more
I've been faced with many similar tasks over the years, and if the volume of data justifies the time spent on writing a bit of program code, then I prefer do do that and avoid error-prone and time-consuming manual fiddling with word tables, spreadsheets, etc.
We all have our 'pet' programming languages, and mine happens to be VB.NET - but I don't doubt that Hans and others will have no problem following the logic.

************
DIM an empty DataTable 'DT' with 10 columns (0 - 9) 'column count >= max. number of header (sub)levels

OPEN plain text file for read-only

WHILE NOT EOF
strLine = READLINE from file 'read one line at a time
Replace all 'space hyphen space' in strLine with '|¿' 'pipe avoids confusion with wanted hyphens, and '¿' flags the need to substitute words from the previous line
SPLIT strLine on '|' --> arrWords() 'variable-length string array

DIM an empty DataRow 'DR' with 10 columns (0 - 9)

FOR i = 0 to Ubound(arrWords) - 2 'copy all except last item from arrWords() into DR
DR(i)=TRIM(arrWords(i)) 'trim leading/trailing spaces
NEXT
DR(9) = TRIM(arrWords(Ubound-1)) 'copy NL term to last column

Add DR to DT
END WHILE

FOR EACH Row IN DT
FOR EACH Column IN Row
IF thisRow/Column = '¿' then
REPLACE thisRow/Column with PreviousRow/SameColumn
ELSE
REPLACE '¿' in thisRow/Column with nothing
END IF
NEXT
NEXT

Each row in the table should now contain one or more EN words, zero or more empty columns, and the NL term is in the last column.

The required output format can be built in various ways, depending on the final destination, knowing that in each Row:
English term = TRIM(JOIN Row/Columns(0-8) with 'space' separator)
Dutch term = Row/Column(9)

**********************

HTH
JL

[Edited at 2024-06-01 16:03 GMT]

[Edited at 2024-06-01 20:22 GMT]
Collapse


Hans Lenting
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Solved Jun 2

Thank you all for your input. The case is solved. A kind person wrote a JavaScript.

 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Community Project Jun 3

See here.

 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

How to make this glossary usable?






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »