How to make this glossary usable? Thread poster: Hans Lenting
|
There is this glossary on the internet:
It has a silly, "space-saving" paper dictionary layout that cannot be used for automatic searching in CAT tools.
How to convert this glossary to this layout?
One could use a spreadsheet, but that will still require a lot of manual editing. | | | Tony M France Local time: 01:23 Member French to English + ... SITE LOCALIZER Start by using "search and replace" | May 31 |
I would start by trying to identify what the actual delimiters used are, and then use S&R to replace them with e.g. TAB; then you can format into a table, if necessary, and then easily delete superfluous columns.
This is more or less what I do a lot of the time to convert miscellaneous customer glossaries into CAT-compatible ones. | | | Philippe Locquet Portugal Local time: 00:23 Member (2013) English to French + ...
Hans Lenting wrote:
There is this glossary on the internet:
It has a silly, "space-saving" paper dictionary layout that cannot be used for automatic searching in CAT tools.
How to convert this glossary to this layout?
One could use a spreadsheet, but that will still require a lot of manual editing.
If that's not confidential, I would use an LLM, with the correct prompt, it would do short work of this. I've had similar tasks in the past and it worked for me. | | | Stepan Konev Russian Federation Local time: 03:23 English to Russian
If all the terms are consistently separated with [space hyphen space], you can use two replacements with wildcards checked:
**Before running the replacements, select all in MS Word, press Ctrl+E, Ctrl+L. This would remove all extra spaces at the beginning of each line.**
Then replace as follows:
1) Find what:
-[^32]{2;}
Replace with: [nothing, leave this field blank]
This would remove the [hyphen space]... See more If all the terms are consistently separated with [space hyphen space], you can use two replacements with wildcards checked:
**Before running the replacements, select all in MS Word, press Ctrl+E, Ctrl+L. This would remove all extra spaces at the beginning of each line.**
Then replace as follows:
1) Find what:
-[^32]{2;}
Replace with: [nothing, leave this field blank]
This would remove the [hyphen space] combination at the beginning of each line.
2) Find what: [space hyphen space]
Replace with: [tab]
===
Now select all and press one button after the other: Alt, C, 4, G
This would convert text to table¹. Click OK.
That's it.
¹Probably the combination of buttons to convert text to table is different in non-QWERTY keyboards. If this is your case, just use the MS Word menu (Insert - Table - Convert to table). ▲ Collapse | |
|
|
Tony M France Local time: 01:23 Member French to English + ... SITE LOCALIZER The real problem here... | May 31 |
...is that the format used replaces the 'headword' in each entry with symbols, and in some cases, a succession of words.
This seems to be Asker's principal headache — and I for one can't think of a solution without, as they point out, major manual editing. | | | Stepan Konev Russian Federation Local time: 03:23 English to Russian Glossary in table format is available too | May 31 |
I̶ s̶u̶p̶p̶o̶s̶e̶ t̶h̶e̶ g̶l̶o̶s̶s̶a̶r̶y̶ i̶s̶ n̶o̶t̶ c̶o̶n̶f̶i̶d̶e̶n̶t̶i̶a̶l̶ b̶e̶c̶a̶u̶s̶e̶ i̶t̶ i̶s̶ a̶v̶a̶i̶l̶a̶b̶l̶e̶ o̶n̶ t̶h̶e̶ i̶n̶t̶e̶r̶n̶e̶t̶, r̶i̶g̶h̶t̶? C̶a̶n̶ y̶o̶u̶ s̶h̶a̶r̶e̶ t̶h̶e̶ f̶i̶l̶e̶ o̶r̶ l̶i̶n̶k̶ t̶o̶ t̶h̶a̶t̶ g̶l̶o̶s̶s̶a̶r̶y̶? I̶ c̶a̶n̶'t̶ s̶e̶e̶ a̶n̶y̶ p̶r̶o̶b̶l̶e̶m̶ h̶e̶r̶e̶ b̶a̶s̶e̶d̶ o̶n̶ y̶o̶u̶r̶ s̶c̶r̶e̶e̶n̶s̶h̶o̶... See more I̶ s̶u̶p̶p̶o̶s̶e̶ t̶h̶e̶ g̶l̶o̶s̶s̶a̶r̶y̶ i̶s̶ n̶o̶t̶ c̶o̶n̶f̶i̶d̶e̶n̶t̶i̶a̶l̶ b̶e̶c̶a̶u̶s̶e̶ i̶t̶ i̶s̶ a̶v̶a̶i̶l̶a̶b̶l̶e̶ o̶n̶ t̶h̶e̶ i̶n̶t̶e̶r̶n̶e̶t̶, r̶i̶g̶h̶t̶? C̶a̶n̶ y̶o̶u̶ s̶h̶a̶r̶e̶ t̶h̶e̶ f̶i̶l̶e̶ o̶r̶ l̶i̶n̶k̶ t̶o̶ t̶h̶a̶t̶ g̶l̶o̶s̶s̶a̶r̶y̶? I̶ c̶a̶n̶'t̶ s̶e̶e̶ a̶n̶y̶ p̶r̶o̶b̶l̶e̶m̶ h̶e̶r̶e̶ b̶a̶s̶e̶d̶ o̶n̶ y̶o̶u̶r̶ s̶c̶r̶e̶e̶n̶s̶h̶o̶t̶.
Update: ah, ok, I see now. The "-" char stands for the parent entry...
Ok then just use this link
https://www.gerritspeek.nl/auto/autowoordenboek/autowoordenboek-l.html
[Edited at 2024-05-31 21:00 GMT] ▲ Collapse | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER | VB.NET-inspired pseudocode - untested! | Jun 1 |
I've been faced with many similar tasks over the years, and if the volume of data justifies the time spent on writing a bit of program code, then I prefer do do that and avoid error-prone and time-consuming manual fiddling with word tables, spreadsheets, etc.
We all have our 'pet' programming languages, and mine happens to be VB.NET - but I don't doubt that Hans and others will have no problem following the logic.
************
DIM an empty DataTable 'DT' with 10 columns ... See more I've been faced with many similar tasks over the years, and if the volume of data justifies the time spent on writing a bit of program code, then I prefer do do that and avoid error-prone and time-consuming manual fiddling with word tables, spreadsheets, etc.
We all have our 'pet' programming languages, and mine happens to be VB.NET - but I don't doubt that Hans and others will have no problem following the logic.
************
DIM an empty DataTable 'DT' with 10 columns (0 - 9) 'column count >= max. number of header (sub)levels
OPEN plain text file for read-only
WHILE NOT EOF
strLine = READLINE from file 'read one line at a time
Replace all 'space hyphen space' in strLine with '|¿' 'pipe avoids confusion with wanted hyphens, and '¿' flags the need to substitute words from the previous line
SPLIT strLine on '|' --> arrWords() 'variable-length string array
DIM an empty DataRow 'DR' with 10 columns (0 - 9)
FOR i = 0 to Ubound(arrWords) - 2 'copy all except last item from arrWords() into DR
DR(i)=TRIM(arrWords(i)) 'trim leading/trailing spaces
NEXT
DR(9) = TRIM(arrWords(Ubound-1)) 'copy NL term to last column
Add DR to DT
END WHILE
FOR EACH Row IN DT
FOR EACH Column IN Row
IF thisRow/Column = '¿' then
REPLACE thisRow/Column with PreviousRow/SameColumn
ELSE
REPLACE '¿' in thisRow/Column with nothing
END IF
NEXT
NEXT
Each row in the table should now contain one or more EN words, zero or more empty columns, and the NL term is in the last column.
The required output format can be built in various ways, depending on the final destination, knowing that in each Row:
English term = TRIM(JOIN Row/Columns(0-8) with 'space' separator)
Dutch term = Row/Column(9)
**********************
HTH
JL
[Edited at 2024-06-01 16:03 GMT]
[Edited at 2024-06-01 20:22 GMT] ▲ Collapse | |
|
|
Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER
Thank you all for your input. The case is solved. A kind person wrote a JavaScript. | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » How to make this glossary usable? Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |