Reducing the size of a Translation Memory (TMX) Thread poster: Nelson Yemeli
|
Hello dearest colleagues!
I have got a translation memory which is made up of more than one million translation units, for a total weight of almost 5GB. The TM is very rich, but it also almost unusable because of its size. When I want to import the TMX or create Autosuggest dictionaries, the processes are too slow. Thus, I wonder if there is any solution for reducing the size of the TMX file, so as to render it more easily manipulable.
I am using SDL TRADOS Studio 2014 on Windows 7... See more Hello dearest colleagues!
I have got a translation memory which is made up of more than one million translation units, for a total weight of almost 5GB. The TM is very rich, but it also almost unusable because of its size. When I want to import the TMX or create Autosuggest dictionaries, the processes are too slow. Thus, I wonder if there is any solution for reducing the size of the TMX file, so as to render it more easily manipulable.
I am using SDL TRADOS Studio 2014 on Windows 7
Thanks in advance. ▲ Collapse | | | | Nelson Yemeli United States Member (2016) English to French TOPIC STARTER
Thanks a million, Umang! But I think 35 euros is more huge than my TM!:-)
Hope there is a cheaper solution.
[Edited at 2015-08-01 08:31 GMT] | | | TM Maintenance | Aug 1, 2015 |
Nelson Yemeli wrote:
I have got a translation memory which is made up of more than one million translation units, for a total weight of almost 5GB. The TM is very rich, but it also almost unusable because of its size. When I want to import the TMX or create Autosuggest dictionaries, the processes are too slow. Thus, I wonder if there is any solution for reducing the size of the TMX file, so as to render it more easily manipulable.
I am using SDL TRADOS Studio 2014 on Windows 7
Trados TM maintenance functions are provided to filter field values, delete TU etc. Please refer to below to reduce the TM size:
http://producthelp.sdl.com/sdl_trados_studio_2014/client_en/tm_view/TM_Data/TM_Overview_Managing_Translation_Memory_Data.htm
Soonthon L. | |
|
|
Nelson Yemeli United States Member (2016) English to French TOPIC STARTER I need to reuce before opening | Aug 1, 2015 |
Soonthon LUPKITARO(Ph.D.) wrote:
Nelson Yemeli wrote:
I have got a translation memory which is made up of more than one million translation units, for a total weight of almost 5GB. The TM is very rich, but it also almost unusable because of its size. When I want to import the TMX or create Autosuggest dictionaries, the processes are too slow. Thus, I wonder if there is any solution for reducing the size of the TMX file, so as to render it more easily manipulable.
I am using SDL TRADOS Studio 2014 on Windows 7
Trados TM maintenance functions are provided to filter field values, delete TU etc. Please refer to below to reduce the TM size:
http://producthelp.sdl.com/sdl_trados_studio_2014/client_en/tm_view/TM_Data/TM_Overview_Managing_Translation_Memory_Data.htm
Soonthon L.
Dear Soonthon, I don't think I can edit a TM in Trados without opening it first. I need hours only to open it. And whenever I try, Trados tells me that I must upgrade the TM; Upgrading too requires hours. | | | 5 GB is large... | Aug 2, 2015 |
Nelson Yemeli wrote:
I need to re[d]uce before opening
... too large for most CAT tools to handle. It's also way too large for "only" a million segments, unless there are heaps of metadata, and more likely, lots of languages.
I suppose you need only one language pair, and (almost) no metadata. In that case, I suggest to use either Andras' free TMLookup or the free version of CafeTran to extract those two languages. Both can import the two languages needed in an SQLite database. You would then need a tool to open your SQLite database to export the table to a format that can be imported in a CAT tool, like CSV or Excel, to get a TMX file again. I use SQLite Browser - again free - for that purpose.
This all looks rather complicated (though it's not that bad, really), so I hope somebody will come up with an easier solution.
Cheers,
Hans
[Edited at 2015-08-02 02:20 GMT] | | | Large TMX files, AutoSuggest, Studio TM and Splitting the TMX | Aug 2, 2015 |
Dear Nelson,
I don’t know anything about your TM, but I know a bit about converting large TMX files into Studio TMs and AutoSuggest dictionaries. In the past few weeks I created AutoSuggest files from the DGT TMs (https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory) in more than 300 lan... See more Dear Nelson,
I don’t know anything about your TM, but I know a bit about converting large TMX files into Studio TMs and AutoSuggest dictionaries. In the past few weeks I created AutoSuggest files from the DGT TMs (https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory) in more than 300 language pairs. They can be found here: https://alexandria-translation-resources.com/resources-for-translation-providers/autosuggest-dictionaries/dgt-tm/
To create the AutoSuggest files, we used the following process:
- Download the DGT files.
- Extract the language pair you want.
This gives you a TMX with about 3 million entries and about 2 GB size in your language pair.
- Use Xbench or Olifant to remove duplicates from the TMX
This will result in a TMX with about 2.1 million entries and about 1.2 GB size.
- Use Autosuggest Creator to produce an AutoSuggest directly from your shrinked TMX file.
Creating the AutoSuggest file takes about 3 hours on a decent computer (8 GB RAM, i5 Processor)
This is much faster than creating a Studio TM from this shrinked TMX file, which on the same computer might take up to 12 hours.
To split your TMX in smaller files, you could use Olifant, how to do it, is described here:
https://groups.yahoo.com/neo/groups/okapitools/conversations/topics/3678 ▲ Collapse | | | 2nl (X) Netherlands Local time: 15:57 Very nice posting! | Aug 2, 2015 |
Meta Arkadia wrote:
This all looks rather complicated (though it's not that bad, really), so I hope somebody will come up with an easier solution.
Cheers,
Hans
Very nice posting, Hans! | |
|
|
Nelson Yemeli United States Member (2016) English to French TOPIC STARTER Let's go back to the source! | Aug 2, 2015 |
Hello!
I thank each and everyone for all these interesting suggestions. I thought it may be interesting or necessary for you to know where I got such a weird TM. Here is the link: http://opus.lingfil.uu.se/MultiUN.php
The original TM was a in zipped file. When I unzipped it, I obtained a TMX file of about 5 GB. | | | Michael Beijer United Kingdom Local time: 14:57 Member (2009) Dutch to English + ... it's just a big TMX | Aug 2, 2015 |
Nelson Yemeli wrote:
Hello!
I thank each and everyone for all these interesting suggestions. I thought it may be interesting or necessary for you to know where I got such a weird TM. Here is the link: http://opus.lingfil.uu.se/MultiUN.php
The original TM was a in zipped file. When I unzipped it, I obtained a TMX file of about 5 GB.
Aha, thanks for the additional info. I just downloaded and had a look at the TMX, and it is bilingual, and contains zero metadata, so its size is purely due to … its size
I'm no Studio specialist, but you might need to cut the TMX up into smaller chunks to get it into Studio. I think András Farkas has a nice little tool to cut up big TMXs, which I think might be in his "Grab Bag", or his LF Aligner package on sourceforge.
Michael | | | One of these monsters | Aug 2, 2015 |
Ah, one of these monsters. We are working on it. I'll let you know when we have it processed. This might take a bit. | | | Richard Foulkes (X) United Kingdom Local time: 14:57 German to English + ... 1 million TUs not unmanageable...? | Aug 3, 2015 |
I've routinely used TMs of that size in Studio in recent years. Is the performance of your computer the issue maybe? Obviously Studio is pretty heavy in terms of memory (RAM) usage. If you can't open a TM of 1m units, I don't think the problem is the TM - unless it's corrupted.
One thing I did do a while ago was to 'prune' one of my big TMs by filtering and deleting all TUs that hadn't been used for over 10 years. It trimmed down the TM and I'm sure I haven't missed them. | |
|
|
Nelson Yemeli United States Member (2016) English to French TOPIC STARTER The TM is actually gigantic!!! | Aug 3, 2015 |
Richard Foulkes wrote:
I've routinely used TMs of that size in Studio in recent years. Is the performance of your computer the issue maybe? Obviously Studio is pretty heavy in terms of memory (RAM) usage. If you can't open a TM of 1m units, I don't think the problem is the TM - unless it's corrupted.
One thing I did do a while ago was to 'prune' one of my big TMs by filtering and deleting all TUs that hadn't been used for over 10 years. It trimmed down the TM and I'm sure I haven't missed them.
Dear Richard, I made a mistake: actually, the TM contains more than 10 millions TUs. It's a "monster" indeed as Siegfried said. | | | Nelson Yemeli United States Member (2016) English to French TOPIC STARTER I found the solution: it is patience! | Aug 3, 2015 |
I thank you all for your help. I have tried almost anything, but finally I think the solution is to start a process and be patient. My pc has been running for three days now. I only hibernate it when going to bed, and I really have much hope. Tomorrow, my AutoSuggest Dictionary may be ready for use. I need it, so I have to wait for it! | | | Richard Foulkes (X) United Kingdom Local time: 14:57 German to English + ... I'd agree 10m TUs is a bit on the big side :) | Aug 3, 2015 |
I'm not surprised you can barely open it! I'd delete old TUs year at a time and break up what's left if need be. Also consider that all the time you spend saving TUs you'll probably never use is hours of your life you'll never get back! I must have wasted a lot of time maintaining TMs down the years.
Good luck. | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Reducing the size of a Translation Memory (TMX) TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |