Pages in topic: [1 2] > | How to decompile Encyclopaedia Britannica software program? Thread poster: Reed James
| Reed James Chile Local time: 05:05 Member (2005) Spanish to English
Hello. I am the owner of the Encyclopaedia Britannica on CD. It is a great application, but I would like to be able to extract all of the articles from the program in order to view them separately and/or index them. Now, each article is an HTML file, so it seems like it's doable. However, I took a look at the data files, and had no clue as to how to find each individual article.
Any suggestions?
Thanks!
Reed | | | Natalie Poland Local time: 09:05 Member (2002) English to Russian + ... Moderator of this forum SITE LOCALIZER Violation of the copyright and user's license | Jan 9, 2014 |
Please check the license of your software, I am sure you will find something like this:
"You may not decompile, reverse engineer or disassemble any software..." | | | Rolf Keller Germany Local time: 09:05 English to German Please clarify your question | Jan 10, 2014 |
Reed D James wrote:
Now, each article is an HTML file, so it seems like it's doable. However, I took a look at the data files, and had no clue as to how to find each individual article.
"The data files" means one HTML file per article? So, you could have Windows' search function search (and index) a folder containing all these files. | | | Rolf Keller Germany Local time: 09:05 English to German
Natalie wrote:
Please check the license of your software, I am sure you will find something like this:
"You may not decompile, reverse engineer or disassemble any software..."
That's right. But in some cases local law supersedes contractual regulations.
Here in Germany decompiling etc. it is explicitely allowed (see section 69e UhRG), **IF** the decompiling is necessary in order to make something inter-operable with the copyrighted product. So, if that "something" is any other lookup software, you may decompile the copyrighted software in order to gain information on how to make that other lookup software usable. | |
|
|
Reed James Chile Local time: 05:05 Member (2005) Spanish to English TOPIC STARTER I want to know where all the individual HTML files are stored | Jan 10, 2014 |
Rolf Keller wrote:
"The data files" means one HTML file per article? So, you could have Windows' search function search (and index) a folder containing all these files.
You would think that there would be a folder with each individual HTML file that could just be copied into another folder. However, that's not the way it works. I'm just asking how to extract this data for the program so I can do that.
I'm not looking to profit from this data, I just want to be able to do what I want with it instead of having to search in the Britannica program.
[Edited at 2014-01-10 11:23 GMT] | | | Natalie Poland Local time: 09:05 Member (2002) English to Russian + ... Moderator of this forum SITE LOCALIZER Copyright again | Jan 10, 2014 |
Reed D James wrote:
I'm not looking to profit from this data, I just want to be able to do what I want with it instead of having to search in the Britannica program.
The question is not in making profit or not; the question is if decompiling the program is legal in your country and in accordance with the user license you hold. If it is not, this cannot be discussed in this forum. | | |
Natalie wrote:
Reed D James wrote:
I'm not looking to profit from this data, I just want to be able to do what I want with it instead of having to search in the Britannica program.
The question is not in making profit or not; the question is if decompiling the program is legal in your country and in accordance with the user license you hold. If it is not, this cannot be discussed in this forum.
Let's assume that it is. Companies put all sorts of ludicrous conditions in the EULA, a lot of which won't stand up to scrutiny or challenge in court in many jurisdictions. Even if there's a "do not decompile" clause in the EULA and it's enforceable in the country in question, decompiling still isn't illegal. It's just an infringement of the contract (the EULA).
It's not our job to police anyone's actions on behalf of Britannica. My approach is that the OP bought the software, it's his to dissect. Of course he is not allowed to distribute the fruits of his labour but he doesn't wish to.
I had a cursory look at the files a couple of years ago, and found that it's not trivial. IIRC I was able to identify the data files but I couldn't extract anything from them. I suspect that it will take some serious hacking skills to extract them. It's not just a big .rar file with HTML or XML files in it or something elementary like that. If you have computer scientist/coder/hacker friends you could ask them to have a look. If you've found the files, you can try and look up the file extension online and try and guess what format it is. You can also post here but I doubt that we can help you.
By the way, Wikipedia provides dumps of its entire database at no cost, and they are in an open format of course. You can do with them as you please. For instance, you can make glossaries out of them.
[Edited at 2014-01-10 12:49 GMT] | | | Samuel Murray Netherlands Local time: 09:05 Member (2006) English to Afrikaans + ... Which version? | Jan 10, 2014 |
Reed D James wrote:
I [have] the Encyclopaedia Britannica on CD. ... I would like to be able to extract all of the articles from the program in order to view them separately and/or index them.
Let's assume for the moment that you want to know is not illegal in your country (we can only find out when we know more about what it is that you're trying to accomplish). If you hadn't used the word "decompile" in your thread title, the copyright police might not have noticed... since what you seem to be asking has nothing to do with the ordinary meaning of the word "decompile".
[From a strictly linguistic point of view, unzipping a zip file is "decompiling" it, but the computer people usually mean something more specific when they say "decompile", which doesn't seem to be what you're trying to do.]
So, which version of Encyclopaedia Britannica do you have? It may be that the different versions have different ways of getting to the content. | |
|
|
Reed James Chile Local time: 05:05 Member (2005) Spanish to English TOPIC STARTER 2013-the latest version I think | Jan 10, 2014 |
Samuel Murray wrote:
Reed D James wrote:
I [have] the Encyclopaedia Britannica on CD. ... I would like to be able to extract all of the articles from the program in order to view them separately and/or index them.
Let's assume for the moment that you want to know is not illegal in your country (we can only find out when we know more about what it is that you're trying to accomplish). If you hadn't used the word "decompile" in your thread title, the copyright police might not have noticed... since what you seem to be asking has nothing to do with the ordinary meaning of the word "decompile".
[From a strictly linguistic point of view, unzipping a zip file is "decompiling" it, but the computer people usually mean something more specific when they say "decompile", which doesn't seem to be what you're trying to do.]
So, which version of Encyclopaedia Britannica do you have? It may be that the different versions have different ways of getting to the content.
Anyway, since it's such a touchy subject, I think I'll just figure out a slow way of doing it, i.e. copying and pasting. Thanks for your input. | | | Michael Beijer United Kingdom Local time: 08:05 Member Dutch to English + ... Use a macro recorder to automate the copy/pasting | Jan 10, 2014 |
If you are going to try copying/pasting, you could also use something like AutoHotkey and an AHK script recorder to do it automatically. If you can break down the steps needed to copy/paste the contents out of the program, it should be possible to create an AutoHotkey script to do it for you.
For example, if all you need to do is:
1. Press the down arrow on your keyboard (or something similar)
2. Ctrl+A
3. Ctlr+C
4. Save the clipboard conte... See more If you are going to try copying/pasting, you could also use something like AutoHotkey and an AHK script recorder to do it automatically. If you can break down the steps needed to copy/paste the contents out of the program, it should be possible to create an AutoHotkey script to do it for you.
For example, if all you need to do is:
1. Press the down arrow on your keyboard (or something similar)
2. Ctrl+A
3. Ctlr+C
4. Save the clipboard contents to a separate file with separate file name (there are AHK scripts for this)
5. Esc,
1. Press the down arrow on your keyboard (or something similar)
2. Ctrl+A
3. Ctlr+C
etc.
... you can quite easily automate this.
Then all you have to do is start the script and wait for a few hours.
Michael
http://www.macrocreator.com/
http://www.autohotkey.com/board/topic/79763-macro-creator-v411-automation-tool-recorder-writer/
[Edited at 2014-01-10 13:25 GMT] ▲ Collapse | | | Reed James Chile Local time: 05:05 Member (2005) Spanish to English TOPIC STARTER It's a little more complicated than that… | Jan 10, 2014 |
Michael Beijer wrote:
If you are going to try copying/pasting, you could also use something like AutoHotkey and an AHK script recorder to do it automatically. If you can break down the steps needed to copy/paste the contents out of the program, it should be possible to create an AutoHotkey script to do it for you.
For example, if all you need to do is:
1. Press the down arrow on your keyboard (or something similar)
2. Ctrl+A
3. Ctlr+C
4. Save the clipboard contents to a separate file with separate file name (there are AHK scripts for this)
5. Esc,
1. Press the down arrow on your keyboard (or something similar)
2. Ctrl+A
3. Ctlr+C
etc.
... you can quite easily automate this.
Then all you have to do is start the script and wait for a few hours.
Michael
http://www.macrocreator.com/
http://www.autohotkey.com/board/topic/79763-macro-creator-v411-automation-tool-recorder-writer/[Edited at 2014-01-10 13:25 GMT]
Thanks for the tip, Michael. I'm afraid it's a little more complicated than that. You see, there are two columns or sections to this program. You have the article titles on the left, and article itself on the right. Unfortunately, if you just hit the down arrow on the article title column, the article content on the right column or pane will not refresh; you have to click on the title column for this to happen. So I'm confused as to how to get to each title with its corresponding content without using the mouse. Because if I have to program it using mouse clicks, then I really don't know how that's going to work.
BTW: I use Macro Express, a very competent and complete application. | | | Rolf Keller Germany Local time: 09:05 English to German I'm confused ... | Jan 10, 2014 |
Reed D James wrote:
I want to know where all the individual HTML files are stored
I'm confused. You haven't seen the HTML files yet? How do you know that there are such files? | |
|
|
Samuel Murray Netherlands Local time: 09:05 Member (2006) English to Afrikaans + ... Okay, an answer | Jan 10, 2014 |
Reed D James wrote:
I [have] the Encyclopaedia Britannica on CD. ... I would like to be able to extract all of the articles from the program in order to view them separately and/or index them. Now, each article is an HTML file...
I don't think every article is an HTML file. It is hypertext, yes, but I see no indication that it is HTML. You can, of course, press Ctrl+S at any time and it will save the current article as an HTML file, but that HTML file will not have any links in it and will lack some of the other features as well.
According to EB's web site, the 2013 DVD contains over 100 000 articles, so it's going to take you a very long time to extract it all. Even if you can extract one article every 10 seconds for 3 hours a day, it will still take you 100 days to do it.
I see your problem with the left column (titles) and the right column (article). There is no keyboard shortcut for loading an article -- you have to click it with the mouse. However, I think I have found a way to ensure that you can click the next article every time: if you can make the mouse click the first line of the title column, and then click the "down" scrollbar at the bottom of the screen once, then the title list moves up one title, and then you can click on the "first" line of the title column again to load the next page, and so on. Fortunately, Ctrl+S works anywhere, so you can press Ctrl+S after you clicked, and it will save an HTML file. You'll just have to name the HTML files when you save them. And... if you've clicked on an article title and press Ctrl+C, it will copy the name of that article, so you can name the "save as" HTML file for the name of the article exactly.
You'll end up with a bunch of HTML files, though, so you'll have to make sure you have an indexing program that can index HTML files.
Can your macro language do this, or do you want one of us to write it in AutoIt?
I think you'll find that you may THINK that you'll have great benefit from having all the articles in your own database format, but I think ultimately it would be best to simply use the company's own software, or alternatively try to buy a web-based subcription.
However, I took a look at the data files, and had no clue as to how to find each individual article.
I don't think the individual articles are stored as individual files or chunks of extractable data, either on the installation DVD or on your hard drive's installation folder. I doubt if any encyclopedia publisher would be stupid enough to do that.
Samuel
[Edited at 2014-01-10 17:57 GMT] | | | Alan Halls Germany Local time: 09:05 German to English Legal problem, definitely | Jan 11, 2014 |
For some, possibly a minor point, but if FarkasAndras says:
"My approach is that the OP bought the software, it's his to dissect."
I would have a closer look at the EULA conditions. The general legal situation is that you buy a licence to USE the software. You don't own Microsoft Office, for example, just because you've paid for a licence.
I'm all in favour of open-source software where that is the intention of the people who invent and distribute it. If i... See more For some, possibly a minor point, but if FarkasAndras says:
"My approach is that the OP bought the software, it's his to dissect."
I would have a closer look at the EULA conditions. The general legal situation is that you buy a licence to USE the software. You don't own Microsoft Office, for example, just because you've paid for a licence.
I'm all in favour of open-source software where that is the intention of the people who invent and distribute it. If it is a commercial product, I would tend to leave well alone. I also use EB for my own reference purposes and just leave it running in the background. ▲ Collapse | | | Reed James Chile Local time: 05:05 Member (2005) Spanish to English TOPIC STARTER That isn't fair | Jan 12, 2014 |
Alan Halls wrote:
For some, possibly a minor point, but if FarkasAndras says:
"My approach is that the OP bought the software, it's his to dissect."
I would have a closer look at the EULA conditions. The general legal situation is that you buy a licence to USE the software. You don't own Microsoft Office, for example, just because you've paid for a licence.
I'm all in favour of open-source software where that is the intention of the people who invent and distribute it. If it is a commercial product, I would tend to leave well alone. I also use EB for my own reference purposes and just leave it running in the background.
The way I see it, if I own a product, then it's mine to do whatever I want with it in the privacy of my own home. If I own a pair of Levi's jeans, and they get old, and I get my scissors out and make a pair of cutoffs out of them, I go ahead and do it. Now, if I were to buy Levi's wholesale, set up a factory and hire people to make cutoffs out of them with industrial machinery, and then sell the cutoffs for my own profit under my own brand, that would be unethical and illegal.
What is the difference between decompiling and/or copying and pasting from a software program and converting a PDF to an editable document to be indexed and searched by the owner? How about buying the set of encyclopedias in print form and making photocopies to take on the road with you? Haven't we all done something like that? What if I had a prodigal memory and I took it upon myself to read and memorize each and every Encyclopaedia Britannica article and then profit immensely from all the knowledge I gained? Isn't that decompiling in a sense?
As for the macros, I found them to be buggy, even when I think the code was legit. Somehow, the computer seized up when the macro told it to save the article, even though the timing was exactly the same as when I did it manually. No matter.
In short, I have closed this discussion, at least on my end. I'm going to go read my Encyclopaedia Britannica instead. | | | Pages in topic: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » How to decompile Encyclopaedia Britannica software program? CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
| Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |