This site uses cookies.
Some of these cookies are essential to the operation of the site,
while others help to improve your experience by providing insights into how the site is being used.
For more information, please see the ProZ.com privacy policy.
Query on Accurate Word Count for Webpage Translation
Thread poster: Lamine Boukabour
Lamine Boukabour Algeria Local time: 18:17 Member (2022) English to Arabic
Jul 18
Dear Colleagues,
For translation purposes, I had to estimate the word count of some webpages using my CAT tool after downloading them in HTML format. A webpage can be downloaded in three formats, as shown in the screenshot.
To check if the different formats lead to the same word count, I downloaded a webpage in two formats: HTML Only and Complete Webpage. However, upon uploading them to my CAT tool, I found significant differences in the word count. The word count of the Complete Webpage is almost double that of the HTML Only Webpage. Can you tell me which format represents the correct estimated word count that should be translated?
Thank you for your assistance.
[Edited at 2024-07-18 13:24 GMT]
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Luca Tutino Italy Member (2002) English to Italian + ...
Just look inside the files with a txt editor
Jul 18
It is possible that the complete page includes a second copy of the text in txt format or other non-html sections. You might be able to verify what is the case simply by examining the file with a text editore like notepad or, better, notepad++.
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Lamine Boukabour Algeria Local time: 18:17 Member (2022) English to Arabic
TOPIC STARTER
The Complete Webpage contains an HTML file + a folder
Jul 18
Luca Tutino wrote:
It is possible that the complete page includes a second copy of the text in txt format or other non-html sections. You might be able to verify what is the case simply by examining the file with a text editor like notepad or, better, notepad++.
Both methods generate two HTML files, with an additional folder when saving the webpage as a Complete Webpage. This resulted in two HTML files with different word counts. I wonder which file reflects the correct estimated word count of the page.
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Jennifer Levey Chile Local time: 14:17 Spanish to English + ...
Probably neither of them is a 'correct estimate'
Jul 18
Lamine Boukabour wrote:
For translation purposes, I had to estimate the word count of some webpages using my CAT tool after downloading them in
(...)
two formats: HTML Only and Complete Webpage. However, upon uploading them to my CAT tool, I found significant differences in the word count. The word count of the Complete Webpage is almost double that of the HTML Only Webpage. Can you tell me which format represents the correct estimated word count that should be translated?
(...)
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Lamine Boukabour Algeria Local time: 18:17 Member (2022) English to Arabic
TOPIC STARTER
Word Count Estimation Approach
Jul 19
Jennifer Levey wrote:
Lamine Boukabour wrote:
For translation purposes, I had to estimate the word count of some webpages using my CAT tool after downloading them in
(...)
two formats: HTML Only and Complete Webpage. However, upon uploading them to my CAT tool, I found significant differences in the word count. The word count of the Complete Webpage is almost double that of the HTML Only Webpage. Can you tell me which format represents the correct estimated word count that should be translated?
(...)
It seems to me that if I can't get the actual word count from the client, the most straightforward approach is to manually download the intended pages as HTML and use my CAT tool to estimate the word count. I tried using software to automatically download them but was unsuccessful.
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.