Skip to content

Commit 34c0a24

Browse files
committed
feat: localize_html method
1 parent 9c5994b commit 34c0a24

File tree

5 files changed

+371
-12
lines changed

5 files changed

+371
-12
lines changed

Gemfile.lock

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ PATH
33
specs:
44
lingodotdev (0.1.0)
55
json (~> 2.0)
6+
nokogiri (~> 1.0)
67

78
GEM
89
remote: https://rubygems.org/
@@ -17,12 +18,15 @@ GEM
1718
rdoc (>= 4.0.0)
1819
reline (>= 0.4.2)
1920
json (2.15.2)
21+
nokogiri (1.18.10-arm64-darwin)
22+
racc (~> 1.4)
2023
pp (0.6.3)
2124
prettyprint
2225
prettyprint (0.2.0)
2326
psych (5.2.6)
2427
date
2528
stringio
29+
racc (1.8.1)
2630
rake (13.3.1)
2731
rdoc (6.15.1)
2832
erb
@@ -47,24 +51,14 @@ GEM
4751
tsort (0.2.0)
4852

4953
PLATFORMS
50-
aarch64-linux-gnu
51-
aarch64-linux-musl
52-
arm-linux-gnu
53-
arm-linux-musl
5454
arm64-darwin
55-
ruby
56-
x86-linux-gnu
57-
x86-linux-musl
58-
x86_64-darwin
59-
x86_64-linux-gnu
60-
x86_64-linux-musl
6155

6256
DEPENDENCIES
6357
dotenv (~> 3.0)
6458
irb
59+
lingodotdev!
6560
rake (~> 13.0)
6661
rspec (~> 3.13)
67-
lingodotdev!
6862

6963
BUNDLED WITH
7064
2.7.1

README.md

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,38 @@ result = engine.localize_chat(chat, target_locale: 'ja')
116116
# ]
117117
```
118118

119+
### HTML document localization
120+
121+
Localize HTML documents while preserving structure and formatting:
122+
123+
```ruby
124+
html = <<~HTML
125+
<html>
126+
<head>
127+
<title>Welcome</title>
128+
<meta name="description" content="Page description">
129+
</head>
130+
<body>
131+
<h1>Hello World</h1>
132+
<p>This is a paragraph with <a href="/test" title="Link title">a link</a>.</p>
133+
<img src="/image.jpg" alt="Test image">
134+
<input type="text" placeholder="Enter text">
135+
</body>
136+
</html>
137+
HTML
138+
139+
result = engine.localize_html(html, target_locale: 'es')
140+
# => HTML with localized text content and attributes, lang="es" attribute updated
141+
```
142+
143+
The method handles:
144+
145+
- Text content in all elements
146+
- Localizable attributes: `meta` content, `img` alt, `input` placeholder, `a` title
147+
- Preserves HTML structure and formatting
148+
- Ignores content inside `script` and `style` tags
149+
- Updates the `lang` attribute on the `html` element
150+
119151
### Batch localization to multiple locales
120152

121153
Localize the same content to multiple target locales:
@@ -308,6 +340,21 @@ Localizes chat messages. Each message must have `:name` and `:text` keys.
308340
- **Parameters:** Same as `localize_text`, with `chat` (Array) instead of `text`
309341
- **Returns:** Array of localized chat messages
310342

343+
#### `localize_html(html, target_locale:, source_locale: nil, fast: nil, reference: nil, on_progress: nil, concurrent: false, &block)`
344+
345+
Localizes an HTML document while preserving structure and formatting. Handles both text content and localizable attributes (alt, title, placeholder, meta content).
346+
347+
- **Parameters:**
348+
- `html` (String): HTML document string to localize
349+
- `target_locale` (String): Target locale code (e.g., 'es', 'fr', 'ja')
350+
- `source_locale` (String, optional): Source locale code
351+
- `fast` (Boolean, optional): Enable fast mode
352+
- `reference` (Hash, optional): Additional context for translation
353+
- `on_progress` (Proc, optional): Progress callback
354+
- `concurrent` (Boolean): Enable concurrent processing
355+
- `&block`: Alternative progress callback
356+
- **Returns:** Localized HTML document string with updated `lang` attribute
357+
311358
#### `batch_localize_text(text, target_locales:, source_locale: nil, fast: nil, reference: nil, concurrent: false)`
312359

313360
Localizes text to multiple target locales.
@@ -426,8 +473,8 @@ bundle exec rake install
426473

427474
## Dependencies
428475

429-
- `http` ~> 5.0
430476
- `json` ~> 2.0
477+
- `nokogiri` ~> 1.0
431478

432479
## License
433480

lib/lingodotdev.rb

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
require 'json'
77
require 'securerandom'
88
require 'openssl'
9+
require 'nokogiri'
910

1011
# Configure SSL context globally at module load time to work around CRL verification issues
1112
# This is a production-safe workaround for OpenSSL 3.6+ that disables CRL checking
@@ -323,6 +324,181 @@ def localize_chat(chat, target_locale:, source_locale: nil, fast: nil, reference
323324
response[:chat] || []
324325
end
325326

327+
# Localizes an HTML document while preserving structure and formatting.
328+
#
329+
# Handles both text content and localizable attributes (alt, title, placeholder, meta content).
330+
#
331+
# @param html [String] the HTML document string to be localized
332+
# @param target_locale [String] the target locale code (e.g., 'es', 'fr', 'ja')
333+
# @param source_locale [String, nil] the source locale code (optional, auto-detected if not provided)
334+
# @param fast [Boolean, nil] enable fast mode for quicker results (optional)
335+
# @param reference [Hash, nil] additional context for translation (optional)
336+
# @param on_progress [Proc, nil] callback for progress updates (optional)
337+
# @param concurrent [Boolean] enable concurrent processing (default: false)
338+
#
339+
# @yield [progress] optional block for progress tracking
340+
# @yieldparam progress [Integer] completion percentage (0-100)
341+
#
342+
# @return [String] the localized HTML document as a string, with updated lang attribute
343+
#
344+
# @raise [ValidationError] if target_locale is missing or html is nil
345+
# @raise [APIError] if the API request fails
346+
#
347+
# @example Basic usage
348+
# html = '<html><head><title>Hello</title></head><body><p>World</p></body></html>'
349+
# result = engine.localize_html(html, target_locale: 'es')
350+
# # => "<html lang=\"es\">..."
351+
def localize_html(html, target_locale:, source_locale: nil, fast: nil, reference: nil, on_progress: nil, concurrent: false, &block)
352+
raise ValidationError, 'Target locale is required' if target_locale.nil? || target_locale.empty?
353+
raise ValidationError, 'HTML cannot be nil' if html.nil?
354+
355+
callback = block || on_progress
356+
357+
doc = Nokogiri::HTML::Document.parse(html)
358+
359+
localizable_attributes = {
360+
'meta' => ['content'],
361+
'img' => ['alt'],
362+
'input' => ['placeholder'],
363+
'a' => ['title']
364+
}
365+
366+
unlocalizable_tags = ['script', 'style']
367+
368+
extracted_content = {}
369+
370+
get_path = lambda do |node, attribute = nil|
371+
indices = []
372+
current = node
373+
root_parent = nil
374+
375+
while current
376+
parent = current.parent
377+
break unless parent
378+
379+
if parent == doc.root
380+
root_parent = current.name.downcase if current.element?
381+
break
382+
end
383+
384+
siblings = parent.children.select do |n|
385+
(n.element? || (n.text? && n.text.strip != ''))
386+
end
387+
388+
index = siblings.index(current)
389+
if index
390+
indices.unshift(index)
391+
end
392+
393+
current = parent
394+
end
395+
396+
base_path = root_parent ? "#{root_parent}/#{indices.join('/')}" : indices.join('/')
397+
attribute ? "#{base_path}##{attribute}" : base_path
398+
end
399+
400+
process_node = lambda do |node|
401+
parent = node.parent
402+
while parent && !parent.is_a?(Nokogiri::XML::Document)
403+
if parent.element? && unlocalizable_tags.include?(parent.name.downcase)
404+
return
405+
end
406+
parent = parent.parent
407+
end
408+
409+
if node.text?
410+
text = node.text.strip
411+
if text != ''
412+
extracted_content[get_path.call(node)] = text
413+
end
414+
elsif node.element?
415+
element = node
416+
tag_name = element.name.downcase
417+
attributes = localizable_attributes[tag_name] || []
418+
attributes.each do |attr|
419+
value = element[attr]
420+
if value && value.strip != ''
421+
extracted_content[get_path.call(element, attr)] = value
422+
end
423+
end
424+
425+
element.children.each do |child|
426+
process_node.call(child)
427+
end
428+
end
429+
end
430+
431+
head = doc.at_css('head')
432+
if head
433+
head.children.select do |n|
434+
n.element? || (n.text? && n.text.strip != '')
435+
end.each do |child|
436+
process_node.call(child)
437+
end
438+
end
439+
440+
body = doc.at_css('body')
441+
if body
442+
body.children.select do |n|
443+
n.element? || (n.text? && n.text.strip != '')
444+
end.each do |child|
445+
process_node.call(child)
446+
end
447+
end
448+
449+
localized_content = localize_raw(
450+
extracted_content,
451+
target_locale: target_locale,
452+
source_locale: source_locale,
453+
fast: fast,
454+
reference: reference,
455+
concurrent: concurrent
456+
) do |progress, chunk, processed_chunk|
457+
callback&.call(progress)
458+
end
459+
460+
doc.root['lang'] = target_locale if doc.root
461+
462+
localized_content.each do |path, value|
463+
node_path, attribute = path.split('#')
464+
parts = node_path.split('/')
465+
root_tag = parts[0]
466+
indices = parts[1..-1]
467+
468+
parent = root_tag == 'head' ? doc.at_css('head') : doc.at_css('body')
469+
next unless parent
470+
current = parent
471+
472+
indices.each do |index_str|
473+
index = index_str.to_i
474+
siblings = parent.children.select do |n|
475+
(n.element? || (n.text? && n.text.strip != ''))
476+
end
477+
478+
current = siblings[index]
479+
break unless current
480+
481+
if current.element?
482+
parent = current
483+
end
484+
end
485+
486+
if current
487+
if attribute
488+
if current.element?
489+
current[attribute] = value
490+
end
491+
else
492+
if current.text?
493+
current.content = value
494+
end
495+
end
496+
end
497+
end
498+
499+
doc.to_html
500+
end
501+
326502
# Localizes text to multiple target locales.
327503
#
328504
# @param text [String] the text to localize

sdk-ruby.gemspec

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Gem::Specification.new do |spec|
2929
spec.require_paths = ["lib"]
3030

3131
spec.add_dependency "json", "~> 2.0"
32+
spec.add_dependency "nokogiri", "~> 1.0"
3233

3334
spec.add_development_dependency "rspec", "~> 3.13"
3435
spec.add_development_dependency "dotenv", "~> 3.0"

0 commit comments

Comments
 (0)