Files
office_translator/scripts/split_i18n.py
sepehr fa637abff0
All checks were successful
Deploy to Production / Build and Deploy (push) Successful in 2m49s
perf+security: fix build, secure downloads, dedupe translations, refactor i18n
Frontend:
- Fix Framer Motion / motion-dom build error by pinning framer-motion to
  11.18.2 (compatible with React 19 and Next.js 16).
- Add cross-env and build:local script to bypass standalone symlink errors
  on Windows without Developer Mode.
- Allow NEXT_OUTPUT=default to disable standalone output for local builds.
- Refactor i18n: split 14,177-line src/lib/i18n.tsx into per-locale,
  per-namespace JSON files under src/lib/i18n/messages/.
- Load English synchronously; other locales loaded on demand via dynamic
  imports (reduces initial bundle, improves maintainability).
- Remove unused next-intl message files src/messages/en.json and fr.json.

Backend:
- Remove insecure legacy /api/v1/download/{filename} and /api/v1/cleanup/{filename}
  endpoints. The job-based /api/v1/download/{job_id} already enforces ownership.
- Deduplicate texts in TranslationService.translate_batch before sending them
  to the provider, reducing API calls for repeated strings.
- Pin httpx to <0.28 to fix TestClient incompatibility with starlette 0.35.1.
- Add pytest-cov and ruff dev dependencies/config.

DevOps:
- Remove hardcoded Grafana password from docker-compose.yml and
  docker-compose.monitoring.yml; use GRAFANA_PASSWORD env var.
- Change default TRANSLATION_SERVICE from ollama to google in
  docker-compose.yml (Ollama is an optional profile).
- Add GRAFANA_PASSWORD to .env.example.
- Add .coverage and frontend/pnpm-workspace.yaml to .gitignore.

Tests:
- Update API versioning tests for removed legacy endpoints.
- Add tests/test_translation_service.py for deduplication behavior.

Verified:
- pnpm run build:local passes.
- uv run pytest tests/test_providers/* tests/test_translation_service.py
  tests/test_story_3_5_api_versioning.py tests/test_download_endpoint.py
  tests/test_translators/test_excel_translator.py: provider/translator tests
  pass; one pre-existing French error-message test still fails (message is
  returned in English, unrelated to this change).
2026-06-14 16:44:18 +02:00

108 lines
3.4 KiB
Python

"""
Split frontend/src/lib/i18n.tsx into per-locale, per-namespace JSON files.
Generates:
frontend/src/lib/i18n/messages/<locale>/<namespace>.json
frontend/src/lib/i18n/messages/index.json (manifest)
"""
import json
import re
from pathlib import Path
from collections import defaultdict
ROOT = Path(__file__).parent.parent
SOURCE = ROOT / "frontend" / "src" / "lib" / "i18n.tsx"
OUT_DIR = ROOT / "frontend" / "src" / "lib" / "i18n" / "messages"
content = SOURCE.read_text(encoding="utf-8")
def find_locale_blocks(text: str) -> list[tuple[str, str]]:
"""Find each locale block using brace matching."""
blocks = []
pattern = re.compile(r'\n\s*([a-z]{2}):\s*\{')
for match in pattern.finditer(text):
locale = match.group(1)
start = match.end() - 1 # position of the opening '{'
brace_count = 0
in_string = False
escape = False
i = start
while i < len(text):
ch = text[i]
if escape:
escape = False
elif ch == "\\":
escape = True
elif ch == '"':
in_string = not in_string
elif not in_string:
if ch == "{":
brace_count += 1
elif ch == "}":
brace_count -= 1
if brace_count == 0:
blocks.append((locale, text[start + 1 : i]))
break
i += 1
return blocks
def parse_block(block: str) -> dict[str, str]:
"""Parse key: value pairs from a locale block. Values may be concatenated strings."""
messages = {}
# Match "key": value, where value is a string literal possibly followed by + "..."
entry_pattern = re.compile(
r'"([a-zA-Z0-9_\-\.]+)":\s*((?:"(?:[^"\\]|\\.)*"\s*(?:\+\s*)?)+)',
re.DOTALL,
)
for match in entry_pattern.finditer(block):
key = match.group(1)
raw = match.group(2)
parts = re.findall(r'"((?:[^"\\]|\\.)*)"', raw)
value = "".join(parts)
messages[key] = value
return messages
OUT_DIR.mkdir(parents=True, exist_ok=True)
manifest: dict[str, list[str]] = defaultdict(list)
all_namespaces: set[str] = set()
for locale, block in find_locale_blocks(content):
messages = parse_block(block)
by_namespace: dict[str, dict[str, str]] = defaultdict(dict)
for key, value in messages.items():
namespace = key.split(".")[0]
by_namespace[namespace][key] = value
all_namespaces.add(namespace)
locale_dir = OUT_DIR / locale
locale_dir.mkdir(parents=True, exist_ok=True)
for namespace, msgs in by_namespace.items():
(locale_dir / f"{namespace}.json").write_text(
json.dumps(msgs, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
if namespace not in manifest[locale]:
manifest[locale].append(namespace)
# Write manifest
manifest = {loc: sorted(manifest[loc]) for loc in sorted(manifest)}
(OUT_DIR / "index.json").write_text(
json.dumps(
{
"locales": list(manifest.keys()),
"namespaces": sorted(all_namespaces),
"manifest": manifest,
},
indent=2,
ensure_ascii=False,
)
+ "\n",
encoding="utf-8",
)
print(f"Wrote namespace files for {len(manifest)} locales")
print(f"Namespaces: {len(all_namespaces)}")
print(f"Total keys (en): {sum(len(manifest[loc]) for loc in manifest)}")