Scoring function
The API returns a trust_score for each translation. This score combines the
model confidence with terminology evidence from the post-processing lookup.
Formula
The implementation uses three cases:
Exact full terminology match: return
1.0.No terminology match: return the model confidence.
Partial terminology match: combine model confidence and terminology match signal.
The weighting scheme is:
where:
and:
Cis the model confidence used internally by the scoring function.Mis the terminology match signal.R_coverageis the proportion of the translated text covered by matched terminology spans.R_count = min(number_of_matches / 5, 1.0).
This means exact terminology matches are trusted fully, partial matches raise or lower the score according to coverage and match count, and translations with no terminology support rely only on the model confidence.
Remote API evaluation
The scenarios below were evaluated against
https://anstranslation2.ddns.net/translate using the deployed NLLB model
/app/models/checkpoint-16539.
Scenario |
Terms |
With highlights |
Exact score 1.0 |
Average trust_score |
|---|---|---|---|---|
Exact term with a terminology match |
1 |
1 |
1 |
|
Sentence/label with exact terms matched within terminology |
50 |
50 |
1 |
|
No terminology matches |
6 |
0 |
0 |
|
Scenario examples
The table shows representative examples only. The complete 50-label scenario 2 run is stored in scoring_function_remote_api_results.txt.
Scenario |
Input |
Output |
trust_score |
Highlights |
|---|---|---|---|---|
Scenario 1 |
|
|
|
1 |
Scenario 2 |
|
|
|
5 |
Scenario 2 |
|
|
|
3 |
Scenario 2 |
|
|
|
1 |
Scenario 2 |
|
|
|
3 |
Scenario 3 |
|
|
|
0 |
Scenario 3 |
|
|
|
0 |
Scenario 3 |
|
|
|
0 |
Key takeaways
Scenario 1 confirms the exact-match rule: when the translated term is found exactly in the terminology,
trust_scorebecomes1.0.Scenario 2 confirms the partial-match behavior: sentence-like labels usually receive high scores when several translated spans match known terminology.
Scenario 3 confirms the fallback behavior on realistic hard clinical shorthand: when there are no terminology highlights, the score comes from the model confidence only.
The full scenario 2 batch used 50 labels and is intentionally kept outside the rendered page to avoid making the documentation too long.