VizSeq

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation and video description. It takes multi-modal sources, text references as well as text predictions as inputs, and analyzes them visually in Jupyter Notebook or a built-in Web App (the former has Fairseq integration). VizSeq also provides a collection of multi-process scorers as a normal Python package.

Please also see the paper https://arxiv.org/pdf/1909.05424.pdf for more details.

Task Coverage

VizSeq accepts various source types, including text, image, audio, video or any combination of them. This covers a wide range of text generation tasks, examples of which are listed below:

Source	Example Tasks
Text	Machine translation, text summarization, dialog generation, grammatical error correction, open-domain question answering
Image	Image captioning, image question answering, optical character recognition
Audio	Speech recognition, speech translation
Video	Video description
Multimodal	Multimodal machine translation

Metric Coverage

Accelerated with multi-processing/multi-threading.

Type

Metrics

N-gram-based

BLEU ([Papineni et al., 2002](https://www.aclweb.org/anthology/P02-1040))
NIST ([Doddington, 2002](http://www.mt-archive.info/HLT-2002-Doddington.pdf))
METEOR ([Banerjee et al., 2005](https://www.aclweb.org/anthology/W05-0909))
TER ([Snover et al., 2006](http://mt-archive.info/AMTA-2006-Snover.pdf))
RIBES ([Isozaki et al., 2010](https://www.aclweb.org/anthology/D10-1092))
chrF ([Popović et al., 2015](https://www.aclweb.org/anthology/W15-3049))
GLEU ([Wu et al., 2016](https://arxiv.org/pdf/1609.08144.pdf))
ROUGE ([Lin, 2004](https://www.aclweb.org/anthology/W04-1013))
CIDEr ([Vedantam et al., 2015](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Vedantam_CIDEr_Consensus-Based_Image_2015_CVPR_paper.pdf))
WER

Embedding-based

LASER ([Artetxe and Schwenk, 2018](https://arxiv.org/pdf/1812.10464.pdf))
BERTScore ([Zhang et al., 2019](https://arxiv.org/pdf/1904.09675.pdf))

Add metric

VizSeq has an open API for adding user-defined metrics. You are welcomed to contribute new scorers to enlarge VizSeq’s metric coverage!

Implementing A New Scorer Class

To start with, first add new_metric.py to vizseq/scorers, in which a new scorer class is inherited from VizSeqScorer and a score method is defined. And then register the new scorer class with an id and a name using vizseq.scorers.register_scorer:

from typing import Optional, List
from vizseq.scorers import register_scorer, VizSeqScorer, VizSeqScore

@register_scorer('new_metric_id', 'New Metric Name')
class NewMetricScorer(VizSeqScorer):
   def score(
           self, hypothesis: List[str], references: List[List[str]],
           tags: Optional[List[List[str]]] = None
   ) -> VizSeqScore:
       # calculate the number of workers by number of examples
       self._update_n_workers(len(hypothesis))

       corpus_score, group_scores, sent_scores = None, None, None

       if self.corpus_level:
           # implement corpus-level score
           corpus_score = 99.9
       if self.sent_level:
           # implement sentence-level score
           sent_scores=[99.9, 99.9]
       if tags is not None:
           tag_set = self._unique(tags)
           # implement group-level (by sentence tags) score
           group_scores={t: 99.9 for t in tag_set}

       return VizSeqScore.make(
           corpus_score=corpus_score, sent_scores=sent_scores,
           group_scores=group_scores
       )

Testing the New Scorer Class

All the scorer classes need to be covered by tests. To achieve that, Add a unit test test_new_metric.py to tests/scorers and run:

python -m unittest tests.scorers.test_new_metric

License

VizSeq is licensed under MIT.