VizSeq ====== VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation and video description. It takes multi-modal sources, text references as well as text predictions as inputs, and analyzes them visually in Jupyter Notebook or a built-in Web App (the former has Fairseq integration). VizSeq also provides a collection of multi-process scorers as a normal Python package. Please also see the paper `https://arxiv.org/pdf/1909.05424.pdf `_ for more details. Task Coverage ------------- VizSeq accepts various source types, including text, image, audio, video or any combination of them. This covers a wide range of text generation tasks, examples of which are listed below: .. list-table:: :widths: 25 25 :header-rows: 1 * - Source - Example Tasks * - Text - Machine translation, text summarization, dialog generation, grammatical error correction, open-domain question answering * - Image - Image captioning, image question answering, optical character recognition * - Audio - Speech recognition, speech translation * - Video - Video description * - Multimodal - Multimodal machine translation Metric Coverage --------------- **Accelerated with multi-processing/multi-threading.** .. list-table:: :widths: 25 25 :header-rows: 1 * - Type - Metrics * - N-gram-based - * BLEU ([Papineni et al., 2002](https://www.aclweb.org/anthology/P02-1040)) * NIST ([Doddington, 2002](http://www.mt-archive.info/HLT-2002-Doddington.pdf)) * METEOR ([Banerjee et al., 2005](https://www.aclweb.org/anthology/W05-0909)) * TER ([Snover et al., 2006](http://mt-archive.info/AMTA-2006-Snover.pdf)) * RIBES ([Isozaki et al., 2010](https://www.aclweb.org/anthology/D10-1092)) * chrF ([Popović et al., 2015](https://www.aclweb.org/anthology/W15-3049)) * GLEU ([Wu et al., 2016](https://arxiv.org/pdf/1609.08144.pdf)) * ROUGE ([Lin, 2004](https://www.aclweb.org/anthology/W04-1013)) * CIDEr ([Vedantam et al., 2015](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Vedantam_CIDEr_Consensus-Based_Image_2015_CVPR_paper.pdf)) * WER * - Embedding-based - * LASER ([Artetxe and Schwenk, 2018](https://arxiv.org/pdf/1812.10464.pdf)) * BERTScore ([Zhang et al., 2019](https://arxiv.org/pdf/1904.09675.pdf)) Add metric ---------- VizSeq has an open API for adding user-defined metrics. You are welcomed to contribute new scorers to enlarge VizSeq's metric coverage! Implementing A New Scorer Class ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To start with, first add `new_metric.py` to `vizseq/scorers`, in which a new scorer class is inherited from `VizSeqScorer` and a `score` method is defined. And then register the new scorer class with an id and a name using `vizseq.scorers.register_scorer`: .. highlight:: python :: from typing import Optional, List from vizseq.scorers import register_scorer, VizSeqScorer, VizSeqScore @register_scorer('new_metric_id', 'New Metric Name') class NewMetricScorer(VizSeqScorer): def score( self, hypothesis: List[str], references: List[List[str]], tags: Optional[List[List[str]]] = None ) -> VizSeqScore: # calculate the number of workers by number of examples self._update_n_workers(len(hypothesis)) corpus_score, group_scores, sent_scores = None, None, None if self.corpus_level: # implement corpus-level score corpus_score = 99.9 if self.sent_level: # implement sentence-level score sent_scores=[99.9, 99.9] if tags is not None: tag_set = self._unique(tags) # implement group-level (by sentence tags) score group_scores={t: 99.9 for t in tag_set} return VizSeqScore.make( corpus_score=corpus_score, sent_scores=sent_scores, group_scores=group_scores ) Testing the New Scorer Class ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ All the scorer classes need to be covered by tests. To achieve that, Add a unit test `test_new_metric.py` to `tests/scorers` and run: :: python -m unittest tests.scorers.test_new_metric License ------- VizSeq is licensed under `MIT `_.