Skip to content

NameTag 3.1.0

Latest
Compare
Choose a tag to compare
@strakova strakova released this 03 Mar 13:01
· 38 commits to main since this release

Since NameTag 3.1, flat NER models can optionally be trained with multiple named entity tagsets. The trained model can then be required to recognize the named entities using a specific tagset during inference, or a predefined default tagset will be used if none was requested.

This allows joint multidataset training across tagsets, and in turn allows expansion of covered languages. NameTag 3 achieves state-of-the-art performance on 21 test datasets in 15 languages: Cebuano, Chinese, Croatian, Czech, Danish, English, Norwegian Bokmål, Norwegian Nynorsk, Portuguese, Russian, Serbian, Slovak, Swedish, Tagalog, and Ukrainian. It also delivers competitive results on Arabic, Dutch, German, Maghrebi, and Spanish, as of February 2025.

The currently supported tagsets are the following:

  • conll: The CoNLL-2003 shared task tagset: PER, ORG, LOC, and MISC,
  • uner: The Universal NER v1 tagset: PER, ORG, LOC,
  • onto: The OntoNotes v5 tagset: PERSON, NORP, FAC, ORG, GPE, etc.

In the NameTag 3 webservice, the tagset variants of one model are served separately, e.g., nametag3-multilingual-conll-250203, nametag3-multilingual-uner-250203, and nametag3-multilingual-onto-250203. The model tagset variants share one multilingual model and apply tagset masks on its output to predict tags of the requested tagset.