Releases: ufal/nametag3
NameTag 3.1.0
Since NameTag 3.1, flat NER models can optionally be trained with multiple named entity tagsets. The trained model can then be required to recognize the named entities using a specific tagset during inference, or a predefined default tagset will be used if none was requested.
This allows joint multidataset training across tagsets, and in turn allows expansion of covered languages. NameTag 3 achieves state-of-the-art performance on 21 test datasets in 15 languages: Cebuano, Chinese, Croatian, Czech, Danish, English, Norwegian Bokmål, Norwegian Nynorsk, Portuguese, Russian, Serbian, Slovak, Swedish, Tagalog, and Ukrainian. It also delivers competitive results on Arabic, Dutch, German, Maghrebi, and Spanish, as of February 2025.
The currently supported tagsets are the following:
conll
: The CoNLL-2003 shared task tagset:PER
,ORG
,LOC
, andMISC
,uner
: The Universal NER v1 tagset:PER
,ORG
,LOC
,onto
: The OntoNotes v5 tagset:PERSON
,NORP
,FAC
,ORG
,GPE
, etc.
In the NameTag 3 webservice, the tagset variants of one model are served separately, e.g., nametag3-multilingual-conll-250203
, nametag3-multilingual-uner-250203
, and nametag3-multilingual-onto-250203
. The model tagset variants share one multilingual model and apply tagset masks on its output to predict tags of the requested tagset.
NameTag 3.0.0
NameTag 3.0 is an open-source tool for both flat and nested named entity recognition (NER). NameTag 3 identifies proper names in text and classifies them into a set of predefined categories, such as names of persons, locations, organizations, etc.
NameTag 3.0 offers state-of-the-art or near state-of-the-art performance in English, German, Spanish, Dutch, Czech and Ukrainian.
NameTag 3.0 is a free software under Mozilla Public License 2.0, and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. NameTag is versioned using Semantic Versioning.
Copyright 2024 Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Czech Republic.
Current Release
NameTag 3.0 can be used either as a commandline tool or by requesting the NameTag webservice:
- LINDAT/CLARIN hosts the NameTag Web Application,
- LINDAT/CLARIN also hosts the NameTag REST Web Service.
NameTag 3.0 source code can be found at GitHub.
The NameTag website contains download links of both the released packages and trained models, hosts documentation and refers to demo and online web service.
License
Copyright 2024 Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Czech Republic.
NameTag 3.0 is a free software under Mozilla Public License 2.0 license and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. NameTag is versioned using Semantic Versioning.
Please Cite as (How to Cite)
If you use this software, please give us credit by referencing Straková et al. (2019):
@inproceedings{strakova-etal-2019-neural,
title = "Neural Architectures for Nested {NER} through Linearization",
author = "Strakov{\'a}, Jana and
Straka, Milan and
Hajic, Jan",
editor = "Korhonen, Anna and
Traum, David and
M{\`a}rquez, Llu{\'\i}s",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P19-1527",
doi = "10.18653/v1/P19-1527",
pages = "5326--5331",
}
Versions
Compared to NameTag 2, NameTag 3 is a fine-tuned large language model (LLM) with either a classification head for flat NEs (e.g., the CoNLL-2003 English data) or with seq2seq decoding head for nested NEs (e.g., the CNEC 2.0 Czech data). The seq2seq decoding head is the head proposed by Straková et al. (2019).