DictaSign Corpus
IDGS · Universität Hamburg
BSL
CC BY-NC-SA 3.0
The European DictaSign project produced a comparable parallel corpus across four sign languages (BSL, DGS, GSL, LSF). Kozha's British Sign Language data is drawn directly from the DictaSign BSL portion. The same corpus is aliased to the ASL translator entry, since no dedicated ASL corpus is available yet — ASL output uses ASL fingerspelling, but lexical signs fall through to the BSL dataset.
- Institution
- Institute of German Sign Language and Communication of the Deaf (IDGS), Universität Hamburg
- Entries
- 881 signs (plus 26-letter BSL fingerspelling alphabet)
Efthimiou, E., Fotinea, S.-E., Hanke, T., Glauert, J., Bowden, R., Braffort, A., Collet, C., Maragos, P., & Lefebvre-Albaret, F. (2012). The DictaSign Wiki: Enabling Web Communication for the Deaf. Computers Helping People with Special Needs (ICCHP).
DGS Lexicon (via SignAvatars)
IDGS · DGS-Korpus · SignAvatars
DGS
Per-entry
German Sign Language data is drawn from the DGS Lexicon, curated by IDGS as part of the DGS-Korpus project, and re-packaged for avatar rendering by the SignAvatars aggregation. Kozha ingests the SignAvatars SiGML export; the upstream linguistic record is the DGS-Korpus.
- Institution
- IDGS, Universität Hamburg (upstream) · SignAvatars project (aggregation)
- Entries
- 1,914 signs
- Provenance
- Native-signer review at corpus level (DGS-Korpus recordings).
Per DGS Lexicon terms (varies by entry; academic/research use permitted). SignAvatars code is released under the upstream repository's declared terms.
Prillwitz, S., Hanke, T., König, S., Konrad, R., Langer, G., & Schwarz, A. (2008). DGS Corpus Project. LREC Workshop on the Representation and Processing of Sign Languages. And: Yu, Z. et al. (2024). SignAvatars.
DictaSign LSF (via SignAvatars)
DictaSign consortium · SignAvatars
LSF
CC BY-NC-SA 3.0
The French Sign Language portion of the DictaSign corpus, re-packaged for avatar rendering by SignAvatars. The LSF branch was recorded in collaboration with French Deaf community signers during the original EU project.
- Institution
- DictaSign consortium (upstream) · SignAvatars project (aggregation)
- Entries
- 381 signs (plus 26-letter LSF fingerspelling alphabet)
- Provenance
- Native-signer review at corpus level.
DictaSign consortium (2012). LSF entries redistributed via Yu, Z. et al. (2024), SignAvatars.
DictaSign GSL (via SignAvatars)
DictaSign consortium · SignAvatars
GSL
CC BY-NC-SA 3.0
The Greek Sign Language portion of the DictaSign corpus, re-packaged by SignAvatars. GSL was recorded with native signers from the Greek Deaf community during the DictaSign project.
- Institution
- DictaSign consortium (upstream) · SignAvatars project (aggregation)
- Entries
- 889 signs
- Provenance
- Native-signer review at corpus level.
DictaSign consortium (2012). GSL entries redistributed via Yu, Z. et al. (2024), SignAvatars.
PJM Dictionary (via SignAvatars)
University of Warsaw · SignAvatars
PJM
Academic
Polish Sign Language data comes from the PJM Dictionary at the University of Warsaw, re-packaged by SignAvatars. The Warsaw PJM dictionary is maintained by the Section for Sign Linguistics and is the reference record for Polish Deaf lexicography.
- Institution
- Section for Sign Linguistics, University of Warsaw (upstream) · SignAvatars project (aggregation)
- Entries
- 1,932 signs (plus 26-letter PJM fingerspelling alphabet)
- Provenance
- Native-signer review at corpus level — the Warsaw dictionary is built on recordings with native PJM signers.
Per the Warsaw PJM Dictionary terms — academic / research use permitted. See the upstream dictionary site for the authoritative statement.
Łacheta, J., Czajkowska-Kisil, M., Linde-Usiekniewicz, J., & Rutkowski, P. (eds.). Corpus-based Dictionary of Polish Sign Language (słownik PJM). University of Warsaw. Entries redistributed via SignAvatars.
SignLanguageSynthesis
Lyke Esselink · community contributor
NGT
Unclear
Dutch Sign Language (Nederlandse Gebarentaal, NGT) data is drawn from Lyke Esselink's SignLanguageSynthesis project, an open-source SiGML dataset authored as part of research work.
- Maintainer
- Lyke Esselink (community contributor)
- Entries
- 39 signs (plus 26-letter NGT fingerspelling alphabet)
No license declared in the upstream repository. We are reaching out to the author to confirm permissions for redistribution. If clarification is not obtained within a reasonable timeline, we will remove the NGT dataset from the translator rather than continue to use it silently.
Esselink, L. SignLanguageSynthesis. GitHub repository.
algerianSignLanguage-avatar
Taha Zerrouki · community contributor
Algerian SL
Unclear
Algerian Sign Language data comes from Taha Zerrouki's algerianSignLanguage-avatar GitHub repository. The repository contains a small seed of SiGML entries; coverage is limited pending community contribution.
- Maintainer
- Taha Zerrouki (community contributor)
- Entries
- 1 sign (seed only)
- Provenance
- Review status varies. No Deaf-native-signer review yet.
No license declared in the upstream repository. We are reaching out to the author to confirm permissions.
Zerrouki, T. algerianSignLanguage-avatar. GitHub repository.
bdsl-3d-animation
Devr Arif Khan · community contributor
Bangla SL
Unclear
Bangla Sign Language data is drawn from Devr Arif Khan's bdsl-3d-animation GitLab repository, a community project producing SiGML for BdSL signs.
- Maintainer
- Devr Arif Khan (community contributor)
- Entries
- 81 signs (self-declared 94; 13 entries unaccounted for)
- Provenance
- Community-contributed, pending Deaf-native review.
No license declared in the upstream repository. We are reaching out to the author to confirm permissions.
Khan, D. A. bdsl-3d-animation. GitLab repository.
Text-to-Sign-Language & text_to_isl
Divanshu & Shoebham · community contributors
Indian SL
Unclear
Indian Sign Language data is drawn from two community GitHub repositories — Divanshu's Text-to-Sign-Language and Shoebham's text_to_isl — which together contribute the current ISL SiGML set.
- Maintainer
- Divanshu and Shoebham (community contributors)
- Entries
- 763 signs
- Provenance
- Community-contributed, pending Deaf-native review.
No license declared in either upstream repository. We are reaching out to both authors to confirm permissions.
Divanshu, H. Text-to-Sign-Language; Shoebham. text_to_isl. GitHub repositories.
KurdishSignLanguage
KurdishBLARK organisation
Kurdish SL
Unclear
Kurdish Sign Language data comes from the KurdishBLARK organisation's KurdishSignLanguage repository — a community GitHub project with a SiGML dataset for Kurdish signs.
- Maintainer
- KurdishBLARK organisation (community contributors)
- Entries
- 558 signs
- Provenance
- Community-contributed, pending Deaf-native review.
No license declared in the upstream repository. We are reaching out to the maintainers to confirm permissions.
KurdishBLARK. KurdishSignLanguage. GitHub repository.
VSL
Raian Rido · community contributor · largest community dataset
Vietnamese SL
Unclear
Vietnamese Sign Language data is drawn from Raian Rido's VSL GitHub repository — the largest community-contributed SiGML dataset in Kozha, contributing over three thousand entries.
- Maintainer
- Raian Rido (community contributor)
- Entries
- 3,564 signs
- Provenance
- Community-contributed, pending Deaf-native review.
No license declared in the upstream repository. We are reaching out to the author to confirm permissions. Given the size of this contribution, license clarification is a priority before further integration work.
Rido, R. VSL. GitHub repository.
syntheticfsl & signtyper
Jennie Ablog · community contributor
Filipino SL
Unclear
Filipino Sign Language data is drawn from Jennie Ablog's two GitHub repositories. The FSL entries ship in a SiGML variant (<hamgestural_sign>) that the current loader does not parse; FSL output today uses fingerspelling only. A future release will add a structural adapter so the lexical signs load.
- Maintainer
- Jennie Ablog (community contributor)
- Entries
- 0 lexical signs (file format not yet supported by loader; 14 entries declared but parsed as zero). The FSL fingerspelling alphabet is available.
- Provenance
- Community-contributed, pending Deaf-native review.
No license declared in the upstream repositories. We are reaching out to the author to confirm permissions.
Ablog, J. syntheticfsl; signtyper. GitHub repositories.