Attribution & provenance

Built on the work of Deaf research, linguists, and open-source code.

Every corpus, notation system, font, library, and tool Kozha depends on is listed below with its full citation, license, and current usage. Where an upstream source's license is unclear, we say so plainly and mark it.

  • 17 Sources
  • 12 Sign languages
  • 11k Signs ingested
  • 7 License families
Section 01

Avatar & rendering.

Every animation you see on Kozha is drawn by CWASA, a signing-avatar engine built by the Virtual Humans Group at the University of East Anglia. Without it, the rest of the pipeline has nowhere to send its output.

CWASA — Character Animation System

University of East Anglia · Virtual Humans Group

CC BY-ND

CWASA (the CWA Signing Avatar system) is a WebGL-based signing avatar engine that renders SiGML input as a 3D animation in the browser. Maintained by the Virtual Humans Group (VHG) at the University of East Anglia, CWASA is the public successor to the JASigning and Animgen avatars developed under the eSIGN and DictaSign research programmes.

Maintainer
Virtual Humans Group, School of Computing Sciences, University of East Anglia
Used for
All avatar rendering on the landing page, the translator, and the contribute preview.

Creative Commons Attribution-NoDerivatives (CC BY-ND). See CWASA Conditions of Use. You may use and share CWASA with attribution, but you may not distribute modified versions.

Section 02

Notation system.

HamNoSys is the intermediate representation between parsed text and a rendered sign. Every sign in the Kozha corpus is stored as HamNoSys, then serialised into SiGML for CWASA. The notation itself comes from IDGS at the University of Hamburg.

HamNoSys

IDGS · Hamburg Notation System v4.0

CC BY 4.0

HamNoSys is a phonetic transcription system for sign languages, covering handshape, orientation, location, movement, and non-manual features. The current reference is HamNoSys 4.0 (Hanke et al., 2018). Kozha uses HamNoSys as the canonical internal representation for every sign regardless of language.

Maintainer
Institute of German Sign Language and Communication of the Deaf (IDGS), Universität Hamburg
Used for
Every sign's canonical encoding; the contribute page's HamNoSys editor; the Chrome extension's pre-compiled notation.

CC BY 4.0. You may share and adapt HamNoSys for any purpose, including commercial, provided you credit IDGS.

bgHamNoSysUnicode

IDGS · TrueType font, 4.0 glyph set

Academic

The bgHamNoSysUnicode TrueType font ships the HamNoSys 4.0 glyph set at intended visual fidelity. Kozha self-hosts the font (no CDN) on the contribute page so HamNoSys codepoints render correctly in the notation editor.

Authors
Thomas Hanke, Marc Schulder, et al. (IDGS / DGS-Korpus project)
Used for
Rendering HamNoSys notation in the contribute-page preview panel.

IDGS distribution terms. Distributed by IDGS for academic, research, and notation-tool use; redistribution requires keeping the upstream license file intact.

Section 03

Sign-language corpora.

The signs Kozha can render come from twelve sources across twelve sign languages. Where the upstream repository does not declare a license, we say so plainly and mark the source until the author confirms permissions.

All twelve corpora at a glance

Click a row to jump to the entry
Corpus Language Signs License
DictaSign Corpus BSL 881 CC BY-NC-SA 3.0
DGS Lexicon DGS 1,914 Per-entry
DictaSign LSF LSF 381 CC BY-NC-SA 3.0
DictaSign GSL GSL 889 CC BY-NC-SA 3.0
PJM Dictionary PJM 1,932 Academic
SignLanguageSynthesis NGT 39 Unclear
algerianSignLanguage-avatar Algerian 1 Unclear
bdsl-3d-animation Bangla 81 Unclear
Text-to-Sign-Language & text_to_isl Indian 763 Unclear
KurdishSignLanguage Kurdish 558 Unclear
VSL Vietnamese 3,564 Unclear
syntheticfsl & signtyper Filipino 0* Unclear

DictaSign Corpus

IDGS · Universität Hamburg

BSL CC BY-NC-SA 3.0

The European DictaSign project produced a comparable parallel corpus across four sign languages (BSL, DGS, GSL, LSF). Kozha's British Sign Language data is drawn directly from the DictaSign BSL portion. The same corpus is aliased to the ASL translator entry, since no dedicated ASL corpus is available yet — ASL output uses ASL fingerspelling, but lexical signs fall through to the BSL dataset.

Institution
Institute of German Sign Language and Communication of the Deaf (IDGS), Universität Hamburg
Entries
881 signs (plus 26-letter BSL fingerspelling alphabet)
Languages
BSLASL fallback
Provenance
Native-signer review at corpus level — DictaSign signs were elicited and recorded with native BSL signers during the original corpus build. Per-sign review status is on the progress dashboard.

Efthimiou, E., Fotinea, S.-E., Hanke, T., Glauert, J., Bowden, R., Braffort, A., Collet, C., Maragos, P., & Lefebvre-Albaret, F. (2012). The DictaSign Wiki: Enabling Web Communication for the Deaf. Computers Helping People with Special Needs (ICCHP).

DGS Lexicon (via SignAvatars)

IDGS · DGS-Korpus · SignAvatars

DGS Per-entry

German Sign Language data is drawn from the DGS Lexicon, curated by IDGS as part of the DGS-Korpus project, and re-packaged for avatar rendering by the SignAvatars aggregation. Kozha ingests the SignAvatars SiGML export; the upstream linguistic record is the DGS-Korpus.

Institution
IDGS, Universität Hamburg (upstream) · SignAvatars project (aggregation)
Entries
1,914 signs
Provenance
Native-signer review at corpus level (DGS-Korpus recordings).

Per DGS Lexicon terms (varies by entry; academic/research use permitted). SignAvatars code is released under the upstream repository's declared terms.

Prillwitz, S., Hanke, T., König, S., Konrad, R., Langer, G., & Schwarz, A. (2008). DGS Corpus Project. LREC Workshop on the Representation and Processing of Sign Languages. And: Yu, Z. et al. (2024). SignAvatars.

DictaSign LSF (via SignAvatars)

DictaSign consortium · SignAvatars

LSF CC BY-NC-SA 3.0

The French Sign Language portion of the DictaSign corpus, re-packaged for avatar rendering by SignAvatars. The LSF branch was recorded in collaboration with French Deaf community signers during the original EU project.

Institution
DictaSign consortium (upstream) · SignAvatars project (aggregation)
Entries
381 signs (plus 26-letter LSF fingerspelling alphabet)
Provenance
Native-signer review at corpus level.

DictaSign consortium (2012). LSF entries redistributed via Yu, Z. et al. (2024), SignAvatars.

DictaSign GSL (via SignAvatars)

DictaSign consortium · SignAvatars

GSL CC BY-NC-SA 3.0

The Greek Sign Language portion of the DictaSign corpus, re-packaged by SignAvatars. GSL was recorded with native signers from the Greek Deaf community during the DictaSign project.

Institution
DictaSign consortium (upstream) · SignAvatars project (aggregation)
Entries
889 signs
Provenance
Native-signer review at corpus level.

DictaSign consortium (2012). GSL entries redistributed via Yu, Z. et al. (2024), SignAvatars.

PJM Dictionary (via SignAvatars)

University of Warsaw · SignAvatars

PJM Academic

Polish Sign Language data comes from the PJM Dictionary at the University of Warsaw, re-packaged by SignAvatars. The Warsaw PJM dictionary is maintained by the Section for Sign Linguistics and is the reference record for Polish Deaf lexicography.

Institution
Section for Sign Linguistics, University of Warsaw (upstream) · SignAvatars project (aggregation)
Entries
1,932 signs (plus 26-letter PJM fingerspelling alphabet)
Provenance
Native-signer review at corpus level — the Warsaw dictionary is built on recordings with native PJM signers.

Per the Warsaw PJM Dictionary terms — academic / research use permitted. See the upstream dictionary site for the authoritative statement.

Łacheta, J., Czajkowska-Kisil, M., Linde-Usiekniewicz, J., & Rutkowski, P. (eds.). Corpus-based Dictionary of Polish Sign Language (słownik PJM). University of Warsaw. Entries redistributed via SignAvatars.

SignLanguageSynthesis

Lyke Esselink · community contributor

NGT Unclear

Dutch Sign Language (Nederlandse Gebarentaal, NGT) data is drawn from Lyke Esselink's SignLanguageSynthesis project, an open-source SiGML dataset authored as part of research work.

Maintainer
Lyke Esselink (community contributor)
Entries
39 signs (plus 26-letter NGT fingerspelling alphabet)
Provenance
Review status varies — see the progress dashboard. Community-contributed source, pending Deaf-native-signer review.

No license declared in the upstream repository. We are reaching out to the author to confirm permissions for redistribution. If clarification is not obtained within a reasonable timeline, we will remove the NGT dataset from the translator rather than continue to use it silently.

Esselink, L. SignLanguageSynthesis. GitHub repository.

algerianSignLanguage-avatar

Taha Zerrouki · community contributor

Algerian SL Unclear

Algerian Sign Language data comes from Taha Zerrouki's algerianSignLanguage-avatar GitHub repository. The repository contains a small seed of SiGML entries; coverage is limited pending community contribution.

Maintainer
Taha Zerrouki (community contributor)
Entries
1 sign (seed only)
Provenance
Review status varies. No Deaf-native-signer review yet.

No license declared in the upstream repository. We are reaching out to the author to confirm permissions.

Zerrouki, T. algerianSignLanguage-avatar. GitHub repository.

bdsl-3d-animation

Devr Arif Khan · community contributor

Bangla SL Unclear

Bangla Sign Language data is drawn from Devr Arif Khan's bdsl-3d-animation GitLab repository, a community project producing SiGML for BdSL signs.

Maintainer
Devr Arif Khan (community contributor)
Entries
81 signs (self-declared 94; 13 entries unaccounted for)
Provenance
Community-contributed, pending Deaf-native review.

No license declared in the upstream repository. We are reaching out to the author to confirm permissions.

Khan, D. A. bdsl-3d-animation. GitLab repository.

Text-to-Sign-Language & text_to_isl

Divanshu & Shoebham · community contributors

Indian SL Unclear

Indian Sign Language data is drawn from two community GitHub repositories — Divanshu's Text-to-Sign-Language and Shoebham's text_to_isl — which together contribute the current ISL SiGML set.

Maintainer
Divanshu and Shoebham (community contributors)
Entries
763 signs
Provenance
Community-contributed, pending Deaf-native review.

No license declared in either upstream repository. We are reaching out to both authors to confirm permissions.

Divanshu, H. Text-to-Sign-Language; Shoebham. text_to_isl. GitHub repositories.

KurdishSignLanguage

KurdishBLARK organisation

Kurdish SL Unclear

Kurdish Sign Language data comes from the KurdishBLARK organisation's KurdishSignLanguage repository — a community GitHub project with a SiGML dataset for Kurdish signs.

Maintainer
KurdishBLARK organisation (community contributors)
Entries
558 signs
Provenance
Community-contributed, pending Deaf-native review.

No license declared in the upstream repository. We are reaching out to the maintainers to confirm permissions.

KurdishBLARK. KurdishSignLanguage. GitHub repository.

VSL

Raian Rido · community contributor · largest community dataset

Vietnamese SL Unclear

Vietnamese Sign Language data is drawn from Raian Rido's VSL GitHub repository — the largest community-contributed SiGML dataset in Kozha, contributing over three thousand entries.

Maintainer
Raian Rido (community contributor)
Entries
3,564 signs
Provenance
Community-contributed, pending Deaf-native review.

No license declared in the upstream repository. We are reaching out to the author to confirm permissions. Given the size of this contribution, license clarification is a priority before further integration work.

Rido, R. VSL. GitHub repository.

syntheticfsl & signtyper

Jennie Ablog · community contributor

Filipino SL Unclear

Filipino Sign Language data is drawn from Jennie Ablog's two GitHub repositories. The FSL entries ship in a SiGML variant (<hamgestural_sign>) that the current loader does not parse; FSL output today uses fingerspelling only. A future release will add a structural adapter so the lexical signs load.

Maintainer
Jennie Ablog (community contributor)
Entries
0 lexical signs (file format not yet supported by loader; 14 entries declared but parsed as zero). The FSL fingerspelling alphabet is available.
Provenance
Community-contributed, pending Deaf-native review.

No license declared in the upstream repositories. We are reaching out to the author to confirm permissions.

Ablog, J. syntheticfsl; signtyper. GitHub repositories.

Section 04

Translation layer.

Input text arrives in one of many source languages; Kozha routes it through translation and NLP before it ever touches a sign database. Two libraries do almost all of this work.

argostranslate

Argos Open Technologies · offline text translation

MIT

argostranslate is the text-translation layer between Kozha's supported input languages and the sign language's base language. It runs server-side with no network calls and packages per-pair translation models downloaded at install time. Supported pairs today: English, French, German, Spanish, Polish, Dutch, Greek, Russian, Arabic.

Maintainer
Argos Open Technologies
Used for
All server-side text-to-text translation when the input language differs from the sign language's base language.

MIT License (library). Translation models are distributed per the argosopentech-published terms per pair.

spaCy

Explosion AI · NLP pipeline

MIT

spaCy is the parser, tokenizer, lemmatizer, and POS tagger that turns input text into a sequence of linguistic tokens before the sign database is consulted. Kozha loads dedicated spaCy models for seven languages (English, German, French, Spanish, Polish, Dutch, Greek) and caches up to four concurrent models in an LRU.

Maintainer
Explosion AI
Used for
Tokenisation, lemmatisation, POS tagging, and sentence segmentation in the planner.

MIT License (library). The small-news and core-web models Kozha uses are MIT-licensed as of spaCy 3.8.

Section 05

Contributors.

The people who built Kozha, and the people who reviewed signs into it. Anonymous contribution is respected — contributors who have asked not to be named are counted in the aggregate line below rather than listed individually.

Core team

Zhan (developer). Bogdan (developer). Askhat Zhumabekov (advisor).

Named contributors

None yet. When named individual contributors are onboarded and have consented to public listing, they will be added here.

Anonymous contributors

None yet. When anonymous contributors have submitted reviewed signs, the aggregate count will appear here.

Section 06

Deaf advisory board.

The Deaf advisory board is the project's accountability body. It approves reviewer appointments, arbitrates disputes, and can remove a published sign without further discussion.

Status

Deaf advisory board is being seated. Current status: zero candidates confirmed. Until the board is seated, no signs from community contributions are exported to the public translator — see the governance page for the full review policy.

Section 07

Funding.

Full transparency on where operating costs come from.

Status

No external funding to date. Kozha's infrastructure costs (hosting, domain, translation-model storage) are covered by the core team out of pocket. If and when external funding arrives, it will be listed here with the funder, amount bracket, and any conditions attached.

Section 08

Contributor compensation.

How Deaf reviewers and advisory-board members are compensated for their time.

Status

A compensation policy is not yet in place. Drafting the policy — covering hourly rates for reviewer work, flat stipends for advisory-board meetings, and the process for requesting payment — is a prerequisite for seating the Deaf advisory board. Until that is done, no reviewers are being asked to work.

The policy will be published on the governance page as soon as it is drafted, and linked from here. We prefer to say "not yet in place" rather than describe an aspirational arrangement that hasn't been funded.