Credits | Shirabe

Shirabe stands on decades of open-source lexicography. Every entry, reading, and tag here originates from the projects below — please visit them, support them, and check their licences before redistributing.

Source versions

Every dataset we currently have loaded, with its origin and the exact upstream build — re-importing a source updates its row here, so this stays an accurate reference. Licences and fuller attribution follow below.

Source	Origin	Licence	Version	Data date
JMdict — word entries	jmdict-simplified (EDRDG)	EDRDG	3.6.2	2026-06-22
JMnedict — names	jmdict-simplified (EDRDG)	EDRDG	3.6.2	2026-06-22
KANJIDIC2 — kanji	jmdict-simplified (EDRDG)	EDRDG	3.6.2	2026-06-22
KRADFILE — kanji → components	jmdict-simplified (EDRDG)	CC BY-SA 4.0	—	2026-06-29
RADKFILE — radical → kanji	jmdict-simplified (EDRDG)	CC BY-SA 4.0	3.6.2	2026-06-29
Example sentences (Tatoeba)	jmdict-simplified · Tatoeba	CC BY 2.0 FR	3.6.2	2026-06-22
Kanken levels (漢検)	mimneko/kanji-data	CC0 1.0	—	2026-06-26
Jōyō table (常用漢字表本表)	mimneko/kanji-data	CC0 1.0	—	2026-02-10
Jōyō appendix (付表)	mimneko/kanji-data	CC0 1.0	—	2026-02-09
Radical names — Kanji alive	kanjialive	CC BY 4.0	—	2021-02-24
Pitch accents — UniDic	UniDic (NINJAL)	GPL/LGPL/BSD	202512	2025-12-31
Word frequency — jiten-global	jiten.moe	—	—	2026-07-12
Wikipedia abstracts	DBpedia	CC BY-SA 3.0	2016-10	2026-06-28
BabelStone IDS — kanji structure	BabelStone (Andrew West)	Public domain	—	2025-06-27
Stroke order — KanjiVG	KanjiVG (Ulrich Apel)	CC BY-SA 3.0	—	2025-08-16

Dictionary data

JMdict — Japanese↔multilingual word entries: Compiled by the Electronic Dictionary Research and Development Group (EDRDG, James Breen et al.) and distributed under the EDRDG licence. We use the jmdict-simplified JSON conversion by Stanislav Petrov.
JMnedict — proper-noun (names) dictionary: Also from EDRDG, same licence terms; consumed via the jmdict-simplified JMnedict release.
KANJIDIC2 — kanji dictionary: EDRDG, same licence; sourced from jmdict-simplified's KANJIDIC2 all release, so kanji meanings come in English, French, Spanish and Portuguese — surfaced per language alongside the JMdict glosses.
JLPT levels (N5–N1): The JLPT level shown on each kanji and word comes from Jonathan Waller's JLPT Resources (tanos.co.uk), licensed CC BY. Since the post-2010 JLPT publishes no official kanji or vocabulary list, these are the de-facto-standard unofficial N5–N1 lists (kanji via the KANJIDIC snapshot, words via Bluskyo/JLPT_Vocabulary).
KRADFILE — kanji component breakdown: The “parts” a kanji is made of, shown on each kanji page, come from EDRDG's KRADFILE / KRADFILE2 (Michael Raine, James Breen et al.), CC BY-SA 4.0, via the same jmdict-simplified JSON build (the inverse of the RADKFILE below).
BabelStone IDS — kanji structure: The positional Ideographic Description Sequences (⿰⿱⿴…) that show how a kanji is laid out come from Andrew West's BabelStone IDS data, released to the public domain.
Kanken (漢字検定) levels: The 配当 (assigned) Kanji Kentei level shown on each kanji comes from mimneko/kanji-data's 漢検漢字辞典 table, released under CC0 1.0.
Jōyō kanji table (常用漢字表) — official readings & examples: The 表内 / 表外 reading distinction and the example words shown beside each on/kun reading come from the 文化庁's 2010 常用漢字表, digitised by mimneko/kanji-data under CC0 1.0. The underlying table is a Japanese cabinet notification (内閣告示), which is not subject to copyright.
成り立ち — kanji formation (六書) & glyph origin: The formation type (象形・指事・会意・形声) and the Japanese glyph-origin (字源) notes come from ウィクショナリー日本語版 (Japanese Wiktionary)'s 字源 sections, used under CC BY-SA 4.0; shinjitai inherit their kyūjitai's 字源 (e.g. 国→國). A few jōyō gaps carry an AI-assigned classification, marked on the kanji page. Note: ja.wiktionary follows modern academic scholarship, which can differ from school-textbook classifications.
RADKFILE — search radicals → kanji: The 253 search radicals in the radical picker and the kanji built from each come from EDRDG's RADKFILE / RADKFILE2 (Michael Raine, James Breen et al.), licensed CC BY-SA 4.0; consumed via the jmdict-simplified JSON conversion.
Kangxi radical names & meanings: The Japanese reading, English gloss, stroke count, and positional category used to label the radicals are sourced from Kanji alive (kanjialive.com), licensed under CC BY 4.0.
Wikipedia abstracts (via DBpedia): Lead-paragraph summaries of Wikipedia articles in every supported language come from the DBpedia project's long_abstracts dataset. The text remains the property of the Wikipedia contributors who wrote it and is dual-licensed under CC BY-SA 3.0 and the GNU Free Documentation License. Each abstract card on a word page links back to the source article.
Example sentences — Tatoeba: The example sentences attached to word senses come from the Tatoeba project (the JMdict examples set), licensed CC BY 2.0 FR, consumed via jmdict-simplified.
Pitch accents — UniDic: The pitch-accent (高低アクセント) data comes from the UniDic morphological dictionary by the National Institute for Japanese Language and Linguistics (NINJAL), triple-licensed GPL 2.0 / LGPL 2.1 / Modified BSD — free for commercial use. We read the accent (aType) from its lexicon.
Word frequency — jiten.moe: Rank ordering for the “frequency” word sort comes from the global frequency list published by jiten.moe.
Stroke order — KanjiVG: The animated stroke-order diagrams use KanjiVG by Ulrich Apel, licensed CC BY-SA 3.0.
Pronunciation audio — AivisSpeech:るな, AivisSpeech:TANAKA: The spoken pronunciations for words and example sentences are synthesized with AivisSpeech, an open-source Japanese text-to-speech engine, using the voices of るな (female) and TANAKA (male) under the Aivis Common Model License (commercial use permitted). Where no clip is available the browser's built-in speech synthesis fills in.

Software

Shirabe is built on Ruby on Rails, Hotwire, and a stack of other open-source gems. The Japanese-language tooling is its own small ecosystem, much of it maintained alongside Shirabe:

kabosu — tokenisation: Ruby bindings for the Sudachi morphological analyser by Works Applications, with its SudachiDict full edition (the most complete — extra named entities and rare words) — this is how Shirabe splits and reads Japanese text. github.com/davafons/kabosu, Apache-2.0.
daidai — conjugation: Pure-Ruby Japanese verb and adjective conjugation, used to derive and explain inflected forms — github.com/davafons/daidai.

Typefaces

Typeset in Inter Tight, Newsreader, Noto Sans JP, and JetBrains Mono.

Found something missing? Let us know.

📚 Where Shirabe gets its data

Source versions#

Dictionary data#

Software#

Typefaces#

Where Shirabe gets its data

Source versions

Dictionary data

Software

Typefaces