Japanese Multi-lingual Electronic Dictionary Project
This document outlines the JMdict project, which set out
to extend the structure and content of
the EDICT Japanese-English Electronic Dictionary file to enable it to
contain additional information and provided an improved service to
The project has several broad goals:
- to convert the EDICT file to a new dictionary structure which overcomes
the deficiencies in the basic EDICT structure.
With regard to this goal, the particular structural and content aspects
addressed include, but are not limited to:
- the handling of orthographical variation (e.g. in kanji usage,
okurigana usage, readings) within the single entry;
- additional and more appropriately associated tagging of grammatical
and other information;
- provision for separation of different senses (polysemy) in the
- provision for the inclusion of translational equivalents from
- provision for inclusion of examples of the usage of words;
- provision for cross-references to related entries.
- to publish the dictionary in a standard format which is accessible
by a wide range of software tools;
It is proposed that this goal be addressed by developing the structure
so that it can be released as an XML document, with an associated XML
- to retain backward compatibility with the original EDICT structure
in order to enable legacy software systems to use later versions of the
The following has been achieved to date (June 2003):
- a new structure has been developed for the EDICT file, which has
(Japanese Multi-lingual Dictionary). This structure has been described in
an XML Document Type Declaration (DTD), which may be viewed
(Note: this DTD is not quite up-to-date. The latest DTD is incorporated
into the distributed JMDict file.)
Samples of some EDICT
entries converted to XML in accordance with the DTD can be viewed
- the EDICT file has been converted into a new structure which is
aligned with the XML DTD. While many of the EDICT entries converted
simply and automatically, a significant number of entries were variants
of each other which had to be identified and combined.
(Note that while this structure is aligned with the XML DTD, the XML
format is not being used internally at the moment.)
- utility software has been developed which converts the new file
structure back to the (old) EDICT format. All updates to the EDICT file
are now taking place via the new structure;
- utility software has also been developed which converts the JMdict
file in the new (internal) structure into the XML format for release;
- sets of translational equivalents in other languages are added to the
JMDict file when it is released. These are:
- entries from Ulrich Apel's
- the French glosses from Jean-Marc Desperrier's translation of the
- Oleg Volkov's EDICT-format Japanese-Russian dictionary file.
Comments are sought from anyone interested in this project. In
particular, critical appraisal of the proposed structure, and
constructive suggestions for its improvement, will be most welcome.
Please feel free to send me
about this project.
The first release of the XML
format JMdict (UTF8 Unicode) took place in May 1999.
There have been several releases since then, with the most recent in
October 2001. It is intended that JMdict releases take place at the same
time as major EDICT releases.
There is a small closed mailing list for people seriously involved in
JMdict. Email Jim if you want to be included.
Some software is under development which uses JMdict:
The WWWJDIC dictionary server now uses an extended format for the main
distionary entries, which draws from the JMdict files.
Permission for Use
The JMdict file is now located within the Electronic Dictionary Research
and Development Group at Monash University. Information about the Group
including the terms under which the file can be used.