Eohan: the Project

Eohan as a software project

Eohan is the result of a longterm software project of mine. It is intended as a contribution to open source scholarship and as a demonstration of my technical abilities. This is a short description of the work involved.

Creating the Eohan database

Half the data required is in the Unihan database, part of the Unicode standard which was drawn up when Chinese, Japanese and Korean ideographs were added to Unicode. I wrote scripts to extract the data I needed and to convert the Unihan key-value format into character texts. Sample output (from Guangyun):

  東菄鶇䍶𠍀倲𩜍𢘐涷蝀凍鯟𢔅崠埬𧓕䰤
  同仝童僮銅桐峒硐𦨴𧱁筒瞳㼧𤭁罿犝筩潼曈洞侗橦烔䴀挏酮鮦㼿𦏆𦍻眮蕫穜衕𩍅𢈉䆚哃𢏕絧𨚯𨝯𪔝𩦶𪒿
  中衷忠𦬕
	...

I assembled the other half of the data required, historical texts not in the Unihan database. This meant writing Python scripts to reverse engineer rough versions from available data and then editing manually.

- designed a database to hold this data and created and populated it using the Python pysqlite library.

- autogenerated historical pronunciation from the above data. This meant mapping complementary datasets like Guanyun and Yunjing on to each other and digitising tables of phonetic analysis from the publications of Karlgren, Pulleyblank and Baxter. - wrote Python scripts to combine the work of those scholars with the historical datasets, output reconstructed Old and Middle Chinese pronunciation and added it to the database. Sample of a table from Pulleyblank:

  東    1    12    45    owŋ    owŋ    uwŋ    uwŋ   -    -    -    -    1 
  冬    2    12    46    awŋ    -      -      -     -    -    -    -    1 
  腫    3    12    47    -      -       uawŋ  uawŋ  -    -    -    -    1
	...

Code snippet:


# Grade 4 finals only have the glide j after labial, velar and laryngeal initials.
    if emcFinal[0] == u"j" and articulationDict[initial] not in [u"p", u"k", u"h"]:
        emcFinal = emcFinal[1:]
    # Where a glide is initial in the syllable, it is not part of the final.
    if emcInitial == u"j" and emcFinal[0] == u"j":
        emcFinal = emcFinal[1:]
    elif emcInitial == u"w" and emcFinal[0] == u"w":
        emcFinal = emcFinal[1:]
    elif emcInitial == "w" and emcFinal[0:2] == u"jw":
        emcFinal = emcFinal[0] + emcFinal[2:]
 ...
 

Writing the Eohan app

I wrote a specification of 30 templates to display various selections of data from the database, listing datafields to be displayed and the links between the pages.

- implemented the specification first in Django and then in PHP (with much modification on the way).

- wrote 40 or more SQL queries (complex nested joins on multiple tables of the database) to retrieve the data to be displayed in the templates. Sample (relatively short) query:


SELECT a.id AS id, a.codepoint AS codepoint, a.glyph AS glyph, a.initial AS initial,  
                    a.grade AS grade, a.tone AS tone, 
					a.lmc AS lmc, a.mcb AS mcb, a.mck AS mck, 
					'' AS inputid, 
                    b.id AS rhyme, b.rhymelabel AS rhymelabel, b.rusheng AS rusheng, b.fanqie AS fanqie, 
					b.gradekeys1 AS gradekeys1, b.gradekeys2 AS gradekeys2, b.gradekeys3 AS gradekeys3, b.gradekeys4 AS gradekeys4 
                    FROM yunjing a, 
                    yjrhyme b 
                    WHERE rhyme = ? 
                    AND a.rhyme = b.id 

- designed and wrote CSS styles to apply to the templates (I am rather proud of the styling of Yunjing).

Ongoing revision

The current version of Eohan is the third I have placed online. The original version was a Django app. This had to be rewritten in PHP due to hosting difficulties. The current version is a revised PHP app embedded in my Wordpress website (for which I wrote my own Wordpress multi-site theme).

I have aimed to improve my practice as a developer in the course of this project. My current programming environment is

  • Python 2.7.10, the Idle GUI and programmer's editor.
  • Development server: WAMP.
  • Notepad++ for templates and CSS.
  • Version control: Git.
  • Bug tracking: Trac.
  • Backup: 2brightsparks SyncBackFree, nightly sync to external drive.