<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Steady State</title>
    <link href="http://msdservices.com/atom.xml" rel="self"/>
    <link href="http://msdservices.com/"/>
    <updated>2011-11-26T12:03:37-05:00</updated>
    <id>http://msdservices.com/</id>
    <author>
        <name>Matthew Sinclair-Day</name>
        <email>msd@msdservices.com</email>
    </author>

    <entry>
        <title>ManManLai Development Process</title>
        <link href="http://msdservices.com/2011/07/31/manmanlai.html"/>
        <updated>2011-07-31T00:00:00-04:00</updated>
        <id>http://msdservices.com/2011/07/31/manmanlai</id>
        <content type="html">&lt;p&gt;When we started developing &lt;a href=&quot;http://manmanlai.net&quot;&gt;ManManLai Chinese&lt;/a&gt;, we set out to create a single application for both elementary and advanced Chinese language learners. We envisioned an app that would stay with a learner throughout his study.  As her skills and fluency improved, the app would be able to provide more advanced vocabulary and capabilities.&lt;/p&gt;
&lt;p&gt;We also envisioned an app built on a fully cross-referenced internal information model making it incredibly easy for a user to explore the world of Chinese, in all of its dimensions.&lt;/p&gt;
&lt;p&gt;What do we mean by dimensions? First, let&amp;#8217;s consider some basic concepts.&lt;/p&gt;
&lt;p&gt;The fundamental unit of information in Chinese is the &lt;i&gt;character&lt;/i&gt;, and two or more characters are often combined into &lt;i&gt;compounds&lt;/i&gt; representing words, phrases and idioms. But Chinese characters are not like letters in an alphabetic language.  A character stands on its own as a meaningful piece of information, and so students must eventually master not only thousands of compounds, but also thousands of individual characters comprising those compounds.&lt;/p&gt;
&lt;p&gt;The challenge of modeling characters and compounds in an information architecture may be fairly tractable in itself, but some other important considerations complicate the problem:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Characters can have more than one &lt;i&gt;pronunciation&lt;/i&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt;Pronunciation is represented in Western alphabets using the standard &lt;i&gt;Hanyu Pinyin&lt;/i&gt; romanization scheme.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt;Many characters and compounds share the same pronunciation.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt;Characters can be organized by internal structures called &lt;i&gt;radicals&lt;/i&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt;Characters can be organized by the number of &lt;i&gt;strokes&lt;/i&gt; needed to draw them.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt;Some characters and compounds are rarely used, and some are frequently used.&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
	&lt;li&gt;Characters used in mainland China are &lt;i&gt;simplified&lt;/i&gt; variants, and the ones used in Taiwan and Hong Kong are &lt;i&gt;traditional&lt;/i&gt; ones.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Apart from simple dictionary look-up mechanisms, we envisioned an app making it incredibly easy to discern relationships and patterns: &lt;em&gt;What does this character mean in this compound? What other compounds use this character? What other characters or compounds share the same pronunciation? And so on&amp;#8230;&lt;/em&gt;. Those are the kinds of questions one can answer with &lt;a href=&quot;http://manmanlai.net&quot;&gt;ManManLai&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Given all of these considerations, it became clear that a completely unified Chinese-English &lt;i&gt;corpus&lt;/i&gt;, sufficiently indexed and internally cross-referenced, would be required to expose the language in all of its dimensions.&lt;/p&gt;
&lt;p&gt;The solution was further complicated by the need to synthesize numerous open-source and proprietary data sources into one whole.  Each data source had its own organization and formatting, dirty and incomplete entries, and internal inconsistencies.  In some cases, the organization was ad-hoc or only implicitly documented.  A system therefore would be needed to manage this and allow for rapid, iterative evolution.&lt;/p&gt;
&lt;p&gt;An extendable framework written in Java, with a smattering of Clojure, was built to achieve these goals. The codebase is covered by JUnit tests. It utilizes Natural Language Processing (&lt;span class=&quot;caps&quot;&gt;NLP&lt;/span&gt;) techniques, such as tagging, and pattern matching to generate metadata describing the entries in the corpus and to construct a consistent representation from the heterogeneous data sources.&lt;/p&gt;
&lt;p&gt;The framework represents the nucleus of the system, and it&amp;#8217;s curious how so much of &lt;a href=&quot;http://manmanlai.net&quot;&gt;ManManLai&amp;#8217;s&lt;/a&gt; development revolved around an outside code base only indirectly related to Objective-C, Cocoa, and iOS.&lt;/p&gt;
&lt;p&gt;In fact, the Java framework also generates the Sqlite database schema built into the iPhone application. In some cases, to address performance and resource constraints on the phone, Objective-C code is generated by the Java framework and compiled directly into the product.&lt;/p&gt;
&lt;p&gt;Underpinning this system is a practical, documented, and entirely reproducible build process driven by &lt;span class=&quot;caps&quot;&gt;UNIX&lt;/span&gt; shell and Ant scripts and integrated into the source control management system.  It even drives Xcode. The process fully supports iterative upgrades to the data sources and development of new features, while maintaining referential integrity within the corpus and derivative user-generated data files, from version to version of the app.&lt;/p&gt;
&lt;p&gt;These pragmatic and iterative approaches proved to be especially well suited for approaching an unfamiliar and complicated problem domain.  Without the infrastructure in place to manage two code bases and numerous, changing data sources, we would not have been able to proceed forward, adding and changing features, while ensuring a high-quality product.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;You may employ the same engineering discipline by hiring &lt;span class=&quot;caps&quot;&gt;MSD&lt;/span&gt; Services for your project. &lt;a href=&quot;/contact.html&quot; title=&quot;MSD Services Contact&quot;&gt;Contact &lt;span class=&quot;caps&quot;&gt;MSD&lt;/span&gt; Services today&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content>
    </entry>

</feed>
