in KONKANVERTER Ver 2.1
All known issues in the “Konkanverter beta” are solved in the KONKANVERTER Ver 2.1 using “A Finite State Transducer based Statistical Machine Transliteration Engine for Konkani Language” developed by our Techie Vinodh Rajan.
KONKANVERTER 2.1 is enabled with Correction Submission tool. If you encounter any incorrect transliteration, you can click on the word and edit it. Once you finish editing the word, it will be submitted to evaluation of admins, once approved you will get desired correct results in the transliteration.
Please let us know if you encounter any major issues.
Thank you.
in KONKANVERTER Beta
While testing out the KONKANVERTER – Online Konkani Script Convertion Utility, you may find a few incorrect results. These are known issues. The following list try to explain the intricacies of theses issues.
TRANSLITERATION TO DEVANAGARI SCRIPT KONKANI
Q: There are a few incorrect Convertions in Devanagari Script.
A: As explained in the Page Transliteration Rules for Devanagari Consonant Clusters behavioural Study conducted by World Institute of Konkani Language, has identified 40 Cases of Clusters in the Kannada/Romi/Malayalam Scripts which can not be treated as either Conjuncts or non conjuncts for sure in Devanagari. They behave both ways. Hence, after observing the cluster in several sample texts, based on the majority behaviour, every particular CASE C Cluster has been assigned as either Conjunct or Non Conjunct while converting in to Devanagari Script. Any particular Case C Cluster will behave correctly in majority cases, and behave incorrectly in a few cases. This issue will be addressed through Lexical Lookup based on a comprehensive wordlist or development of machine learning system for KONKANVERTER in the future versions.
TRANSLITERATION FROM ROMI SCRIPT KONKANI
Q: Word ‘ambo’ in Romi Script is converted as ‘आंब’, It should have been ‘आंबो’. I observed all Os are converted in to अ/ಅ/അ infact it should have been converted as ओ/ಒ/ഒ.
In Standard Romi Orthography, syllable O represents both अ/ಅ/അ and ओ/ಒ/ഒ. There is no clear indicator for distinguishing अ/ಅ/അ from ओ/ಒ/ഒ in Roman Script Konkani. The system can understand and produce only one maping for one syllable. For Example;
godd ambo, mol chodd, tor am’so (Sweet Mango, if costly, is sour)
The above Romi Script Konkani sentence should have been resulted as
गोड आंबो मोल चड तर आम्सो
but system can convert it as either
गोड आंबो मोल चोड तोर आम्सो
or
गड आंब मल चड तर आम्स
We observed, in an average sentence unlike the above which is constructed for illustration of the issue, syllable O denotes more of अ/ಅ/അ and less of ओ/ಒ/ഒ. Hence, in KONKANVERTER beta we have maped syllable O with अ/ಅ/അ. We are building a Text Corpus in each script to solve this issue. In special adaptation of the Transliteration tool, applications can opt for assigning a slight differentiation in the syllable O for ओ/ಒ/ഒ. For the time being try replacing the respective /o/ with /ô/ in the input text itself.
This issue will also effect Dipthong Oi which represents both अय and ओय
A paragraph from Romi Script Konkani converted in to Devanagari for illustration:
Source: (www.v-ixtt.com)
Thodde rokddech mon’xachi ixttagot kortat. Ani ti bore bhaxen samballtat. Thodde ixttagot korunk bore asat. Punn ti chodd kall togona. Te ixttagot kortat khore, punn tankam ti samballunk kollona. Thodde ixttagot korunkuch nokllo. Zalear te samballtole koxe ? Hanv khoinchea vorgant poddtam ? Mhaka nove ixtt korpacho xegunn asa ? Vo hanv ekmoddo ? Mhaka eksuro jiyeunk borem lagta ? Vo hanv ixttagot kortam ti mhojea vaitt sonvoyank lagun toddun uddoitam ? Tum ixttagot korpi ? (TOTAL ‘O’s = 45)
थडे रकडेच मनशाची इश्टागत कर्तात. आनी ती बरे भाशेन सांबाळतात. थडे इश्टागत करुंक बरे आसात. पूण ती चड काळ तगना. ते इश्टागत कर्तात खरे, पूण तांकां ती सांबाळुंक कळना. थडे इश्टागत करुंकूच नकळ. जाल्यार ते सांबाळतले कशे ? हावं खयंच्या वर्गांत पडतां ? म्हाका नवे इश्ट कर्पाच शेगूण आसा ? व हावं एक्मड ? म्हाका एक्सुर जियेवंक बरें लाग्ता ? व हावं इश्टागत कर्तां ती म्हज्या वायट संवयांक लागून तडून उडयतां ? तूं इश्टागत कर्पी ?
Reds (Incorrect Conversions) = 9
Correct Conversions = 36
UPADATE ON THIS ISSUE: (30th December 2012)
Rule: Treat ‘O’ if it occurs at the end of any word as ओ (single syllablic words such as Vo will be converted incorrectly as ओ due to this new rule, but it helps all other). So above para will be now look less reddish? let us see:
थडे रकडेच मनशाची इश्टागत कर्तात. आनी ती बरे भाशेन सांबाळतात. थडे इश्टागत करुंक बरे आसात. पूण ती चड काळ तगना. ते इश्टागत कर्तात खरे, पूण तांकां ती सांबाळुंक कळना. थडे इश्टागत करुंकूच नकळो. जाल्यार ते सांबाळतले कशे ? हांव खयंच्या वर्गांत पडतां ? म्हाका नवे इश्ट कर्पाचो शेगूण आसा ? वो हांव एक्मडो ? म्हाका एक्सुरो जियेवंक बरें लाग्ता ? वो हांव इश्टागत कर्तां ती म्हज्या वायट संवयांक लागून तडून उडयतां ? तूं इश्टागत कर्पी ?
Reds (Incorrect) = 7 ( Observe three new correct results and two new Incorrect results)
godd ambo, mol chodd, tor am’so
Earlier: गड आंब मल चड तर आम्स
Now: गड आंबो, मल चड, तर आम्सो