What is a Cool IRI?
I don't have a complete knowledge on IRI. So below contents might have some misunderstandings. If you find one, please feedback to me.
* At first, this post was written in part of Cool IRI for a permalink. So in this post, you'll often see the words, "this plugin". "This plugin" means that a plugin for a Movable Type bloggin tool, which enables you to create a Cool IRI for blog archives.
You may be stunned by comparatively long page. I swear this content is not deeply technical and is just common knowledge. If you have no time to read it, just see some pictures. Perhaps it will do a little for you.
What means a "Cool" ?
If you've never heard of Cool URI, I would recommend you to read Tim Berners(w3.org) - Cool URIs don't change. In that document, Tim Berners-Lee suggested what to leave out in URI like, authors name, subject, status, access, file name extension, software mechanisms, etc.
In Movable Type 3.2, 6A reinforced this feature by introducing an easy archive file mapping(like yyyy/mm/entry_basename/index.html) and an entry's basename field. Subject(title) of an entry is changeable, but basename is fixed, so cool URI of permalink can be kept.
If you titled an entry as "This is a Cool permalink", then its permalink could be created as like, "http:/ /www. example. com/ blog/2005/11/this_is_a_cool_permalink/". This URI doesnt have any un-Cool items in it as like subject("this is a cool permalink" is a subject, but not changeble basename) , extension(index.html, ...cool_permalink.php), software mechanism(mt.cgi mtview.php...). Yes, it is a perfect CoolURI.
BUT, if you create permalink like "http:/ /www.example.com/2005/11/003846.html", it is not cool, because of file name extension and software mechanism(003846 = entry ID by MT)
Why be Cool ?
Cool URI strengthens permanentabiliy of permalink
In general, a permalink is not permanent. Let's say, you are publishing entries by HTML, then permalink will have ".html" in it. Afterwards, if you switch file extension from HTML to PHP, then its permalink will also be changed. It means that old permalink can't be working.
More extreme case is when you switch blogging tools from one to other. An entry's ID is created by blogging tool not by YOU. Entry ID doesn't have any relationship with entry itself. So even though you change your blogging system, and export-and-import all entries to new blogging system, you can never have any means to keep your previous permalink, if you used entry ID in URI.
If you've created your permalink as Cool URI, then you could have keep your permalink in above cases.
Cool URI can give you and SE some semantics
Some people (mainly non-MT users, I think) say that URI is only Locator, so its main role is just to locate some internet resource. From that point of view, the form of URL is not important. They think URLs like "www.example.com/blog/12345.html", "www.example.com/blog/view.php?id=12345", or "www.example.com/blog/cool_uri_in_movable_type/" dont have any worth for distingushing and only if these URL can point exact resource, they are all of equal values. Really?
Some SE like google also consider URL forms. If we embed some sematics in URL, SE can also use it. Ws help SE, SE help us, by using Cool URI.
For example, if you use archive file template mapping like category/sub_category/base_name, an entry could have URL like, "www.example.com /computers /blogging_tools/ what_is_movable_type/". SE can distinguish it from "www.example.com /printing/typography /movable_type/" by URL itself.
Why can't we be Cool ?
Are you using US-ASCII characters? Then, you can fully create Cool URI for a permalink in MT 3.2.
BUUUT, if you are Chinese, Japanese, or Korean, you can't create Cool URI in your native language in default MT. All characters not US/Europe alphabets are removed from URI. This means east-asian characters are never allowed in MT's permalink. Instead of native characters, some meaningless "post-1, post-2, cat-1, cat-2..." words are shown. If you don't like this, you have to enter English title in native body contents. Or you have to use entry ID in permalink.
If you are German, Frenchman, etc., then you can have a little cool URI, however, I think this is a distorted URI like "www. example. com/ blog/standardmäßige/" => "www. example. com/ blog/standardmaessige/". I'm not German, so I don't know the meaning of this converted words is natural and easy to comprehend for a German.
This very unsatisfied( is it only me to be ?) results are due to the defintion of URI. URI uses only alphanumeric characters. All characters except alphanumeric should be %-escaped.
However, I don't want to escape my native language characters if possbile, because the readability of those will be diminished after escaping. I don't want URI like "blog/%C4%B5%C8/". It's neither COOL nor HOT. It's just for a stupid computer, not for human.
Then what means on earth do I have?
How about an IRI (Internationalized Resource Identifiers)?
What is an "IRI" ?
Uniform Resource Identifiers (URIs) are a core component of the Web. Internationalized Resource Identifiers (IRIs) are equivalent to URIs except that they remove the limitation that only a subset of us-ascii can be used.
Above simple quotations are from abstract of Internationalized Resource Identifiers : From Specification to Testing by Martin J. Dürst(http://www.w3.org/People/Dürst). Table 1. and 2. in that paper was helpful to me.
I can't explain IRI very well in English so I provide some useful materials.(Of course you don't have to understand IRI fully for using this plugin. Just browse simply.)
- An Introduction to Multilingual Web Addresses - easy and good
- IRI Test Overview
- Recent Progress on IRIs - easy and small slides
Cool IRI plugin for Movable Type
Yes, this plugin makes you create a permalink as cool IRI whatever languages you use. To put it concretely, it enables you to use native language in "category name" and "entry title(=basename)".
Let me show you examples.
Chinese :
Entry's category is "表、明 自,@,本"(Simplified Chinese), entry title is "启动并进,行焚烧、深埋等无害化处理"
In default MT, the permalink might be "http://www.example.com/blog/cat_1/post_1/".
By this plugin, it can be "http://www.example.com/blog/表_明_自_本/启动并进行焚烧_深埋等无害化处理/"
German :
Entry's category is "Vorschläge", title is "Alles über Google", then permalink might be "http:// www.example.com/blog/vorschlaege/alles_ueber_google/" in default MT.
By this plugin, the permalink can be "http://www.example.com/blog/vorschläge/alles_über_google/".
Japanese :
Entry's category is "スポーツ", title is "男子1万Mで中村が優勝 東アジア大会", then permalink might be "http://www.example.com/blog/cat_1/1m/" in default MT.
By this plugin, it can be "http://www.example.com/blog/スポーツ/男子1万mで中村が優勝_東アジア大会/".
Korean :
Below image was taken from Google result listing page. I entered search-keyword as "different", ,"encoding", "trackback" in Korean to get my old MT entry which had been applied by old-version of this plugin. As a result of this plugin, Korean word "trackback" was embeded in my permalink with Korean language, and Google catches it well.
By accident, that page is ranked top. (Wow, as I know, my old-blog is very unpopular). Did Cool IRI produce an effect on Google ranking ? Maybe NOT. I've heard Google said URL form is not related to ranking officially. In conclusion, Google may not use URL form in calculating ranks, but from some search results we can say Google may consider the URL form in indexing(?). How? I don't know.

Fig.1 Google search results in Korean keywords( marked Korean keyword means "trackback" ).
6. IRI Environment
URI environment :
URI limits characters in address to some US-ASCIIs which have the same bytes in all encodings. Therefore URIs don't have to consider the encoding of address itself. So web server also doesn't.(You know that the encoding of a HTML PAGE and the encoding of ADDRESS, URI itself, of that page are in a different notion. You can define the encoding of page by setting HTTP header or HTML META tag, but can't do to the encoding of URI ).

Fig.2 URI environment
Non-IRI environment :
In non-IRI (URI with non US-ASCII address) environment, non US-ASCII characters of address are encoded with the encoding of that page. It means that bytes streams of link März in UTF-8 encoded page and link März in ISO-8859-1 encoded page are different. If the encoding of your file system of web server uses Macintosh encoding, how will web server get the right resource? Perhaps, web server will show you a "404 Not found error" page.

Fig.3 Non-IRI environment
IRI environment :
In IRI environment, browser should send an address by encoding it with UTF-8, regardless of the encoding of current page. Many browsers support this feature by default (Browser side IRI implementations).
Web server implementing IRI(Server-side IRI implementations) interprets an address with UTF-8 encoding. More intelligently, if web server finds incoming address is not encoded with UTF-8, then it guesses the encoding and then converts that into the UTF-8 or server's local encoding to find a requested file.

Fig.4 IRI environment
Practical IRI Environment
As mentioned above, most of up-to-date browsers support IRI at present.
In Opera, you can use IRI by checking Tools -> Preferences -> Adavanced -> Network -> "Encode international Web addresses with UTF-8".
In Mozilla/Firefox, enter "about:config" in address bar, then all configurable fields are shown. You can turn on IRI by setting network.standard-url.encode-utf8 to "true".
In MS Internet Explorer, you can use IRI by checking Tools -> Internet Options -> Advanced -> Browsing -> "Always send URLs as UTF-8(requires restart)".
BUT, you can't ensure that visitors to your site will turn on IRI feature in their browser, i.e. you can't assume the encoding of incoming address to your web server always be UTF-8 encoded. Besides, your web server might not support IRI automatically.
As a solution to above practical situations, this plugin provides a CGI script for Apache web server. In Apache, users can set their configurations in .htaccess file. So if 404(file not found) error occurs, you can redirect request to the provided CGI. Then CGI guesses the encoding of requested address and then converts it to your blog's PublishCharset and lastly redirect it to converted address. With this CGI, you don't have to worry about users' browser IRI status and your web server.
Fig.5 Overall work-flow of IRI

Wiki and Blog with Cool IRI
As before, most of BBS and portal sites will still use DB-based URI like "http://www.example.com/view.php?id=1234 &start=1& end=10" which is neither Cool nor IRI. Actually, all Internet addresses don't have to apdapt Cool IRI.
I think most outstanding applications of Cool IRI are likely to be WIKI and BLOG.
The address of "Blog" page in en.wikipedia.org is "http://en.wikipedia.org/wiki/Blog". Yes, it's very natural and esay to guess its wiki address for English users. It is exactly a wiki's working.
But this intended working of wiki is not well done in non English.
In Japanese wikipedia, "ビデオ"(it means 'video') page has the address, "http://ja.wikipedia.org/wiki/%E3%83%93%E3%83%87%E3%82%AA".
The address of "Gegenöffentlichkeit" page in de.wikipedia.org is "http://de.wikipedia.org/wiki/Gegen%C3%B6ffentlichkeit".
I think "http://ja.wikipedia.org/wiki/ビデオ" and "http://de.wikipedia.org /wiki/Gegenöffentlichkeit" are more natural to Japanese and German users.
It's not technically difficult problems, only a problem of adaptation. Maybe some wiki tools may already support this, and in the near future, all wiki will do.
Blog is under the same situation with Wiki. With Movable Type's easy archive mapping and my Cool IRI plugins, you can enjoy Cool IRI.
Vague worry about IRI
Question:
I'm Korean.
Of course, I write my posts mostly in Korean language.
So it is very likely that Korean characters are in my Cool IRI permalinks.
I don't know exactly why, but it seeeeeeeems that non Korean users will have some difficulties in accessing my pages by using Korean IRI.
So I'm hesitating to adapt IRI to my blogs. Am I right?
Answer:
Why do you write your posts in Korean?
Just because you are Korean and Korean language is only one you can write in?
Absolutely not!
No matter how you sensed, the final intention you used Korean in posts is that you wanted to share your posts' content with people who can understand Korean.
As far as you use Korean in writings, Korean IRI or US-ASCII URI doesn't matter to non Korean users.
How many times have you visisted native pages in German, France, Colombia, España, Україна, Việt Nam, Grønlands, नेपाल, Кыргызстан, ประเทศไทย, and so on, until now? Perhaps they all have used US-ASCII URI only.
For example, Korean wikipedia is for Korean, Japnese wikipedia is for Japanese, So Korean wikipedia should consider Korean peoples accessability first and mainly .
You should remember IRI is mainly for your country people. If you really want to share your thoughts/ideas with much more world people, then maybe you will have to write your posts in English or so, then that address of page will be automatically URI. I think Korean contents with US-ASCII URI is also not for foreigners. Concentrate your energy to Korean if you write in Korean.
Anyway, when viewed technically, foreign characters link doesn't hinder us in accessing that resource. Although client's PC doesn't have any of your language fonts, that IRI link will work well even if some are displayed as "squre or so".
Fig.6 Although no such fonts exit in clients PC, link itself works well.

Post a comment