[Photo: "Keyboard Cat," by Fluffy Avenger.]
On Friday, I joined Rachel Maddow for a segment on the Rachel Maddow Show about news that ICANN will soon begin supporting truly internationalized domain name extensions -- in other words, dot-com, dot-(country name), and the like, typed out non-Latin character sets. Chinese, Hebrew, you name it.
A number of Boing Boing readers commented on that video clip, and had questions. What about scripts that go right to left, like Arabic or Hebrew? Will we all have to buy new keyboards (and keyboard cats)? Is this the end of the internet as a unifying force, and the beginning of greater cultural rifts? WHAT ABOUT THE CATS AND THEIR CATSPEAK WHICH MUST BE TYPED? The Internet is serious business.
Paul Hoffman was one of the authors of the original standards that led to this news. I asked him to address our commenters' questions, and to go into a little more of the geeky techno-historical detail that wouldn't really work in a five-minute television news segment. Paul kindly obliged -- his thoughts follow, and he answers your questions after the jump:
Ever since the DNS was created, many people had the feeling that requiring all names to be only in a limited set of characters was not going to work for the whole world. The Internet was never just a US invention; there were plenty of Europeans soon after things got going. Names are *very* important to people, and the old DNS rules basically forces many people to misspell their names, their company's names, and even worse, their country's names.Starting almost exactly ten years ago, there was a groundswell of interest in the IETF to fix this. The IETF is the standards organization that makes the technical rules for how the Internet works. Unlike most standards organizations, anyone can contribute in the IETF, and all of its standards have always been open and freely available. This helped make fixing this part of the DNS easier, because people from all over the world could help without having to pay anything, or even formally "join" the IETF.
The work from 2000 to 2003 was fairly intense. People unfamiliar with how the DNS works, but who wanted it to work for their languages, had to learn the technology so they could weigh the various proposals for fixing it. People who knew the DNS technology intimately had to learn to let go of their "it works just fine" mentality. Everyone had to get over "the keyboard issue", and to remember that most people go to web sites and send email by clicking, not by typing.
In 2003, the IETF published the standard, called IDNA (Internationalized Domain Names in Applications). The "in applications" part is important: we didn't change the DNS technology at all, we only changed how applications like web browsers and email clients would display domain names. Many application vendors such as Microsoft, Mozilla, and so on, embraced it immediately, as did the domain registries. Thus, you have been able to go to éxample.com for many years.FURTHER READING: If you want to learn more about the IETF, a good intro is "The Tao of the IETF." The actual standards for IDNA are RFC 3454, RFC 3490, RFC 3491, and RFC 3492, all available from this link. Some of Paul's notes on ICANN are here.The IDNA standard we created over five years ago applies to *all* parts of the DNS, including the top level domains (TLDs). However, ICANN was hesitant to open up the TLDs to using IDNAs because of many political, non-technical concerns. The news this week is that ICANN has finally opened up the root zone to useful IDNs, not that the technology was just developed. The technology has been available for years, but the desire to make it available has finally achieved a critical mass within ICANN.
Also note that ICANN's announcement *only* covers IDNs in country names, not in new TLDs. The latter will (or won't) happen a few years from now, and the political discussions there will be even more difficult than it was for the country names. For example, it makes a great deal of sense to let China have an equivalent of .cn in their native script. VeriSign owns .com, which most people think means "commercial"; does it make sense to let VeriSign have the equivalent of the word "commercial" in every native script in the world? If not, in any?
All of that is politics and business, not technology. What the IETF did in 2003 was to make it possible and then let everyone else decide how to use it. The IETF is just about finished with a revision to IDNA that will change things a bit, but again, only on the technology level. Later on, you will see IETF standards for making email addresses (the stuff to the left of the "@") also be internationalized. It's a long, slow process, particularly because it is done by volunteers.
YOUR QUESTIONS ANSWERED.
"What if a human rights group in Canada wants to register a domain name in Chinese or Arabic, in the native-alphabet country extensions for China or Saudi Arabia," she said, "Can the countries involved deny that request? Those are the sort of challenges to free speech that lie ahead."
That is incorrect: they have been able to do it since 2003. In fact, that was one of my driving motivations: I should be able to have a domain name that speaks to the intended recipient. All that happened this week is that now the whole domain name *might* be able to be the native script. I emphasize *might* because many countries don't have open registration policies for names under the country's name.
Maybe it's a bad day for web browser developers. Whatever code just needed to work with ascii characters now needs to work with unicode characters. Or maybe it's a good day for web browser developers because they have more work to do.
That is incorrect: the browser makers added this years ago.
This is a good day for phishers. How long until "ebay.còm" and such addresses that look like one domain but are another.
This is partially incorrect: ICANN is responsible for preventing look-alike TLDs, and has promised that they will be vigilant. In fact, that is some of the reason that we have had to wait as long as we did. The phishing topic has been discussed for over a decade, and the number of people who would mistype a domain name is dwarfed by the number of people who will just click on anything. Avram got it right in #8.
How does this interact with right-to-left scripts?
Great question, and they won't like the answer: not so well. Right-to-left scripts (also called "BiDi scripts" because they use bidirectional characters) have lots of very difficult-to-handle side effects. IDNA deals with them by restricting the labels that have right-to-left characters, and the update coming next year loosens that restriction a bit. This will take over a decade to fully sort out, I'm afraid, but our experience since 2003 is that we did pretty well on the first round.
The homogenization and standardization of English, is one of the main reasons for the explosive growth and globalism of the net. This will only serve to fractionate things and in the end will hurt growth and usefulness. Trendy dialects belong in the history books, English, my friend is the language of the future.
Incorrect: we have seen no significant fractionalization since 2003. The content of web pages has always been able to be non-English, and that is much more important than the domain names that lead to the pages. Internationalizing domain names just makes the access to that content easier for those who type names.
The biggest myth is that this is the first time that you could have internationalized domain names. It will be (once they approve some in a few months) the first time that you can have fully internationalized domain names. People have been using internationalized domain names in a variety of scripts for years now.
It seems like the second myth is that people who can't type the names won't be able to reach the web sites. This myth is hard to kill, even though doing so only takes one word: click.
--Paul
VIDEO: Xeni on Rachel Maddow Show: World Wide Web grows wider, more worldly

The web is not the internet. I do not send emails by clicking. I do not add people to my address book by clicking. However, I guess it'll open up a new world of contact exchange apps.
Personally I can't wait for all the phishing fun to begin.
A common, default language is a beneficial thing. Latin-character-only domain names were, and could've continued to be, an important part of encouraging that common, default language.
Not that it's all over by any means, but still: bad move, Internets overseers, bad move.
"FTFA "ICANN is responsible for preventing look-alike TLDs, and has promised that they will be vigilant."
Oh yeah, that's going to work out swell.
Allowing internationalized domains in the top level has almost nothing to do with typing your email. This is no different than you entering comments in a "leave a comment" box on a web page that is in a language you cannot type.
If someone you want to correspond with has an email address you cannot type on your keyboard, and you are not replying to an email they sent you or a "mailto:" URL on a web page, you may not be able to actually communicate with them anyhow.
>Oh yeah, that's going to work out swell.
Note that I said "is responsible for preventing", not "will prevent". Having said that, *every* DNS registry is responsible for preventing look-alike names, fraudulent registration, and so on. ICANN has it easy: they will only have to deal with applications from countries and organizations with which they have lots of financial and bureaucratic interaction. You should be much more worried about look-alike names and other malfeasance under the TLDs, not the root.
That's pretty cool that you actually contacted an expert to answer commenter's questions. It is definitely a pleasant surprise, and taught me more about the issue.
If you DO need a new keyboard cat, mine is enthusiastic and available for limited engagements.
http://www.flickr.com/photos/digitalartform/4066181663/
ICAANNN, ICANT, whatever. As long as Xeni and Rachel are on the teevee screen together, I shall remain blissed out to the end of eternity.
I'm sure this will be a help for many, and a hazard for a few. Whether the needs of the many etc. etc...
You know that you, yes, you can type all these crazy foreign characters on your existing keyboard, right? I have a typical American keyboard, just like yours. And if I need to I can type ホームページ.日本 into my address bar.
"No I can't" you say. Well, yes you can. You've got the tools, but you just don't know what to do with them.
Which is exactly the problem of non-English (non-Roman alphabet) speakers have faced since the beginning of the 'net. This is just leveling the playing field. Bitching about it makes you look remarkably small and culturally ignorant.
The latest version of MacOS X lets you write out kanji on the trackpad with your finger. Maybe other languages' characters as well. (I'm still using Leopard, so I don't have this feature handy to try out.) So even if your keyboard doesn't support a language, you might still be able to write in it.
Look, I've seen enough science fiction to know that English is spoken on pretty much all planets in the known universe. Chinese, only in the 34 Tauri (2020) system, and other languages, pretty much not.
So it's clear that allowing domain names in non-Latin characters will greatly inhibit our ability to share technology with galactic alien species.
>A common, default language is a beneficial thing. Latin-character-only domain names were, and could've continued to be, an important part of encouraging that common, default language.
That's easy to say when it's your language that's the "default".
"That's easy to say when it's your language that's the "default"."
It's the default because it's the language of the people who actually invented the thing.
Encouraging a standard language (or at least letters) is not a horrible thing.
gee, now I want a rongo-rongo IP address.
By that logic, shouldn't you need to learn hieroglyphics to write on paper?
I, for one, welcome our new ホームページ.日本.
Seriously though, won't this just open up a new market for applications that allow user-friendly typing of foreign characters?
That is how it happened, pretty much. Look at the development of the alphabet.
Actually paper comes later, for that we would need to write Aramaic, or Greek.
The point is, not everybody on the internet speaks English or has the wherewithal to learn it. Besides, we Americans could benefit by learning a couple of other scripts, our multi-linguistic deficiencies are pretty appalling.
English is not the world's default language, it's just the closest thing the world has to a default language. There's still only a moderate fraction of the world's people who can speak or write it.
Keeping domains in english might make the technical side more elegant, but encouraging participation in the internet by the overwhelming majority of the world which currently doesn't, has got to be a net benefit for the internet and the world.
My Fyunch(click)assures me that pretty much every other alien civilization can work around our pathetic human language.
Actually I'm pretty sure I can communicate with a very large percentage of Europe that speaks English as either a primary or secondary language, but where the local language contains any number of accents, umlauts, graves and other special characters.
Right now I do not actually need to go into the 100,000+ unicode characters and search out the one that matches some particular language. Never mind the letters that look nearly identical to their latin counterparts that you'll probably never be able to visually differentiate unless you speak that language.
Basically this change will create local internets and there's nothing wrong with that, but let's not pretend there aren't downsides to doing this - especially when it comes to things like phishing and simple things like handing out business cards.
I visit a lot of Korean or Japanese sites, mostly to properly tag my music in the respective language. This could possibly make things difficult. I'm copy-pasting because I can't type in the language in the first place, if the site is in the non-latin characters it would be difficult to get to.
But yea seriously, it's a whole new world for phishing attacks. And yea DNS is a lot like phone numbers in a way, (yes I know actually IPs are a lot more like phone numbers, but work with me here), and just like phone numbers, they need to work internationally. But almost every phone in the world has Latin numbers. That's because both phone numbers and DNS are control characters, and so to work most efficiently everyone using the system needs to be enter the characters in. Now that doesn't mean all webpages and email needs to be in English just like every phone conversation doesn't need to be in English. Imagine the difficulty if phones in different countries had different characters for numbers. They're wouldn't be much international calling. And numbers in one character set are actually comparatively easy to translate.
scifijazznik, I find that ease of communication in multiple languages helps facilitate rishathra negotiations.
Three words: two email addresses. We old farts remember when the same issue came up for email in the multiple walled gardens: did you put your CompuServe *and* your AOL *and* your newfangled Internet address on your card? When did you stop using one or the other?
No one is forcing anyone to use the new internationalized top level domains, just as they are not being forced to use the internationalized second level domains that have been available for years. Some companies are using the new IDNs, many aren't. When there are fully-internationalized domain names, there might be a big increase in the usage, there might not. (Well, we know that there will be in China because the government has been pushing IDNs for years, but guessing about the rest is just guessing.) There will be transitional pain, but it will come from the users trying new things in their native scripts, not from the standards bodies and ICANN stopping them from trying.
Why should I care? Don't speak or read Chinese, not going to bother translating a random site on speculation it might be of interest. Once it becomes evident that there isn't any money in having an 愚蠢.愚蠢 address this will all be forgotten. As provincial as it might seem, Roman characters as used in English thrive because they are easy to recognize and differentiate. With the exception of Q and O, it's pretty damn hard to mix them up even if you don't know any English at all. I don't think that there is a real legitimate need to differentiate ebay.còm from ebay.com
I would gladly exchange an all Xeni/Rachel channel for a rishathra-free six weeks. But I don't think that's how it works...
Seems like we're getting closer to the mashed up language/tech of Firefly.
Of course, now we have to buy Bøingbøing.net. And Bôing, and Bóing ....
Once it becomes evident that there isn't any money in having an 愚蠢.愚蠢 address this will all be forgotten.
You do know that a quarter of the world's population uses those characters?
I think allowing non Latin character based languages to be part of the fundamentals of the internet is a mistake. Just as it would be in the world of commercial aviation not to all speak aviation English. Predictable internet and plane crashes will occur.
Uh, is it me, or is everyone missing something obvious? Names are just entries in DNS that resolve to an IP address. Two names can resolve to the same IP address. So if you are an American speaking English in America with a US keyboard and absolutely refuse to figure out how and which characters to type in another language, I'm sure you'll be able to continue visiting that questionable web site in Sweden with the new TLD (whatever it will be) and the current ".se".
What's with all the whining? This won't affect most of you at any level at all. On the other hand it will affect millions, if not billions of people in a small, but positive way. It will not lead to "predictable internet crashes" and it will not lead to "local internets" (whatever that is). Us non-english speakers already have our own websites, TLDs, domains with our fancy characters and we even communicate in our own languages with our fancy characters. If you don't know our languages you will not register this at all.
No, you could already get those addresses. Now you could get, for instance, бойнгбойнг.ро! (In theory, at least.)
This is partially incorrect: ICANN is responsible for preventing look-alike TLDs, and has promised that they will be vigilant.
ICANN's record to date in this area is 100% failure. That's why there are hundreds of thousands of such domains, and more every day, much to the delight of phishers, SEO scammers (all SEO operations are scams), spammers -- and registrars, who of course make money every time one is registered.
There's no reason to expect this to change: there's too much money to be made.
The Titanic had to sink before radio operators truly began to appreciate the need to accommodate different codes and methods of transmission internationally. I'm just glad no one had to die before ICANN figured out that the internet should fully incorporate multiple languages.
The internet is supposed to be worldwide but the dependence on English actually serves to exclude a huge portion of the world population. So, lighten up, folks. Let everyone else have a chance to join in. Adding some extra site addresses in non-Latin characters won't kill you. And giving them a shot online won't prevent you from watching hairless cats wrestling (or whatever) on YouTube.
A lot of people cope with English-only internet by simply avoiding the web altogether. So you won't have instant and immediate access to sites in non-English characters--Guess what? Right now you don't have access to those people's ideas, photos (and hairless cat videos, as the case may be) AT ALL.
Besides, ever heard of free on-line keyboards anyone? It's not rocket science. And you can still always cut and paste the new addresses even if your keyboard doesn't support that script. Just ask all the people out there now dealing with English web addresses when they don't actually speak a language that uses any form of the Latin alphabet. If they can do it, then so can we. Sheesh.
I find the English-only whining about this to be appalling. There are millions and billions of web pages not in roman characters, because the people writing and reading them speak their own, perfectly good, languages! It's all going to be in Unicode, which just works. The criticisms are xenophobic and incredibly insecure.
I've been trying since 1998 to get search engines to recognize diacritical characters and non-Roman scripts, because the point is to be useful to everyone, not just the favored few.
Internationalised domains are the next step to provide an internet that is truely global, rather than English centric. Imagine if the DNS servers were executed Russian or Arabic and not English. The real reason for the English centricity is because of the programming languages used to decode the names into numbers. it all started the 1950s with Algol and Russians, French, Germans, Americans and a couple of other guys decided then that programming would be done in English. back then we only had 127 characters to choose from.
This is bigger than Phishing, it is bigger than simple mashups, a couple of mp3 downloads or even two email addresses. This is going to enable people from every nation on earth to be able think, type and read in their own language. This is bigger than english people wanting to visit other countries domains but forcing the Japanese/Chinese/Arabic/Indonesian/Amharic/Jewish/// people to learn english to do the same thing.
Bring this on and support it with all your might.
There are more technical and social problems to overcome moving into the next generation of the internet. More brilliant minds from all around the world can now contribute rather than being stuck in the dark ages of learning English to communicate. It is a little bit like the Roman church conducting services in Latin few people understood it.
... and the email spam using IDN domain urls is already starting. I'm sure you'll see your own copies soon enough.