Xeni on Rachel Maddow Show: World Wide Web grows wider, more worldly

Update here, with Q&A.

Rachel Maddow, host of all that is worth watching on television, very kindly invited me back to The Rachel Maddow Show tonight for a "Moment of Geek" on the big ICANN news today: starting soon, domain name extensions will be available in non-Latin character sets. Chinese, Greek, Arabic, or any one of the more than 20 official languages in India. In other words, the alphabet you're reading this blog post in will no longer be the default for web addresses.

You can watch the video here.

When Ms. Maddow's team invited me in earlier today, the first thing I did was phone Hong Kong-based journalist and global 'net culture researcher Rebecca MacKinnon (Twitter: @rmack), who was in Seoul attending the big ICANN meeting. She has written extensively on this topic, and helped me parse the news.

First up for the "non-Latin" extensions? Country-specific domain names (.cn for China, for instance). Later on, everything else (.com and the like). Don't expect to see "dot china" in Chinese characters right away, explained Rebecca: starting November 16, registrars can begin to apply, but it'll be a while before the domains show up in the wild.

Some US tech reporters covering the news ran with but what about meeee! headlines. "This is a bad day for the English language," wrote one. Well, someone call the whaambulance -- it's an awesome day if you read in Farsi or Hebrew. It's not about our language, it's about the languages spoken by the next billion people to come online, and most of them don't speak English or write in a language based on our Latin character set.

As MacKinnon reminded me from Seoul, today's announcement follows earlier news that ICANN, formed ten years ago under the auspices of the US Dept. of Commerce, will no longer answer directly to the US, but to a sort of congregation of world governments.

Many groups around the world from non-governmental organizations and civil society still have concerns about ensuring their voices are heard.

"What if a human rights group in Canada wants to register a domain name in Chinese or Arabic, in the native-alphabet country extensions for China or Saudi Arabia," she said, "Can the countries involved deny that request? Those are the sort of challenges to free speech that lie ahead."

More online:

41 Comments

| Leave a comment

I'm so glad this is becoming a regular thing. I have both pure and dirty reasons.

Why is it a bad day for the English language? Maybe it's a bad day for web browser developers. Whatever code just needed to work with ascii characters now needs to work
with unicode characters. Or maybe it's a good day for web
browser developers because they have more work to do.

Nice. My only nit to pick is poo-pooing Rachel's question about how we gain access to these new domains, if for no other reason than the millions of Chinese, Korean, Indian, Japanese, Arab, etc people living in the US who will be very interested in accessing this information.

Still, it's a Big Deal globally and it's good to see the coverage. The more you're on @maddow, the better.

You looked fab. Anyway this is a positive thing for the internet. So other than english, I'm very tower of bably illiterate in other languages, unless there are dirty words involved. I know only actually know PUTA and SCATA in spanish and greek in them. I probly know more but can't think of them at the moment. The C word seems to have alot in common with many languages, but hey if I say them here I'll get reamed in my mental anus for it. Who care's about how the word originated as a very nice one. I used that word on the Breaking Bad site. Started out bad, butt ended up informative. Some of Lady Chatterley's lover poetry. Cunninglis stuff. Anyway to get back to this topic. I liked how the typing keyboard was on there with all of the various characters. As a Moment of Geek, I do know that there were competing keyboards presented in typewriters when the typewriter was invented. It was a pretty big deal. Very much like if not identical to the MAC V PC contraversy. The only reason we have this keyboard as I recall is the who invented this keyboard had more monetary backing. Its not the easiest. It's all got to do with patents and money. Check it out on Wikipedia. I'm wondering why there hasn't been a movie made about it. It would make a great movie.

That post was like a fun/scary ride at the state fair....

This is a good day for phishers. How long until "ebay.còm" and such addresses that look like one domain but are another.

Hey Cog, Its Halloween week here in Milwaukee, a big lake type place where as it turns out the US typey writer thingy keyboard was ultimitly patented. I also can recall that there was a movie made about it, but its a 40's or 50's movie. Period piece. A comedy about the first female secretary? When reading WIKIpedia, it gives no credit to women in any country as to its invention or plausible use. Dang, I know there's a movie out there! Gwen Verdon? Perhaps? OYE!

Rob, or how about "ebαy.com" or "ΕΒΑΥ.COM"? (That second one spells eBay with capital epsilon, beta, alpha, upsilon.)

A friend told me once that the 7-bit ASCII character set was designed to be able to support all of the major languages of the Western hemisphere -- English, Spanish, French, Portuguese -- provided that your output device was capable of overstrike. (You could use a comma for the little hook under the cedilla.)

Slam dunk explanation Xeni.
As for the question: "But-but-but-but! How will the English speaking person access those sites?" I have the idea that, if the site has only such an URL it will be written using those characters anyway. So, anyone wanting to access it will have to be able to write in that language or to copy/paste or to click on a link or to use a bookmark or... If an English version exist, then there will be a double URL.
There are just too many ways and darn TV gives one so little time to explain.

That's pretty cool.

I wonder though - will people also be able to write HTML in other charsets, or will web development remain an english-based activity?

I'm with you on that one. Can't wait for the Internet to become segmented beyond repair.
Won't it be awesome when you'll have to learn Chinese to read a given software's source code? There is already a russian Delphi clone, dubbed "1С". That's not a "C" in there, but the cyrillic letter "es".

The only people it's a bad day for are domain name squatters. It is now going to get a lot more expensive to be the bridge trolls of the internet.

@ tweeders1 #4:

You're from craigslist, aren't you?

"This is a bad day for the English language" ... what a terrible article !

Yes, this will be used to some extent by domain squatter, but since when do we stop developping something because it could be used by "bad guys" ?

Chrome, firefox and the others are getting better and better at filtering phishing websites. And people are getting smarter.

How does this interact with right-to-left scripts?

HA...if your for this-Your a Racist!

If your against it-Your a Racist!

It is a win-win or a lose-lose situation.

This is like opening the gates of hell, some are jubilant, others scared to death.

I have to disagree - the homogenization and standardization of English, is one of the main reasons for the explosive growth and globalism of the net. This will only serve to fractionate things and in the end will hurt growth and usefulness. Trendy dialects belong in the history books, English, my friend is the language of the future.

That's more the type of examples I was thinking of, but I wasn't sure how to type them.

What year is it where you live? Apparently you missed the last decade for the least.

The richness and diversity of English is what makes it a language worth communicating in. Sure there is a place for a homogenized standard English, but it's not the internet.

US Department of Commerce. Not the US Chamber of Commerce.
http://www.ntia.doc.gov/ntiahome/domainname/domainhome.htm

What are the trendy dialects you are referring to, yrarbil cilbup? Hindi, with 490 million speakers? Arabic, with 255 million? Farsi, with 110 million? You have made one of the dumbest pro-English statements I have heard? Using unicode characters for urls might present modest
challenges for some software developers but it isn't going to fractionate anything.
It was simpler to use a smaller ASCII character set for web addresses but it's time to stop dumbing things down
so Americans don't get confused.

Ugh thanks, fixing!

@djrempel, it's already possible to write html in unicode characters. You just need a text editor with unicode support.

Mitch, are you talking about the HTML tags themselves, or just the page text?

I need a babelfish

Thanks, Xeni, very clear and interesting explanations.

I read in Hebrew and let me tell you I dread the day that I'll have some of the domain names in Hebrew, or worse yet, mixed Latin and Hebrew.

For one, this will be a killer of any search capabilities.
It will also cripple accessibility if I'm ever on a computer that does not have a Hebrew keyboard on it.

Google, for instance, occasionally reverts to Hebrew on my computer just because I'm situated in Israel.
If I want to switch back to English, I have to access a Hebrew menu, where all the language options are also in Hebrew (ie. אנגלית, הונגרית, רומנית etc.)
Imagine if I were a non-Hebrew speaker who just landed in Israel. I'd not know which of these options actually mean "English".

This is not really a bad day for the English language (well, it might be, but that's incidental). It's a bad day for the cohesion and accessibility of the internet. This could well be the first step in compartmentalizing the internet.

You guys at Boing Boing talk about Net Neutrality and not having a two-tier system. This could well pave the way for such an outcome any way.
Having whole parts of the internet only available to some seems like a slippery slope.

This is a bit confusing. All this announcement means is that new TLD's can be created in non-Latin characters. Domain names under some existing TLD's can already be created in non-Latin scripts. I was unaware of the latter, but I just registered a non-Latin .com and it seems to work.

Oddly the .co.kr domain doesn't seem to accept non-Latin, so there's some progress that needs to be made on the existing TLD's.

Here's a summary:

김치.com -- already implemented, domain-squatted
김치.co.kr -- doesn't work, apparently no non-Latin support
김치.캄 -- possible future TLD per this announcement

I give it 3 weeks after this goes into effect before Elvish and Klingon are added.

Regardless of the language of the domain, won't they still have to prefix it with the very Latin-character-based "http://" if they're providing the protocol as well?

No, I meant the text within the page. It didn't occur to
me that anyone would want to write the tags in another language. Thinking more on that, though, if urls in non-Latin character sets are supported, then non-Latin character sets for the values of the attributes "href" and "src" inside the "img" and "a" tags should be supported.

I give it 3 weeks after this goes into effect before Elvish and Klingon are added.

I'm already disappointed that BB comments don't accept tengwar.

won't they still have to prefix it with the very Latin-character-based "http://"

Why isn't there a key for that character string?

You might think the barriers between cultures are high enough already. This means that EVERYONE is going to need translators, not just armies of occupation and cops in Texas. Say......! Do you suppose this is The Next Big Thing? Of course, there's still the Jesuit Problem to work out -- if you're sympathetic enough to another culture to learn the language, it means you can no longer be trusted. My guess is, next year Microsoft buys Babelfish!

Do you know that there were web sites in languages other than English before? Having domain names in non-Latin character sets is going to make a small difference.

I learned to read Hindi before I learned to type it. Now
it there had been domain names in Hindi I might have
learned to type Hindi to use them. Learning to type in
a language that you're trying to learn isn't necessarily
a bad thing.

It's very likely that many people are going to register
their domains in Latin as well as in their own native
character sets to accommodate their expatriate
communities, so there isn't really cause for alarm yet.

Web sites in languages other than English weren't
readable by people who only speak English before, so
not that much has really changed.

Well.. most chinese sites are written in chinese to begin with, this isn't going to change anything other that the domain-names. The big challenges will be to upgrade all domain-servers to support unicode. Seing as I'm from a country where we too use characters not in the English alphabet, such as æ ø and å. You can use them, but the actual domain-name is translated, such that, for example; båingbåing.no is actually registered as xn--bingbing-9zae.no, but the browser translates for you. Unicode is a huge step forward.

Hey guys -- there's an update on this post here, with lots of your questions answered.

Link

So, out of curiosity, is this just a translation thing? Or does this mean that there will be domains that are totally in, for example, chinese charaters? Is this I, as an english speaker, type in msnbc.com go to the same place as someopne in China who types in msnbc.com in Chineese characters? I am just curious how this will work. I think it is great that they are accomodating languages that are structurally differnt than English.

Rachel said that this change would make it look less like the Web was "invented in America". Sir Tim Berners-Lee, the English guy who invented the World-Wide-Web while working at CERN, a European international institution in Switzerland, might find that amusing. Gopher was invented in America...


I'm opposed to this change, because while having Unicode characters in domain names would be a fine thing, the xn-ugly-hack representation system called Punycode that ICANN adopted to implement it is an appalling mess.


On the other hand, at least somebody's picked up the ICANN-has-cheeseburger meme :-)

Leave a comment

Anonymous

More items

What MP3 player should I buy?

I'm in the market for a new MP3 player -- my second-gen iPod Nano is finally dead, and I don't want to buy another iPod, or any other player with DRM built in. I figure that any company that wants to devote its engineers to figuring out how to frustrate my desires doesn't really want my business. W... More.

Xeni on Rachel Maddow Show: World Wide Web grows wider, more worldly

Update here, with Q&A. Rachel Maddow, host of all that is worth watching on television, very kindly invited me back to The Rachel Maddow Show tonight for a "Moment of Geek" on the big ICANN news today: starting soon, domain name extensions will be available in non-Latin character sets. Chinese, ... More.

Ebook license "agreements" are a ripoff

In today's Observer Business column, John Naughton discusses what a ripoff it is for ebook vendors to "sell" you books with abusive, multi-thousand word "license agreements," pretending that because you bought your book over the network, it wasn't a sale, and so you don't get to own it. These "lice... More.

Hitler: football coach?

The Scottish veterans charity Erskine surveyed 2,000 young people between the ages of nine and 15 about World War I and II. Apparently, five percent thought that Hitler was a German football coach; sixteen percent believed that Auschwitz is a WWII theme park; five percent said the Holocaust was a ba... More.

Sleep: more important than you think (Psychology Today)

"Getting enough sleep, on a regular cycle, may make us a better version of ourselves. And even though my greatest wish is usually more time in the day, I'd rather feel good and perform well than get to be a crankier, impulsive, sick version of myself for a few extra hours a day."... More.