"Universal Acceptance of Domain Names and E-mail Addresses" - What is the issue?
Understanding state-of-the-art of internet domain naming system
Do you think the internet based applications accept all the valid domain names and e-mail addresses you use to provide their services?
"Universal Acceptance" in the context of domain names and e-mail addresses could be sounding a bit ironic to some users of the internet. Afterall, we see domain names and e-mails almost everywhere. These apps/websites which get associated with us right from the moment we wake up, to exercise, to ordering breakfast/grocery, to doing work, to learning online, to recreational activities in the evening or at the night, to the very end of the day which many prefer with the soothing music being played on either YouTube/Spotify/Jio Saavn, all are connected to some domain name and get catered to us through some "account" which uses an e-mail in the back-end, right? Then an obvious question comes to mind. Why is the need for a movement called "Universal Acceptance" for them?
Well, though domain names and e-mail addresses are so pervasive on our daily lives, if you look closely, they majorly belong to one text script only and that is Latin. You do not often find them into any of the other scripts. This however is not the reflection of the lack of technology support. Technology support exists for them in the form of "Internationalized Domain Names" (IDNs) since early 2003, and more formally in a robust manner since 2008. The only thing that is holding wider adoption of the same is the classic "chicken and egg - demand supply issue" here !
Before getting into details, lets see some of the possible IDNs:
The above are fully functional domain names. Go give them a try by clicking on them.
Here are some examples of Internationalized Email Addresses:
- ອີເມວ-ທົດລອງ@ສາກົນ-ການຍອມຮັບ-ທົດລອງ.ລາວ
- ඉ-තැපැල්-පිරික්සුම@විශ්ව-සම්මුති-පිරික්සුම.ලංකා
- 電子郵件測試@普遍適用測試.台灣
- ईमेल-परीक्षण@सार्वभौमिक-स्वीकृति-परीक्षण.संगठन
The above email IDs are also fully functional ones. You can certainly give a try and mail to any of them. You will eventually receive an automated reply with wording as showed in the image below.
Of course, you will need to use an e-mail service that is able to communicate with the Internationalized Email IDs. If not, do consider contacting your e-mail service provider to include that functionality, as it is central to the inclusive internet. To check if the email service provider is maintaining an EAI aware mail-server, do try this wonderful service hosted by UASG named "EAI Check" over here. Just input your own email ID and you will get a response if your email service is aware of the "SMTPUTF8" flag, which forms of the backbone of the EAI aware implementations.
Not only this, apart from usual top-level domain names like .com, .org. or .net; are you aware that there are so many possible top-level domains which have varying characteristics and lengths? Here are some examples:
- .MUSIC
- .TECHNOLOGY
- .ISTANBUL
- .SKY
The existing internet userbase, which deals with the texual aspect of the internet services, largely understands English language and latin script. Hence, apparently, there is no significant userbase which needs to access the internet in their own language/script, simply because that section of the society thinks that it is simply un-available. This further accentuates the problem in terms of adoption of the IDNs as well as new Top Level domains, as majority of the software systems that offer various ranges of services do not allow non-latin characters in the domain names and e-mail IDs submitted by the users, considering them to be some kind of spam. The new top-level domains also get blocked by some string specific restrictions like length validation or so.
Like most of the issues of public interests are often picked up by the eco-system shapers, Internet Corporation for Assigned Names and Numbers (ICANN), the guardian of the names and numbers and the governing administrative policy towards that has started to take steps towards sensitization of both user-community and software developing community towards the same.
The "Universal Acceptance Steering Group", a group comprising of individuals representing more than 120 companies holding different kinds of stakes into the internet ecosystem.
In concrete terms, Universal Acceptance of Domain Names and Email Addresses is defined by a set of five operations, which when undertaken by a software-system in the appropriate form, ensure that the said software-system is "Universal Acceptance" compliant. Those five operations are:
- Accept
- Validate
- Store
- Process
- Display
Fig. 1: Important operations for UA Compliant softwares-systems
Let us dive deeper into what each of the operations entail.
1. Accept:
In any user-input accepting field, on the user-interface of the software, all the characters that can be used as a part of the domain name or e-mail field, should be able to be inputted by the user and the software should ensure proper font and formatting of the same so that the inputted text is clearly visible and readable for the user. In addition to the precautions required on the software implementer's side, there could be some issues at the user side as well which might prevent the Unicode characters in the IDNs from getting rendered correctly. To alleviate those issues, users can refer to the guidelines by the Unicode Consortium here.
2. Validate:
There should not be any discriminatory checks that needlessly invalidate the technically viable formats of the domain names and e-mail IDs including those in local languages. To understand the possible reasons of these validation checks, please refer to this blog at:
https://thinktrans.hashnode.dev/what-is-the-ultimate-goal-of-the-domain-name-and-email-id-validation
3. Store:
When it comes to storing the user-submitted text for later processing, the text should be saved either in it's original form as submitted or in a format that is devoid of any lossy conversions which can alter the fundamental nature of the inputted text. Typically, UTF-8 is one of the most common and universally acceptable format for Unicode storing data.
4. Process:
Processing of the text, as per the Unicode norms can happen at two stages throughout the it's journey.
4.1 Process on input:
Before the processing of the text onto the database, there are various possible Unicode routines, which are applicable for faithful handling of the Unicode text. Those processes, can/should be applied onto the text beforehand. Now the ambiguity of the "can/should" is expressed at this point as whether to apply those conversion routines, or at what stage to apply them is totally dependent on the business logic, the control of which rightfully lies with the software developer. Typically these processes involve
Conversion/Non-conversion from Unicode to Punycode
- If the string is going to get stored in the database column that is not Unicode enabled (for code legacy reasons, otherwise it is definitely recommended to fully Unicode compliant database structure) then the developer might want to convert the Unicode string to punycode. However, if there are a set of operations expected by the developer on the stored domain names, e.g. some sort of searching and sorting, it is desired that the Unicode form of the domain name be preserved.
Conversion as per the Unicode Normalization
- Typically, as per the IDNA protocol, the domain name has to be in a "Normalized" form. This helps prevent the dual representation of the same label. A developer, as per the business logic can take appropriate call if this conversion is needed at this stage.
4.2 Process on output:
This operation on the text often follows after the text is retrieved from the database and is on it's way to subsequent processing for further journey into the user-interface. All the conversions/processes applied on the "Process on input" can either be maintained or reversed or newly applied onto the text, depending on the requirement and business logic.
5. Display:
Finally, after the text is retrieved from the back-end of the system, it should be able to be rendered back into it's Unicode form as understandable by the native user of the script/language.
The software-systems which can comply with all the above 5 major operations while dealing with the IDNs and Internationalized Email Addresses, can be considered to be truly compliant with the notion of the "Universal Acceptance".
Why is the topic so important at ICANN?
ICANN as a body is entrusted with the responsibility of maintaining and managing the policy in relation to the names and numbers, typically domain names and IP addresses. In addition to this, ICANN has become a global forum for internet policy development discourse. Throughout the discussions at various ICANN meetings, "on-boarding of the next billion" frequently forms the core of the discussions. Given the current linguistic spread and the demographics of the new internet users, catering to the needs of the non-Latin and non-English users becomes vital.
As per the ICANN's Strategic Plan for fiscal years 2021 - 2025, ICANN's strategic goal is;
Foster competition, consumer choice, and innovation in the Internet space by increasing awareness and encouraging readiness for Universal Acceptance, IDN implementation, and IPv6.
This discourse is ably supported by the technical protocol developments on the IDN and Internationalized Email front. Thus, two of the major aspects of this discourse, the technical feasibility of the multilingual identifiers as well as the availability of the affordable internet in otherwise non-connected regions of the world, have been fulfilled. This leaves only the third side, that is enablement of the internet based service' ability to cater to the truly multilingual internet to be fulfilled. It is towards that goal, that ICANN wants the support to rally thereby ensuring the fruits of the internet-based human enrichment reaches the unreached quarters of the human-society.