Web call (uri2dec) open source project home page

Welcome

August 19, 2007

URI to decimal number translation

Filed under: Technical notes — admin @ 6:55 pm

URI to decimal number translation

Status of this Memo

This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the “Internet Official Protocol Standards” (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.

Copyright Notice

Copyright (C) The Internet Society (2006). All Rights Reserved.

Abstract

This document discusses the translation of the Domain Name System (DNS) names and Unified Resource Locators (URI) into sequence of decimal digits (decimal number) for easy input by user at the numeric keypad.

Table of contents
Table of contents 1
1. Introduction 2
1.1. Terminology 2
2. Problem 2
3. Purpose 3
3.1. Scenario 1 3
3.2. Scenario 2 3
3.3. Scenario 3 3
3.4. Scenario 4 4
4. Intended usage 4
5. Encoding 4
5.1. Hash 4
5.2. Key 5
5.3. Address inversion 5
5.4. Encoding symbol position 5
5.5. Examples 6
5.6. Zero Key case 7
5.8. Extension 7
5.9. Service number 7
5.10. Representation 8
6. Pre-inverted address 8
7. Decoding 8
7.1. Is decimal number a Internet Code of Service 8
7.2. Extension truncation 8
7.3. Key start position 8
8. Special symbols 9
8.1. Lowercase 9
8.2. National codepages 9
8.3. Special symbols encoding 9
9. Searching in address books 9
9.1. Address book applications 9
9.2. Hierarchy of address books 10
9.2. Adding new hashes and keys to the address book 10
9.3. Collisions in hash table 10
10. Full Copyright Statement 10
11. Acknowledgment 10
12. References 10
1. Introduction
This document provides an algorithm of encoding Internet domain names and Uniform Resource Identifiers (URI) [6] into sequence of decimal digits (decimal numbers).
Specifically this document aims to enhance translation of dial numbers to SIP [1] telephone address that allows users to dial URI from numeric keypad directly bypassing referring to the telephone number directories.
1.1. Terminology
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 [7].
2. Problem
Users often use telephone devices with numeric keypad shown in Figure 1 and users like use traditional telephone numbers. SIP phone and any other Internet device in contrast use URI addresses and callers must provide more complicated address than telephone number.
To simplify dialing, SIP server uses directory of local telephone numbers and refer to external directory of global telephone numbers therefore dialed numbers translated to other telephone numbers or Internet address.
A lot of addresses can not be found in such directories. Modern communication devices and SIP soft phones are reachable in the Internet but are simply not listed in directories which are in use.
+—–+ +—–+ +—–+
| 1 | | 2 | | 3 |
| | | abc | | def |
+—–+ +—–+ +—–+

+—–+ +—–+ +—–+
| 4 | | 5 | | 6 |
| ghi | | jkl | | mno |
+—–+ +—–+ +—–+

+—–+ +—–+ +—–+
| 7 | | 8 | | 9 |
|pqrs | | tuv | | wxyz|
+—–+ +—–+ +—–+

+—–+
| 0 |
| |
+—–+

Figure 1. Numeric keypad.

Most known solutions are ENUM [2] and DUNDi [3] based on directory concept.
This memo offer direct translation of sequence of decimal numbers into URI without usage of any sort of directories. It means the resulting encoded sequence can be decoded back to the URI without any additional information.
There are two main steps of encoding:
Entering URI. User press buttons with letters and digits figured on face of button; this sequence of digits named a “Hash”;
Adding sequence of decimal digits about two times shorter than Hash, this sequence of digits named a “Key”
Key must be entered by User or, more preferable, device or SIP server can try recovering Key by searching stored keys in hash table stored in the device, SIP server or in other device in the network environment. Hash tables named Address Books.
Combination of Hash and Key sequences named Internet Code of Service (ICS). This code is not global E.164 telephone number or local number because it is not listed in the directory.
3. Purpose
Encoding described in this memo is intended for entering URI using numeric keypads.
It is possible to enter any textual or binary information, not URI only, but this memo is not discusses that.
There are some scenarios of encoding SIP addresses described below in this section.
Section 5 describes encoding, section 7 describes decoding.
3.1. Scenario 1
User bob@commandus.com would like place a call to Alice. He presses buttons:
2 5 4 2 3
a l i c e
Bob’s soft phone found one hash “25423” in the memory thanks Bob placed call to Alice earlier. Soft phone add a key and send combination of them (ICS), to the SIP server. SIP server find out that entered sequence of digits is not a local or global E.164 telephone number and decode ICS to the address: alice. Then SIP server routes a call to the Alice’s telephone.
3.2. Scenario 2
User carol@atlanta.com placed a call to Alice at address alice@commandus.com a week ago. Carol would like place a call again. She presses letters of URI at the numeric keypad:

2 5 4 2 3 0 2 6 6 6 2 6 3 8 7 0 2 6 6
a l i c e @ c o m m a n d u s . c o m

She is outside of commandus.com domain therefore she must add a destination domain at least. She skips entering a Key.
3.3. Scenario 3
User carol@atlanta.com would like to place a call to Bob’s telephone bob@commandus.com first time:

2 6 2 0 2 6 6 6 2 6 3 8 7 0 2 6 6
b o b @ c o m m a n d u s . c o m
Telephone asking her for a Key, Carol enters a key:
1 1512622351721
Later Carol place a call without key requests (as described in Section 3.3)
2 6 2 0 2 6 6 6 2 6 3 8 7 0 2 6 6
b o b @ c o m m a n d u s . c o m
3.4. Scenario 4
User carol@atlanta.com receives from alice@commandus.com a telephone number list by e-mail:
bob@commandus.com - 26602666263870262 0 1512622351721
alice@commandus.com – 2660266626387025423 0 51262235162333
She asks Alice, why Alice’s telephone number was changed. Alice replied that number are valid both.
4. Intended usage
Typical application is telephone or other communication device; algorithm is responsible for coding URI, for instance:
SIP addresses [1] of VoIP telephones or SIP services;
e-mail and voice mail box addresses, user groups and conferences;
web resources.
Decoding algorithm can be implemented in such devices:
SIP servers,
number/address translation servers;
routers.
5. Encoding
Character sequence to be encoded MUST be represented by subset of Latin character set in accordance to RFC 2396 [6] as described in Section 8.3. This memo supposed that address encoded already.
5.1. Hash
Table 1 shows codes of Latin characters, digits from “0” to “9”, and some symbols of punctuation in accordance to the Figure 1:
Table 1. Codes of characters
Character
Code
“1”
1
“a”, “b”, “c”, “2”
2
“d”, “e”, “f”, “3”
3
“g”, “h”, “i”, “4”
4
“j”, “k”, “l”, “5”
5
“m”, “n”, “o”, “6”
6
“p”, “q”, “r”, “s”, “-”, “/”, “:”, “7”
7
“t”, “u”, “v”, “8”
8
“w”, “x”, “y”, “z”, “_”, “?”, “&”, “9”
9
“.”, “@”, “,”, “0”
0
For address:
alice@commandus.com
Hash would be:
2542302666263870266
The syntax of Hash is defined in an Extended Backus-Naur Formalism (EBNF):
hash = digit{digit}
digit = “0″|”1″|”2″|”3″|”4″|”5″|”6″|”7″|”8″|”9″
User MUST press buttons contains image of letter on face of button, and press “0” instead of “@” and “.” characters.
Supposing names is easy to remember in comparison to digits, User can enter Hash without any troubles.
In some cases it is possible to use much shorter addresses, like:
alice
Hash would be:
25423
Of course Hash can not be translated to the URI definitely. Additional information required to do translation clear named a Key. Device receives Hash can waiting for a Key to be entered by User or device can try initiate searching Key in Address Books that keeps addresses that User enter earlier, receives from other person or, for instance, just looking right now in the browser. Address Book searching is described in Section 9.
If Address Book searching gives positive result, device inform User that address is reconstructed and use that address to place a call.
5.2. Key
Key contains information how Hash must be reconstructed to the URI.
key = (“1”|”0”) space {“1”..”9”}
Key starts with “1” or “0”. Character “0” is used in case of pre-inverted addresses (purpose of such addresses discussed in Section 6), else “1” character is used. Space character is used for text formatting purposes only, similar to signs “+”, “-” and spaces are used in telephony numbers.
If address is not pre-inverted, do address inversion.
5.3. Address inversion
Address inversion starting with splitting address into words. Find delimiter characters: “.”, “@”, “,”, “0” (refers to “0” in Hash) in the address. Delimiter characters are words too.
All words (including delimiters) copy in backward order to the string of inverted address.
This procedure named address inversion. If address is pre-inverted, these steps a skipped.
For instance, address:
alice@commandus.com
consists of five words:
“alice” “@” “commandus” “.” “com”
and after reversing words, address would be:
com.commandus@alice
This address named inverted address. Key MUST BE constructed from non-address (key constructed with reversed symbols).
5.4. Encoding symbol position
Then key is constructed with “1” … “9” digits.
For each character of address determine required quantity of bits in the table 2. For different characters quantity of bits varied from 0 to 3.
Then for each character of address determine position code. Note that characters in address usually are lowercase therefore position code for ‘A’ and ‘a’ is the same. Otherwise refer to the Section 8.
Then bits of position code copy to the zero filled bits array. Quantity of bits to be copied is determined earlier. First character of encoded character sequence must occupy higher address. That array must have enough size or memory must be reallocated. At least 16 octets MUST be reserved.

For instance:
alice
Hash would be:
25423
And bit array would be:
e c i l a
01 00 10 10 01
MSB LSB
Non-significant zeroes MUST BE omitted. Now get bit array octal representation. Split array to triads:
100 101 001
MSB LSB
Now time to do a little trick with octal alignment. Alignment means adding from 2 to 5 alignment bits at the lowest address with left shifting.
Most significant bits MUST be bit 2, 5, 8… and occupy most significant bit in triad. Because sequence in example given is octal aligned already, three alignment bits must be added.
In alignment bits last two bits represents quantity of alignment bits, minus 2.
0…0 L L
MSB LSB
Two bits represent quantity of additional bits. These additional bits (if exists) are 0 always.
In shown example one additional bit to two bits of length is required so alignment bits are: 001 and bit array would be:
100 101 001 001
MSB LSB
Finally, each triplet converts to octal digit (plus 1, it means “0” replaced with “1”, “1” with “2” and so ones):
5622
The aim of octal alignment is avoid bit shifting for addresses in the same domain. Therefore Keys of addresses of the same domain starting with same sequence of digits.

Table 2. Symbol position codes
0
1
2
3
4
5
6
7
Quantity of bits
1

0
c
a
B
2

2
d
e
F
3

2
g
h
I
4

2
j
k
L
5

2
o
m
N
6

2
p
q
R
s
-
7
/
:
3
t
u
V
8

2
w
x
Y
z
_
9
?
&
3
.
@
,
0

2
5.5. Examples
Let see more examples. There is address consists of five words:
alice@commandus.com
Hash would be:
2542302666263870266
Bit array would be:
m o c . s u d n a m m o c @ e c i l a
01 00 00 00 011 01 00 10 01 01 01 00 00 01 01 00 10 10 01
MSB LSB
There are four alignment bits must be added:
0010
and bits array would be (rewrite significant bits in triads, starting with from most significant bit):
100 000 001 101 001 001 010 100 000 101 001 010 010 010
Key would be:
1 51262235162333
and ICS would be:
2542302666263870266 1 51262235162333

Another example of address:
bob@commandus.com
Hash would be:
26202666263870266
Bit array would be:
c o m . c o m m a n d u s @ b o b
00 00 01 00 00 00 01 01 01 10 00 01 011 01 10 00 10
MSB LSB
Rewrite significant bits in triads, starting with from most significant bit:
100 000 001 010 110 000 101 101 100 010
There are 3 alignment bits must be added (it is impossible add 0 alignment bits):
001
and bits array would be:
100 000 001 010 110 000 101 101 100 010 001
MSB LSB
Key would be:
1 1512622351721
ICS would be:
26202666263870266 1 1512622351721
Compare keys:
bob@commandus.com - 8 401511240610
alice@commandus.com – 8 40151124051222
Note octal alignment allows Keys starting with same sequence of digits.
5.6. Zero Key case
In case of calculated Key equals zero, this Key MUST represented as:
1 space 1
or
0 space 1
in case of pre-inverted address.
5.8. Extension
Some code sequences can be shorter than 16 characters. Anyway it is possible add any quantity of “0” characters at the end of ICS.
extension = {“0”}
By such alignment it is possible to provide easy way to separate ICS from local telephone numbers and global E.164 telephone numbers both.
5.9. Service number
Numeric format of Internet Code of Service is:
service_number = address|hash space key [extension]
5.10. Representation
Numeric format of Internet Code of Service (ICS) must be represented as:
representation = “ics:” serviceNumber
For instance:
ics:bob@commandus.com 1 1512622351721
It is first representation form.
There is second representation form:
ics:26602666263870262 0 1512622351721
In this form usage of pre-inverted address is recommended (see Section 6).
6. Pre-inverted address
For address:
alice@commandus.com
inverted address would be:
com.commandus@alice
as explained in Section 5.2 and Hash would be:
2660266626387025423
For address
bob@commandus.com
Hash (after inversion of address) would be:
26602666263870262
Hash of inverted address starting with same digits for identical domains:
266 is hash for ‘.com’,
266626387 is hash for ‘commandus’
In case of address is inverted Key must starts with ‘9’ character not ‘8’.
7. Decoding
7.1. Is decimal number a Internet Code of Service
If length of digit sequence is 16 or more, it is ICS. If digit sequence is shorter than 16 characters, SIP server MUST try to search telephone number in his directory first, if no occurrences found, it is ICS. Device such as SIP server must decode ICS to the address.
Decoded address can not contain name of desired protocol. Device can assign default protocol. It is recommended to use ‘sip’ protocol as default. For instance,
bob@commandus.com
would be expanded to:
sip:bob@commandus.com
It is recommended to avoid protocol name from the address.
7.2. Extension truncation
If code sequence contains “9” at the end, they must be removed.
7.3. Key start position
Decoding itself start from finding key start position. From the end of sequence find out first ‘8’ or ‘9’ digits:
26202666263870266 8 40151124051222
Note key consists of “0” … “7″ characters in body of the Key and starts from ‘8’ or ‘9’.
Key is:
8 40151124051222
Remove character ‘8’ from the beginning of Key.
Remember that it was ‘8’ not ‘9’ meaning address is not pre-inverted.
Address was not pre-inverted, so do address inversion.
Find out in Hash “0” characters occurrences
26202666263870266
and split hash into separate words:
262 0 266626387 0 266
Copy separated Hash words to the string in backward order:
266 0 266626387 0 262
Address inversion done.
Looking at two lowest bits in the Key – it represents quantity of additional octal alignment bits – 1.
Shift the Key 3 bits right.
Then for each character in the Hash find out corresponding bits in the Key using Table 2 and recover address:
com.commandus@bob
Then do address inversion again:
bob@commandus.com
Address restored.
8. Special symbols
8.1. Lowercase
Address often is not case sensitive; in most cases address consists of characters in lower case. In case of address contains uppercase characters, such characters MUST be encoded as described in Section 8.3 as special symbols.
8.2. National codepages
Non Latin characters MUST be encoded as described in Section 8.3 as special symbols.
8.3. Special symbols encoding
Address may contain Latin characters not listed in Table 1, special characters, characters of national code pages. Any of these characters MUST be encoded in accordance to RFC 2396 [6].
For instance:
%20 - space character
%u20AC - Euro sign
9. Searching in address books
Searching in address books aims to expand address by adding a Key if address book contain one occurrence of searching Hash.
9.1. Address book applications
Address book applications are procedures or software applications and network resources, using different sources for validation addresses:
LDAP directories;
DNS servers, including translation E.164 telephone numbers as described in RFC 3761 [2]
DNS cache of personal computer;
Special software application plug-ins. For instance, web browser plug-in can search addresses in history of visited web pages or grab URL from the current viewed web page. E-mail client plug-in can search addresses in collected e-mails.
9.2. Hierarchy of address books
User can have more than one address books. Device initiated search MUST organize searching in different address books in assigned order one by one.
Address book application for each record calculates Hash, using Table 1, and then searching Hash compared with them as substring.
If one Hash is matched Address, Address is found and Key can be added automatically.
If no one Hash is matched see Section 9.2.
If more than one Hash is matched see Section 9.3.
9.2. Adding new hashes and keys to the address book
If no Hash is matched, User must provide a Key.
After Key is entered and validated, device must remember Address (or combination of Hash and Key) in Hash table. After some period of time relatively rare in use Addresses must be deleted from the memory.
9.3. Collisions in hash table
If more than one Hash is matched User can choose Address from the list of matched records.
It is recommended in case of matched records quantity is more than 10 occurrences, Device requests a Key to be entered by User.
10. Full Copyright Statement
Copyright (C) The Internet Society (2000). All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an “AS IS” basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
11. Acknowledgment
I would like to thank ….
12. References
1.Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, “SIP: Session Initiation Protocol”, RFC 3261, June 2002.
2.Faltstrom, P. and M. Mealling, “The E.164 to Uniform Resource Identifiers (URI) Dynamic Delegation Discovery System (DDDS) Application (ENUM)”, RFC 3761, April 2004.
3.Marc Spencer. Distributed Universal Number Discovery (DUNDi), Internet-Draft? October 2004.
4.Peterson, J., Liu, H., Yu, J., and B. Campbell, “Using E.164 numbers with the Session Initiation Protocol (SIP)”, RFC 3824, June 2004.
5.Handley, M., Jacobson, V., Perkins, C., Session Description Protocol (SDP), RFC 4566, July 2006.
6.Berners-Lee, T., Fielding, R.T. and L. Masinter, “Uniform Resource Identifiers (URI): Generic Syntax”, RFC 2396, August 1998.
7.Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels”, BCP 14, RFC 2119, March 1997.

Authors’ Addresses

Andrei Ivanov
email: support@commandus.com
URI: http://commandus.com/

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

About Us | Site Map | Privacy Policy | Contact Us | Copyright © 2007-2011 Commandus software development group . All rights reserved. Powered by WordPress