Re: [P1619-3] I18n of SO_GUID
On 2008-May-28, at 15:17, Luther Martin wrote:
> OK, here's an attempt at the ABNF definition of the URL SOGUID that
> allows UTF-8.
>
> I've validated the ABNF syntax.
> <abnf with utf8.txt>
I do not recommend mixing characters with octet concepts as it breaks
size constraints.
Do really need the complexity of UTF-8 character validation given that
handles names, as presented to the API, may contain octets filled any
binary pattern? I don't think we do.
I simpler approach may to be avoid the raw encoding of the UTF8 shift
characters. I.e.: (revised from my earlier EMail):
# UFT8-SAFE non-dot octet avoids NUL (%x00)
# UFT8-SAFE non-dot octet avoids space (%x20) thru / (%x2F)
# UFT8-SAFE non-dot octet avoids : (%x3A) thru @ (%x40)
# UFT8-SAFE non-dot octet avoids [ (%x5B) thru ` (%x60)
# UFT8-SAFE non-dot octet avoids { (%x7B) thru del (%x7F)
# UFT8-SAFE non-dot octet avoids %x80-%xFF
# UFT8-SAFE non-dot octet allows %x01 thru %x1F
# UFT8-SAFE non-dot octet allows 0-9 (%x30-%x39)
# UFT8-SAFE non-dot octet allows A-Z (%x41-%x5A)
# UFT8-SAFE non-dot octet allows a-z (%x61-%x7A)
o <UFT8-SAFE non-dot octet> = (%x01-%x1F / %x30-%x39 / %x41-%x5A /
%x61-%x7A)
o <UFT8-SAFE octet> = <SAFE non-dot octet> / <dot>
...
• <SO_Handle> = <UFT8-SAFE handle> / <non-UFT8-SAFE encoded handle>
o <UFT8-SAFE handle> = (ALPHA / DIGIT) 0*254 <UFT8-SAFE octet>
o <non-UFT8-SAFE encoded handle> = <handle first octet> 0*254 <UFT8-
SAFE next octet>
o <handle first octet> = ALPHA / DIGIT / <non-alphanumeric encoded>
o <non-alphanumeric encoded> = “%” (“0” / “1” / “2”) <hex>
o <non-alphanumeric encoded> =/ “3” (“A” / “B” / “C” / “D” / “E” /
“F”)
o <non-alphanumeric encoded> =/ “%” “4” “0”
o <non-alphanumeric encoded> =/ “%” “5” (“B” / “C” / “D” / “E” / “F”)
o <non-alphanumeric encoded> =/ “%” “6” “0”
o <non-alphanumeric encoded> =/ “%” “7” (“B” / “C” / “D” / “E” / “F” )
o <non-alphanumeric encoded> =/ “%” (“8” / “9” / A” / “B” / “C” /
“D” / “E” / “F”) <hex>
o <UFT8-SAFE next octet> = <UFT8-SAFE octet> / <dash> /
<underscore> / <UFT8-UNSAFE encoded octet>
# UFT8-UNSAFE encoded octet encodes any octet that is not <UFT8-SAFE
octet> nor <dash> nor <underscore>
o <UFT8-UNSAFE encoded octet> = “%” "0" "0"
o <UFT8-UNSAFE encoded octet> =/ “2” <digit>
o <UFT8-UNSAFE encoded octet> =/ “2” (“A” / “B” / “C” / “F”)
o <UFT8-UNSAFE encoded octet> =/ “3” (“A” / “B” / “C” / ”D” / “E” /
“F”)
o <UFT8-UNSAFE encoded octet> =/ “%” “4” “0”
o <UFT8-UNSAFE encoded octet> =/ “%” “5” (“B” / “C” / “D” / “E”)
o <UFT8-UNSAFE encoded octet> =/ “%” “6” “0”
o <UFT8-UNSAFE encoded octet> =/ “%” “7” (“B” / “C” / “D” / “E” / “F”)
o <UFT8-UNSAFE encoded octet> =/ “%” (“8” / “9” / A” / “B” / "C" /
"D" / "E" / "F") <hex>
The above avoids the non-POSIX issues, avoids conflicts with the
reserved namespaces, preserves size limits and remains UTF-8 safe
because the UTF-8 shift characters are encoded.
chongo () /\oo/\