Friday, December 22, 2006

ASP utf-8 encoder from Windows string

I am posting this up, in case someone out there needs the code for encoding windows string to UTF-8.

It's so great to have it working, and it will happily convert your windows string (unicode UTF-16LE) to it's equivalent UTF-8 format.

Here it is. If you feel it's working, please put some comments:

function EncodeUTF8(s)
dim i
dim c
dim baseNum
baseNum = 32768

i = 1

'Response.Write "len(s)=" & len(s) & "
"

do while i <= len(s)
c = clng(ascW(mid(s,i,1))) 'returns the character code.

'if character code point is more than 32767, it's in negative value. So some adjustment.
if c < 0 then
c = (baseNum + c + 1) + 32767
end if

Response.Write "c=" & c & "
"

'normal algorithm to convert from UNICODE code points, to it's equivalent UTF-8 format.
if c < &H80 then
i = i + 1

elseif c < &H0800 then
s = left(s, i-1) + _
chr(RightShift(c,6) OR &HC0) + _
chr(c AND &H3F OR &H80) + _
mid(s,i+1)

i = i + 2

elseif c < &H10000 then
s = left(s, i-1) + _
chr(RightShift(c,12) OR &HE0) + _
chr(RightShift(c,6) AND &H3F OR &H80) + _
chr(c AND &H3F OR &H80) + _
mid(s,i+1)

i = i + 3
else
s = left(s, i-1) + _
chr(RightShift(c,18) OR &HF0) + _
chr(RightShift(c,12) AND &H3F OR &H80) + _
chr(RightShift(c,6) AND &H3F OR &H80) + _
chr(c AND &H3F OR &H80) + _
mid(s,i+1)

i = i + 4
end if
loop
EncodeUTF8 = s

end function


function RightShift(inputVal, position)

RightShift = fix(inputVal / 2^position )

end function