Unicode

Unicode is a multi-language character set designed to encompass virtually all of the characters used with computers today. Unicode characters are represented by a 16-bit value, and differ from other character sets in two important ways. First, unlike the traditional single-byte (ANSI) character sets, Unicode is capable of representing significantly more characters in a variety of languages. Second, unlike multi-byte character sets (where some characters may be one byte in length, while others may be two bytes), the characters are fixed-width, which makes them easier to work with.

Whenever a string is assigned to a property value or passed to a method, that string is in Unicode. If necessary, the control will automatically convert that string to ANSI and it does not require any additional programming on the part of the developer. This is all largely transparent when using the components in high-level languages like Visual Basic. However, in Visual C++ and other languages that deal with COM objects on a lower level, it is important to understand that string values must be passed as BSTRs, which are Unicode strings.

The issue that most commonly confronts developers with regards to how strings are handled by the SocketTools components are with regards to the Read and Write methods. These methods are used to send and receive data over the network, and accept several different types of data. Typically, the data is exchanged as either a string of text characters, or as an array of bytes. Consider the following code:

Dim strMessage As String
Dim strBuffer As String
Dim cbBuffer As Long

Do

  cbBuffer = SocketWrench1.Read(strBuffer, 1024)
  If cbBuffer > 0 Then strMessage = strMessage + strBuffer

Loop Until cbBuffer < 1

In this case, the program expects to receive data from the server which is textual, and it will be stored in the string strMessage. What happens internally is that the data received from the server is automatically converted from an array of bytes into a string by the control. This is done because the control knows that the strBuffer argument is typed as a String, which means it is Unicode. However, what if the data being returned by the server contains binary data or is already Unicode text? In this case, the data may end up being corrupted because of the conversion performed by the control. To prevent this, the solution is to read the data into an array of bytes rather than a string. For example:

Dim byteMessage() As Byte
Dim byteBuffer(1024) As Byte
Dim cbMessage As Long
Dim cbBuffer As Long

Do
  cbBuffer = SocketWrench1.Read(byteBuffer, 1024)

  If cbBuffer > 0 Then
    ReDim Preserve byteMessage(cbMessage + cbBuffer) As Byte

    For nIndex = 0 To cbBuffer - 1
      byteMessage(cbMessage + nIndex) = byteBuffer(nIndex)
    Next
    cbMessage = cbMessage + cbBuffer
  End If

Loop Until cbBuffer < 1

In this case, because the data is being read into a byte array, not a string, then no Unicode conversion is performed and the data is returned exactly as it was sent. Note that Visual Basic also supports the ability to explicitly convert between Unicode strings and byte arrays using the StrConv function. For more information, refer to the language reference and online help in Visual Basic.