MQTT 5.0 analytics platform, greenfield project build, part six (Java vs MQTT datatypes and interoperation)

This post will look at the different datatypes within Java, MQTT and some commentary around how to interoperate.

Within the Java environment the correct use of datatypes is essential.  As the documentation states:

The Java programming language is statically-typed, which means that all variables must first be declared before they can be used. This involves stating the variable's type and name...

The set of primitive types is:

  • byte: The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive). The byte data type can be useful for saving memory in large arrays, where the memory savings actually matters. They can also be used in place of int where their limits help to clarify your code; the fact that a variable's range is limited can serve as a form of documentation.

  • short: The short data type is a 16-bit signed two's complement integer. It has a minimum value of -32,768 and a maximum value of 32,767 (inclusive). As with byte, the same guidelines apply: you can use a short to save memory in large arrays, in situations where the memory savings actually matters.

  • int: By default, the int data type is a 32-bit signed two's complement integer, which has a minimum value of -231 and a maximum value of 231-1. In Java SE 8 and later, you can use the int data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 232-1. Use the Integer class to use int data type as an unsigned integer. See the section The Number Classes for more information. Static methods like compareUnsigneddivideUnsigned etc have been added to the Integer class to support the arithmetic operations for unsigned integers.

  • long: The long data type is a 64-bit two's complement integer. The signed long has a minimum value of -263 and a maximum value of 263-1. In Java SE 8 and later, you can use the long data type to represent an unsigned 64-bit long, which has a minimum value of 0 and a maximum value of 264-1. Use this data type when you need a range of values wider than those provided by int. The Long class also contains methods like compareUnsigneddivideUnsigned etc to support arithmetic operations for unsigned long.

  • float: The float data type is a single-precision 32-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. As with the recommendations for byte and short, use a float (instead of double) if you need to save memory in large arrays of floating point numbers. This data type should never be used for precise values, such as currency. For that, you will need to use the java.math.BigDecimal class instead. Numbers and Strings covers BigDecimal and other useful classes provided by the Java platform.

  • double: The double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.

  • boolean: The boolean data type has only two possible values: true and false. Use this data type for simple flags that track true/false conditions. This data type represents one bit of information, but its "size" isn't something that's precisely defined.

  • char: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).

When working with binary messaging we often need to represent data in datatypes that are not native to Java - datatypes not included in the list above.

MQTT has the following set of datatypes:

1.5.1 Bits
Bits in a byte are labelled 7 to 0. Bit number 7 is the most significant bit, the least significant bit is assigned bit number 0.

1.5.2 Two Byte Integer
Two Byte Integer data values are 16-bit unsigned integers in big-endian order: the high order byte precedes the lower order byte. This means that a 16-bit word is presented on the network as Most Significant Byte (MSB), followed by Least Significant Byte (LSB).

1.5.3 Four Byte Integer
Four Byte Integer data values are 32-bit unsigned integers in big-endian order: the high order byte precedes the successively lower order bytes. This means that a 32-bit word is presented on the network as Most Significant Byte (MSB), followed by the next most Significant Byte (MSB), followed by the next most Significant Byte (MSB), followed by Least Significant Byte (LSB).

1.5.4 UTF-8 Encoded String
Text fields within the MQTT Control Packets described later are encoded as UTF-8 strings. UTF-8 [RFC3629] is an efficient encoding of Unicode [Unicode] characters that optimizes the encoding of ASCII characters in support of text-based communications. 
Each of these strings is prefixed with a Two Byte Integer length field that gives the number of bytes in a UTF-8 encoded string itself, as illustrated in Figure 1.1 Structure of UTF-8 Encoded Strings below.  Consequently, the maximum size of a UTF-8 Encoded String is 65,535 bytes. Unless stated otherwise all UTF-8 encoded strings can have any length in the range 0 to 65,535 bytes.

1.5.5 Variable Byte Integer
The Variable Byte Integer is encoded using an encoding scheme which uses a single byte for values up to 127. Larger values are handled as follows. The least significant seven bits of each byte encode the data, and the most significant bit is used to indicate whether there are bytes following in the representation. Thus, each byte encodes 128 values and a "continuation bit". The maximum number of bytes in the Variable Byte Integer field is four. The encoded value MUST use the minimum number of bytes necessary to represent the value [MQTT-1.5.5-1]. 

1.5.6 Binary Data
Binary Data is represented by a Two Byte Integer length which indicates the number of data bytes, followed by that number of bytes. Thus, the length of Binary Data is limited to the range of 0 to 65,535 Bytes.

1.5.7 UTF-8 String Pair
A UTF-8 String Pair consists of two UTF-8 Encoded Strings. This data type is used to hold name-value pairs. The first string serves as the name, and the second string contains the value. Both strings MUST comply with the requirements for UTF-8 Encoded Strings [MQTT-1.5.7-1]. If a receiver (Client or Server) receives a string pair which does not meet these requirements it is a Malformed Packet. Refer to section 4.13 for information about handling errors.

So, how do we represent these MQTT datatypes within Java?

Bits can be represented using a BitSet and that's quite clear.

A two-byte integer in MQTT is unsigned - so the minimum value is 0 and the maximum is 65,535.  Compare this to a two-byte integer in Java has a range from a minimum of -32,768 and a maximum value of 32,767. The range of values that can be persisted is the same but the values used differ.  As such, to represent this signed two byte integer within Java a simple pattern is to "go one size up" and use an int within code and then convert to a two byte representation when transmitting over the wire.

The question about why Java does not natively support unsigned datatypes is a well trodden path:

"Use the Integer class to use int data type as an unsigned integer. See the section The Number Classes for more information. Static methods like compareUnsigneddivideUnsigned etc have been added to the Integer class to support the arithmetic operations for unsigned integers.

Stackoverflow 

For the MemoryMappedFile implementation for MQTT persistence we find something that's atypical for Java - unsigned datatypes...

import java.io.RandomAccessFile;



Reads an unsigned eight-bit number from this file. This method reads a byte from this file, starting at the current file pointer, and returns that byte.

This method blocks until the byte is read, the end of the stream is detected, or an exception is thrown.

Specified by: readUnsignedByte() in DataInput

Reads an unsigned 16-bit number from this file. This method reads two bytes from the file, starting at the current file pointer. If the bytes read, in order, are b1 and b2, where 0 <= b1, b2 <= 255, then the result is equal to:

     (b1 << 8) | b2 

This method blocks until the two bytes are read, the end of the stream is detected, or an exception is thrown.

Specified by: readUnsignedShort() in DataInput


So, if you spend some time on creating "up-one-size" representations within Java so your code can work with MQTT, you may find some interesting little parts of the core Java framework that do support unsigned datatypes.



See also:


https://github.com/AlignmentSystems/CodeGen

Comments