~/posts/pokemmo-network-protocol-2

Intro

In the last post we built our foundation. One TCP connection, a 2-byte length frame, a stack of layers, and a three-message handshake that leaves both sides holding a shared secret. We stopped right as the encrypted frames started.

This post is about those layers. Everything that happens to a frame after the handshake and before it leaves the machine. That is three things: The keys get derived from the shared secret, the payload gets encrypted, and an integrity check gets stamped on the end. There is also a compression layer underneath, so we cover that too.

As before, the code is the OpenMMO reference. The Game vs OpenMMO notes flag where the real client’s exact numbers still need confirming.

From a shared secret to keys

At the end of the handshake both sides run the key exchange over their ephemeral keys and end up with the same shared secret. A shared secret is not a key though. It is a blob of bytes. We need to turn it into something the cipher can use, and we want each direction of traffic to use a different key so that traffic one way cannot be replayed the other way.

The derivation is a small hash sandwich. Take a salt, the secret, and the salt again, hash all three together, and keep the first 16 bytes. Two different salts give two different keys, one for the client direction and one for the server direction.

// salt || secret || salt, hashed, first 16 bytes kept
fun derive16(secret: ByteArray, salt: ByteArray): ByteArray {
  val sha = MessageDigest.getInstance("SHA-256")
  sha.update(salt)
  sha.update(secret)
  sha.update(salt)
  return sha.digest().copyOfRange(0, 16)
}

val CLIENT_SALT = "KeySalt".toByteArray() + 1
val SERVER_SALT = "KeySalt".toByteArray() + 2

val clientKey = derive16(secret, CLIENT_SALT)
val serverKey = derive16(secret, SERVER_SALT)

Whoever you are decides which key you encrypt with and which you decrypt with. The client encrypts with the client key and decrypts with the server key. The server does the mirror image.

AES in CTR mode

The cipher is AES in CTR mode. CTR turns a block cipher into a stream cipher. The practical consequence is the one that matters for us. The output is exactly as long as the input. Encrypting 40 bytes gives you 40 bytes, not 48 rounded up to a block.

That is why the framing from Post 1 can be so simple. The length prefix is written after encryption, and because encryption does not change the length, nothing needs to be recomputed or padded. The number you read off the wire is the number of encrypted bytes.

The initialization vector is derived from the key with its own salt, the same hash sandwich method as before. Key in, run it through the hash with an IV salt, and you have a deterministic IV both sides can compute without sending it.

val cipher = Cipher.getInstance("AES/CTR/NoPadding")
val key = SecretKeySpec(seed, "AES")
val iv = IvParameterSpec(derive16(seed, "IVDERIV".toByteArray()))
cipher.init(mode, key, iv)

Integrity, and the round counter

Encryption keeps traffic private. It does not, on its own, stop someone from flipping bits or replaying an old frame. That is what the integrity check is for. Every frame after the handshake carries a keyed check appended to the end, and its size is the one the server picked back in ServerHello.

The interesting part is not the hash, it is the counter. Each direction keeps a number that starts at zero and goes up by one for every frame. That counter is mixed into the check. So even two identical frames produce two different checks, and a frame replayed out of order fails because the counter no longer lines up. You get ordering and replay resistance for free, without sending the counter anywhere. Both sides just count.

private var round = 0

fun stamp(payload: ByteArray): ByteArray {
  mac.update(payload)
  mac.update(intToBytes(round++))   // the counter goes into the mac, not the wire
  return mac.doFinal().copyOfRange(0, size)   // truncated to the negotiated size
}

Verification is the same computation on the other end. Recompute the check over the received bytes with the receiver’s own counter, and compare against the check that came in. If they differ, drop the connection. Because the counter is never transmitted, an attacker cannot fix it up.

Before the keys exist

There is a small chicken-and-egg problem. The handshake messages themselves travel before any keys exist, so they cannot use the check above. For that early window there is a lighter check in place, and in the reference it is effectively a placeholder until the real one is installed.

The handoff happens at the exact moment the handshake completes. Up to that point the connection uses the cheap check and no cipher. The instant both sides have derived their keys, the checksum and cipher for the session get swapped in, and every frame from then on is sealed with the real thing.

Compression

One more layer lives underneath, and it is the simplest of the lot. Compression is per-protocol, not always on. In the reference only the game protocol turns it on. And even then only messages from the server to the client have it. The login or chat protocol does not bother.

The layout is small. The opcode byte passes straight through untouched. Then a single flag byte says whether what follows is raw or compressed. Then the payload.

+--------+------+--------------------------+
| opcode | flag | payload (raw or deflated)|
+--------+------+--------------------------+
           0 = raw
           1 = deflated

There is a size threshold. If the payload is under X bytes, deflating it is not worth the CPU or the few bytes of overhead, so it is sent raw with the flag at zero. Only larger payloads get compressed.

val opcode = msg.readByte()
out.writeByte(opcode)             // opcode always passes through

if (payloadLen < THRESHOLD) {
  out.writeByte(0)                // raw
  out.writeBytes(payload)
} else {
  out.writeByte(1)               // deflated
  out.writeBytes(deflate(payload))
}

The deflate itself uses a streaming sync flush rather than a one-shot compress, with a small trailing marker, so a payload can be compressed as part of a running stream rather than as an isolated block. For our purposes the flag byte is the thing to remember. See a one, inflate. See a zero, read it straight.

Game vs OpenMMO. The 256-byte threshold was set arbitrarily but the Stream Flag has to be set properly.

One frame, all the layers

Let us trace a single outbound frame from the top down, so the order from Post 1 is concrete.

packet object
   -> serialize to  [ opcode | body ]
   -> compress      [ opcode | flag | payload ]      (game protocol only)
   -> encrypt       the bytes after the opcode become ciphertext
   -> checksum      append the keyed check (counter mixed in)
   -> frame         prepend the 2-byte little-endian length
   -> socket

Inbound is the same list read bottom to top. Strip the length, verify and strip the check, decrypt, inflate if the flag says so, and you are holding opcode + body again. That body is a packet, and reading a packet is the whole subject of the next post.

Next time

In Post 3 we finally open a packet and read it field by field. Here is the twist that surprised me when I first dug in. The real game has no clever serialization system at all. Each packet just has a method that writes its fields onto the buffer by hand, and another that reads them back. We will walk a real one, the login packet, byte by byte, and then look at how the whole login exchange fits together.

See you in the next one.

PokeMMO Network Protocol 2: Sealing the Frame