$ cat pokemmo-network-protocol-1.md
PokeMMO Network Protocol 1: Getting Connected
Intro
A while back I wrote two posts about snooping on PokeMMO’s packets. We hooked the send method, then the receive method, and watched packet objects fly past. But it left the real question open: What actually goes over the wire, and how is it built.
This series answers that, from the raw TCP bytes up to the packets themselves. I have been rebuilding the server side from scratch in a project called OpenMMO, which gives us clean, renamed, non-obfuscated code to read instead of looking at decompiler output all day.
One warning up front. OpenMMO is a reimplementation, not a copy. Where it matches the real client I will say so, and where it makes its own choice I will flag it with a short Game vs OpenMMO note so you always know which one you are looking at.
Same disclaimers as always. Obfuscated names in the real client change on every
update, so anything short like x40 is a snapshot and will be different by the
time you read this. Running a private server may be against the PokeMMO ToS. This
is for learning.
This first post is the foundation. One connection, the layers a byte passes through, the handshake that sets up trust, and finally how you point the real client at a server of your own.
One connection, one frame
PokeMMO talks to each server over a single TCP connection. TCP gives you a stream of bytes with no message boundaries, so the very first job is to cut that stream back into messages. PokeMMO does the simplest thing that works. Every message is prefixed with its length.
The prefix is a 2-byte little-endian length. Read two bytes, that is how many bytes the rest of the message is, wait until you have that many, then hand the whole thing up. This is the exact same value the receive loop in Snooper 2 kept peeking at before it read a packet.
+---------------+---------------------+
| len (2 bytes) | body (len bytes) |
+---------------+---------------------+
The reference decoder in OpenMMO is just Netty’s length-field decoder wired for a
2-byte little-endian prefix, with a maximum frame of 0xFFFF. Because the prefix
is two bytes, a single frame can never be larger than 65535 bytes. Anything
bigger has to be split up before it is sent, which matters later when we get to
big game packets.
Game vs OpenMMO. The real client hand-rolls this on a Java NIO
ByteBuffer, reading the short and slicing the buffer itself. OpenMMO uses a named Netty handler for the same job. Same two bytes on the wire.
The pipeline
Framing is only the first of several layers. A byte that arrives on the socket passes through a small stack of transforms before it becomes a packet you can act on, and a byte you send passes through the same stack in reverse.
inbound (bytes off the socket)
|
v
[ frame ] strip the 2-byte length, hand up one whole frame
[ checksum ] verify and strip the trailing integrity check
[ cipher ] decrypt the payload
[ compression ] optional, inflate the payload if the flag says so
[ protocol ] turn opcode + bytes into a typed packet
outbound is the same list, bottom to top
Each layer has one job and does not care about the others. The cipher does not know what a packet is, it just turns bytes into other bytes. The framing does not know anything is encrypted. This separation is the whole reason the protocol is tractable to reverse. You can understand one layer at a time, which is exactly how the rest of this series is organized. Post 2 is the checksum, cipher, and compression layers. Post 3 is the protocol layer on top.
For now the only thing worth remembering is the order. Framing on the outside, then integrity, then encryption, then optional compression, then the packet itself on the inside.
Opcodes and direction
Once a frame is decrypted and, if needed, decompressed, what is left is a packet. Every packet starts with a one-byte opcode that says what it is. A login response is one opcode, a chat message is another.
There is a subtlety worth calling out early. An opcode is only meaningful together with a direction. The same opcode number can mean one thing when the client sends it and something completely different when the server sends it. So the real identity of a packet is the pair, opcode plus direction, not the opcode alone. Keep that in mind, it saves a lot of confusion when you start reading a capture and the numbers seem to collide.
The handshake
A fresh connection starts here. Before any real traffic flows, the client and server run a short handshake that agrees on encryption keys and, just as importantly, lets the client check that it is talking to the real server.
client server
| |
| ClientHello (timestamp) |
| -----------------------------> |
| |
| ServerHello (key, signature) |
| <----------------------------- |
| |
| ClientReady (key) |
| -----------------------------> |
| |
| ... now everything is sealed |
ClientHello is tiny. It carries a timestamp, but it does not send it in the clear. The timestamp is masked with a random number and two fixed keys baked into the client, so the same clock value never looks the same twice on the wire.
// two fixed keys compiled into the client
private const val KEY_RANDOM = 3214621489648854472L
private const val KEY_TIMESTAMP = -4214651440992349575L
fun writeClientHello(buf: WriteBuffer, timestamp: Long) {
val random = Random.nextLong()
// first value: a fresh random, masked
buf.writeLongLE(random xor KEY_RANDOM)
// second value: the timestamp, masked by both a key and that same random
buf.writeLongLE(timestamp xor KEY_TIMESTAMP xor random)
}
The receiver undoes it in the obvious order. Recover the random from the first value, then use it to recover the timestamp from the second. It is not encryption, it is obfuscation, and the point is only to make the opening bytes look like noise.
ServerHello is where the real work happens. The server sends three things, an ephemeral (per connection) public key for the key exchange, a signature over that key, and the size of the integrity check it wants to use for the rest of the session. Both the key and the signature are length-prefixed, so the layout is a length, then the bytes, then the next length, then its bytes, then a single byte for the checksum size.
ClientReady closes it out. The client sends its own ephemeral public key. Now both sides have each other’s public key and can compute a shared secret. We save exactly how that secret becomes keys for Post 2.
Trust, and why the signature matters
The interesting part of ServerHello is not the key, it is the signature. Anyone can generate a key pair and claim to be a server. The signature is what stops this.
The client ships with one known public key hardcoded in, a root key. When ServerHello arrives, the client verifies the servers ephemeral key against that root key. If the signature can’t be verified, the connection dies right there. In plain terms, the real server proves its identity by signing its session key with a key only it holds, and the client refuses to talk to anyone who cannot.
// pinned at build time, the client trusts exactly this key
val rootPublic: ECPublicKey = loadPinnedRootKey()
on<ServerHello> { hello ->
if (!verify(rootPublic, hello.ephemeralPublicBytes, hello.signature)) {
throw InvalidServerSignatureException() // hang up
}
// ... proceed to derive keys
}
This is also the single biggest obstacle if you want the client to talk to a server that is not the official one. Which brings us to the fun part.
Game vs OpenMMO. The pinning scheme, curve, and signature algorithm are functionally identical on both sides. They have to be. The client verifies the signature before anything else, so if OpenMMO got the crypto wrong the client would refuse to connect at all. It connects, so it matches.
Pointing the client at your own server
Say you have your own server running and you want the real client to connect to it. Two things are in your way. The client connects to a hardcoded host, and it trusts a hardcoded root key. Both of those live as constants inside the compiled jar.
You do not need the source to change a constant in a jar. You can rewrite it in the bytecode. Java class files store string constants in two obvious places, static field values and the constant-load instructions inside methods. If you visit every class and swap the strings you care about, you can repoint the host and replace the pinned key without touching a single line of source.
Here is the core of the reference patcher. It is an ASM class visitor that watches for string constants and replaces the ones that match.
class StringPatcher(next: ClassVisitor, private val patches: List<Patch>) :
ClassVisitor(Opcodes.ASM9, next) {
data class Patch(val name: String, val original: String, val replacement: String)
private fun patch(value: String): String =
patches.firstOrNull { it.original == value }?.replacement ?: value
// static final String FOO = "..." lives as a field value
override fun visitField(a: Int, n: String, d: String, s: String?, v: Any?) =
super.visitField(a, n, d, s, if (v is String) patch(v) else v)
// "..." used inside a method is a load-constant (LDC) instruction
override fun visitMethod(a: Int, n: String, d: String, s: String?, e: Array<out String>?) =
object : MethodVisitor(Opcodes.ASM9, super.visitMethod(a, n, d, s, e)) {
override fun visitLdcInsn(v: Any?) =
super.visitLdcInsn(if (v is String) patch(v) else v)
}
}
Feed it a list of Patch(name, original, replacement) entries, run every class in
the jar through it, and out comes a client that connects where you tell it and
trusts the key you give it. That is the whole trick. Find the constant, rewrite
the constant.
And now the obvious question. You have a client pointed at your own server, the handshake completes, and encrypted frames start flowing. What is actually inside them.
Next time
That is exactly where Post 2 goes. We take the shared secret from the handshake and turn it into real keys, then walk the two layers that wrap every frame after the handshake, the encryption and the integrity check, plus the compression that rides underneath. After that, in Post 3, we finally analyse a real packet and read it field by field.
See ya