RTP
Overview
AirPlay 2 data packets are transmitted over RTP1 protocol. The RTP payload is encrypted with ChaCha20-Poly1305 AEAD algorithm.
Encryption
The audio frames encapsulated into the data RTP
packets are encrypted by the
sender using ChaCha20-Poly1305 AEAD algorithm as defined in RFC 75391.
The sender generates the key and shares it with the receiver using a SETUP
RTSP request containing the shk (shared key) value. Nonce, AAD and Tag
instead—used to decrypt and verify the RTP payload—are included in every RTP
data packet.
The first SETUP
request sent to the receiver contains the ekey, eiv
and et (encryption type) fields. Audio packets are most likely encrypted with AES-CBC when MFi or FairPlay are enabled. I didn't test this yet.
Packet header
The packet header adheres to the RFC 3550 standard with part of the fields reused as AAD for ChaCha20-Poly1305 AEAD decryption of the payload.
The header size is 12 bytes.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
---------------------------------------------------------------
0x0 | V |P|X| CC |M| PT | Sequence Number |
|---------------------------------------------------------------|
0x4 | Timestamp (AAD[0]) |
|---------------------------------------------------------------|
0x8 | SSRC (AAD[1]) |
|---------------------------------------------------------------|
0xc : :
Version (V) is 2 and the payload type (PT) is 96 (DynamicRTP-Type-96).
Timestamp and SSRC are used together as AAD for ChaCha20-Poly1305. The overall AAD size is 8 bytes.
Packet trailer
Each packet has a trailer appended after the encrypted audio frames. The trailer contains the nonce used for payload decryption and the tag used for ChaCha20-Poly1305 verification. Its size is of 24 bytes.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
: :
|---------------------------------------------------------------|
N-0x18 | |
|-- Nonce --|
N-0x14 | |
|---------------------------------------------------------------|
N-0x10 | |
|-- --|
N-0xc | |
|-- Tag --|
N-0x8 | |
|-- --|
N-0x4 | |
---------------------------------------------------------------
N
The nonce size is 8 bytes while the tag is 16 bytes.
-
ChaCha20 and Poly1305 for IETF Protocols - https://tools.ietf.org/html/rfc7539 ↩︎