Login

thumb5

This thread is to document findings and ideas about how the AIR streaming protocol works. It's based on examining network traffic between a Devialet and a host computer running the AIR software.

I started investigating this partly out of interest and partly to see whether I could work out what causes AIR to drop out occasionally (on my system). Please add your own ideas and findings, and correct anything you discover to be wrong. Hopefully between us we can build up a better picture of how AIR really works.

The protocol(s) used for discovery and remote control of the Devialet are not addressed here, and could be the topic of separate threads.

Disclaimer: I can't guarantee the accuracy of anything in this thread so use of the information is entirely at your own risk.

Initial data gathered using a 16-bit, 44.1 kHz stream. Host was a MacBook Pro running OS X 10.8.5, AIR 2.1.2, streaming to a Devialet 200 running firmware 7.1.1.

AIR uses the User Datagram Protocol (UDP) with unicast addressing.

The host transmits audio data from port 45456 to port 45455 on the Devialet. With 16/44 data, there's an outgoing packet about once every 4 ms. The payload length (excluding MAC, IP and UDP headers) is typically 962 bytes (giving a 1004-byte Ethernet packet). When the audio stream is idle (silent), the payload length drops to 166 bytes (208-byte Ethernet packet). This suggests AIR is implementing some form of compression at least in this simple case.

The Devialet transmits feedback/synchronisation information from port 45456 to host port 45456. With 16/44 data, there's an incoming packet (to the host) about once every 50 ms. The payload length seems to be a constant 41 bytes.

For a full 962-byte payload, the structure of the outgoing (host->Devialet) payload seems to be:

Filename: 16-44-441Hz-tone.png Size: 495.9 KB 07-Sep-2014, 14:23

bytes 0-21 (22 bytes): payload header

bytes 22-426 (405 bytes): channel 1 audio header/data

bytes 427-831 (405 bytes): channel 2 audio header/data

bytes 832-833 (2 bytes): 16-bit CRC or other checksum?

bytes 834-961 (128 bytes): zero (padding?)

Total: 962 bytes

When the host is transmitting "silence", the payload becomes:

bytes 0-21 (22 bytes): payload header

bytes 22-28 (7 bytes): channel 1 audio header/data (all zero)

bytes 29-35 (7 bytes): channel 2 audio header/data (all zero)

bytes 36-37 (2 bytes): 16-bit CRC or other checksum?

bytes 38-165 (128 bytes): zero (padding?)

Total: 166 bytes

The payload header seems to be essentially the same for the "silence" case as the normal case. Discussed later.

Don't know yet which channel appears first in the payload (left, at a guess). Should be an easy experiment to find out, though.

thumb5

The data representation for multi-byte quantities seems to be big-endian (i.e. network byte order). This surprises me somewhat since both the supported hosts (i.e. PC and Mac) are little-endian.

The 22-byte payload header for outgoing (host->Devialet) packets seems to be:

byte 0 (1 byte): 0x44 ('D' for Devialet?)
byte 1 (1 byte): 0x6d ('m' for music?)
byte 2 (1 byte): 0x02
bytes 3- 6 (4 bytes): stream ID? (pseudo-random value, fixed for a given AIR stream)
bytes 7-11 (5 bytes): 0x00 0x00 0x00 0x02 0x01
bytes 12-15 (4 bytes): possibly high-order 32 bits of sequence number?
bytes 16-19 (4 bytes): (low order 32 bits of) sequence number
bytes 20-21 (2 bytes): number of samples carried by this payload (e.g. 0x00c8 = 200)

The stream ID is presumably assigned when AIR opens the streaming session with the Devialet. It could be assigned by the Devialet or by AIR - maybe this is part of the discovery (rather than streaming) protocol. I can't see any pattern in how the stream ID is generated for successive streams, so I suspect it's a pseudo-random number.

The sequence number seems to be the number of the first audio sample carried within the payload. For example if each payload carries 200 samples the sequence number will be 0 for the first payload, 200 for the second, 400 for the third, etc.

At 44.1 kHz a 32-bit unsigned sequence number would wrap round after about 27 hours. It is possible (likely?) that the protocol could represent the sequence number as a 64-bit quantity to avoid wrap-round. I'm running an experiment at the moment to see what happens when I keep an AIR session open for longer than the expected wrap-round time so hopefully I can resolve that question soon.

The per-channel data in outgoing (host->Devialet) packets is harder to understand.

I'm assuming for the moment that the 962-byte payload carries exactly 200 samples. This seems to tally with what I see in the payload header, and is roughly consistent with the size of the per-channel payload if AIR represents each sample in 16 bits.

I've used a tone-generator program on my Mac to play a 441 Hz sinusoid through AIR, which resulted in the network capture posted above. At a sample rate of 44.1 kHz (= 100 x 441 Hz), with 200 samples per payload, you'd expect to see exactly two full cycles of the sine wave for each channel.

Based on a bit of guesswork, I can see a set of 199 16-bit values for each channel which are obviously derived from the original sinusoid. These start at byte offset 7 within the per-channel data. The plot shows these values against the corresponding byte offset within the overall payload (there's an offset of 1 I haven't got rid of):

Filename: 16-44-441Hz-tone-samples.png Size: 167.56 KB 07-Sep-2014, 15:02

This raises some obvious questions about how AIR is representing the audio data:

where's the first (or 200th) value?
why does it look as though the values plot a "half-wave rectified" sine wave?
why are the values covering the range (roughly) 1,000-33,000 given that I was playing a very low-volume tone?

I don't yet know the answers to those questions.

The "rectification" effect is repeatable using different tone generator applications.

It seems clear that AIR isn't just sending linear PCM data to the Devialet. (This is consistent with the fact that "silence" seems to be sent in a compressed form.)

I wonder whether AIR is encoding the 200 samples as a baseline value and a set of differences, or something similar? If so, it looks as though the differences may be normalised so that they occupy a full 16 bits. The information to re-build the original samples must then be encoded either in the payload header or the 7 bytes preceding the 398 bytes of sample data. This must include the baseline value and scaling factor, for example.

For my payload, these 7 bytes are:

0x04 0x03 0x00 0x03 0x03 0x10 0x10

These are the same for both channels.

I tried streaming a sinusoid from a different application, with a different volume setting, and found that the per-channel header changed to:

0xF3 0xE4 0x00 0x02 0x02 0x10 0x10

The two 0x10s might represent the fact that the values have 16 significant bits and/or are transferred in 16-bit units.

When the payload is representing silence, these 7 bytes become:

0x00 0x00 0x00 0x00 0x00 0x00 0x00

(which seems a reasonable way to represent silence in compressed form).

Incidentally, the Excel spreadsheet I used to plot the sample values is here, in case anyone would like to play around with it:

.xls

16-44-441Hz-tone-payload.xls (Size: 113 KB / Downloads: 1)

thumb5 · 07-Sep-2014, 15:48

The 41-byte payload for incoming (Devialet->host) packets seems to be:

byte 0 (1 byte): 0x44 ('D' for Devialet?)
byte 1 (1 byte): 0x66 ('f' for feedback?)
bytes 2-5 (4 bytes): stream ID (as in outgoing packets)
bytes 6-9 (4 bytes): 0x00 0x00 0x00 0x02
bytes 10-13 (4 bytes): possibly high-order 32 bits of sequence number?
bytes 14-17 (4 bytes): (low order 32 bits of) sequence number 1
bytes 18-21 (4 bytes): possibly high-order 32 bits of sequence number?
bytes 22-25 (4 bytes): (low order 32 bits of) sequence number 2
bytes 26-29 (4 bytes): possibly high-order 32 bits of sequence number?
bytes 30-33 (4 bytes): (low order 32 bits of) sequence number 3
bytes 34-38 (5 bytes): zero
bytes 39-40 (2 bytes): 16-bit CRC or other checksum?

The four-byte sequence 00 00 00 02 might be correlated with the four bytes with the same value at offset 7-10 in the outgoing packets?

The three sequence number values each increment in successive feedback frames.

By correlating the outgoing data frames and the feedback frames, it looks to me as though the first of the sequence number values is the sample number for the most recent sample the Devialet has received from the host. (That may not be from the previous outgoing data packet on the wire, of course.)

The second sequence number value seems always to be 199 below the first one. That suggests it's the first sequence number in the most recently received payload. If that were true it seems it's redundant to have both this and the first value, so I suspect there must be some reason why both are present (possibly detection of packet loss or out-of-order delivery?).

The third sequence number doesn't directly correlate with values seen in the outgoing packets. The difference between this value in consecutive feedback packets is always 2,205 (plus or minus one) in my tests - at a 44.1 kHz sample rate this corresponds to exactly 50 ms which suggests it's probably locked to the audio playback pipeline in the Devialet and provides a timebase for generating the data for these feedback frames. It looks like this is how AIR synchronises (in the long term) with differences between the Devialet's internally-derived sample rate and whatever generates the sample rate on the host.

This third value stays roughly in step with the first two values, but in my tests was consistently about 62,000-64,000 below them. At a 44.1 kHz sample rate that corresponds to a time delay of about 1.4 seconds. This might relate to the "target device buffer" depth which was set at 1000 ms in my tests.

I've attached a screen shot of the packet capture, although there's not much to see:

Filename: 16-44-441Hz-feedback.png Size: 260.76 KB 07-Sep-2014, 15:47

***Rufus McDufus*** · 07-Sep-2014, 16:15

This is really interesting stuff thumb5! Could the "silence" packet effectively be a "null" packet for a defined period of time (the packet length or some set variable within the header) as opposed to compressed? I wonder why they felt the need to treat perfect silence like this as it almost just seems to add complication, though reduces the amount of data transferred.

thumb5 · 07-Sep-2014, 16:27

As far as I can tell, the "silence" packet is sent continually at the same rate (i.e. roughly every 4 or 5 ms) as the normal audio packets. I assume that's because its payload header still says it's carrying 200 samples (200 / 44.1 = 4.54).

Yes, it does seem an added complication for little benefit.

***Rufus McDufus*** · 07-Sep-2014, 16:47

But the silence packet probably helps in figuring out how it all hangs together!

thumb5 · 08-Sep-2014, 18:09

I've just captured some packets after running AIR continuously for about 32 hours, which confirm that the sequence numbers in both the outgoing (music) and incoming (feedback) packets are indeed 64-bit values.

rik

My guess is that because we are talking UDP with the associated risk of packet loss there need to me a sliding window based feedback between Dev & AIR. This would be very similar to the method used for X-25 windows over lossy media & would need a counter which would be sent in both directions.

As an aside I'm currently investigating the port 45454 protocol which is a continuous stream of UDP packets use to represent the state (volume, source selected, eq etc) used by control points (& probably also the slave Dev in a mono system). I've some Python code which I'll put in GitHub as soon as I've finally cracked it.

PS: That use "Da" as it's magic number.

Rik

Mka · 09-Sep-2014, 15:35

Great start guys! I haven't played around with Netboz for a while but I will go into it later today.

Login
Username/Email:
Password:	Lost Password?
	Remember me