Transcoding for Sonos

Several years ago I had a Squeezebox music streamer. I liked it, but gave it up as it was a pain to keep running due to the requirements and foibles of the server software. After that I just used a generic media streamer for living room music playback, but the interface is awful as you have to slowly scroll through directories with the remote.

I knew Sonos existed, but wasn’t really tempted until I saw someone else’s setup with a Connect Amp (streamer and integrated amp) with Play 1 and Play 5 speakers. The Connect Amp sounded pretty good with bookshelf speakers and the Play speakers were as good as you could expect for the size. Two things I particularly liked were how easy it was to start playing music, a touch on the iPad interface and music comes out, and the multiroom effect. Having the music coming out of all speakers mentally frees you to move around the house without losing the thread, even if it means shifting to a smaller playback device.

I bought in with the Sonos Connect (non-Amp) into my existing AV amp. It loses the one touch to start aspect, but I wasn’t realistically going to give up all the other AV uses. It works fine, but I hit a few issues with how it handles files.

The Bad

No album title

When you select an artist it shows all of the albums, plus an option for ‘All tracks’. But there’s no option for “No album”. It’s quite common for files in my collection not to have an assigned album title, such as demos, session tracks or other odd songs. For example, I have almost 200 Belle and Sebastian tracks, of which 13 have no album title. The smallest view that contains those 13 is the ‘All tracks’ option, making them hard to locate.

High quality FLAC

Sonos doesn’t support any FLAC files richer than 16bit / 48KHz. It’s not just that it will transparently convert larger formats down, it just flat refuses to play them. The quality loss from being limited to 16/48 is an issue, but it’s really significant for most people with most equipment, and certainly not if you’re using the Play speakers. The bigger aspect is really one of convenience as it means you can’t just download a FLAC to your network share and have Sonos play it, instead you have to perform some kind of conversion.

Leaving aside specialist shops like Linn and FDRHD, a lot of FLAC downloads on Bandcamp are now 24-bit[1], so this is something that will affect a lot of people.

Multiple tag values

Sonos doesn’t support multiple values for tags either in MP3 or FLAC[2]. To give a usage example, track 5 on Write about love features Norah Jones on vocals so I store this file with both Belle and Sebastian and Norah Jones as artists. Ideally, I should be able to find this file under either artist but with Sonos this isn’t possible.

With multiple tag blocks in the file, Sonos will just take the last one and with a single slash delimited entry the track is assigned to “Belle and Sebastian/Norah Jones”. Either way you end up with a gap in the album as track five appears to be by a different artist.

This limitation also applies to genres as well, so each track can only be in one genre.

Album/Disc number

When browsing by album Sonos doesn’t respect the album number as a distinguishing feature. So an album that came on two CDs will appear as a single entry with two track ones, two track twos etc. As well as looking stupid this breaks the running order.

A problem pre-solved

If you can stand the smug self-promotion, this is a good use-case for my BunnyMusicFile Perl module with musview.

In brief, the benefit of using BunnyMusicFile is the creation of ‘Views’ which serve as a virtual presentation of your music collection. For example, you could create a View for portable devices where every file is in MP3 format. Where the original file is in MP3 the View entry is a link to the original, and for other formats BunnyMusicFile will transcode. The config supports lots of different options to decide which files to include into a View, and how to process them.

I’m going to walk through how I configured BunnyMusicFile for Sonos consumption.

Take a View

The address the above issues I started working on a Sonos View. This View will use the original file if:

  • The format is MP3, MP4 or FLAC
  • For FLAC the file is no more than 16-bit, 48 KHz, 2 channels
  • The file only has a single album artist and track artist
  • The file has an album title set
  • The file is not part of a multi-album set
  • The file is part of an album (as opposed to single or EP)

The Sonos View consists of a number of Sources, where each Source describes where a file can come from:

<View sonos>
  base  = %VAR/viewDir%/sonos
  path  = default
  
  <Source>
  <Source>
  <Source>
  
  <AfterChange>
    cmd=/usr/bin/env RSYNC_PASSWORD="xxx" /usr/local/bin/rsync --delete -av /media/music/view/sonos/ rsync://rsync@nas/Multimedia/music/.
  </AfterChange>

</View>

The AfterChange command is run every time the View is updated. In this case I rsync my local copy of the View to the QNAP NAS box which supplies the Sonos.

The Source which accepts the original FLAC files looks like this:

# Use original FLAC files if in spec
<Source>
  name      = flac
  type      = format
 
  <Filter>
    input = %AUDIO/channels%
    test  = maximum
    value = 2
  </Filter>
    
  <Filter>
    input = %AUDIO/frequency%
    test  = maximum
    value = 48000
  </Filter>

  <Filter>
    input = %AUDIO/depth%
    test  = maximum
    value = 16
  </Filter>

  <Filter>
    input = %TAG/albumTitle/count%
    test  = minimum
    value = 1
  </Filter>

  <Filter>
    input = %TAG/albumArtist/count%
    test  = maximum
    value = 1
  </Filter>

  <Filter>
    input = %TAG/trackArtist/count%
    test  = maximum
    value = 1
  </Filter>

  <Filter>
    input = %TAG/albumTotal%
    value = 1
    empty = accept
  </Filter>

  <Filter>
    input = %TAG/albumClass%
    value = album
    empty = accept
  </Filter>
</Source>  

The entries for MP3 & MP4 are similar, but without the filters to only accept standard resolution.

If the file doesn’t match those criteria musview will create a new version. The new file will be in the same basic format[3], but with extra processing. For MP3 and MP4 we define Sources which map to a Class (a description of how to encode):

# Use repackaged mp3 files
<Source>
  name      = mp3-simple
  type      = class

  <Filter>
    input = %AUDIO/encoding%
    value = mp3
  </Filter>
</Source>

The repacking FLAC Source is a bit different. For this one I don’t have any restriction on what the input format is so any file which isn’t in MP3/MP4/FLAC will be transcoded into FLAC.

# Fallback to Sonos compatible FLAC
<Source>
  name = flac-simple
  type = class
</Source>

Showing some Class

Each Class definition describes how to encode and what processing to perform. The FLAC version is the most complex due to the downsampling requirement. The general structure is:

# Sonos compatible FLAC
<Class flac-simple>
  format      = flac

  <Encode>
    cmd=%VAR/flac% -s --best --replay-gain -o "%ENC%" "%WAV%"
  </Encode>

  <Process>
  <Process>
  <Process>
  
  <Tag>
  <Tag>
  <Tag>
</Class>

The Process sections describe how to process the audio before encoding. Here we have a number of actions which trigger on certain criteria:

# Collapse to 2 channels
<Process>
  cmd=%VAR/sox% -G "%IN%" "%OUT%" channels 2 dither -S

  <Filter>
    input = %AUDIO/channels%
    test  = greater
    value = 2
  </Filter>
</Process>
  
  
# Collapse to 16-bit channels
<Process>
  cmd=%VAR/sox% -G "%IN%" -b16 "%OUT%" dither -S

  <Filter>
    input = %AUDIO/depth%
     test  = greater
    value = 16
  </Filter>
</Process>

# Downsample to 16/44.1 for stereo CD ratio files
<Process>
  cmd=%VAR/sox% -G "%IN%" "%OUT%" rate 44100 dither -S
    
  <Filter>
    input = %AUDIO/frequency%
    value = 88200
    value = 176400
  </Filter>
</Process>

# Downsample to 16/48 for DAT ratio files
<Process>
  cmd=%VAR/sox% -G "%IN%" "%OUT%" rate 48000 dither -S
  
  <Filter>
    input = %AUDIO/frequency%
    value = 96000
    value = 192000
  </Filter>
</Process> 

You can reduce the number of calls to sox by combining some of these actions, but I find it more readable this way.

The Tag processing affects the metadata tags. The first thing I do is add a default album title of “(No Album)” if no album title is set. This way there is an entry in the Sonos UI for all of those homeless tracks.

# Add a default albumTitle tag
<Tag>
  name    = albumTitle
  default = "(No Album)"
</Tag>

Next is a helper to deal with multiple artists on a song. A simple, clean, way of handling this is to only keep the first artist. This is fine in terms of stopping albums being broken up, but it loses information. Instead, where there are multiple artists I include the additional artists in the track title. This rule adds “feat. one, two, three” onto the title.

# Put 2nd artists in title
<Tag>
  name  = trackTitle

  <Filter>
    input = %TAG/trackArtist/count%
    test  = minimum
    value = 2
  </Filter>

  <Text>
    <Regex>
      pattern=^(.*)$
      value=$1 feat. %TAG/trackArtist/join(", ", 1-3)%
    </Regex>
  </Text>
</Tag>

Now I get rid of any other multiple tag values:

# Collapse multiple tags
<Tag>
  <Text>
    multiple = false
    
    <Merge>
      method = first
    </Merge>
  </Text>
</Tag>

ID3/FLAC meta data supports the idea of album class (album, single, EP) etc. I like to see the album class in the title as it helps to avoid ambiguity in general, and in particular with a single and album have the same name. Since Sonos doesn’t display album class anywhere I add that to the album title. But I don’t want to do this when the class is album, to avoid the visual clutter. So first I remove albumClass where the value is “album”:

# Remove album class when class is actually Album
<Tag>
  name    = albumClass
  remove  = true

  <Filter>
    input = %TAG/albumClass%
    value = album
  </Filter>
</Tag>

Next I deal with the fact that I haven’t always been consistent with my album class capitalisation:

# Make class names consistent
<Tag>
  name    = albumClass

  <Text>
    <Regex>
      pattern = ^single$
      value   = Single
    </Regex>
  
    <Regex>
      pattern = ^ep$
      value   = EP
    </Regex>
  </Text>
</Tag>

The album class is now ready to be added to the album title. But there’s another piece of information I want to add and that’s the disc number. This is how I prevent Sonos from merging the track listings of multi CD sets. Again, I don’t want to see this for single disc albums so I remove it when the disc total is 1.

# Remove album number and total when total is 1
<Tag>
  name    = albumNumber
  name    = albumTotal
  remove  = true

  <Filter>
    input = %TAG/albumTotal%
    empty = accept
    value = 1
  </Filter>
</Tag>

Finally I can update the album title. Here I make use of the fact that the regular expression processing element allows multiple possible values, with the code taking the first one it is able to fully resolve in terms of variables.

# Put album class & number in title
<Tag>
  name  = albumTitle

  <Text>
    <Regex>
      pattern=^(.*)$
      value=$1 (%TAG/albumClass%, %TAG/albumNumber%)
      value=$1 (%TAG/albumNumber%)
      value=$1 (%TAG/albumClass%)
      value=$1
    </Regex>
  </Text>
</Tag>

And that’s it.

Evaluation

The Sonos itself is ‘OK’. It does perform the basic task of letting me control music from the sofa with a reasonable tablet interface. The setup was very easy and I’m sure the multiroom parts (when I get there) will be useful. But it also seems to be pitched at a slightly lower level of user than I’d like. That is, it’s fine for just playing compressed tracks from Amazon/iTunes, but it’s not really good enough for very serious audio nerds, or people who take meta-data seriously. The Sonos View I created solves some of the interface problems, but there’s still other issues like the lack of artist artwork.

In terms of my solution, I have mixed opinions. To do everything I wanted for Sonos I was forced to revisit the BunnyMusicFile code. I originally wrote BunnyMusicFile around four years ago, but hadn’t touched it for over three. I’m not happy with some of the architecture decisions made and I contemplated just scrapping it and starting again. I’m glad I didn’t as when I got back into it I was reminded how much was really in there.

Even though I hadn’t changed the code for years I have been using it regularly all this time and it is useful. I really do like the Views concept. What I’m wondering is if the big, complex config file model is really the right one. As config file formats get more and more complex, they become programming languages, and in particular they become bad programming languages. I hit a similar issue before when I was looking at static site generation with my attempt at defining a meta-language for the task.

As well as forcing me to design a way of representing concepts like string joins in something that is nominally ‘config’, it all makes it rather slow as the module is effectively running an interpreted language. To do a complete scan, presuming no changes required, on my music collection takes over five minutes at an average of ~30ms a file. This is running on a Xeon 3210 with SSDs.

Perhaps a better way is to provide a framework for managing Views, in terms of file lifetime etc., but leaving all of the decision logic in code. For example, the Sonos FLAC ‘Class’ could, partly, look like:

my $sonosFlac = sub
{
  my $obj = shift;

  if ($obj->frequency() == 96000 || $obj->frequency() == 172000)
  {
    $obj->resample(48000);
  }

  if ($obj->tag("albumArtist")->isEmpty())
  {
    $obj->tagSet("albumArtist", "(No Artist");
  }
}

A project for another time, perhaps.

  1. My suspicion is that this is often not a deliberate decision to go high-res by the artist but just a side effect of Bandcamp insisting on PCM uploads, with the artists just uploading whatever is output by their DAW. If they were editing in 24-bit then the upload keeps that. In my collection I even cases where an album had different bit depths track to track. 24-bit files were more common than high sample rate ones. I only have two whole albums at 24-bit and 88.2/96KHz [Back]
  2. And probably MP4 but I didn’t test this [Back]
  3. MP3 and MP4 files are not re-encoded, the copy just has different tag data [Back]