Studio Structures for Surround Broadcasting

There is growing interest among broadcasters to deliver a surround listening experience to their audiences. Surround is clearly the hot topic at audio, consumer electronics, and computer shops.

Visit any of these and you will see plenty of surround audio set-ups. Indeed, it would appear to a casual visitor that stereo has become nearly obsolete. The systems on display are home systems, mostly to be used for “home theater” listening to accompany surround DVD-Video disks, which are near-universally produced with a digital surround audio track. But many people have discovered surround as an impressive way to enhance general music listening as well. Two audio disk formats offer an audiophile grade carrier to provide surround audio to consumers: DVDAudio and Super Audio CD (SACD). In the shops, you will find DVD players costing around $140 that can play all three formats.

An interesting recent development is SACD “double layer” technology that allows disks to be produced that can be played on both traditional CD players and the new SACD players. Such disks usually include both stereo and surround SACD tracks. A number of disks are now in shops that use this technique to give consumers the choice of three formats on one universal disk. The popular 30th anniversary edition of Pink Floyd’s Dark Side of the Moon has been released this way. This gets around one of the big factors holding back surround audio - that shops would have to stock two versions of each disk. With this innovation, it is certain that surround disks will proliferate and be routinely available to a wide audience in typical record shops everywhere.

The car is an obvious next step for surround listening. Most already have four or more speakers and associated amplifiers. The “wife-factor” opposition to multiple speakers is absent. Designers of car systems have an advantage in that they know the environment in which their systems will be operating and they know precisely where the listener’s ears will be positioned, so they can tune their products to optimize the result in a way that livingroom system designers cannot. A number of high-end car models already feature surround audio capability, among them the Acura TL and some BMW models.

Portable MPEG MP3 and AAC players (Apple’s iPods) are tremendously popular and are set to take on a role beyond playing isolated music tracks. “Podcasting” is the word now given to the automated downloading of topical audio programs from the Internet to portable players. And surround is coming to them soon. MP3 Surround was introduced a few months ago and was quite the buzz at the January Consumer Electronics Show. Because it allows stereo-compatible surround with only a few more bits than stereo alone, it is almost sure to catch on. On the subject of iPods, how are we going to listen with headphones? The answer is a new technology that presents surround on stereo headphones. While this may seem implausible, a listen will convince you! The first Studio Structures for Surround Broadcasting Steve Church, Telos Systems Michael Dosch, Axia Audio Cleveland, Ohio USA Universal disks are soon to be common. This one works with SACD surround, SACD stereo, and CD players.time I (Steve) heard it, I was sure the guy demonstrating had forgotten to switch off the loudspeakers in the room - I was sure there was something coming from behind! Removing the headphones, it was clear that the effect was from the surround processing. This works on the principle that we do have, after all, only two ears, and the impression of ambience comes from the acoustic filtering provided by the physical shape of the human ear, which is decoded by the brain into perception of direction. An electronic filter modeling the transfer function of the ear from various directions is able to fool the brain into thinking the sound is actually coming from those directions. This is surprisingly effective and has to be heard to be believed.

PCs are becoming increasingly sophisticated with regard to their audio capabilities, as a visit to any computer shop will attest. Surround outputs are ubiquitous on modern soundcards and motherboards with integrated audio. Shelves are groaning with surround satellite+subwoofer speaker set-ups. Games are routinely produced with surround music and effects. DVD drives play films with surround audio. High-end soundcards from Creative Labs even come with DVD-Audio playback software. Broadcast, cable, and satellite digital TV in the USA offer the surround experience to viewers, using the same Dolby Digital system that is one of the audio options for DVD-Video disks.


Why 5.1?

In all cases, surround is being delivered to consumers digitally in the so-called 5.1 format, providing six discrete digital channels: Left Front, Right Front, Left Surround, Right Surround, Center, and Subwoofer.

Our current modern systems offer a tremendous jump in quality over the early quadraphonic attempts to woo consumers. These were “matrix” systems that combined four tracks into two using phase-shifting techniques. They had the advantage that existing stereo vinyl records and FM transmission could be used to convey the audio to consumers. They were even compatible! One record, one broadcast could serve both quad and stereo listeners. This was all certainly convenient to broadcasters: just playing a surround-encoded record in the same way and with the same equipment that you would play a stereo one made you a surround broadcaster. Alas, compatibility was achieved only in the imaginings of the record company PR departments. The reality was quite something else, as the systems were heard and their faults became apparent: the vast majority of people still listening in stereo received music that sounded quite different from what they were used to hearing, with distinctively strange placement, reverb, and cancellation effects. And the separation of the surround channels proved disappointing, dipping to as low as 3dB between some of the channels. (Which channels depended on which system; each made different compromises.) Manufacturers introduced “logic” enhancements to try to improve apparent separation, but these work best when there is simple audio coming from one principal direction and tend to fall apart with more complex material. One worst case is separate talkers originating in each of the channels, which cause funny steering effects, resulting in parts of the speech coming from the wrong place. Another is when there is centered speech or singing, which can modulate the width of the accompanying music. As they deserved, these matrix approaches to delivering surround music were abandoned after only a couple of years. It is clear in retrospect that the valiant efforts to marry vinyl and analog with multi-channel were premature. We needed a digital carrier to make surround a practical reality.

The 5.1 channels idea was the first surround method conceived in the digital era. Work on it was begun in 1987 by the Society of Motion Picture and Television Engineers when it looked to be possible to digitally encode audio for film releases. SMPTE decided that 5.1 channels were satisfactory to create the aural sensations film producers desired. The name was proposed by film sound innovator Tomlinson Holman to initial confusion but eventual acceptance. The “point one” channel is the subwoofer, with the decimal value suggesting the limited frequency response of the channel. However, all six channels are actually stored with full-fidelity bandwidth. But the subwoofer channel is played-back 10dB louder than the others, effectively offering producers 10dB more volume headroom for “Low Frequency Enhancement” as it is called in the film world.

Prior to the establishment of 5.1 as a name and standard, a few surround films had been released on 70mm that had the capability for six audio channels in the longstanding analog optical format. Star Wars was the first, followed by Close EncountersSuperman, and Apocalypse Now. These were all big hits and all used essentially the 5.1 L-C-R-LS-RS-Sub arrangement, so there had been successful real-world experience with the format.

There is limited physical space on film for the marks needed to encode digital values and at first it seemed there would not be enough space to hold six channels. But a new development came along just in time: audio “coding” or compression. With the possibility to reduce the bitrate by a factor of up to 10 over simple PCM, multichannel digital for film became a practical reality. Dolby Digital, DTS, and Sony SDDS were invented to exploit the opportunity and remain in widespread use today.


Hey, where’s Radio?

By now you have probably noticed what’s missing from this picture: Radio. With the exception of a few stations playing quad records back in the 70s and a very few recent experiments with matrix systems, radio broadcasting has been absent from participation in the advance of surround listening – certainly odd, given that radio’s business is uniquely dependent upon creating a satisfying aural experience. While it always been possible to transmit matrix surround over traditional FM, the introduction of iBiquity HD Radio in the USA offers the opportunity to give listeners digital surround with quality commensurate with their current and soon-to-come experiences with movie theaters, DVD film, TV broadcast, surround music disks, computer audio, and portable players. European DAB offers a similar opportunity for surround upgrade, and first steps are underway to enhance this radio service.

The iBiquity HD Radio service has a bitrate of 100kbps, of which 96kbps is used for audio. Only a couple of years ago, this would have been thought to be too little for high-fidelity stereo. Surround at this rate was but a dream. As with the introduction of digital surround to film, enabled by the just-in-time availability of audio compression technology, another bit of magic seems to have appeared just when needed for surround radio broadcasting: surround cue coding technology. This amazing development allows a stereo signal to be expanded to surround with an additional 16kbps added to the basic stereo rate. Meanwhile the Spectral Band Replication (SBR) enhancement to the HD stereo codec means that it is possible to operate it at lower rates. The combination of 80kbps for the base stereo and 16kbps is just what we need for HD Radio’s 96kbps rate. iBiquity has a proposal before the FCC to increase the HD bitrate to 150kbps, which would allow either a quality improvement or additional programs to be broadcast. These latter could be realtime or for download to local storage for later playback, as is contemplated in NPR’s Tomorrow Radio project. While discussing FM band digital capacity, it should also be mentioned that there is a technology on the horizon that would provide yet an additional 64kbps in the current SCA spectral area.

The work to create the efficient surround codec began more than a decade ago at Bell Labs. It was originally proposed as a way to get highly-efficient stereo coding from a base mono audio signal. The research continued as parts of Bell Labs became Lucent and later Agere Systems, joined by the public Fraunhofer IIS laboratory in Germany. Fraunhofer is the inventor of MP3 and MPEG AAC, so their interest naturally turned to enhancing MP3 and AAC. To that end, they have already introduced MP3 Surround using the technology. Similar work was later started at Philips in the Netherlands, which has been taken up by Coding Technologies, the Swedish firm that invented the “plus” enhancements to MP3 and AAC and developed the current HD Radio codec. Both were submitted to MPEG for evaluation, with the result that each was found to have specific advantages. The organizations have agreed to a module-by-module merge in order to take the best parts from each to incorporate into the MPEG efficient surround standard, which is expected eventually to become a part of the AAC audio codec family. But it can be used in combination with any stereo codec at the core, including the HD codec.

Remember that the quad broadcasting experiments came to an end primarily because it was not possible to achieve acceptable compatibility in stereo. Fortunately, this is guaranteed not to be a problem with the modern MPEG approach because the stereo signal is taken directly from the original source and sent on to the listener without modification. Unlike with the matrix systems, there is no requirement for the original 5.1 source to be downmixed to create the stereo broadcast. Instead the system takes both the stereo and the 5.1 signals from the source, such as a DVD-A or SACD disk, and uses the 5.1 to create the 16kbps surround-coded stream. This stream contains the information necessary to expand the stereo to surround at the decoder. So we have a fortunate match of the coder’s characteristics to the application. If for some reason we have only the 5.1 source, it would be possible to downmix it automatically to create the stereo signal, but if we have a handcrafted stereo mix available, as we usually do, we are able to use it. Advances in digital transmission and codec technology let us get that so often sought and so rarely achieved “something for nothing.” Station owners bought an old-fashioned analog stereo license and now find themselves with the potential to offer a state-of-the-art digital surround service just when they need it to compete. Not bad!

The one price to be paid is the need to upgrade studio facilities to surround. Specifically, we need to store, network, and mix in the 2 + 5.1 format. This is the main objection that proponents of matrix systems proffer – that their systems can be used with existing stereo facilities. So we need to examine what cost is associated with a surround upgrade. Sure, it requires investment, but not nearly as much had yet another technology advance not arrived right on time: computer networking as a more capable, lower-cost substitute for obsolete analog and 20- year-old first-generation digital technologies. Together with the ever lower cost and ever-larger capacity of hard disks for storage, and ever-faster and cheaper CPUs for mixing, editing, and processing, it is possible to build a modern surround facility for around half the cost of a stereo facility made the traditional way. For stations with older facilities that need to be upgraded anyway, we’re back to something for nothing.

Some thought has been given to the idea of pre-encoding the surround and storing the 16kbps stream along with the stereo audio on delivery systems. This would avoid the need for a 2 + 5.1 facility, but would require a special mixing console that could take the encoded surround streams from various mixed sources and combine them with appropriate algorithms to make the stream that would get transmitted. You would save hard drive space, but the complexity in the mixing console, and difficulties with sync and monitoring almost certainly outweigh any advantage from this approach.


Build a Modern Surround Radio Studio Facility

So let’s walk through how one would build a modern 2 + 5.1 plant, with a careful eye to cost. We start with the routing and distribution infrastructure, then the PCbased delivery system, move to the mixing console, then on to the surround encoding, dynamics processing, STL, and transmitter. Then we’ll discuss monitoring and the production studio’s needs. All will be in the context of using computers and computer networking to provide the functions we used to get from the old proprietary radio station machinery.

Think of the old-fashioned automation systems with their reels of tape, mechanical cart carousels, pegboards for programming, relay switching, etc. Contrast this to a desktop PC with broadcast software doing the same thing. The latter is less troublesome, cheaper (by far), has commodity parts that can be replaced from local shops, enables new operating paradigms like remote voicetracking, etc. These are all bene?ts of using a ubiquitous platform that is produced in very high volume. But why limit this benefit only to playback? Why not take this idea and extend it further to modernize and improve the other aspects of a station facility?


The Routing and Distribution Infrastructure

Here we are talking about the glue that binds the studios together. In smaller stations, this may only be a couple of distribution amps that provide a couple of rooms with network feeds. But larger facilities, including the common consolidated ones in the USA, usually need to have flexible audio routing so that sources originating at any point within the plant may be consumed anywhere else. There is also the need to distribute a multitude of network feeds to a number of studios, switch various studios to transmitters and other outgoing lines, etc. Thus in the past years, we have seen the increasing application of facility-wide audio routers such as have been common in TV facilities for some time. These are proprietary boxes filled with cards that communicate via a backplane and offer various kinds of input/output. They look very much like the telephone PBXs that have been in use over the past decades and share many characteristics. These are manufactured in low volume for our very small industry, and are consequently expensive. Each input or output requires a port on a card which needs physical space, conversion chips, etc. An 8-channel input such as we need for surround would require 8 individual XLRs for analog or 4 for AES3 connections. Same for a surround output.

We propose a system that uses an Ethernet switch as an audio router. When analog or AES3 inputs and outputs are needed, these are converted in “nodes” to Ethernet. But this system requires many fewer of these because most devices communicate directly via a single Ethernet RJ-45.

An Ethernet 100BaseT link has 100Mbps capacity, enough to transport 25 uncompressed stereo signals or 3 8-channel surround signals. And these are bi-directional. One RJ-45 thus substitutes for as many as 100 XLRs!


Delivery System

Most stations are using PC-based delivery systems to play music, promos, commercials, etc. With today’s low-cost, high-capacity hard drives, there is no significant barrier to storing the required 8 channels optimally A low-cost Ethernet RJ-45 handles discrete bidirectional surround connections. Surround Radio Station: Functional Perspective needed for MPEG surround. A 200 Gigabyte drive costs around $150 and can store 1200 surround songs with no compression.

With an Ethernet infrastructure, there is no need for soundcards and their associated connectors. There is also no need for router or console inputs and/or outputs at the other end. Driver software passes the audio to and from the audio playback application and the Ethernet. Physical connection is via a single RJ-45.

Modern Ethernet switches support “Quality of Service” prioritization, so that general data may share the same link as audio. That means that you can use the same network for both audio playback and for other applications like file downloads from a server.

We propose to store audio in eight 24-bit integer packed PCM (uncompressed) channels in standard Windows interleaved wav format.

This layout is standardized within the ITU and SMPTE for interchange of program content accompanying a picture and is widely used with TV digital tape recorders. The Music Producer’s Guild of America has also endorsed it. For Windows PCs, this will be stored in the RIFF/WAVE audio file format, which is a variation of the longstanding .wav format. The “fmt” (format description) chunk is a WAVEFORMATEXTENSIBLE structure that allows description of multichannel formats as well as any other PCM and non-PCM audio formats.

We choose 24-bit because DVD-Audio has this resolution and SACD has dynamic range that could take advantage of this bit depth. The Axia Livewire network also has 24-bit resolution, so we have a match between the source, the storage, and the network. Compact disks have 16 bits and this has been the norm in broadcasting, but 24-bits are the future. 16-bit systems have theoretically 94dB dynamic range, and 24-bit systems 141dB. Both are plenty enough for radio broadcasting, but having more bits means that distortion at low audio levels is reduced, which may be audible – even (or particularly?) after aggressive processing. Were big cheap hard drives not available, we’d probably want to stay with 16 bits – bit with drives to cheap, why not splurge?

Audio that is stored on other formats: compressed, fewer bits, only mono, stereo, or surround should be uncompressed and/or up or down-mixed as needed to convert to the network’s standard format. The file header tells the application about the format so that it knows what to do.

We could consider compressing the surround channels to extend capacity. Since they are only used as inputs to the surround position encoder and are not actually transmitted, there would be no degradation of the on-air quality at all. On the other hand, swapping to a bigger drive or adding another one is so cheap, so perhaps there is no compelling reason to bother with it.

A software “driver” installed in the PC makes the network look like a standard Windows Driver Model (WDM) soundcard, so any audio application that works with usual soundcards should work without modification to send and receive audio from the network.


Mixing Console

A modern mixing console can be built with two ingredients: A control surface and a mixing and processing Engine with PC motherboard, CPU, and Ethernet connection. While the control surface has to be manufactured in the small volumes our industry dictates, the Engine can take advantage of powerful, high-volume, low-cost components from the computer world. A commodity 2.4 Gigahertz Pentium 4 CPU has plenty of horse-power to support mixing, equalization, panning, dynamics control, etc. for a 24-fader surround broadcast console.

The Engine has only two connectors: power and Gigabit Ethernet. All audio and control pass via the single RJ-45 Ethernet jack. With no hard drive (software is stored in a Compact Flash card), embedded Linux as the operating system, and all parts mounted on one PCB, re-liability is probably higher than a traditional digital mixing engine with its many plug-in DSP, CPU, input/output cards, etc.

Cost to provide surround mixing in this PC Enginebased console is the same as for stereo. There is no incremental increase in cost going to surround from stereo because the P4 platform has so much headroom that surround mixing software can be added without changing any hardware. The Gigabit Ethernet connection has enough capacity as well to support the additional surround signals. Contrast this with a surround upgrade to a traditional console. You would have four times the dozens of audio in/out connectors already needed for stereo and many more plug-in cards, leading to probably having to increase the size of the frame. Your cost increment would be tens of thousands of dollars.

Via the Ethernet switch, the console has access to any audio source in the system. Its various outputs may consumed anywhere within the facility.

There will surely be a lot of experimentation with microphone ideas for surround. In most cases, mono mics will be panned to a position within the surround stage. Perhaps reverb with multiple outputs, time-delay, pitch-shifting, or comb-filtering processing will be used to create a sense of immersive spaciousness. Surround panning will be part of the console and so microphones without additional processing will cost no more to support in surround than in stereo.

Other local inputs and outputs would require corresponding ports in the audio-to-Ethernet nodes. Stereo CD players would need only the usual two input ports and would be panned to surround within the console. Only surround SACD and DVD players would require surround inputs. They would normally connect 5.1 channels with the downmixing to stereo happening within the console.


Surround Encoding

The surround encoder is another Ethernet-connected box. One RJ-45 serves all required inputs and outputs. The 2 + 5.1 channels from the console program output are the inputs and the output is a 16kbps coded surround stream that gets sent to the transmitter. Both a front panel LCD and a web browser interface are provided to select the appropriate input channels from the network and to permit users to monitor operation.


Dynamics Processing

Initial testing indicates that existing stereo processing is satisfactory for the MPEG surround system. Any dynamics processing that is applied to the stereo channel affects the received surround channels as well. Because today’s processing is not enabled for direct Ethernet connection, a node is used adapt the network audio to the processor’s input and to apply the output back to the network.

Future processors may incorporate the surround encoder and offer more sophisticated individual processing control over the stereo and surround channels. For example, it may be interesting to have a way to “deprocess” the surround channels somewhat, while maintaining a more aggressive sound on the stereo program. More than a few listeners exposed to surround have said that the envelopment effect causes a perception of highenergy similar to what programmers and engineers try to achieve with dynamics compression.

FM and HD require different processing styles. FM needs special attention to pre-emphasis, usually quite a lot of left/right clipping, and perhaps even some composite clipping. The HD encoder has to work harder on a clipped signal and will not have as good a result as from a non-clipped signal since the additional harmonics look like audio that needs to be encoded and therefore attempt to receive bit allocation. So a processor optimized for the HD channel will generally use a look-ahead limiter rather than a clipper. Stations probably will decide to process the HD program less than the FM in order to offer a more “purist” signal to listeners. (For now, anyway. When HD Radio gets popular, all bets are off.) The most popular processors use a common front-end AGC section and follow that with independent limiter sections for FM and HD. Thus, both outputs need to be connected ultimately to their respective transmitters.


HD Radio Encoding

In iBiquity’s second-generation HD Radio system, the encoder is located at the studio. This has the advantage that the processing may be co-located at the studio and the STL only has to convey the encoded HD signal, tremendously reducing its bandwidth requirements. It also gives the benefit that any additional data that needs to be muxed-in can be applied at the studio. This data could be Program Associated Data (PAD) like song titles, or indeed our 16kbps coded surround stream. The input for this data is via Ethernet, so connecting it to the network easily enables a path from the surround encoder.


Studio to Transmitter Link

In most cases, we need to get our FM program audio to the transmitter in either composite stereo or PCM form. And we need to send the 96kbps HD radio signal. We could decide to do this with two independent links, or we could use one STL radio to handle both.

A digital STL such as the Mosley Starlink can be used in this set-up. The FM audio goes via the usual input and the HD radio signal via the ancillary data channel. These radios don’t have much capacity because they operate in the traditional 950 MHz band, where not much bandwidth is available. Because their operating frequencies are protected by license and because the frequencies they use are (relatively) low, they are quite reliable.

Another way would be to use the new Ethernet radios like the BE Big Pipe. These operate with bitrates up to 45Mbps, so there is a lot of capacity for multiple audio channels as well as data, VoIP phone, etc. Since we already have all our facility’s audio on the Ethernet, no format conversion is required – just connect the radio’s Ethernet jack to a port on the Ethernet switch. These operate in the unlicensed ISM band at 5.2 and 5.7 GHz, so there is some risk. However, the few current users report good performance and overall satisfaction.


At the Transmitter

With the HD encoding at the studio, there is not much to be done at the transmitter site. The HD exciter simply accepts the already encoded and multiplexed bitstream from the STL and modulates it for transmission.

The FM audio is applied to the FM exciter and transmitted as usual.


Listening Monitoring

It’s going to be necessary to listen to your internal audio and your station’s on-air program in surround. This means 5 small speakers and one subwoofer in each serious monitoring position.

The old quad arrangement, with the speakers in each corner of the room, is not the right way to position your monitoring set-up. Human ears are not front-to-back symmetrical and that set-up not only sounds unnatural, but may indeed provoke stress as your deep genetic wiring causes your brain to tell you that “there is danger behind.” The right way is de?ned in ITU standard 775. This speci?es the left and right front speakers to be placed at 30° from the listening position. The surrounds go at 110° ±10° - just a bit back of straight out to the sides. The center goes in the center and the sub goes wherever it sounds the best or is out of the way. You should not be able to detect the position of the subwoofer. According to well-researched psycho-acoustics, humans are not able to localize frequencies below 80Hz. Our heads are too small and our ears too close together at these long wavelengths to detect any left-right difference. If you are able to locate the sub, it probably means that it is radiating audio at a frequency high enough to be localized. One cause of this is distortion-caused harmonics outside of the sub’s proper operating range.

Another psychoacoustic phenomenon to be aware of is the human ear’s change in frequency response from different positions due to the Head Related Transfer Response (HRTF). Sound entering the ear from the side speakers will be perceived as bright compared to sound panned to the front speakers. The effect is significant – a broad curve starting at 1.6kHz, reaching an 8dB peak at 4kHz, and extending to 7kHz. Music producers have probably already compensated for this in their mixes, so it’s not an issue for normal listening. But if you are checking your set-up with white noise, you will likely notice this.

While the usual set-up calls for a one-to-one correspondence between channels and speakers, when you have small main speakers, you will probably need “bass management” to filter the low frequencies from the small speakers and re-direct them to the subwoofer. This means that the subwoofer will be responsible for the sum of the “.1” bass channel and the filtered lows from each of the other channels. This is how the “theater in a box” systems so popular with consumers do it.

As another compromise, you could leave off the center speaker and add the center signal to both the left and right front speakers. (I actually recommend this for your home music listening system. While the center speaker is helpful to stabilize the dialog that accompanies video, it has been my experience that music is better without it. You have the practical consideration that the center speaker is probably not nearly as good as your two front mains – and it can’t be if you have a TV in front of you since the screen and the speaker can’t share the same space. Movie theaters solve this by putting the speakers behind a screen with holes in it. Punching holes in your TV’s CRT is not very likely to be a satisfying operation…)


Production Studio

Most PC-based audio editors such as Adobe Audition, ProTools, etc. support mixing for surround, a procedure not much more complicated than stereo mixing. For a production studio now equipped with a PC editor (are there many that aren’t these days?), a soundcard upgrade and a surround monitoring loudspeaker set-up may be all that are required to start producing in surround.Most PC-based audio editors such as Adobe Audition, ProTools, etc. support mixing for surround, a procedure not much more complicated than stereo mixing. For a production studio now equipped with a PC editor (are there many that aren’t these days?), a soundcard upgrade and a surround monitoring loudspeaker set-up may be all that are required to start producing in surround.

For dubbing surround music from disks to the delivery system, the production studio will need a DVDAudio and SACD player, which may be one universal device. These players will not output stereo and 5.1 simultaneously, so the tracks need to be recorded separately and synchronized in an audio editor.



We can get another advantage from the network. It’s perfect for the distribution of clock time to everything in our studios that need it. PCs, consoles, processors with timetriggered presets, and wall clocks can all be synchronized and accurate. The key is Internet Time Protocol (NTP). This is the Internet’s way to communicate time to any interested connected device. It is a sophisticated system that delivers accurate time even when the path from the time server to the receiver can have varying delay. It accomplishes this by using a local Phase Lock Loop that is tuned for delay by measuring the packet transmission round-trip. Stations have two options: If you have a permanent Internet connection, you can retrieve the NTP data from it. You will need to run a local “NTP server” on a PC. Usually this would be equipped with two Ethernet cards, with one linked to the Internet and the other to the station’s local network, thus providing firewall isolation. The other way is to use a device that receives the WWV or satellite GPS time signals and generates the NTP packets from the radio signals.



If the audio network is engineered with sufficient capacity and it correctly supports modern priority mechanisms, it could also be used for the station’s data needs. Email, web browsing, client-server downloads, etc. may traverse the common network. The Ethernet switch isolates the data traffic from the audio streams. When audio and data need to share the same switch port and link, the audio is assured to have first call on the bandwidth because it has higher priority than the data and the switch knows to hold any data packets until the audio is sent. The TCP (Transmission Control Protocol) part of TCP/IP in the network interfaces of computers automatically regulates the data transmission rate to fill the link capacity not occupied by audio.

A more conservative approach is to have two networks. Computers that need to have access to both could have two Ethernet cards with a connection from one to Adobe Audition’s surround mix-down functionthe audio and the other to the data network. Or an IP router could be used to safely pass data from one network to the other.


Looking Again at the Diagram

In the days of simple analog equipment the connection block diagram and the functional diagram told the same story. Signals flowed into and out of physical boxes with individual cables that each were represented by a line on the diagram. In our modern networked era, where a cable can carry multiple audio signals both ways along with control and data, the physical implementation is likely to be a lot simpler than the functional diagram implies. Here is the same studio facility we have been describing, but now shown from the perspective of the physical interconnections. In the computer networking world, an Ethernet is often shown as a simple bus to which everything connects.


Cost Comparisons

Since our focus has been on the real-world practicality of upgrading a facility to surround, let’s explore the cost to build surround-capable studios with different approaches. For this discussion, we will have a basic studio set-up with a 12 fader console, 3 mics, 4 automation sources, 2 codecs, 2 phone hybrids, and 1 SACD player. We will compare the cost of building a stereo studio with that of a surround studio. Here is the summary:


Stereo $50k

Surround $80k


Stereo $25k

Surround $30k

The networked console approach is much less expensive than the conventional (digital router-based console) systems for stereo studios. The small increment in cost for the networked system when expanding from stereo to surround-capable is due to the inherent characteristics of the networked approach. The networked radio studio has only a single cable for each source, whether stereo or surround. The DSP mixing engine is the same as for stereo. Soundcards are replaced by a software driver and the router is replaced by a low-cost Ethernet switch, which handles stereo or surround equally well. With conventional systems you must increase the quantity of PC soundcard ports, console input cards, output cards, cables, DSP cards, frame size, etc. by a factor of four. These costs can vary significantly between vendors but as a rule of thumb, a networked stereo console should cost about half that of its conventional counterpart. And the upgrade to surround will be only a small incremental cost for speakers, extra hard drive space and some additional I/O.



Cost is not the only advantage. Using networking from the computer industry brings a number of opportunities to enhance operations.

PCs can be used to listen to any audio channel within the facility. Just attach a PC to the network, add the appropriate software and select what you want to hear. PCs with web browsers can also be used to configure and monitor the performance of the various connected elements. With careful firewalling, this could be accomplished offsite via an Internet or dialup connection. A connection from a station’s internal network to a wide area network that connects multiple stations offers opportunities for sharing audio, either live or file-based. IP packet-based satellite links are yet another way local stations may be connected to remote audio and data. With the local network, the WAN, and the satellite all talking the same language, it becomes cheaper and easier to link-up. Novel ideas taking advantage of this connectivity are sure to arise.

It is possible to have Ethernet switch equipment with automatic back-up, redundant power supplies and controllers, etc. A single common network may support audio, telephone, control, and general data. With the cost of the network amortized over multiple functions, building redundancy into it can be readily justified.

Protection against obsolescence comes from using a networking technology that is over 30 years old but continues to evolve and grow. Ethernet and IP are not going away anytime soon.

Cables, wiring accessories, testers, etc., etc., are widely available off-the-shelf. Again, the huge computer industry volumes come into play to give us a lot of options at commodity prices.

You’ll be using a lot less cable compared to traditional infrastructure methods because each cable has so much capacity. Because Cat 6 cable is commodity, it is cheaper than audio-specific cables.

There are university programs, training seminars, shelves of books, etc. devoted to educating people about data networking. You’ll be plugged into a much bigger human network.


Time for a Radio Revolution?

While the world around swirls with change and opportunity, not much has happened to the technology side of radio since the addition of stereo to FM in the early 60s. Other established media (film, TV, music disks) have exploded with innovation, and completely new media (Internet, iPod, satellite broadcasting) have burst onto the scene. All of the preceding have gained capability and appeal from having transitioned to digital. We finally have HD to take our industry into the digital era, but in stereo it offers a very small improvement to the FM listener’s experience. When we were growing up, FM was cool because it was at the pinnacle of audio delivery technology. Just the letters FM connoted a general sense of quality. With competitive media having surpassed radio, this connotation has faded. Without action, we will surely be sidelined.

Digital surround may well be an answer. It’s a way to please older music fans by making the classics fresh and to excite younger listeners with aural fireworks – all the better in cars with huge subwoofers. As Tomlinson Holman says:

“Perceptually, we know that everyone equipped Any PC on the network can listen to surround audio streams.with normal hearing can hear the difference between mono and stereo, and it is a large difference. And…virtually everyone can hear the difference between 2-channel stereo and 5.1 channel sound as a significant improvement.”

There is an argument that the HD channel capacity should be split and used to transmit additional programs to listeners. If we divide the current 96kbps channel into two 48kbps channels, we could certainly offer two good- fidelity talk services. (One of them would not benefit from HD Radio’s “revert to analog” feature in the case of digital failure, though.) There is no need for surround in this scenario. But 48kbps is not good enough for music. So we could imagine that a station operator could decide to use his available bandwidth for either two channels of talk or one channel of music. In the latter case, the 80 +16 division for a surround music service is a perfect fit. Should iBiquity’s proposal to increase HD’s bandwidth succeed, a division of 80 + 16 + 48 could nicely serve one surround music service and one talk service.

Either way, compared to a mere shift to stereo digital, there will be a clear motivation for a consumer to buy an HD-enabled receiver.

And a state-of-the-art networked studio facility supporting the creation of on-air product for these services presents an opportunity for both cost savings and operational flexibility.



Broadcast Electronics Big Pipe BP4500 Datasheet

Broadcast Elect. XPi 10 HD Radio Exporter Datasheet

Church, Steve. Introduction to Livewire.

Church, Steve. Designing and Building Your Livewire Ethernet System.

Church, Steve. Ethernet for Studio Audio Systems.
Proceedings of the NAB 2004 Broadcast Engineering Conference.

Dosch, Michael. Axia – A Network-Enabled Radio Console Architecture.
Proceedings of the NAB 2004 Broadcast Engineering Conference.

Herre, Juergen; Faller, Christof; Ertel, Christian; Hilpert, Johannes; Hoelzer, Andreas; Spenger, Claus 2004, MP3 Surround: Efficient and Compatible Coding of Multi-Channel Audio, AES preprint 6049, 116th annual convention, May 2004, Berlin

Holman, Tomlinson. 5.1 Surround Sound, Up and Running. Focal Press.

Internet Engineering Task Force (IETF) RFC 1305 Network Time Protocol (Version 3) Specification, Implementation

ISO/IEC JTC 1/SC 29/WG 11N6691, Procedures for the Evaluation of Spatial Audio Coding Systems, July 2004, Redmond, USA

ITU-R Recommendation 775, Multichannel stereophonic sound system with and without accompanying picture, International Telecommunications Union, Geneva, Switzerland

Mosely Starlink SL9003Q Datasheet

MP3 Surround: info and free evaluation download software Fraunhofer Institute for Integrated Circuits

Return To