Radio broadcast audio mixing consoles have remained relatively unchanged for more than twenty years. Originally, source equipment connected to stand-alone mixing consoles with discrete analog signals. Later, the preferred method of interconnection became AES/EBU digital. More recently, high-end broadcast consoles have begun to offer proprietary centralized mixing and routing engines which make possible the sharing of sources between studios.
Using modern computer networking equipment, it is now possible to build robust Networks capable of transporting digital media signals throughout a complete studio facility. This paper describes various console models, outlines the advantages offered by a studio Network and explains how future broadcast equipment— most notably mixing consoles— will need to change in order to fully exploit these advantages.
Sources are different now
The audio mixing console has long been the central processing and control device of the radio studio. Despite a trend toward digital processing, the basic architecture of the console has not changed in more than twenty years. Audio source equipment feeds the console analog or AES/EBU audio. The user mixes live and recorded elements and the outputs of the console feed the transmission chain and other destinations. This approach is heavily dependent on the user to push the right buttons at the right times so as to deliver the appropriate content. And sharing sources between studios is difficult. The stand-alone console is ideal for dedicated studios that can be set up for a certain show type and left unchanged.
Newer console designs have begun to offer integrated routing switchers using proprietary centralized mixing/ routing time division multiplex (TDM) engine cores. These systems offer significant advantages over standalone console designs. Because all studio sources are connected to a central core engine, it is possible for sources to be shared by multiple studios. Further, because the mixing and routing is performed centrally, the studio console interface is a flexible control surface that can be reconfigured in software to accommodate changing show types, shared resources and the instant recall of user preferences and settings. The centralized mixing/routing engine approach reduces costs when compared to stand-alone mixing consoles due to reduced wiring costs and a consolidation of expensive components.
While these advancements offer benefits to the modern radio plant, even the most advanced consoles of today seem to ignore the now central role played by the personal computer (PC). Most broadcasters are using PC’s to replace many other studio functions— particularly audio source equipment. Gone are the days of playing from CD, carts, vinyl, cassette and reel tape in a typical broadcast. Most program audio is now recorded, edited and played out of a PC system.
While consoles remain much the same, the PC has quietly taken center stage in today’s radio studio. Traditional consoles handle PC audio the same as any discrete source, hindering potential intercommunication that might enhance accuracy and efficiency. Instead of using analog or AES/EBU audio as the interconnection standard, we believe broadcast audio systems of the future will use networked Ethernet to provide a much more flexible and cost-effective alternative to console systems used today.
With traditional consoles, the PC uses sound cards to feed analog or digital audio to the console. In a complex studio, it may be desired to play many audio elements from the PC simultaneously. While modern PC’s are capable of playing multiple simultaneous audio streams, the sound cards can often be a limiting factor. With Ethernet, the PC does not need sound cards. Rather, it passes the audio directly to the network via a standard network interface connection (NIC), eliminating the expense and compatibility issues associated with sound cards.
Ethernet can be carried over standard computer networking devices such as switches, hubs and routers. These networks can be easily scaled from small single studio installations all the way up to the most advanced consolidated multi-station, multi-studio facilities.
More importantly, Ethernet is information rich meaning that associated data can travel the same path as the audio. As broadcasters continue to embrace digital audio broadcast (DAB), there will be a need to convey content-related data to the transmission chain. An Ethernet will carry both audio and associated data on a single connection to any destination.
An Ethernet provides device-independent flexibility. Sources and destinations are network resources, as are mixing engines, storage devices, processors, and other types of peripherals. Because of this, an Ethernet is easy to install and maintain. Once a device has been connected to the network, it is now an available resource to be used as the engineer wishes. Sharing devices across studios on a permanent or temporary basis will no longer require wiring changes.
What’s wrong with the way it is?
Discrete analog and digital connections to a broadcast console have worked well for years. Some might say this approach is not broken and shouldn’t be fixed. Indeed, there are some very sophisticated facilities running some of the most complex shows on stand-alone consoles from PR&E, Wheatstone and others. But we must carefully consider the changing technology of radio, particularly the capabilities of DAB, as we evaluate whether the old model will be able to meet future needs. Figure 1 shows a (greatly) simplified diagram of a typical radio studio connected to a traditional console. Analog and digital sources are connected to the console where they are mixed and routed to the program output. The operator chooses the sources— including computer audio— and selects levels to produce a live show.
Even though the PC is providing most of the recorded audio and the playout software can log what is played, it is quite possible for user errors downstream to render the log useless. For example, the PC may have played a spot while the console fader was down, the channel not assigned to program, another source was being played simultaneously encroaching on the spot, etc.
Because sources are tied to the console, they are not easily shared by other studios. And with only analog or AES/EBU connections, any provisions for program associated data will need to be made separate from the console, complicating the system design. For example, how does the system know if a source is feeding the program chain or is simply being auditioned locally?
Despite its limitations, this is by far the most popular radio console model in use today, and is quite satisfactory for many applications.
The more sophisticated designs of today, provide a centralized mixing/routing engine as depicted in Figure 2. The central engine core performs all the switching, mixing and console processing for a group of studios. Sources can be shared across studios. For very large plants, multiple engines can be ganged together and some have special provisions for dealing with localized studio sources.
Control surfaces provide the user interface, but perform no actual audio processing. The user interacts with a surface much the same as they would an actual console, but rather than changing the audio directly, their input is captured and fed to the central engine core to change levels, switch signals, etc. Control surfaces can be reconfigured quickly to accommodate different shows or user preferences.
This approach provides much more flexibility than the stand-alone console model previously described. Wiring costs are greatly reduced and studios can be more efficiently utilized. Perhaps the most significant benefit is the seamless integration between routing and mixing functions. Each input channel can select from a range of available sources. Outputs and monitor preferences are stored for instant recall when launching a show. Yet with all this integration, all of the audio is still treated as discrete. This is especially limiting for the PC which must use sound cards to convert its audio to analog or AES/EBU streams before feeding it into the engine core. An Ethernet audio network can provide all of the benefits of the centralized core approach while adding a wide range of new capabilities.
Why use computer technology?
The computer industry has advanced the state-of-the-art in computer networking, routing and switching systems. It is now possible to transport digital media signals reliably over controlled Ethernet audio networks with guaranteed quality of service (QoS).
Studio audio in the broadcast plant is especially demanding. It is not enough that the network be capable of reliably delivering audio packets. The delivery method must provide for synchronization, absolutely no information loss, and extremely low delay (latency).
By carefully specifying the network components, system design and transport protocol, it is possible to build a low-latency, no-loss, synchronized Ethernet audio network using a combination of commonly available Ethernet and PC components and some purpose-built broadcast pieces.
Additionally, because the underlying network is Ethernet, PC’s can connect directly to the network without any translating hardware. Ethernet cables, plugs, tools, testers, hubs, and Ethernet adapters are ubiquitous and inexpensive. By building the studio infrastructure using these elements, broadcasters are able to access advanced technology with costs driven lower by the high volumes of the mainstream computer networking markets.
What about traffic?
An Ethernet audio network must manage traffic more intelligently than the typical office LAN which routinely drops packets and uses TCP/IP to throttle the speed of the source to deal with variable network congestion. While this method works fine for web browsing, email and print jobs, the penalty for this method of delivering audio is very high latency due to large buffers, audio drop-outs, or both.
The best way to solve this problem that we have found is to use switching Ethernet hubs to prioritize audio streams for reliable transmission and to control the flow of traffic. In an ideal system, high-priority audio can be conveyed over the same Ethernet segments as standard TCP/IP or UDP/IP control or file transfer data.
The switching Ethernet hub is ideally suited for an audio network. In Figure 4, six workstations are connected to a switching hub using 100BT Ethernet segments. Each segment is capable of carrying 24 inputs and 24 outputs (linear PCM, stereo 48kHz sampling rate, 20 bit resolution) simultaneously. So in the simple example shown, this network provides a 144 by 144 cross-point matrix. The switching Ethernet hub performs two vital functions for the Ethernet audio network.
First, it divides the network into independent Ethernet segments, each capable of carrying a full payload of traffic. It does this by sending only those packets intended for a particular segment. With a properly written protocol and careful system design, it is possible to completely eliminate network congestion and contention.
By contrast, a standard (non switching) hub will cause the connected devices to share bandwidth; relying on the connected devices to ignore the unnecessary packets. Without a switching hub, the six workstations above would share a single 100BT network connection limiting the matrix to 24x24 total inputs to outputs at best.
Second, the switching hub prioritizes the data. This feature is what allows the Ethernet audio network to also carry lower-priority associated data without concern that these additional packets will affect the delivery of time critical audio packets. In fact, it is possible to set multiple levels of priority for maximum reliability and efficiency.
For example, in a broadcast studio, we could set live elements like microphones to the highest priority, computer audio sources to medium priority, and logic signals and PAD to low priority. By prioritizing traffic this way, it is possible to deliver live audio with minimal latency and still allow other traffic on the same net. With switching hubs and a well designed protocol and system, a broadcast-capable Ethernet audio network is possible.
Why is low latency so important?
The traditional console model provides for very low input to output delay. This is a critical requirement for a live-format broadcast console in which the announcers will typically monitor their own voices in headphones.
Studies have shown that total mic to headphone delay in excess of 30ms will cause live monitoring to become distracting if not impossible. Delays between 15 and 30ms produce an annoying comb effect. Ideally, a console system would have much lower latency, perhaps less than 10ms total.
A 10ms latency budget disqualifies most network methodologies, even those which purport to offer low latency delivery. The problem is that even the low latency protocols— even those intended for media use— will add at least 5ms per network hop. Multiple network hops are required in even the simplest systems.
To gain acceptance by broadcasters, networked audio systems will need to provide latency performance in the range of 1ms per network hop. The other system components will also need to be designed for speed. It does little good to have an ultra-fast network, only to have huge buffers in the mix engine adding tens of milliseconds to the round-trip delay.
The essential components of a network-centric radio console are shown in Figure 5. Each component will add some delay to the overall chain. The good news is that with careful design and some clever application of technology, it is possible to build an Ethernet audio network capable of delivering real-time signals with minimal latency. In fact, it is possible to build an entire studio network with port-to-port throughput times that can rival the traditional console.
So analog sources are networked?
Every source and every destination should be made available to the network as a resource. Every microphone, tape machine, satellite feed or CD player used in the broadcast plant needs to be connected to the network.
In order to be useful, an Ethernet audio network will need to have provisions for converting analog feeds to packets and back again. Professional-grade A/D/A conversion would logically be bundled together with the adapters. It would also be beneficial to have network-addressable GPIO interfaces to start and stop sources and to provide remote control capabilities.
What about digital sources? Again, an Ethernet audio network must be able to interface with discrete digital sources and destinations. Because AES-3 is a universally-accepted standard for transporting linear PCM audio, translation between AES/EBU and network would be required for certain devices.
In an ideal future, every device would be equipped with an Ethernet adapter and would be capable of transmitting and receiving properly formatted packets directly. We believe that the benefits of Ethernet will drive many broadcast equipment manufacturers to replace or supplement their AES-3 digital connections with network ready Ethernet jacks in future designs.
And to reiterate an earlier point: most recorded audio in the modern broadcast plant originates in the PC. IP allows the PC to speak directly to the network through its NIC— no sound cards required.
Is this scalable?
The overall bandwidth of a switched network scales with the size of the network (more bandwidth is added as the network grows). This means that bandwidth does not limit the number of channels that can be supported network-wide. There is virtually no limit to how large or complex a network can be built using this approach.
What may be surprising though is how cost-effective an Ethernet audio network can be for small, simpler installations. Even a one or two studio facility will benefit from the ability to share sources, direct connect to PC’s, transport associated data and wire everything with inexpensive Ethernet cables.
Studio systems can be built as stand-alone clusters, each with its own central switching Ethernet hub. Interconnecting multiple studios can be accomplished via one of the switched Ethernet segments. Although 100BT Ethernet is ideal for local shared sources, some broadcasters may wish to connect the studios together using a 1000BT copper or fiber link.
Where is the cross-point switcher?
Perhaps one of the more interesting attributes of the Ethernet audio network is its ability to provide the functions of a cross-point audio switcher— without any additional cost. In the networked audio system, every audio source and every audio destination is available on the network, eliminating any need for a dedicated cross point audio switcher.
Some larger facilities use expensive, proprietary cross-point audio switchers to share sources and reconfigure destinations. These traditional routing switchers can easily cost more than $50,000US for a typical plant. And while these routers are competent at routing analog or AES/EBU discrete signals, an Ethernet audio network is superior for most modern radio plants with mixed analog, digital and computer-generated signals.
Figure 6 shows a simplified cross-point switching example. Analog and digital sources are converted to digital and interfaced to the network as high-priority multicast streams, available to all interested destinations. Connections are made by simply having the destination (output) terminal adapter request a source stream. This could be done locally with a simple user interface on the terminal itself or with a configuration application.
Any audio workstations on the network can “direct connect” via Ethernet; no sound cards required. Audio from the workstations will be IP-standard, medium priority, and can feed the same destinations as the high priority live streams. This system is much more flexible than the traditional audio cross-point switcher at only a fraction of the cost. And unlike the proprietary cross-point switchers which are prohibitively expensive for the smaller station, an Ethernet audio network is cost-effective for very small systems— as small as only a few devices— while being able to scale up to meet the needs of the largest facilities.
Some facilities may choose to use audio networks to simply replace the function of the cross-point routing switcher, connecting to traditional consoles and source equipment. Even in this application, the network approach offers key benefits over traditional approaches.
Modern broadcast plants have a mixture of local and centralized sources and destinations. CD players, microphones, headphones and speakers are mostly local to the studio while audio servers, satellite receivers, transmission feeds are usually in the central terminal room.
The traditional cross-point switcher is often a central resource. Studio sources and destinations must be connected back to the central device. This can be done with either multiple discrete audio cables or some type of proprietary studio connector interface device. Both approaches add cost to the already-expensive cross-point audio switcher.
The networked audio approach allows conversion terminals to reside near their sources and destinations. Terminals can be located both in studios and in central rack rooms. Switches can also be distributed around the facility or centralized. Even workstations can be central, local or both. Everything is connected together with standard low-cost Cat-5 cabling.
While the Ethernet audio network makes an excellent replacement for the traditional cross-point switcher, much more is possible once we establish the network infrastructure. In particular, if we are to add a device to manage the mixing and routing of signals on the network, we can also replace the traditional console.
A PC-based mixing engine?
Having established that all of a facility’s sources and destinations can be networked, let’s now address the need for mixing and processing. Ideally, a mixing engine would be attached to the network and would receive the desired streams and would perform any mixing and signal processing necessary and send the result to the appropriate destinations.
Most of today’s digital mixing console engines— both stand-alone and the centralized router/engine types— are based on proprietary DSP architectures. While these designs are satisfactory for the discrete audio studio of the past, the networked approach makes possible a different architecture, one based on the power of the modern PC motherboard.
A Pentium-4 equipped motherboard is an amazingly powerful device, with processing power comparable to large multi-DSP proprietary embedded systems. In fact, the PC engine is much better suited for mixing in a network-centric facility than proprietary engines. All the connections into and out of the engine are made via Ethernet.
Of course, most PC motherboards are burdened with slow, general purpose operating systems and inefficient applications. To make an effective mix engine, the PC must be optimized for this purpose, with an efficient and reliable operating system capable of handling real-time processing tasks (such as real-time Linux) and tight, efficient application code. In order to keep the overall system latency under our 10ms maximum, the engine will need to receive, mix, process and distribute live streams within a millisecond or two. Although challenging, this too is possible with careful design. Needless to say, this PC engine must be dedicated to perform the engine functions exclusively.
In the network-centric architecture, the mixing engine is an available resource just like the sources and destinations themselves. It costs only a fraction of what a proprietary mixing engine would, again taking advantage of computer industry volumes to make technology more accessible. The low cost and wide availability of the PC-motherboard makes this engine architecture much easier to acquire, maintain and upgrade than traditional approaches.
A simplified studio mixing system is shown in Figure 7. Analog and digital discrete sources are converted to digital live (high-priority) streams and fed to the network. The mixing engine sweetens and mixes these streams and feeds the result to the appropriate output destinations, based on a configuration template and live-input from a control surface or user application.
A single P4 engine is capable of supporting a very complex studio setup, with 24 or more active sources, multiple program outputs, monitor outputs, mix-minus outputs, auxiliary sends, talkback paths, etc. Amazingly, this PC-engine can outperform the very largest multi-bus, multi-channel, stand-alone consoles used in radio today.
Due to the tremendous amount of latent power in the P4 motherboard, the PC-based mixing engine is capable of adapting to a wide range of situations without any hardware changes. One studio setup might have a dozen or more live sources, each with independent mix-minus output requirements. Another setup might use 6 or 8 computer-sourced IP streams and several different control surfaces. The PC-based mixing engine adapts to the needs of the studio instantly and effortlessly.
Further, it is possible to integrate external functions into the engine. Many consoles will use external effects devices, equalization, profanity delays, headphone dynamics processing, and other specialized functions. A PC-based mixing engine can assign resources to provide these and other functions that might otherwise require dedicated equipment.
The engines can be located in the studios or in the terminal rooms, stand-alone or shared. The networked broadcast plant requires an entirely new way of thinking about systems architecture, but once our minds are open to the possibilities, it is easy to see how powerful and flexible tomorrow’s systems will be.
Is all this really possible?
The concepts described here are more than interesting theory. Telos has in fact developed a studio audio transport system called Livewire, a suite of audio networking tools which will forever change the way we connect and use studio audio equipment.
The Livewire network uses a common Ethernet to carry audio streams and any associated data or control between devices, studios and facilities. At its heart Livewire uses Ethernet switches to isolate links, manage traffic and ensure fully reliable transmission.
Livewire assigns the highest priority to live audio streams (called Livestreams) for delivery in less than 1ms per network hop, while also providing an IP-Standard medium-delay mode for connection to PC’s. It distributes a clock signal over the Ethernet for precise synchronization and low delay.
The Livewire system includes translation terminals for microphone audio, line-level analog audio, and AES/EBU audio for connection to traditional equipment. These terminals provide the synchronization and advertise the availability of connected sources to the rest of the network and can be located physically near their associated gear.
A specialized Routing Controller terminal provides a list of available streams which can be scrolled and selected or instantly accessed via softkeys. It connects to Livewire and provides convenient audio input and output ports.
The Livewire system provides a unique way of handling audio from PC’s using a software driver that causes the network to look like a sound card to the PC application. Equipped with this driver, the application will pass audio to and from the network seamlessly.
A PC-based Engine running Linux and a highly-tuned application mixes and processes Livewire streams while adding less than 1ms of throughput delay. The Engine adapts to changing studio requirements and has sufficient processing headroom to allow for “accessory” features like built-in headphone dynamics processing and channel equalization that might require add-on devices in a traditional studio system.
Telos offers control surfaces to provide the tangible user interface (UI) for the board operator, with intuitive controls and displays designed for the fast-paced live format radio show. These surfaces communicate to the Engine and other devices over the Livewire.
Putting it all together
Shown on this page is an example studio system using Livewire components. In this example, the studio has a large number of active local sources. Each microphone has an independent monitor feed which enables the host to talk to each guest’s headphones privately.
The phone and codec sources each have associated mix-minus outputs. In fact, due to the Engine’s ability to assign resources as required, it is possible to have a mix-minus output for every assigned source. And the management of mix-minus outputs is handled completely within the Engine automatically, finally making hybrids and codecs as easy to use as CD players.
The audio delivery software is directly feeding the network with 6 simultaneous stereo audio sources. Additionally, the Ethernet switch is linked to other studios and centralized sources and is also making these local sources available to other interested studios.
In this example, the traffic is light and the local Engine working well below its capacity. There are 10 local Livestream sources, 6 IP-Audio sources and 13 local destinations. Any program associated data is carried through the network along with the audio data and can be delivered to interested devices by simply connecting them to an unused switch port.
A GPIO terminal is shown which provides for remote control and contact closure commands for microphones and discrete peripherals.
In this drawing, we even show a firewall-protected internet connection. The idea of allowing internet traffic onto a critical audio network would be terrifying were it not for the traffic management features of the Ethernet switching hub. Because of the priority placed on Livestreams over IP-Standard audio streams over everything else, Livewire ensures that even on a busy network, audio comes first.
Some will be uncomfortable with the idea of computer networking technology for audio delivery. Proprietary embedded systems may feel more industrial and secure. What’s more, we have all had our share of bad experiences with computers and networks. We groan at the thought of “rebooting” our consoles. For good reason of course.
In order to be accepted by broadcasters, Livewire— or any other audio networking approach for that matter— absolutely must provide the highest level of reliable operation. This is our programming we’re talking about. The office printer can be off line for an hour while we hunt down the IT expert. The station audio must be uninterrupted.
We believe that the future will clearly prove, despite some initial apprehension, that studios built around audio networks will provide high reliability, cost efficiency and greatly enhanced studio operations. Once networking begins to gain acceptance, we should see other significant changes.
We described here a console engine which hangs on the network intercepting streams, mixing, processing and presenting the result back to the network for interested destinations. It is easy to imagine future broadcast products equipped with Ethernet audio connections to be addressed and shared throughout a facility.
And as Moore’s law continues to drive PC MIPS up and prices down, the network-enabled radio Engine will very soon have excess capacity that could be tapped for alternative tasks. Software plug-in products to do voice processing, program delay or even codec or hybrid functions may eventually replace the need for stand-alone broadcast gear.
Broadcast technology has always been driven forward by advancement in the communications and computer industries. PC’s replaced broadcast carts. Digital Signal Processing replaced analog functions. And each technology advance brought with it new standards of performance and new operating possibilities.
We can now apply computer networking to the broadcast plant in ways never before possible. Discrete point-to-point wiring and TDM mainframe-type engine cores will soon seem antiquated once broadcasters begin to experience the benefits of the networked audio plant.