The Killer App for FM: 5.1 Surround Sound
For the past few decades, the appeal of FM radio has been its technical superiority: Low noise and stereo sound. (Remember Steely Dan’s “FM…No static at all!”) The world has moved on and FM as a technical medium has become status quo. Multichannel sound exists for television, personal computers, video games, and DVD – both video and audio. Now for the first time since 1961, when FM implemented stereo, we have technology that will keep FM radio competitive with other existing mediums. The ability to broadcast distinct 5.1 multichannel audio.
Anyone for Pizza?
When was the last time you turned on the radio and heard something truly exciting? Think about it. Take a moment and really think about it…Not easy is it? For me, it’s probably during the last hey day of CHR, about 20 years ago, when Z-100 (WHTZ) made its run in New York City.
Now, look at radio today. It's losing market share to so many alternatives: mobile CD listening, iPod, XM/Sirius, and netcasting. While none of those entities alone are defeating radio, combined they are eroding the listener base. Recently, at the Radio & Records Convention in Los Angeles the following dialog occurred with a well-known corporate Program Director who did not want to admit that his legacy station was losing audience. His claim was: "Hey, even with XM and Sirius around I still have a 4.3 share in Los Angeles." The reply to him: "Yes, you do still have a 4.3 share...BUT...the pie is smaller, and why is that?" Interesting in that he had no rebuttal.
Pizza analogy: Many people can eat half of an 8 cut pizza easily. If the pizza is 12 inches round, that's a lot of pizza. If the pizza is 10 inches round, then it's a smaller pizza, but in both cases they've consumed a 50% share of the pizza. Think of radio ratings shares the same way. The radio “pizza” is getting smaller! If terrestrial broadcasters don't respond soon, that same PD who once had a 4.3 share of a huge LA audience, and now has a 4.3 share of a smaller audience, will soon have a 4.3 share of next to no audience!
Fortunately, pessimism has never been my strong suit. I’ve heard the future of FM Radio, and it’s truly exciting! The amazing thing about this new enthusiasm is that it’s not a new format, super-duper air talent, or an amazing station giveaway. Surprisingly, it’s technical…distinct multichannel sound in 5.1 glorious channels!
The Killer APP
The multichannel system invented by Fraunhofer Institute (FhG) and Agere Systems comes from people who know their stuff. FhG are the folks who created MP3 and MPEG AAC. They are also getting a lot attention for their new Iosono system that uses as many as 304 loudspeakers to create an amazingly enveloping soundspace for applications like high-end movie theaters. The Agere people are former Lucent and Bell Labs audio coding researchers. Sparing all the technobabble, this surround system will provide a distinct multichannel listening experience to the FM radio audience. This is accomplished using a technique called coded-discrete which prepares the audio for transmission over iBiquity’s HD Radio® system.
The Fraunhofer Institute (FhG) folks have been busy pushing the frontiers of perceptual audio research. The latest result is a powerful spatial audio coding system that takes advantage of the most up-to-date knowledge in aural perception. From psychoacoustics studies, it has been learned that the level difference, time difference, and coherence between channels is what creates the perception of spatial image. The key to FhG’s multichannel system is that they represent these difference values with very compact coding, rather than transmitting all of the individual audio channels. The encoder estimates the values as a function of frequency (that is, within each sub-band) and transmits them to the decoder in an ancillary stream that accompanies the main coded audio stream.
The enclosed block diagrams illustrate how an encoder/decoder pair would work within a broadcast channel such as HD Radio®. Following are descriptions of two implementations.
The first step is to create a compatible stereo downmix from the multichannel material. The resulting stereo signal is coded using any perceptual codec. Since there are no changes to the basic codec, this signal can be received by stereo radios. The spatial encoder extracts the various spatial cue parameters from the multichannel input, which are transmitted in an ancillary data channel. The decoder, if present in the receiver, recreates the original multichannel audio. In the diagrams, you can see that we need to have a downmix function to create the compatible stereo channels from the multichannel source. The most obvious way to do this is with simple linear combiner, as follows:
L = Lfront + a*Lrear + b*Center
R = Rfront + a*Rrear + b*Center
Where a and b are constant scale factors, with the values usually ranging from .5 to .7. But this simple procedure, Figure-1, is far from the best possible.
When making an optimized downmix, a number of considerations come into play, which comes from both psychoacoustics and production practices. We must present a stereo mix to listeners without multichannel receivers that is as good as a stereo-only broadcast. Simply collapsing the front and back signals into a 2- channel representation may cause some confusion in the normal binaural cues and degrade stereo listening. And it almost certainly will sound different from the version that listeners are used to. The FhG system allows a producer to make a manual downmix, thus preserving maximum artistic freedom and allowing maximum flexibility to adapt to different kinds of audio material. Since almost all music released in surround format also has a stereo version on the same disk that could be used as input to the encoder, this stereo version is what would be heard by listeners with nonsurround radios – with no modification or compromise of any kind.
Advanced automated downmixing is also an option when manual mixes are not available. A processor could dynamically modify the scaling values and relative phase during mixdown. Such a processor would use advanced algorithms that can take into consideration absolute source positioning, panning laws, the way sources were mixed into the multichannel signals, and original inter-channel phase relationships, so it would have the potential to achieve quality that is comparable to manual downmixes.
5.1/ 2.0 Discrete Method
Figure-2 is an example of an optimal method that supports both 5.1 surround and 2.0 stereo mixes. By utilizing the original stereo mix for the transmission path, and the 5.1 channels for the surround encoder, the best of both are available. It’s important to note that all of the signals required for surround replication already exist in the stereo mix. The 5.1 channel data is used to restore the original placement of the multichannels.
The configuration of this embodiment can exist in two manners. The obvious would be to employ the wave extensible file format that would include eight (8) channels of audio (5.1 and 2.0), or the encoded 2.0 stream could adapt the surround information as metadata and thus create a quasi 2.1 signal.
This encoding method must rely upon the content of both mixes being exactly the same. If that’s not possible, then the compatible downmix method can be used as an alternative.
Note: Since the original, or optimized stereo mix is used to support 2.0 listening, it goes without reason that this will also support monaural 100% of the time.
The receiver embodiment is the inverse of the above described functions. A stereo perceptual decoder will recognize the ancillary side-channel data and forward that to a surround decoder along with the Left/Right audio. Utilizing the inverse function of the encoding method, the surround channels are restored to their original placement and 5.1 audio is presented to the amplier and speakers. If the receiver does not contain the stereo perceptual decoder, then the original stereo signal is passed to the output of the receiver. Figure-3 illustrates the receiver function.
All well and good, but will this work with HD Radio®? The astonishing answer is: Yes! The FhG spatial encoding system is fully compatible with HD Radio’s® current codec for the stereo channels. And the sidechannel for spatial information is less than 20kbps, a rate that fits well within the HD Radio® ancillary data channel. The system was demonstrated last fall in San Diego at the NAB Radio Convention.
A topic worth discussion is technical infrastructure. The studio facility will need to be upgraded to surround. But in this evolving iMedia world, adding distinct 5.1 audio is not the challenge that FM faced when it rolled out stereo in 1961.
Here’s a point to ponder: When stereo records were first introduced, the only available content was whatever was recorded at that time in stereo, as all older material only existed in monaural. It is much easier with 5.1 audio, as multi-track master recordings of older material already exist and can be remixed for surround. Imagine what it would have been like at the inception of stereo, if there were archived 2-track recordings of the Glenn Miller Orchestra, or other famous artists from that era.
While at first blush it would be understandable to think that there is a need to triple the audio channels around a facility with more cabling, switching, and routing. Actually, adding multichannel audio is as easy as CAT5. Look around, the world today is migrating to network based audio distribution via high-speed networks. Adding more channels to a network based router and cabling installation is done mostly by changing the software of the system, at very little incremental cost. The same holds true for delivery systems. Modern consoles use the surface+engine configuration, so existing surfaces might well be connected to upgraded engines. For more information on an innovative networkable solution, check out: www.axiaaudio.com.
Regarding the multipaired cables and AES configuration: Computer networks are taking over these obsolete technologies - and with an Ethernet networked studio approach, the incremental costs to move from stereo to discrete surround are near zero. The majority of studios on-air today are still analog and need to be upgraded to digital anyway, so the surround capability comes along for the ride.
A unique aspect to this transmission method is the ability to employ existing 2.0 style audio processing. Since the main audio path remains in a two channel format, conventional audio processors can manage all processing requirements.
Not Your Dad's Surround…(MATRIXING)
It must be stated that this coded-discrete system offers distinct surround sound. With the exception of the implementation from Coding Technologies, the other designed systems are matrix based or they contain multiple drawbacks that can compromise and degrade the 5.1 multichannel audio, as well as the existing stereo and mono mixes. Consider that the FM-Stereo system in place today offers discrete 2-channel audio with separation that theoretically approaches 70dB. It’s extremely doubtful that our industry would have accepted a broadcast system for FM-Stereo that utilized synthesized 2-channel duophonic sound, and passed that off as stereo. This is what the other proponents wish to do.
The matrixed methods synthesize and fake the 5.1 audio channels. They do this by manipulating the original stereo mix to create the surround effect. In doing so, this technique also alters the original stereo mix so that now BOTH the stereo and surround signalsare in effect spatially distorted. This type of approach is not the answer, or solution, to boost radio listening forward. Basically these other systems have renewed those old quad concepts from the 1970’s and repackaged them as digital in hopes of banking on old tech.
The method described herein is innovative, and it totally preserves spatiality of BOTH the stereo and 5.1 audio mixes. Radio needs the real thing, not synthesized, not matrixed, not compromised. For surround on radio to be respected and to successfully compete with other media, it must be state-of-the-art performance.
Now it’s worth pointing out a key reason why the matrix systems all contain a flaw, which spatially distorts the audio. Please understand, the word distort is used here in a different context than normally associated with audio. It refers to the loss of spatiality in an audio environment, as compared to the edgy, rough sound that distortion of a harmonic nature (THD) exhibits.
The area of concern with the matrix systems is the loss of separation in the spatial-axial patterns between the Left-Front/Right-Rear and the Right-Front/Left-Rear channels. Audio signals along these two axis will tend to bleed into one another. Figure-4, offers an illustration.
The arrows correlate to the paths of perceived multichannel artifacts. They are heard as false spatial cues and lost separation. Maybe you remember the quadraphonic systems from the 70’s that had a brief and unsuccessful run at a few radio stations? Don’t confuse this modern multichannel perceptual approach with those – or any of the current descendents that are around. Matrix methods have a critical drawback in that only fixed-scale downmixes are possible, so stereo compatibility suffers. This is one reason the 70’s era matrix systems didn’t catch on – they had a weird soft and indistinct quality in stereo. Clearly, this is an important issue for broadcasters. With most people listening in stereo, we can’t afford to compromise our fundamental service. And that is why the FhG approach is so well suited to radio broadcast: the system does not depend upon any specific downmix procedure to work. Indeed, the downmixing process can be thought of as a component outside of the basic spatial coding system.
Another problem with matrix schemes is poor surround separation. Matrix systems must mingle everything into a 2-channel signal, which is a crippling constraint on performance. They can have only a few dB separation between some of the channel pairs. (Which channels get the separation and which don’t are design compromises. Each system deals with this differently.) Because FhG’s spatial encoding uses an independent digital side-channel and a modern perceptual approach to spatial cue encoding, it offers very high separation that is not dependent on the nature of the audio or that needs to be compromised for stereo compatibility.
By the way, beware of demonstrations using material in one or two channels at a time. These are deceptive because a steering circuit – a gain processor in something like a noise-gate configuration combined with an operation that dynamically varies the matrix coefficients – detects this very directional condition and steers the strongest signal into the target channel, while reducing gain or providing some kind of cancellation in the other channels. (This approach is also a leftover from the 70’s, having first been used in the Tate and Vario-Matrix “logic” schemes.) With normal non-pingpong programming, which has material present in all the channels simultaneously, the separation is dependent upon the underlying matrix scheme and is much poorer than the demonstrations suggest.
Coded-Discrete VS. Watermarking
It must be stated that alternative surround methods which employ watermarking will not offer much additional benefit than the matrix systems. Reason being, is that a watermark function can not contain the needed data payload to properly manage all of the audio channels over the entire spectrum. There will be aural compromises to this scheme, especially in separation and placement.
Psychoacoustic research has shown that acceptable surround is created when the ancillary data payload is set between 10kbps – 20kbps. If the watermarked system is embedding data payload at this level, the question arises as to how much of the data payload will pass through the audio codec used in the HD Radio system. The rate that is considered robust in the context of anti-piracy watermarking is 5-10 bits per second. Experts say that around 100 bits per second would be pretty much the limit in order to withstand passage through usual codecs. There seems to be a huge disconnect here with regards to the capabilities of the watermark. If a watermark is only capable of 100 bits per second, yet surround requires more than 10kbps, something does not add up correctly!
Watermark Application Concerns
With a watermarked implementation, what would happen if two pre-coded sources (stored on a delivery system, for e.g.) were to be cross-mixed on-air? During the overlap time, wouldn't the watermark become corrupted and the received multichannels sound quite strange, or collapse to stereo? Has cross-mixing been demonstrated? How would a surround or panned mic be added to a mix for voice-overs? Certainly crossmixing and announcer voice-overs are routine in normal radio programming.
2. Fixed-Downmix To Create Stereo
With a watermarked system, stereo is always derived (downmixed) from the 5.1 multichannels. If this is a satisfactory procedure, why don't DVD-Audio and SACD disks use the same approach? They could save a lot of bits and trouble by providing only the surround mix and letting stereo players do a mechanical downmix. But they never do, instead providing listeners with human-optimized mixes for each mode.
When a fixed-downmix is employed for stereo, it’s not a guarantee that the mix levels will result in the same aesthetic texture as a human optimized mix. This may result in poor sounding stereo, loss of depth or sound stage, and destruction to the mono sum. This is very critical as it would lend itself to inconsistent sound for those listening in stereo or mono.
Thus far the test broadcasts with the watermarked system were with live concerts that were produced in surround. So, what reference was there to know of "no surprises in the stereo mix," since there was no stereo original for comparison? A fair evaluation would be to test with DVD-A or SACD music as source, so the listener can evaluate carefully and accurately if the stereo/mono is OK. This is going to be critical to acceptance of a broadcast surround system since weird sounding stereo is certainly going to trigger protests from program directors, listeners, and owners. Figure-5 is an example of a fixed-downmix method.
If you refer back to Figure-1 of this paper, you’ll notice that embodiment uses a variation of this known as a compatible downmix. Note this discussion also mentions that this method is not preferable for the surround encoder. It’s for the same reasons described above that the compatible downmix is not employed. This alternative method is available when multichannel and stereo versions of content will not be the same. In that case, the compatible downmix can be used to create the surround signal.
The ISO/MPEG audio group has noted the recent advances of surround and its market potential. They have started a new work item with the working title Spatial Audio Coding. FhG submitted their spatial approach to MPEG for consideration and testing, and chances are good that it or some variation will eventually be approved as an international standard. Thus there will be the usual advantages of MPEG: an independent confirmation of performance, and assurance of fair and equal access to licensing. It is the suggestion of this writer that the NRSC Committee consider an MPEG like study to determine the performance and preference for a single FM-Surround transmission system.
Speaking of tests, why hasn't the watermarking proponent submitted their tech to the scrutiny of the unbiased MPEG testing that has been ongoing? At what point will they offer an honest description of their system so it can be evaluated on a reasonable basis?
Adopt A Single 5.1 Method
Can you say “AM-Stereo!” The iBiquity Digital Corporation, the creator of the HD Radio system, needs to adopt a standardized surround transmission system. Broadcasting can not afford a replay of the AM-Stereo fiasco. The marketplace decision did not work, and the technology failed on account of it! When television adopted stereo transmission, a single system was chosen and that aided in the successful rollout of TVStereo.
As stated earlier, the FhG surround technology is capable of faithfully reproducing the sound field without degrading both the surround effect and the conventional stereo and mono signals. Ideally iBiquity would select one single system, to hasten the acceptance of this exciting tech. There are other proposed methods out there and iBiquity has been reluctant to endorse a particular system out of risk in offending the others, but the fact is that ALL of them offer degrading performance to both the surround and stereo performance. This writer believes that iBiquity must get off the fence in order to launch surround on FM with the best tech possible.
Receiver manufacturers also need to choose a single reception method for their systems. Once the record labels and broadcasters are on-board, then the receiver folks will follow. It stands to reason that they will sell more speakers, amplifiers, and radios…a victory all the way around for everyone.
The FhG/Agere system will appeal to manufacturers because MPEG standardization means that the tech will be universally available to all manufacturers at a reasonable cost. One of the reasons MP3 has grown so fast is that it is an open standard available to all.
Where The Rubber Meets The Road!
Consideration needs to be given to the following: Should broadcasters adopt a watermarked system that has had limited on-air testing, little or no disclosure of technology, no comparative evaluation of performance, a single vendor source, and troublesome claims?
Or, should they support the MPEG system, described herein, that has been carefully tested in a controlled scientific fashion with a wide variety of source audio material. Its developers include Fraunhofer Laboratory (inventors of MP3 and MPEG AAC), Agere (former Bell Labs and Lucent researchers), Coding Technologies (inventors of the "plus" enhancements to MP3 and AAC and the HD Radio codec), and Philips (co-inventor of MPEG Layer 2 and a leading consumer electronics firm). More testing is forthcoming as the best ideas continue to be merged from each contributor.
The FhG/Agere technology approach has been published in a number of AES and other papers so that researchers have been able to evaluate claims and build upon each others work. It is assumed that on-air tests with normal programming are a necessary part of any evaluation process, and expect those to start within the next months. Radio broadcasting is important enough to deserve this care.
Exciting and Compelling!
Multichannel 5.1 surround creates an impressive theater of the mind – something that must be heard to truly appreciate. Imagine turning a Production Director loose with the power of additional audio channels on station liners, sweeps, and promos. Even commercials will sound exciting. Just think about all the possibilities with a multi-person morning show. The use of the surround channels offers endless creative possibilities that will stimulate live on-the-air bits, and morning show routines!
And music…Have you heard any of the DVD-Audio or SACD discs? Those WILL take your breath away! The re-release of many classic albums has brought new light, appreciation, and enjoyment by hearing them presented in an environment that actually draws you into the sonic experience. To name a few: Steely Dan’s Gaucho, Elton John’s Goodbye Yellow Brick Road, The Who’s Tommy, REM’s Automatic For The People, Roxy Music’s Avalon, and Fleetwood Mac’s Rumours are a small sampling of discs that will leave you not only wanting for more, but making a trip to the local audio store to outfit your living room in 5.1.
Getting this music on the air will make exciting radio. Are you aware that it’s not possible to buy a 2-channel stereo receiver anymore? Think not, walk into your local Best Buy or Circuit City and try to find one. All they’ll have are stereo boom boxes, as everything else is multichannel. Audio stores say that 90% of their customers ask for multichannel sound equipment by name. That’s compelling onto itself. The 5.1 surround audio that accompanies DVD movies and videos has conditioned early adopters to a multichannel world, and this is rapidly spreading to the mass audience. How many times have you noticed the crawl that’s posted at the beginning of primetime TV shows, or movies that says, “This program is broadcast in 5.1 Surround Sound.”? It has grown considerably. Most video and computer games now offer surround sound as well. Realize that this last statement refers to a whole generation of young people who now consider multichannel audio standard, just as this 1956 model year writer considered stereo as a standard for so very long!
Acura, Cadillac, Volvo, Mercedes, and Lincoln have already announced 5.1 surround with DVDAudio/ SACD players in their up-scale 2005 models. As happened with FM Stereo, this will work its way down to all automobile models. Hyundai is going to offer 5.1 in an upcoming model as well. Thus bringing the availability to the masses that much sooner. The auto industry is moving in this direction because consumers want it.
Radio broadcasters MUST migrate into the surround world, or they will get left behind. Remember how that AM became a stepchild once FM stereo was universally accepted in the late 1970’s. All of terrestrial radio is now at risk due to the advancement of technology because the consumer has more exciting alternatives to listen to, and many involve surround sound. We must advance FM radio if we want it to remain interesting to consumers. Program content needs to be compelling as well, that’s a given. But now we’ve got a technical reason to get excited about radio again, and it will inspire new and compelling programming – just as FM stereo did when it was a fresh technology.
The WOW Factor
We, as an industry, need to adopt the following mindset: Create enough of a WOW factor in the mind of the consumer, that it compels them to purchase a digital radio. The HDAM system offers a wow when comparing the HDAM signal to conventional AM audio. Adding distinct 5.1 audio is the killer app that puts the wow factor into the HDFM system. This is what it will take to motivate the average consumer towards HD Radio®. This will be their vehicle to hearing exciting radio once again. Just flipping the HD Radio® switch to ON will not get it done alone. Consumers already have many more exciting alternatives to choose from. Adopting a distinct 5.1 multichannel system for FM helps level the playing field, and creates the opportunity to win back lost listeners.
If you’re convinced, then what’s next? When can we crank out this cool excitement? For this to happen, only a few key sectors need to hear and act on this: the record labels, broadcast executives, iBiquity Digital Corporation, and the receiver manufacturers.
5.1 CONTENT NEEDED
The record labels must provide more 5.1 content. This shouldn’t be hard. There’s already a lot of surround available, and with the incentive of radio’s promotion capabilities, all new releases should be in the surround format, as well as stereo. Just think about the vaults that are filled with multitrack master tapes of classic recordings that can be remixed into 5.1 and re-released again. The artists and record labels stand to make millions on the re-issues alone. The record labels win, as they have a new revenue source from material they already have, and right now they probably are not even aware of the millions they are sitting on. This is the same thing that happened when the compact disc evolved. This creates a general excitement involving a new music format that will draw people back to record stores. DVD-Audio and SACD multichannel are available for consumers now, but record labels need radio to help them promote these new disks. This is a no-brainer. As Nike Corp would say, “Just Do It!”
At the time of this writing, discussion has already opened with syndication firms about creating 5.1 libraries that comprise the top 1000 titles in each radio format. This would immediately help jump-start the ability to launch 5.1 programming, while the labels get online with new and re-releases.
BROADCASTERS TO-DO LIST
Radio broadcasters need to perform two significant functions: Adopt this tech by installing it, and then promote it! Remember how many station ID’s used to say something like “101, WMMS-Cleveland, in FM-STEREO!” That was how radio subliminally conditioned us to “stereo.” Well, time to re-enact that discipline again: “100.7, WMMS-Cleveland, in FM-SURROUND.” Also, radio can easily tie in with audio stores to promote surround sound. Live remotes from audio outlets, radio give-aways, along with advertising to help tell the story and raise top-of-mind awareness in consumers. After all, the consumers are broadcasting’s ultimate customers.
Recently, a Vice-President of Technical & Capital Management with a major broadcast company, was quoted regarding 5.1 for radio, “The biggest breakthrough will be 5.1 surround sound using IBOC or similar digital technology. To compete with new methods of delivery, especially the ubiquitous DVD, I believe 5.1 will be key to radio remaining competitive, both in the home and in the car. Consumers have grown to expect this level of quality.” A prime example of someone who not only gets it, but can see how this tech is needed in order for radio to compete and survive.
Making it Fun Again!
Think about it, WE as an industry can actually inject life back into radio. Make it fun and exciting again! Consider that line from the wonderful movie Field of Dreams: “If you build it, they will come.” We need to build this, so that they’ll come...back! We’re losing listeners to many alternatives, a trend that will continue if we don’t act. We have a chance NOW to breathe new life into our medium. Finally, a killer app and a compelling reason for listeners to buy digital radio receivers, and a new reason to listen to radio again.
In closing, broadcasting needs to evolve with the changing world, instead of maintaining the status quo. Both XM and Sirius have the gear in place to switch on 5.1 surround today. Also, the tech is available today to stream 5.1 surround via the net at competitive bitrates. It's a matter of the radios becoming available, and that's in process too. Even if we start off multichannel transmission using tricked up methods for the surround, it will eventually get picked up by the consumers, just as it did with stereo in the 1960’s. There's also development going on that will enable the iPod world of having surround in their players and in headphones.
Using a famous Scott Shannon phrase from the Z-100 Morning Zoo, “If it’s too loud, you’re too old!” Well, we need to inject life back into radio. Adopting distinct 5.1 audio is just the right dosage of audio channels to excite the patient. If we follow this suggested path, it’s quite possible that radio listener’s will remember another great slogan from Z-100…”Lock It In, And Rip The Knob Off!”
HD Radio is a registered trademark of iBiquity Digital Corporation.
Ing. Wolfgang Fiesel
Group Manager Audio Systems
Multimedia Realtime Systems
Steve Church, President