Andrew Scheps: “Lost in Translation: Audio Quality in Streaming Media” | Talks at Google

for coming out today. Again, my name’s
Deston Bennett. I’m with the Grammy producers
and engineers wing. The Grammy Awards, as many of
you may know, are the only music awards that are peer
determined, meaning it’s not the public that votes. Those who vote are members of
the Recording Academy, and who are hands-on music creators– artists, songwriters,
musicians, producers, and engineers. From the very beginning at our
founding in 1956, the basis for the Grammy Awards process
has been a commitment to excellence. The Recording Academy’s original
credo clearly states that the awards are not about
sales, and they’re not about popularity. Musical excellence in all areas
is the only criteria Grammy voters are charged with
to determine who gets nominated, and what will win. Knowing that little bit of
information should help you whenever there’s some
controversy about the Grammys, as there sometimes can be. For example, the year jazz
bassist Esperanza Spalding won the Best New Artist category
against fellow nominees Drake, Florence and the Machine,
Mumford and Sons, and Justin Bieber. She won because the majority
of the Grammy voters were familiar with Esperanza and her
work, and they saw her as a stand-out that year. It’s pretty cool when
you think about it. Many of you who are musicians
or audio producers or engineers may be eligible to
be members of the Academy. And I’m happy to speak to you
about that after this is over if you like. You can also get some more
information or join by visiting Today specifically, we’re here
to talk about excellence in sound, something that’s key
to great recordings. The P&E wing has partnered with
the Consumer Electronics Association and others on an
initiative we call Quality Sound Matters. We represent people who truly
understand the difference good sound makes, and we want to
share their enthusiasm and excitement about quality
with everybody. Today, we have a very cool
presentation from Grammy-winning engineer that
we think you’ll enjoy. And I want to give a big thank
you to Neil Annala and Joe Rosenberg for bringing
us here today. We’d also like to thank JBL
and Prism Sound for this amazing sound system you’re
going to hear today, too. The speakers in particular, they
encompass some very new, exciting technology that
you’re amongst the first to hear. And to top it off, I really want
to introduce an amazing engineer producer who’s worked
with artists including Metallica, Lincoln Park,
Green Day, and U2, along with last week’s– well, not last week, but a
recent number one album, the Black Sabbath project. He’s a two-time Grammy winner
for his work on the Red Hot Chili Peppers’ “Stadium
Arcadium” project, as well as Adele’s “21” album. He has another interesting
honor. In 2012, he was named the
International Engineer of the Year by England’s Music
Producers Guild. Please welcome Andrew Scheps. [APPLAUSE] ANDREW SCHEPS: First of all,
thanks for coming. This is as full a house as we
could have in here, I think. So thank you so much, and thanks
again to Neil and Joe for putting this together. This is awesome. So later on, we’re going to
listen to a bunch of stuff, which is the point
of what I do. And the Recording Academy has
been really great about sponsoring me to do this talk
all over the country. The idea of the talk, it was
originally put together for what the Recording Academy
called their Grammy Future Now conference, which was sort a
mini, one-day TED conference for producers and engineers, for
people who make music in Los Angeles. And since then, I’ve gone around
the country, and most the time I give this
presentation to producers and engineers. And it’s because there’s a
lot of information in the presentation that, as people who
make records, we sort of kind of know, but we don’t
actually know. And so I’m trying to put numbers
and facts behind the things we think we know so that
when we listen, you can actually compare things and
you know what it is you’re listening to, and why there
might be differences and things like that. So I’ll start the way I usually
start by asking how many people in this room are
artists and make records, or have ever released a record. So still good number of you. So if you’ve released a record,
how many of you have then gone and bought your record
to make sure that what comes off the services sounds
like what you sent them? So that’s about normal. About a third of the
hands, maybe less. And that’s exactly the same
with people who do that as their day job– or night
job, depending on the hours your keep. It’s not something people
really think about. They finish their record, they
master it, like oh, it’s done. Send it off, and you’re done. And of course, now with all
the digital services– and we’ll get into lots
of them specifically– and there are a few that happen
to be housed in this building or down
in San Bruno– there are lots and lots of
different ways that music gets out into the world. And so, the idea is to give some
context to know, what are all these possibilities? How do they compare? And do they actually impact the
consumer’s experience when they listen to your music? So that’s the idea. Now along the way, I can usually
get away with a lot of sort of vagaries, because
I’m talking to producers and engineers. All right, this is the question
just for you guys. How many people in this room
know more about digital audio theory than me? There’s going to be– come on. It’s everybody in the room. But seriously, how many people
work directly with digital audio in the room? OK, I’m going to be vague. I’m going to be slightly
inaccurate, and I would welcome corrections
along the way. So I’ve done the presentation,
I think, 12 times now. 11 of those times were for
producers and engineers, and once was a few weeks ago, which
is when Neil came and saw me at Fantasy Studios
in Berkeley. And that was for a room full
of people from the tech community, including people from
Google and YouTube and Google Play music, as well as
SoundCloud and Apple and Rhapsody and Arteo, and a
few other companies, and Fraunhofer, who developed
the MP3 and ADC codecs. And I got my butt kicked. And I’m fine with that. I would love to get my butt
kicked, because every time I give this presentation, I know
more, and I can kick butt back a little bit, which
is the point. And I think what happens is
people get into their little rabbit holes on what
they work on. So I make records, and I want
to make great sounding records, but I don’t want to
follow it through the food chain down to the consumer,
because that’s not what I do. Now two years ago, I started
my own record label, so now that became part of what I do. And I used to think, I’m going
to start a label because the labels suck. They don’t know what
they’re doing. It turns out I don’t know what
I’m doing, and it’s really, really difficult, and
there’s a lot to it. So every part of this process
of getting music into the hands of people who listen to it
is unbelievably difficult, incredibly technical, and
fraught with peril for the audio along the way. So we’ll talk about some
of the specifics. So what I want to do first,
though, is put recording into perspective, OK? So for thousands and thousands
of years– and now we start my very fine PowerPoint– there has been music. OK, for who knows how many? Let’s say 10,000? Is that a good number? This is where I get vague,
and everybody in the room backs me up. So we’re going to say 10,000
years, there’s been music in the form of songs that have
been written by somebody. And then, they would perform
their song for somebody in their village or something
like that. And the only way music could
propagate would be either they would go to the next village, or
they would teach their song to somebody and then they would
go to the next village, or people from the next village
would come hear them and go back. Right? So there’s your music industry
for the first 9,900 years. Fair enough? OK, about 100 years ago–
a little bit more– but basically, about 100 years
ago, there started to be consumer recordings of audio. And there were a few things
before this, but let’s say the wax cylinder was the first
viable format. So you have the Edison cylinder
where people would come into a room. They would make lots of noise. That noise gets collected
by a horn. It would get scratched
on to this disc that’s spinning around. And then, you could take
that disc, and go play it back elsewhere. So all of a sudden,
you have created what is called recording. Recording, especially back then,
was technically just at a delay process, right? So you perform the music, and
then you capture it for a second, and then you can
carry it around. And then later on at any point,
you can play it back. So now, you can get rid of some
of the space and time constraints of everybody
come to your concert. Now you can record your concert
and send it out. Now this caused a huge uproar. And in researching this for this
presentation, and also I teach a recording class where I
try to give a little bit of a history, there’s some amazing
quotes from John Philip Sousa and people like
that about how recording was going to destroy not
music, but society. Destroy it. You have to be in the room
with the musician. So I think we’ve all kind
of gotten over that. I mean, I would hope everybody
here enjoys going to concerts and things like that. But we’ve gotten over the fact
that we’re going to completely destroy society. Music isn’t the only thing
that’s destroyed. And it’s just one
of many things. OK, so that’s 100 years. That’s it. Then about 50 years ago, mainly
with technology out of Germany in the ’40s, and then
also some techniques developed bouncing from one tape machine
to another tape machine, you started to be able to not just
capture a live performance, but you had what we call
overdubs, which is basically, you make a recording, and then
you record some more stuff to go with it. So now, you can record at
different times, and all of those things up to
make a recording. A lot of the early Beatles
recordings were examples of bouncing. They would record the band, then
they would play the band back while recording something
else, combine those together. So that was a technique. The German tape machines allowed
you to actually have multiple tracks that
side by side. So you record on a couple of
tracks, then you record on another track. So things we’re sort
of familiar with. But that basically ’40s,
into the ’50s– but even in the ’50s, most
commercial recordings were live recordings, to mono or
possibly starting to get into three-track tape, but eventually
going to be mono going out into the world. But once you start having these
multi-track tapes, then you have to mix those
things together. So this created something
in the music industry that didn’t exist. It used to be there were only
recording engineers who captured things. Now all of a sudden, you needed
people who could take all the stuff that was captured,
combine it together, and make it something that
could go off into the world to be heard. So that’s the mix of
the recording of the song with overdubs. And then, once you actually
had consumer formats– whether it was the cylinders or
onto LPs or 45’s or 78’s or cassettes or eight-track
tapes, up into CDs– you needed to have some sort
of standard as to how the music would have to be put onto
these media to then be distributed. So you would get your mastered
mix of the recording of a song with overdubs. Now in a room full of engineers,
that kills. That’s really funny, because
it’s a font joke. Mastering makes things loud. That’s the idea so– all right. I’m sorry. It’s the wrong crowd. OK, so this is now what
the artist sends off into the world, OK? This is what a record is. But it’s much more
than that, right? Music, pre-recording, was
nothing more than art. There was some commerce
involved, but it was basically art. It was musicians and composers
who would have a piece of themselves that they would want
to capture, and then let other people here it, and
recreate the emotion they were trying to create when they
performed the music live. So let’s say that really, the
recording is more like this. And you don’t have to read it. The point is I needed to get a
lot of text on the screen for later on for one of my very
clever, inaccurate analogies. The idea being that we need to
keep in mind that this is art, and this is the difference
between looking at an art book and going to a museum, OK? There are differences. And the idea of live performance
versus recording is one stage of this
difference. But there’s also a huge
difference depending on how that recording gets to you
at the end of the day. And when we actually get to the
listening portion, I think someone once said it’s stuff
that you can’t unhear. You’ll hear the difference
between some of these file formats and bit rates and things
like that, and you’ll decide for yourself whether
it makes a difference. My theory is I think it does. OK, so now we’re going to
go through part of the presentation, which is a little
more technical, which means it’s a little dumbed
down for most of the people in this room. But there are a couple
important things. So the first thing is,
the difference between sound and audio. And I’m sure most people in this
room know this, but the idea that’s important is that
all sound is analog, period. An analog meaning infinitely
variable, OK? Until you get down to the
molecular quantum level, any sound in the air is infinitely
variable acoustic pressure waves that travel around
the room, right? Everybody cool with that? Now, you can buy a digital
microphone or a digital pair of headphones, and that isn’t
actually what they are. They are analog microphones
and analog headphones that happen to have converters
built into them. So they are two things in one. But they are an analog device. There’s no such thing as
a digital microphone. The only way you can record
something is to put something in the air in the way of the
pressure wave so it moves because of the pressure wave,
and then using lots of different technologies for how
you design your microphone. You turn that into a
voltage is the most common way to do it. Then, you can digitize
the voltage, OK? So this would be the
simplest sound. It’s a sine wave. It’s information at only
one frequency. But the idea is while it’s a
sound wave, you zoom in, you zoom in, you zoom in. It never pixelates, right? It’s smooth all the way down. So the idea of digitizing– and this is where, feel free
to take a nap or something real quick. So obviously with digital
systems, you don’t have the luxury of looking at something
infinitely many times a second, right? You have to have a clock. You have to decide how many
times you’re going to look. So for the producers and
engineers I talk to, this is actually really helpful. I know it’s very simplistic,
but it’s just the easiest visual representation
of what sampling is. So the idea is, time
across the bottom, voltage up and down. And every time there’s
a vertical line, that’s a sample. So how many times a second–
let’s say that’s a second, and then we count the number lines,
and that’s how many times a second we’re
looking at it. And each time we look, we say,
how big’s the voltage? And we write it down
using a number. And how many bits we get to
write down that number are our horizontal lines. Everybody’s good with
that, right? So the idea being that if you
look at this particular grid superimposed on the sine wave,
we almost never go directly through an intersection. So we are always wrong. We are always rounding. And obviously, anyone in the
room who really knows digital theory knows that that’s OK. There’s one quantization error,
but you make up for it, and you can reconstruct
things quite well. You’ll also know this sample
rate is way higher than we actually need to capture
this sine wave. You only need just over
two samples per cycle, and your good. So that’s fine. And I’m not saying that this is
not a good sample for this particular sine wave. But as a visual representation, it’s important. The idea being, though, if we
want to be more accurate, we can do two things– we can up the sample rate, and
we came up the bit depth. So now, this is sort of the
aha moment for a lot of engineers who’ve got little
pop-up menus for sample rates and bit depths, and they don’t
actually know what they do other than bigger is better,
So I’ll record more stuff. Now there are diminishing
returns. In terms of actually building
audio hardware, it’s very hard to build something that will
work equally well at every single sample rate. And I do lots of listening tests
for just my studio for making records, and I found
that there’s a lot of gear that works great at 96
kilohertz, and up at 192, it doesn’t really work so well,
because some things are getting stressed, and it’s just
not optimized for it. So it’s not always that higher
sample rate is better. But in a perfect system, a
higher sample rate will be more accurate more
of the time. Right? I mean, I think that’s
fair enough to say. And the same thing
with bit depth. And in some ways, bit
depth is more important than sample rate. Now the other thing is you could
very easily make the theoretical argument that 44.1
kilohertz is fine, because human hearing goes up to
around 20 kilohertz? And I know everyone probably
already knows this, but basically, take your sample
rate, divide it by 2. That’s the highest frequency
you can capture at that sample rate. Fair enough? So 44.1, you get
down to 22.05. Wow. 22.05. There you go. Sorry. My math just went
out the window. But the problem is, to make that
work, you need a perfect filter that cuts off everything
above that frequency, but doesn’t touch
anything below it, right? That filter cannot be built. It doesn’t exist, especially
as an analog filter. So this is part of why higher
sample rates are really important for capturing
things– to get an accurate picture at
20k, you kind of need to leave it alone out to 40 or 48,
something like that. So if you start working at 96,
and you can either use very gentle analog filters or you
can start getting into over-sampling and digital
filters, but you can do things way past where we hear that
are brutal, and they don’t affect what goes on down
where we do hear. Now there are also people who
argue that we respond to frequencies above 20k. We’re not getting into that. We’re not getting into, we
should be tuning everything to 436 instead of 440. There are lots of holistic
arguments about lots of things, and I try and keep
things more real and in numbers, because then I don’t
have to argue about them for 12 hours, and not
get anywhere. So I try and keep it that way. So anyway, this is basically
what I try and impart about sampling, even though you
guys know most of this. So then we start talking
about the actual consumer formats, OK? Now there are two types of
digital audio files. Again, I’m sure you guys know
this, but there’s lossless audio and there is
lossy audio. OK, lossless audio is take a
PCM-encoded wave file at some sample rate and some bit depth,
and you keep all the numbers, period. That’s it. That’s all a loss is. It’s AIFF, WAV, used to be
Sound Designer, too. OK, so those are loss files. Now if you want to get into
the analog versus digital debate, they’re all
lossy, right? We’ve thrown away some
information. But we’re not there. Let’s say that our capture
is awesome. Let’s say we’re working
at 96k, 24-bit. We’ve got lots of information. If we keep all that information,
it’s a loss file. Lossy is– and again, I’m just going to go
through the presentation. You guys know all of
this already, which is why it’s so great. So lossy is the difference
between zipping a file, and using something where when you
unzip your 25-page paper you’ve just written, it’s
missing a bunch of letters and there’s stuff spelled wrong. And again, for a lot of
producers and engineers, they don’t actually understand
this concept. They assume that lossy
compression is still OK, because you end up with a PCM
audio stream at the other end. But it’s reconstructed, and
stuff is thrown away to actually make those files. And the reason being that if
you zip an audio file, you save maybe 20%. If you use FLAC, which is
optimized for audio, you can maybe save 50% of the space. But that’s it. So if you do some quick math,
and you’re looking at a CD, let’s say, which is at 44.1,
16-bit, you’re talking about 10 megabytes for every minute
of stereo music. Those are big files. You guys spend a lot of time
trying to get files from one place to another quickly
and efficiently. Those files are too big,
especially up to a few years ago with the data pipes
going to phones, all the mobile devices. There’s no way you’re going
to send that much audio. So this is why the lossy
codecs actually exist. So very briefly, Fraunhofer,
which is based up here, developed first the MP3 lossy
codec, and then more recently, the ADC codec. These are based upon
the way you hear. If you know anything about the
way your brain processes the information from your ears, your
ears have just got lots of hairs in it. And Julie will probably
talk more about this than I need to. But you basically are splitting
things into different frequencies. All of that information comes
up into your brain. Your brain then processes it,
and decides, I don’t need to listen to that, not going to pay
attention to that, I hate that, screw that– oh,
that’s important. And then that’s what you hear. So there’s lots and lots of
information that’s thrown away, which is why in a
crowded room, you can concentrate on a conversation
with somebody, because you start to mask things out. And the same is true when you’re
listening to music. There are lots of things
that can be masked out. So through a lot of research,
they decided, what can we throw away, right? The idea being that if we take
care of getting rid of some of this information, then all of a
sudden, we’re dealing with a much smaller file. And if you compare file sizes, a
decent bit rate MP3 is maybe 10% of the size of the
uncompressed audio file. Yet in some listening tests, you
might be able to actually do pretty well against the
file it was encoded from. OK, so this is where I get very
inaccurate, and people actually got mad at
me about this. But that’s OK, because I’m up
here and you’re back there, and you’d have to jump
over the screens. So this is the way I explain
lossy encoding to people. So if we go back to our
paragraph of lots and lots of text, if I take out some of
the vowels, everybody can still read this just as fast
as they used to, right? The idea is your brain is
predicting what should be there as much as it’s
taking the input of what actually is there. So if we look at the word
“mastered” in that first line, as soon as you get to the M and
you see the “stered” after it, your brain has decided
there’s probably an A there. There’s room for an A there. There’s an A there. It fills in the blank. If you have a tiny little smudge
on the page, your brain is all about it. That is an A. Absolutely. Whereas on its own, that
smudge is nothing. It’s a smudge. So that’s the basic idea, is
finding what can we throw away, and still be able to
read as fast we can? Or, listen and enjoy the music
without having to figure out what it was supposed
to sound like? So the idea being that if I only
take out those vowels, we don’t save a whole
lot of space. If I take out all the vowels,
now we’re really starting to save some space, and we can
compact it down, but I can no longer read this, OK? So somewhere is a threshold. The problem is when you’re
reading, you have very discrete chunks of data. You either know what that
word is, or you don’t. Maybe you can fill in a word
from the context around it, but that’s kind of as
far as you can go. When you’re listening to music,
at some point it just sounds bad, and you don’t
really want to listen to it anymore. Sometimes it sounds so bad that
it’s kind of crazy and it sounds like it’s under
water, and more like whales than music. But until you get to that point,
it’s very hard to say, yeah, OK, we compressed too
much, because you could put someone in the room, and
especially if they know the song, they’ll fill in some
blanks on their own, and they’re like, yeah,
I like this song. It’s all good. So the problem with audio is you
go from this analog sine wave– which no matter how
far we zoom in, is still infinitely varying. We capture it, we compress
it, we send it off, we reconstruct, but we’re starting
to reconstruct something that’s a little
more stepped. Now again, this will get
smoothed out by things in both the circuitry and also by your
ears, so there are lots of things working to help you out
in reconstructing this waveform along the way. But you compress too much, and
then you start getting to things that start to not really
sound like sine waves, or they’ve got so many harmonics
on them that you don’t hear them as a
sine wave anymore. And at that point, you’re
listening to something different than what
you started with. And I think that is more akin
to someone who kind of sucks at art, copying paintings, and
selling it to you, and like, yeah, I’ll put that
on my wall. Now for $10 and I
can download it? Maybe that’s a trade-off
you’re willing to make. But in terms of taking the art
that this artist has made, and saying, this is my record, and I
love it, and it makes my mom cry, at some point you’re going
to send them such a low bit rate file that their mom’s
not going to cry anymore. And that’s a drag, because at
that point you’ve lost the point of the music, right? It’s art coming through
speakers. It’s emotion coming
through speakers. So what can we do as record
makers, and then what can we do as people who get that music
out into the world to help people listen to it? And the great part is, I would
assume that everybody in this room listens to music
recreationally. Let’s start with the hands of
people who don’t listen to music ever. OK, so not only are we in charge
of making this music and getting it out there, but we
also consume it, so we want to make products that we
actually like, which with a lot of things, people don’t
actually buy their own products, whereas this is sort
of the ultimate consumer product, because everybody’s
into it one way or another. So going back to the actual
consumer formats. Within the loss category,
you’ve really only got two choices. You have CDs, which are dying a
very quick death, which are set at 44.1 16-bit
audio, right? Then you’ve got what
is called high res. And this is a term that people
can argue about. All it means is anything better
than 44.1 16-bit, OK? So when the Beatles re-released
their catalog, I dunno, six years ago, something
like that, there was a version you could
buy on a USB stick which was 44.1 24-bit. That is high res audio, because
it’s higher than a CD. So that’s what the term means
out in the audio world. Now for me, I like to think of
high res being up at 96k or something like that. But in terms of consumer audio,
that’s what you get. Now in terms of buying high
res audio, there are very, very few options. There’s HDtracks, who will sell
you things to download, and there’s this crazy
Java file. OK, has anyone bought anything
from HDtracks in this room? So a few people. Is there anybody who thinks that
it’s so easy to download and play back this stuff that
everybody should be doing it? OK. Got a couple. So there are a lot of things
involved, and I’ll talk a little more about what
I have set up here to play this stuff back. It’s hard to get the high res
music, and it’s hard to play it back properly. It’s easy to play
it back wrong. Anybody can do that. Just throw it in iTunes or any
other music player, it’ll play back wrong, and you’re
all good. But you’re getting into
transcoding, and things that you don’t really want
to get into. But anyway, that’s what you’ve
got for the two viable sort of ways you can get
lossless audio. There are a couple others
that, once we start listening– excuse me– once you start
listening that I’ll actually show you, which are
kind of cool. There’s high res streaming
starting to happen, adaptive streaming. It’s really awesome. OK, then we get into the lossy
formats, and those files are basically MP3 and AAC,
which are the two Fraunhofer codecs– AAC having not necessarily
superseded MP3, but just coming after. I think Robert from Fraunhofer
would argue that it supersedes it. But obviously, there’s tons
of stuff still coming out on MP3 as you go. Depends how you encode
things like that. Then there’s ogg vorbis, which
other than Wikipedia, I don’t know much about it. Is it that it’s open source? OK, so it’s the open
source encoder. There you go. But of course, there are open
source MP3 decoders, which skirt Fraunhofer’s license. Because if you get the lame
encoder, you’re not paying them, either. So I don’t know. That’s vague. Yes? AUDIENCE: It’s totally
patent free, as well, but that’s debatable. ANDREW SCHEPS: The ogg vorbis? OK, so ogg vorbis is
patent-free, which I guess would be the main difference. Because if you can build
yourself an MP4 encoder that’s open source, you’re
getting around– anyway. Robert and I had a very long
conversation about this, and he was awesome. He was very, very
good about this. I thought he was going to kill
me, but he was great. OK, so if we actually start
looking at the services themselves, this is where for
the producers and engineers it’s a big, big deal, because
this is the stuff where they don’t necessarily understand
things. I mean, they understand, but
it’s the stuff you know but you don’t know. So the CD and high res are both,
I’m going to say, WAV. You can buy it is FLAC, but
that’s just compressed WAV. There are AAIFs and things
floating out there. But WAV is the most robust and
the most prolific form of uncompressed audio. Everything else is not WAV. OK, so it’s either– all right, first of all, who
here is from the Play Music? OK, I need an answer, because
I have scoured your website, and it says it plays up
to 320 kbps files. So what format, and what
does the up to mean? I can’t– is that
it’s scaled, so as you test bandwidth, do go– I’m going to guess,
128, 256, 320? AUDIENCE: 192, 256. ANDREW SCHEPS: OK. So three tiers topping
out at 320. OK, I couldn’t– and this is part of
the problem of looking for this stuff. And I don’t think– and you can correct
me if I’m wrong– I don’t think anyone is
intentionally being obscure about this. Maybe you are. Are you being intentionally? Are you obfuscating? I love that word. Maybe you are a little bit. OK. So– yes, sir? AUDIENCE: If people in front
can move maybe towards the back of the room, we’re
going to playing stuff out of those speakers. ANDREW SCHEPS: Yeah, that’s
going to hurt. I think what we can do actually
is, what’s going to happen is at about 10 to 5:00,
Julie’s going to speak, because she’s got
a presentation about what she’s doing. And we’re technically sort of
4:00 to 5:00, but we also have the room to 6:30. So what I’d love to do is we’ve
got 15 minutes, I’ll finish going blah blah blah. We can maybe do some questions
where you guys kick my ass. Can I say ass on this? It’s internal, right? You can kick my ass. And then, we’ll break
for Julie to speak. And then, we’ll do the
listening, and people who have to go can go, but then we can
also shove people into the middle of the room, because
you guys are going to get killed. I mean, I’m not going to have
it crazy loud, but still, you’re going to get killed
a little bit. OK, so finding all of
this information. OK, how many people
from YouTube? Do we have anyone? OK, so we got a couple. Finding out the information on
what happens with the audio on YouTube was not that difficult,
but it was also a little odd in that– so does
everybody in the room know why there are two bit rates, and
everybody in the room know when you get which one? OK, there you go. So here’s the problem. It’s tied to the video rate. There’s no setting that says,
give me good audio. There’s only the setting that
says, give me good video. So basically– and you can correct
me if I’m wrong– 720 and 1080 give you 384. Everything else gives
you 128, OK? Here’s the problem– a lot of people can’t afford to
make videos for every song on their record, and a lot of
people who buy records and then really like a song and want
to upload it to YouTube don’t make a video that’s
HD for that song. So you upload static art work,
or you upload lyrics, or you upload a picture of your
dog– or cats. Cats are the internet, right? So it’s kittens. But unless it’s awesome footage
of a kitten, nobody is going to switch to HD. Nobody. And it doesn’t default to HD, so
nobody here’s your music at 384, which is, in terms of pure
bit rate, the highest of the lossy formats available,
period, and nobody hears it. Yet from numbers I’ve seen,
and I’m sure my NDA won’t cover this because I haven’t
even signed one, but for numbers I’ve seen, 80% of music discovery happens on YouTube. Somebody says, hey, have you
heard vrr, and I go, I don’t know, let me search for it. And you put it in, and you
listen to it on YouTube. So 80% of the time, people are
being introduced to music with one of the lowest bit rates on
the board, when the highest rate on the board is actually
there, though not available for most of the videos,
because people aren’t bothering to upload HD video. And should be just
to finish up the YouTube thing right now? OK, and this is something
I’m hoping– I know I’m speaking with some
of you tomorrow, but I would love to get– my email address is my name,
[email protected] Hunt me down, find me,
because I’d love to discuss this stuff. Because another thing is going
through all of the YouTube documentation, there’s nothing
that I could find about audio upload guidelines. OK, so there are no audio
upload guidelines on the YouTube site. Zero. The problem is, of course, what
you’re ending up with are 128 and 384 AACs, but most of
the time, people are uploading lossily compressed audio. So you’re transcoding. Is there anybody in the room who
disagrees that transcoding is the worst sounding thing you
could ever do to a piece of audio between two
lossy format? Because we’ll fight later. OK, there are amazing sounding
lossy encoded files. 384 AAC, I would defy most
people to sit in a room, do double-blind test between 384
AACs properly encoded and CDs. I would defy anybody to not tell
the difference between 384 transcoded AAc that came
from any other lossy format. It sounds terrible. This is one of the things
we’re hoping to move forward with. So anyway, this is one
of the problems with comparing the services. But the big problem that a lot
of the people I speak to normally have is they don’t know
how to compare the 44.1 and the 256, and zero
consumers know how. 256 is way more than
44, right? I rest my case. But when you’re trying to
actually educate people about just what this is, you need to
come and sit in a room, and have me go blah blah blah,
and show you a chart. So the idea is that, again, as
with any scientific thing, you’ve got to look
at the units. And the kilohertz and bit depth
is totally different from kilobits per second. Now the cool thing is that all
of the lossy formats are actually very transparent
with their bit rate. OK, this is, again, where
I make records. I don’t work with computers
all the time. I’m rounding. There’s no 1024. The numbers are very round,
because it’s easy for us people to understand. All right, so basically, I
take your bit rate, I put three zeroes on the end, and
that’s how many bits per second I get to represent my
stereo piece of art that makes my mom cry. Then actually do the math– 44,100 times 16 times 2, and
we’re at 1.4 million on a CD. Now obviously, the codecs that
encode the lossy encoders are very smart. So it’s not like just take a
percentage, and that’s how much worse at sounds. I absolutely get that. But we’re talking at a very big
difference, and then you look at the 192 32, which is the
highest I’ve seen coming off of HD tracks. And you’re up to 12.2 million. OK, the problem being in the
grand scheme of things that that’s really not a whole lot
compared to the analog we started with. So again, we’re not going to go
the analog versus digital debate, but how many people
here like vinyl? How many people actually look
to see if the vinyl’s done from the analog masters instead
of digital remasters? Get some old Blue Note. Even just compare it to some
of the reissued Blue Note. And it’s kind of astonishing. It’s like your there. OK, so this is where we stop
talking about numbers. And now, I want to go through
this study very quickly. This is sort of an
older study. Because of course, the thing
is, does anybody care? If nobody cares, then we don’t
need to care, right? If this doesn’t make a
difference, and it’s all just a bunch of numbers,
I don’t care. The idea is I want people to
spend enough money on the music that I work on that the
artists I work with cannot take a day job so they can
keep making records. And I want to be able to afford
to keep making records, and not necessarily take a day
job, but if you’ve got something for me, we’ll talk. OK? That’s the idea. OK, we’re not all looking
to be on MTV Cribs, because we’re not. OK, but if people don’t care,
then by all means, make the files tiny, because then
everything else about the consumer experience
is awesome. Instant on, very fast, move it
from one place to another, fit 25 bazillion songs on anything
that fits in your pocket. That’s all good. OK, now Harman who were actually
nice enough to send up this pair of speakers we’re
going to listen to later, this study is from a little
while ago to be fair. But they decided, we need to
actually know if people care. Because they don’t care what the
outcome is, but they need to know the answer to that
question because they make equipment for people
to listen to music. That’s what they do. So they need to know, do
we need to be really concentrating on stuff that
plays back loss audio, or even high res audio? Or should we be building better
MP3 hardware decoders in, and just deal with that? Should we actually limit
the bandwidth? When we’re starting to talk
about wireless technology– I mean, if you look at Sonos
and RedNet and a lot of the really cool networked audio
and wireless audio technologies– where do we need to
cap our bandwidth? These people need to know what
people like, but they don’t actually have a horse in the
race, because they’re just going to build the gear
to play it back. So Dr. Sean Oliver, who works
there, who’s a pretty amazing guy, and he’s got labs
that have all kinds of stuff in them. They’ve got stuff that looks
like it’s out of an amusement park, so when you’re A-B-ing
speakers, they hydraulically move into the same place. You don’t have the differences
in placement when people change speakers and
things like that. So what he decided to do was
get young people, because there’s a lot of sort of
anecdotal evidence that young people not only don’t care– but this is the crazy one to me,
and if you know anything about neurology and cognitive
listening, it’s even crazier– but that kids these days have
only heard MP3s, so they actually prefer them. Again, if anyone wants to
discuss that later, I will talk about that for hours,
because that’s the rabbit hole I’ve been down for the
last two years. But I’ll just say that
that is pretty much categorically not true. So this study from a little
while ago was meant to prove this. So they got a bunch of young
kids these days, or in those days, both high school and
college age students. The only thing that’s really
important here– well, there are two things. One is that, for whatever
reason, they were mostly male students, as opposed to female
students, studying audio, which is kind of a drag
at all times. So that’s just the
way it works. The other thing is you see this
last column, this level of training– all this is is that these
students were involved in a recording program, or they had
taken a comparative listening class or a critical listening
class, or something like that. So they were aware of audio
quality as a thing, as opposed to just being someone off the
street who really has never, ever thought about it, OK? So that’s the break up. Here is what they did. And I– all it means is they knew what
they were doing, and it’s scientific. OK, so it’s true double-blind
listening. These kids don’t know what
they’re listening to. They come back multiple times,
and they listen. OK, now this is between 128k
MP3, which was what everybody was selling when they
did this story. And you think, my god, that’s
the Dark Ages, but it’s really, what, four years ago? Maybe five? Maybe five. That’s what you could buy. So between that and CD. So we’re not talking high
res HDtracks downloads. 70% of the time, those stupid
kids liked the CD. And this isn’t even a
what sounds better. This is a what do you
listening to? Which one do you want to hear? All right, the important part of
this is going back to this sort of threshold of where
does my mom cry, is what happens emotionally? So part of one of my theories
is, if you go back to that huge block of text, and you take
out a bunch of vowels, at some point it’s harder
work to read. So while you will still
understand the words, and enjoy the story maybe, you
will be less emotionally invested because you’re
doing stuff. The same thing is true, I
believe, when listening to lossy audio, because while your
brain might throw stuff away, it’s expecting it, and
your brain gets pissed when the stuff doesn’t show up. So you can create anxiety, you
can create depression at very low levels, but at the same
time, it’s also filling in the blanks for you, right? You’re taking away
lots of acoustic things from the music. That’s one of the first things
to go are reverb tails and acoustic cues. So your brain is recreating. Therefore, it becomes more of
an active process to listen. Now while that may not be that
much of an issue, one of the anecdotal things that really
sent me down this road is that my daughter had a friend in high
school who was interning with me in my studio. And great drummer, really
musical kid, listens to music all the time. And he showed up at the studio
in the afternoon to work on something, and he came in, and
he said, man, been listening to music all day and
I’m exhausted. And I don’t know how many people
that sounds absolutely crazy to, but that to me is
crazy, because I would wake up in the morning and put on
records or cassettes– even that I had recorded from
a microphone in front of a speaker, so not the highest
quality audio in the world– but I would listen for 15 hours,
and my parents would yell at me, and then I would
listen to headphones in bed for a while. Even recently, I’ve gone to
friends’ houses who have these amazing set-ups, and we listen
to vinyl all day. And as my wife can attest, I was
down at this guy’s house for 15 hours, and I got home at
1:30 in the morning and put on a record. I was not exhausted. When I listen to some of the
streaming audio services, though, I get tired. I get a headache. I grind my teeth. And it’s not an instantaneous
thing. It is not an, oh my god, that’s
killing me and making my ears bleed. But it is, in terms of a
long-term commitment, and I would also argue in terms of a
long-term connection between people who hear the music
and the artist. And one of the most important
things with artists is that people actually connect with
them on an artistic level. And that happens by them
experiencing some of the emotion that went
into the song. And it could be as simple as a
lyric, which means you’re in pretty good shape
no matter what. But it could be because of
the chord changes and the instrumentation and the
subtleties of the performance. And when we start listening, you
will, I believe, start to hear some kind of not
subtle differences. We put the B back in subtle with
some of the things that change when you listen back to
back between some of the lossily encoded music and
the lossless music. In terms of when you get to the
second verse of the song, do you feel like, musically,
I’ve already heard this, let’s move on? Or do you feel like, god, what’s
next in the story? And man, there’s a
new guitar part. And these are subtle things. So if you love an artist, then
it doesn’t really matter. You will love them even
if it sounds terrible. But what if it’s somewhere
in the middle? What if you’re kind
of on the fence? What if the audio quality
actually determines where your threshold moves as you’re
listening as to whether you’re going to listen to the next song
on that record, or even make it to the end of
the first song? And I know that part of people
not listening all the way through to songs and skipping
around all the time is just due to changes in consumer
habits, and we’re all multitasking more, and
things like that. But for the people here who
listen to vinyl, I think you may not always flip it to Side
B, but how often do you lift with the needle in the middle
of Side A– unless you’re DJing a party– because you’re just kind of
tired of it, and now I want to move on? You’ll generally have the
experience of Side A. So you’re getting 20 minutes
straight of something. When you’re just listening
online, that doesn’t happen so much. There’s a lot of skipping
around, and a lot of moving. But what I’ve got here– I went to a few of the
different labels. I’ve got 18 songs and a bunch of
different genres, and I’ll put up just a list of them. And you guys will DJ. And also, we can talk
about anything. If anyone has questions or want
to point out stuff I’ve got wrong, I absolutely want
that to happen, as well. And we can do that while we
listen, things like that. And I have them in as many
formats as I could possibly have them in, including– oh, we didn’t make
it to this slide. Sorry. Google Play Music, I’ve got
my playlist from you guys. So hopefully because I’m on your
ridiculously fast, free Wi-Fi, we’ll be getting 320
the whole time I’m sure. But also, then I want to show
you something called OraStream, which is
adaptive based on bandwidth, which is awesome. And we’ll talk about
other stuff. Roundabout. OK, really quickly. The way I’m playing the stuff
back is I’m using my Mac. I am playing out of a program
called Decibel, which is just a very, very simple
music player. And the only thing that it does
is it switches the sample rate of the hardware
to match the files. So that way, we’re not doing
any sample reconversion. In software on the way out of
the computer, we get it out to the converter at its
native sample rate. It also crashes a lot. It’s a $30 program. But it generally works. I’m using this Prism Orpheus,
which it’s a one-rack space eight-channel audio interface. So it’s amazing for recording,
but I’m using it because it gives me a volume knob
on the front. I’m just using it stereo
going out. The reason I’m using it, as
opposed to something a little more simple, is because some
of my source material is at 192, so I need a box that’ll go
up to 192 without putting something else in the middle. I’ve tried as hard as I can to
make sure that all of these different files are from
the exact same master. So the same– remember my font joke
from earlier? Sometimes, that happens multiple
times to a release. Roundabout is one
of the examples. Sorry, we will listen
really quickly. But I needed to say that
Roundabout is one of the examples of something where it
is actually from a different master, the high res version,
because it was from a DVD audio release from, I don’t
know, eight years ago– way back in the stone ages when
that was a format for about eight minutes. So that is actually a
different master. But still, it’s a pretty
astounding difference. Now I will also say– and we can stop this, but anyone
who was at the talk I gave at Berkeley knows that
at some point, Robert from Fraunhofer made me stop playing
things off YouTube because he said it’s unfair
because it’s all transcoded, and made it him look bad. And I said, OK, that’s fine, but
I wasn’t sure if everybody in the room kind of understood
what had just happened, that we just took the biggest player
in music discovery out of the discussion completely
because it wasn’t fair to the people who developed the codec
that encode the music that’s on this service. So I will play– and that said, I play official
videos if I can find them. But there aren’t always
official videos. So let’s listen to some
Yes, and would you like to pick a format? Do you want to go low to
high, high to low? AUDIENCE: High to low. ANDREW SCHEPS: High
to low, OK. So we’ll actually go down
through CDs, because you’ll hear a little bit of the
difference between the master. So this is the 96 24 taken off
the DVD-A, or whatever it was. AUDIENCE: Quick question
for you. Are you relying on the digital
analog in your Macbook? ANDREW SCHEPS: No, I’m going
FireWire to the Orpheus, and the Orpheus is the
D to A converter. And it’s a great sounding
converter. The Prism converters are– some people say that they’re the
best converters out there for music recording. In the UK, it’s almost
exclusively what’s used for all the orchestral
scoring guys. They’ll have 80 channels of
the Prism converters. And then, we’re just going
straight into an amplifier to these speakers. And that’s it. Yeah? AUDIENCE: What are you doing
to match levels? I’m fudging it. OK, so this is not a
scientific test. This is an anecdotal test. Unless I unplug the monitor,
which we can do as well, you’re going to know what
you’re listening to. So I’ll try and match levels
as best I can from up here, but it does vary a little bit. So I’ll always make the high res
stuff louder, because then you’ll like it better. AUDIENCE: How much power are
you using to drive the amplifiers? ANDREW SCHEPS: It says
it’s 4 by 350. So each speaker is bi-amp, so
we get 700 watts a side. So I’m barely cranking it. You let me know how
loud to go. And I apologize again. Yeah? AUDIENCE: [INAUDIBLE]
volume [INAUDIBLE] digital in this thing? ANDREW SCHEPS: In this? No, it’s actually an analog
control on the output, which is bizarre. That’s what they tell me. You can hook it up in lots
of different ways. There’s an audio
path within it. The way it is supposedly
hooked up is as analog. But if it is digital,
I have to be able to turn it up and down. I don’t have a choice. There have been times when I
actually had an analog control room section instead,
but it was a lot of gear to bring up here. So we’re going to use that. Again, everything is
going through that. Everything is constant except
the files themselves. AUDIENCE: Is it worth turning
off the air conditioning, or will that not matter because
of the volume? ANDREW SCHEPS: I think we’ll get
over the top of it, yeah. I mean, again, this
is not the most– here’s the crux of this. And I do want to get to
music for those of you who have to leave. But the crux of this is that
you could set up audio file double-blind A-B tests– A-B-X tests– and be really
precise about this, and see what you can tell the
difference of. But I think especially as
we jump from ends of the spectrum, it’s not subtle. It’s huge differences, and then
it’s a question about whether it matters to you. I mean, who cares if you can
hear the difference? If you like them both,
then fine. Then you’re good with
the small files. I’m not trying to evangelize one
particular type of file, or to convince anybody that you
have to listen this way, or you’re missing out
on the music. My theory is that once you get
to a certain point, you’re no longer kind of interfering in
the emotional response. But in terms of an audio file,
short burst listening test, this is more fun than anything
else, because it takes a lot of work to actually find all
these stupid files and put them in one place. So that’s the fun of it is I
wasted days of my life so that we can sit here and DJ. OK, so that said, let
me know how loud– OK. [MUSIC PLAYING] ANDREW SCHEPS: All right,
and here’s CD, which– again, a different master, but
it’s more to set for when we listen to the other formats. So this will be the
same master as all the other formats. OK, so that’s pretty
different. But it’s also a different
master. So let’s for fun, because
it is fun. This is when I’m glad I’m
behind the speakers. Sorry. Let’s just listen to some more
stuff, and then we can talk more, because– AUDIENCE: What resolution were
you playing that at? ANDREW SCHEPS: Well, that
would have been– is it 128 AAC? Because there was no
high def video. AUDIENCE: OK, so it was an
old upload [INAUDIBLE]? ANDREW SCHEPS: I guess, yeah. I mean, or it’s a static artwork
upload, so they didn’t bother uploading it in HD. AUDIENCE: [INAUDIBLE]. ANDREW SCHEPS: Yeah. OK, so let’s do Coltrane. So this is the same
master, OK? There have been reissues and
things like that of this, but I know for a fact because I got
this from Blue Note that this is the same master
in all formats. OK, so where do we
want to start? You guys tell me. So that’s A. We’ll do A,B,C.
What do you think about that? Or do you just want A, B? AUDIENCE: A, B, A, B ANDREW SCHEPS: Just A B? Well, hold on. A, B or A, B, C? A, B. OK. AUDIENCE: Are you sure
that happens a lot. And this is why, again, we had
to stop going to YouTube as any of them, because a lot of
them are either swapped, or depending on the transcoding
start to collapse into mono. Like the Beatles stuff
is mono, but it’s not the mono mixes. So yeah, that happens,
but that’s– AUDIENCE: [INAUDIBLE] resolution. You can’t have very high good
placement [INAUDIBLE]. ANDREW SCHEPS: Oh yeah. Yeah, I mean, with the CD. OK, so that was A and B. That
was YouTube versus 192. And so again, it was the
low resolution possibly transcoded, even though
that was an official Blue Note upload. But the problem is– I mean, I’m sure you guys know,
working at kind of a big company, that at some point,
someone told the people at Blue Note, OK, now we’re going
to start doing our official YouTube uploads. And here are all the assets,
and go ahead and do it. And that definitely filtered
down to an intern who had to sit in front of a computer
uploading for three weeks, because nobody who really know
what they’re going to do, knows what they’re doing is
going to spend longer than it takes to just point them
to the assets. So their official
uploads could’ve been completely destroyed. I mean, it’s easier
sometimes– and this happens at HDtracks
a lot, where they’re sent something that they’re told is
96/24 so that they can sell it, but the person who actually
sent them the files didn’t know how to get over the
2 gig file size limit, and the album was too big, so they
just ripped a CD and sent it. And it happens. And then HDtracks gets in a lot
of trouble, because there are a bunch of crazy audiophiles
at home doing FFTs of this stuff. And also, depending on how it
was recorded, there isn’t necessarily anything
above 20k. But if they don’t see stuff
at 40k, they’re like, that’s not high res. So there are lots of problems
in the supply chain, as well as just the file formats, which
is, again, why this is not meant to be a scientific
test, and more of just an anecdote. Now if you want, we can stay
away from YouTube, because it is, unfortunately, the
most problematic. But– AUDIENCE: Which one
and B was 192. Now an interesting thing to me–
with these speakers, I added some low end to tune
this room very quickly before I came in. There’s some thumping on that
side that I’m hearing on the 192 which I don’t really
hear in the MP3. So you don’t always– like, oh my god, it’s
just so much better. Sometimes you uncover other
things along the way. AUDIENCE: Do you have a
non-YouTube [INAUDIBLE] ANDREW SCHEPS: Yeah. We can do– well, your stuff would be 320. So we could do 320, or
we could do Amazon if you want to do that. Let’s do Amazon. Well, I just told you,
it’s Amazon. I’ll leave it up. OK, but here’s Amazon. AUDIENCE: Can you do it, and
then we vote which is which? ANDREW SCHEPS: Yeah. Yeah. Let me just play you some of
the Amazon of the Coltrane, and then we’ll go to a different
song, and I won’t say a word. Now I’m crazy, so I did some
FFTs of some of this stuff. And one of the things that
Amazon does– because they’re only selling 256 MP3s, that’s
what they sell– and presumably to help
their encoder– because they’re not getting
24-bit files, either– they actually pretty much cut
off everything above 15k. So that’s ban limited to 15k on
the way into the encoder, because if you don’t have to
bother encoding from 15 to 20, you’ve got that much more
room to encode below. So that’s their decision. Again, now I’m right between
the speakers, so for me the imaging is a pretty
obvious thing. The 192 is the only one where
things are either on the left or in the right. And Rudy Van Gelder, who
recorded this album, did not have a pan pot. It was a patch cord. It was either in the left or
the right, and that’s it. So as soon as you get anything
that isn’t discretely on one side or the other, you know it’s
part of the process of the encoding that has
made things shift. And that’s another way that a
lot of the encoders work. And I don’t know specifically
the ones you use because you’re writing your own
encoders, if you mono up stuff, , it makes it much
easier to encode. It’s one audio stream, and it’s identical in both channels. So you can save a lot of space
doing it that way. And I’m sure that’s part of the
pre-encoding of a lot of this stuff, especially at
the lower bit rates. So that’ll happen. And it’s not that big a deal
on modern pop stuff because stuff is everywhere, but any of
the Beatles stuff, all the old Motown stuff, the Blue
Note stuff, that is all discrete stereo, and it will
change it completely. AUDIENCE: [INAUDIBLE] are you thinking about
[INAUDIBLE] ANDREW SCHEPS: No. No, I refuse to. So here’s my theory, is that I
need to make my records sound as good as I can make them
sound regardless of what happens afterwards. So then, when I realized what
was happening afterwards, I asked the Recording Academy so
let me come and talk about it. And they said, sure, we’ve
been trying to figure– because they’ve had this
Quality Sound Matters initiative officially for a
little over a year, but unofficially for the
last 10 years. And they’ve had ideas about– we’re going to get buses and put
awesome sound systems in them, and we’re going to drive
them around and play this stuff for people, and trying
to come up with ways to let people here what the difference
is so that you can start to understand. So when I came up with this
presentation as a way to do it, they were all over it
and have allowed me to come and do it. So my idea is to find
out what’s actually important, and change it. I refuse to live with the crap,
and just say, I got to make it work on earbuds, because
in five years, it won’t be earbuds. And the pipes will be bigger,
and you guys will flip a switch, and it’s going to be
either uncompressed or barely compressed. And so now, I’ve changed my
whole workflow to cater to something that goes away. And it’s one of– not to talk about your
neighbors– but it’s one of the biggest problems I have with
Apple conceptually, is that they will talk a lot about
what they want to get from the labels and from the
artists in terms of their ingestion, and they want
24 bit, and they want the high res. But if I master specifically for
their encoder right now, in three weeks, if they say,
bandwidth is awesome. We’re going to start
selling 320 AACs. Well, now it’s a new encoder,
or they just update their encoder. All of a sudden, I’m making
decisions based on things that go away. And I think it’s a very big
difference between the record making process and the consumer
distribution world, and you can’t make records for
the consumer distribution world other than a lot of the
analog limitations we used to have to deal with. Like you can’t pan your bass off
to one side if there’s a lot of low end, and
still cut vinyl. OK, like their physical
limitations to things which I’m fine with. And AM radio– they shave off the top and the
bottom, and it’s mono. OK, that’s fine, I know what’s
going to happen. But in terms of taking some
sort of encoding algorithm that’s constantly being
updated– otherwise, some people in this room would
be out of a job– I can’t work for that because
it’s a moving target. So my idea is if I make it sound
great, it will survive the process better. And that, I’ve actually
found is true. Like this Blue Note stuff
sound so amazing and so natural that you can start to
hear things get hashy and it’s a little more annoying and a
little brash, and the panning isn’t as wide. But musically, it’s still pretty
awesome, and it’s OK. And it survives better. And strangely, a lot of the
urban music survives better, because there’s lots of
separation between the instruments. Things are very discretely
encompassed in terms of their frequencies and things
like that. They’re not sharing
a lot of space. You don’t have 15 microphones
on a drum kit that are all making noise. So that actually translates
better. And strangely, there is zero hip
hop or R&B that I was able to get, other than the
Espreranza Spalding record, in high res. It doesn’t exist. CD is as high as it goes. They turn in masters that are
44.1-16, because they’re building it on a laptop. And they’re actually building
their tracks with MP3s. AUDIENCE: [INAUDIBLE] compressed not in bytes but
making the lowest part of the music– the softest
one– high. So if people start doing
that [INAUDIBLE] what’s the point in
going to high res? ANDREW SCHEPS: Well, I would
argue that even something that doesn’t have a whole lot of
dynamic range, you will still absolutely here the difference
when you have a very lossly encoded file. You start to destroy things
other than just the dynamic range, right? There’s frequency content,
there’s panning content, there’s the mono versus
stereo content, there’s depth of field. There are all of the cues that
are being taken away, all the acoustic cues and reverb tails
and things like that. And that will affect it even
if it’s super loud. I mean, there’s this whole thing
called the loudness war, which maybe you know about,
but they just like– I won that war, OK? I mixed “Death Magnetic,”
which was the album that everybody said was the poster
child for things being way too loud. OK, so I won. Therefore, the war is over, we
don’t have to worry about it. [LAUGHTER] ANDREW SCHEPS: I spent weeks
reencoding for iTunes and Amazon at that time
to make those files work lossly encoded. So what happens is you start
to get rid of dynamic range and things like that, is you
start to break the encoders. The encoders need some
room to work. So I’m making it very difficult
for that to work. And one of the things we found
that worked great was turn the mix down 0.7 db, period. Just let there be headroom
that we never even use, because it’s brick
wall right there. We never get up to that last
0.7, but all of a sudden, all of the encoders sounded about
100 times better. When we got to give them 24-bit
files for the last Chili Peppers record– that was
right at the beginning of the mastered for iTunes project
at Apple– and the big crux of that project is give
us 24-bit files instead of 16-bit files. That made a huge difference. So in terms of what you feed
the encoder, it isn’t just about the source material in
terms of a sonic thing. Because I think there are lots
of hardcore and punk albums that, from a sonic audio file
point of view, sound terrible. But they are so super exciting
that people love those bands and they want to
listen to them. And if you do a 128 MP3 of that
album, what used to be hashy and exciting is now just
hashy and noisy, and I think there are lots of people who
wouldn’t get into the band as much as they would even if they
buy it on a cassette, which doesn’t have anything
above 12k on it, or something like that. So there are two very different
aesthetic paths you can take when you talk
about the music. And the problem is, it’s
not like with TV. Right, with TV, who is going to
argue that a high def set looks worse than an SD set? Because you see it, and
it’s easy to A-B. Some people like the artifacts
and you’re used to things like that. And if you have a bad digital
set that pixelates, there can be issues. And if you look at bad material
on an HD set, it looks terrible. OK, so all those arguments
are true. But let’s say you have a
well-captured still image, and you show it on these
two different TVs. One of them has way more
information about it and it just looks a hell of
a lot better, the other one does not. Whereas with audio, people don’t
trust what they hear. People think you have to be
trained to like something better when you just talk
about audio formats. And people believe what
they’re told, period. I mean, nothing influences your
opinion about things more than me telling you how
great it is, right? If someone’s about to play you
something by a certain band and you like them, and they say,
I can’t stand this band, check it out, you will
not like that band. If they say this is my favorite
band in the whole world, you’re going to try
really, really hard to like that band because you
like that person. So there’s so much that goes
into liking music that has nothing to do with any of
this, but it also has everything to do with it,
because I really believe that there are just thresholds. And for every person listening
to a new piece of music, there’s a threshold of,
am I going to like it? Am I not going to like it? And the more you can give them
something that sounds true to whatever the artist decided
was done, the lower that threshold will be, and the
easier it is to connect. So regardless, let’s listen to
some stuff, unless you want to keep talking. AUDIENCE: So when did you do
that, the Death Metallic? ANDREW SCHEPS: The Metallica
mix? “Death Magnetic”? That was– I don’t know, six years
ago, seven years ago? AUDIENCE: What made
me destroy it? AUDIENCE: Yeah. ANDREW SCHEPS: OK. That is a conversation
that is not– I mean, really the only thing
I would say about that is I have nothing to say
about that. The idea that me as an engineer
could mix a record in such a way that was destroyed,
but everybody would be OK with it and let it out into the
world is just crazy. There is a band involved, there
are producers involved, there are plenty of people
involved who said, this is awesome. Now during the process, whether
or not I made quieter mixes to A B,and an let them
hear differences and whatever, I may have done, but
it’s irrelevant. It’s irrelevant. What happens is at the end of
the day, that album sounds the way it does because that’s what
the band and the producer thought was great. And there’s some people who
really don’t like the way it sounds, but there are a lot of
people I’ve talked to who think it sounds awesome. It’s super aggressive. It’s not the most hi-fi thing
in the world, but a lot of stuff I do is not hi-fi. But I hope that it’s emotionally
awesome, and makes you love it, and makes you want
to either kick a hole in the wall or cry or call your
mom or whatever it is that we’re trying to get across. So this discussion in terms of
what you do with that file afterwards is also very
different from the audio quality in the sense
of audio file. There are lots and lots of
records that if you go to one of the big consumer electronics
shows where they have a million dollar set-up
where a speaker this size will cost you $85,000 each, and
has iridium tweeters. And you’ve got a stand for the
turntable that costs more than your house– that
kind of thing. You can only listen to audio
file stuff on there, right? And so what are you
going to hear? You’re going to hear a few jazz
records and Steely Dan, and that’s kind of it. And those are great records,
and they’re also amazing sounding records. But if you put on something like
the Metallica record on there, at that point, maybe
some of that’s wasted. But it’s not because you’re
putting on a low bit rate MP3. OK, another thing just
anecdotally– and we will listen more. I’m sorry. I will talk about
this for days. But while I was putting all
these files together, I had this massive folder of files,
and I’m keep things organized, and making sure things
are named. And I was just listening
on my laptop speakers. First of all, I’m letting the
OS do sample rate conversion in real time. Right, whatever Quicktime has,
that’s what happened, so it can play back at whatever sample
rate the stuff was set to, which is probably 44.1. And I’m just listening to the
first 25 seconds of each song, making sure they’re all
the right song. I can tell the difference
in my laptop speakers. So I bring this set up because
it’s cool, and we’ve got a room this big. And if I played stuff on my
laptop, no one can here it. So this helps. But if you have any sort of
decent kind of system-ish that has some good DSP on the back
end to make it sound pretty good, and it’s got a little bit
of power so some of the dynamics come through, I think
you absolutely will hear the difference. And even more than that, you’ll
feel the difference. One of them is just more
fun to listen to. But that’s a discussion that
could go for weeks, and there’s no necessarily
right answer. But the good thing is, I won the
war, so the war is over. So now we can all make
quiet records again. Yeah? AUDIENCE: So there’s a new
standard from the ITU to set record loudness levels. Are you following that at all? ANDREW SCHEPS: Well, what
those are as far as understand, and correct me if
I’m wrong, that’s what’s used in the Apple Sound Check, as
well, where you scan a record to say how loud it is, and then
it uses it to even out the level if you take advantage
of that in whatever playback system you’re using. Is that– OK. So basically– AUDIENCE: [INAUDIBLE]
a little different from the ITU’s standard. So there’s different, competing
implementations. ANDREW SCHEPS: Again, I don’t. I mean, if we got into
my mix process– which I could talk for
a different set of hours about that– my mixes are what sounds good. And sometimes, the level of the
mix really doesn’t matter. But a lot of times, it does. And I mix on analog equipment
which has voltage rails. So as I hit that rail, I
don’t just cut it off. It smooshes it off, and it takes
a while to smoosh it off completely. And different amounts of
that smooshing differ. And it’s just because I’m in the
analog world, so clipping and harmonic distortion are your
friend until they’re not your friend, and something
catches on fire. So when I’m mixing something
like the AFI record I just mixed or Black Sabbath record,
those mixes are going to be loud because they don’t really
sound right until their loud. But when I make something
like [INAUDIBLE] which is on my label
or jazz record– I mixed a Jeff Babko
record last year– those end up being much quieter
mixes, because I want it to be more open, and
the dynamic range really helps the music. So for me, it’s much
more a feel thing. And then I find out later that
I’ve kind of screwed up, and the mastering guy gets angry. And then I will send the quieter
makes and say, if you get it to sound as good as my
one that you say is too loud, then we’re good. But if it doesn’t feel as good,
then we have to go with my screwed up mix. So I’m not the best
person with that. There are a lot more technical
mixers than me who adhere to things more than I do. I’m kind of a disaster
with that. Yeah? AUDIENCE: What’s your
take on [INAUDIBLE] Pandora, [INAUDIBLE]? ANDREW SCHEPS: OK. So streaming, I mean, the
filetypes are the same, right? And on that chart, I had bit
rates for the streaming files. So I have no problem with
streaming versus download. I mean, there’s a whole other
conversation which is about making the music business
still exist. And that’s actually a really
important conversation, and encompasses way more
than just this. This is the esoteric, I think
this makes a difference part. Then there is the recording
album credits part, which is a discussion I’m hoping we’re
having tomorrow a little bit– implementation of that, getting
consumers to interact directly with artists more,
because that’s what creates the relationships that last so
that I don’t have to go get another day job. That’s my goal in all of that. In terms of just the audio,
though, the streaming and not is exactly the same thing. So actually let me plug
the monitor back in. And let me show you one other
thing, which is a technology. It’s called OraStream. Does anyone in here know
about OraStream? So we’ve got one, because you
were there last time. Does anyone here know about
the MP4 SLS format? It’s another Fraunhofer
encoding format. So it’s meant to be an archival
strength format. So what it does is it will
wrap audio in its own metadata, and preserve whatever
the native bit rate and sample rate is
of that audio. But one of the byproducts it
has is you can do what they call truncating of the stream
to produce in real time any bit rate stream you want. So what OraStream have done is
they’ve come up with all of the server side and back end
technology to do pinging of your connection in real time,
and to granularly scale. So the Google Play Music– you’ve got three bit rates. You check out how fast people
are able to get the stuff, and you give them the fastest when
you think they can get without any buffering, right? Because buffering sucks. No one wants their
music to stop But that’s what you do, right? And you will skip between
those levels. So if when you start playing a
song, you’re in a black hole, even though you’re listening
on your cell phone. You’re in a parking garage. You’re going to start off
at a very low bit rate. Now are you constantly pinging,
and you’ll up the bit rate as soon as you can? Or do you wait for
the next song? AUDIENCE: You want for it. ANDREW SCHEPS: You wait
for the next song. OK. And this is technology– these
guys, I mean, they probably had meetings here, I don’t know,
with anyone in the room. But originally, it’s a few
guys from Singapore who developed the technology. And they were hoping someone
else would just license it, because they thought it was
awesome, and why wouldn’t people want to do this? So what they do is they’re
pinging constantly, and the bandwidth will change. And it plays back in HTML5
using a WebSocket, and it plays back on iOS and Android
via an app, because MP4 SLS isn’t supported directly
in the OS of anybody’s computer yet. So let me just quickly go
to my account here. And for audiophile people,
by the way, this is an awesome service. So as a listener, it’s like a
Dropbox that can stream your audio to you. So you can get a
free 1 gigabyte account, I think it is. Or you can pay $5 a
month for 5 gig. You can pay a little bit more
for 10 gig or 50 gig, or something like that. You upload your lossless music
to the service, and you can immediately stream it anywhere
in the world on any platform. So it’s their version
of a cloud iPod. But here is what it’s
awesome about it. So let where are all
of my playlists? Here, let’s stream something
that’s kind of– oh, here we go. Come on. OK, so I’ve got some of
the same songs here. But here’s what’s important is
see it right up at the top, below the scroll. What do you call it? What’s the official name for
that, the progress bar with the thing in it? You know, it’s the position
bar thing. OK, so watch what happens. So everybody heard the song come
out from under the water, and start sounding good? Here’s one that is not
a hi-fi recording. Oh, this is a band from Austin
who are the most exciting show I’ve ever seen. And I signed them to my label. They made a record
in two days. I mixed it in one day. It’s psychedelic rock stuff. It’s not the most hi-fi
thing in the world, but this is at 96/24. And again, just watch the bit
rate if you can see it. So we’re going to start off at
128 because there’s a cache. OK, so we’re just
streaming 96/24. And if you do the math and
figure out the bit rate, the number will always be a little
lower, because the last part of the decoding happens at the
WebSockets, so you don’t actually need to give
the full bit rate. So the drawback is if you
compare 256 stream from MP4SLS to a 256 encoded MP3 or AAC, the
MP4SLS will not sound as good, because it’s not optimized
for that bit rate. But I’ve never had to listen
to 256 with this. Wandering around on the 4G or 3G
that I get off AT&T, I’m CD quality all the time. And as you go from the cell
network onto your wi-fi, it jumps up. And it’s seamless, and
it works in real time, and it’s awesome. So this is another example of,
I think, where stuff can go where you still get the
convenience of things having to start playing immediately,
which I totally get. You don’t want to start
streaming CD quality audio to people on crappy cell
connections. But if you can hit Play
immediately, then realize they’re not on a crappy cell
connection and be CD quality within the first few bars of a
song, and when they jump on a wi-fi network, be up at
audio file quality, that’s pretty cool. So hopefully, this is
sort of where some things will get headed. And it’s one of many
possibilities. But if anyone’s interested in
talking to the or guys, please get in touch with me, because
they’ve set it up where now it’s a lockbox service for
people who want to just upload their own stuff. I can sell my artists’ albums
through there, download as individual apps. So they have a business model,
but they’re also always looking for partners. When Neil Young released his
last record, and everyone has heard of the Pono system that
he’s touting, which is a hardware-based high
res audio system? The Warner Brothers wanted to
stream his record for a week before it came out, because
that’s what record labels do now is give you a free stream. And he said, yeah, that’s fine,
as long as it streams at 192/24, which of course, that’s
not going to happen. So they got the or guys to do
it, and they actually did it. And they were streaming about
5 terabytes an hour all over the world of people who
wanted to listen. And if they were on their mobile
browser, they were probably getting maybe
CD quality. But if they were on a computer
hooked up to a stereo, they could listen to his
album at 192/24. And again, granularly scaling,
so if there’s any little bit in the traffic, or if your buddy
starts streaming a movie down the hall, you granularly
dip, so it’s not a stepping dip. So in terms of the listening
experience, it’s a lot less intrusive, because you dip
down and come back up. Anyway, so that’s OraStream. Yeah? AUDIENCE: There’s one form that
you haven’t mentioned a single time. I was wondering [INAUDIBLE] DSD? ANDREW SCHEPS: OK, so DSD,
just really quickly, is basically 1 bit encoding
at a megahertz level. So instead of taking this grid
and putting it over, many, many, many more times a second
then you would on a PCM encoding, you say, what’s
the voltage? Is it higher or lower
than last time? And you use your 1 bit– this is the dumb version– say, yeah, it’s higher, it’s
higher, it’s higher, it’s higher, now it’s lower,
it’s lower. So you’re basically tracing
the waveform very, very quickly as it goes. The only problem is– the reason
I don’t mention it is because until about a week
ago, there was no viable consumer format. And now there is one site that
is actually selling DSD audio files that you can download. And it’s even more cumbersome
to get a player to work. Now in terms of audio quality,
listening to DSD versus high res PCM encoding, I haven’t
gotten to do A B test, but a lot of people love it, think it
sounds absolutely amazing. It’s a very different
way to encode music. It’s awesome. I try to only cover established
consumer formats during this, because that’s
what’s out there. And there’s no way
I can distribute anything DSD right now. It’s impossible. AUDIENCE: And it would be
hard for you to edit it ANDREW SCHEPS: It’s
almost impossible. There’s one system that allows
you to do multi-track editing, and it’s really expensive,
and their software sucks. So I can edit, but it
would not be good. So again, obviously, there’s
always the ability to work versus what would be best. [APPLAUSE]

Leave a Reply

Your email address will not be published. Required fields are marked *