Panel: Cybernetic Ebooks: A Panel on Machine Learning and AI in Book Production – ebookcraft 2019

Panel: Cybernetic Ebooks: A Panel on Machine Learning and AI in Book Production – ebookcraft 2019


– [Laura] I’m going to introduce the
moderator of our next session. It’s A Panel on Artificial Intelligence
and Machine Learning. We’ve got a lot of collective intelligence
assembled up on stage and you’re going to be interested in this. I’m going to introduce Wendy, who’s going
to moderate the panel. Wendy Reid is a Senior QA at Rakuten Kobo
and has spent the last few years on the other side of…
I can’t read today, sorry. Has spent the last few years on the other
side of EPUB mysteries and reading-system technology. She is currently one of the co-chairs of
the publishing working group of the W3C and leads the Audiobooks Taskforce. In her abundant spare time,
she likes to learn about new technologies and read as many e-books as she can.
Take it away, Wendy. – [Wendy] Thanks Laura.
Can everyone hear me? Oh, sounds like now I can hear myself.
Welcome to our panel. Today, we are going to be talking about
artificial intelligence and machine learning and how that
applies to book production. So I have with me, up here,
some very, very intelligent people. And maybe you guys want to introduce
yourselves one-by-one because you can probably do a better job than I can?
– [Monica] Sure. Hi, I’m Monica Landers,
I’m founder and CEO of StoryFit. We apply AI in both the book world and the
movie-studio world. I think the other bits of my experience
that’s relevant is that I ran an R&D group for about 7 years within a
content-technology company. And so, a lot of this like,
“How do you bring in new technology into a company?” is something I’ve been focusing on for a
long time and I think is often harder than it looks. So I’m happy to be here today to talk
about how to bring complex technology into companies. – Jens? – [Jens] My name is Jens Tröger,
I’m the founder of Bookalope which is a set of tools that helps you analysing
books, build e-books, build print books. There’s lots of AI in that process as
well, I guess we talk about that later. Louder? Oh, louder?
Can you hear me? Yes.
And that’s pretty much it. – Joshua. – [Joshua] I’m Joshua Tallent,
I’m the Director of Sales and Education at Firebrand Technologies
and I’m an e-book guy. So there you go. – So I wanted to start the panel today…I
know we’re kind of covering something that’s a little bit maybe
scary-sounding for people. You probably have heard a lot about
machine learning and artificial intelligence in the news,
it’s kind of one of those technologies that every…it’s like blockchain,
like every startup is throwing those words around, maybe correctly,
maybe not so correctly. And we thought,
“Why don’t we introduce it?” because it’s really not as hard as you
think, it’s really not that scary, but there’s a lot of words that we’re
going to use. We’re actually going to have a glossary
that’ll be up the whole time. So if we say something you don’t
understand, hopefully, the definition will be there. We’re also going to do questions at the
end, so, if we say anything that maybe wasn’t in the glossary or you really want
to know about, please feel free to ask. We’re here to answer
all of your questions. So I’m going to give a quick intro. So we’re talking about two things here,
Machine Learning and Artificial Intelligence. Machine Learning is the scientific study
of algorithms and statistical models that computer systems use to effectively
perform a specific task without using explicit instructions,
instead relying on patterns and inference. And Artificial Intelligence is the theory
and development of computer systems that are able to perform tasks that normally
require human intelligence…remember that intelligence is not equal to
consciousness…such as visual perception, speech recognition, decision-making,
and translation between languages. It was probably where you’ve seen most
applications of AI lately. And a brief history,
because the field has actually been around for a lot longer than most people
really think, especially considering the growth of it in the last couple of years,
so most people probably know Alan Turing, very famous for creating the Turing test. But, between him and another scientist
named Church, in the ’30s and ’50s, they theorised that a machine can simulate
any process of formal reasoning known as the Church-Turing Thesis. In 1956, AI was founded at
the Dartmouth College with the claim that, “Human intelligence can be so precisely
described that a machine can simulate it.” It is their thesis. In 1959, machine learning was defined as
“Progressively improving performance of a specific task without the need to
reprogramme the task,” so creating something that can learn from itself. But, very interestingly,
there was a really…that period, there was a lot of talk and a lot of
research into AI and machine learning. But, in the 1970s and 1990s,
we refer to this period as an AI winter, computers just weren’t powerful enough,
a lot of theories but not enough power to kind of reach those goals. So what we see, from the 1990s ’till
today, is that, thanks to the dramatic increase in data and computing power that
we see, training of things like what we call narrow AIs, so stuff you’ve
probably seen on television, IBM Watson, Google’s AlphaGo, other applications like
that that can do very, very specific things, are possible
thanks to computing power. So and that’s why we’re here today. So I’m going to just leave this up on the
screen. These are some terms that we’re going to probably use throughout the
presentation, but feel free to read at your leisure. So I wanted to open up with, first, is,
“Describe how you were using artificial intelligence and machine learning
today in your business?” How do you apply it,
and just so the audience understands, how you guys are using it? You’re first. – So if you take AI as a big picture and
ML is smaller, we focus on a very specific part of machine learning called NLP,
natural-language processing, so we are really focused on the text
within a book or a movie itself, which is a very narrow focus. And part of where machine learning is
today, too, is that they’re these broad applications, but to get really good and
get precise information, it requires different focus. So that’s the focus that StoryFit uses is
just on understanding the words across the narrative. – Jens? – My problem is a little bit different in
a sense that the AIs that I’m building are considered classification AIs,
which means that their job is to assign labels to unknown things. For example, if you take a document,
you don’t know if this particular paragraph is a chapter, it’s a section,
it’s actually a paragraph, it’s part of a poem,
these are all labels for paragraphs. And so, the AIs that I’m working with,
their job is to look at the formatting of these documents and kind of derive the
labels, the semantic meaning of the paragraphs, from the way the
paragraphs are formatted. And that’s a classical
classification problem. – Joshua? – Just like what Monica was saying,
about StoryFit, we’re working mostly with natural language
processing just in the sense of trying to understand how people think about
books, what people say about books, generating keywords based on that concept. – We’re still very much kind of in the
exploratory phase of seeing where both of these technologies can help us. Because we do so much,
we do everything from self-publishing to just our own quality stuff. We probably are looking very much at
classification to see like, can we help self-published authors
produce better EPUBs or can we use it to improve recommendations
or things like that? We have a service called Shelfie
that we’re still looking at that just looks at book spines,
so there’s lots of things out there. But we’re still exploring,
so it’s fun to learn. So where do you guys see this field kind
of helping the publishing industry the most, especially today? Like where do you see the biggest impact? I feel like Jens maybe
has the best answer. – I think there are different
fields in publishing. Right? I mean there’s the content creation,
there’s the content conversion, there’s content management,
there’s content analysis. Like these are all kind of things we do
as we’re creating books, as we’re working with books. And each of these fields, I think, could
benefit from using different kinds of AIs. And, for example, with content analysis,
that’s kind of what I’m looking at is, you know, can we classify content
automatically so that editors and publishers don’t have to sit and
pour over the documents and kind of label all the paragraphs
because the software needs that? That’s the kind of tedious task that AI
can solve for us. Content generation, I’m sure you’ve
seen…what’s that website with the Create my Face thing? – Oh, thispersondoesnotexist.com? Go there now, thispersondoesnotexist.com,
a very interesting AI tool. The pictures that you see do not exist,
that person does not exist, it’s all created by AI. And there’s another one that I saw today
actually, Nvidia just released, it’s still in development stages,
but they released an AI system that can take basically an MS-Paint-looking drawing
and turn it into what looks like a real photo-realistic landscape. – Yeah.
And that’s the content-creation part. And there’s a lot of,
if you pay attention to the news, you keep reading, you know,
this guy wrote that AI which simulates the writings of Mark Twain so it generates a
block of text which, when you read it, reads very similar to how Mark Twain
would’ve written, it’s just not his writing but it’s a
close enough simulation. And I think it’s very important to
understand that’s what AI does. Right? It’s a tool, it does get stuff done,
it helps us achieve something. We need to make sure
we use it in the right way. – I think on where it applies to
publishers, how publishers can benefit from it, like you’re saying,
content creation is part of that potentially content analysis and being
able to use it. I think, on the marketing side,
there’s a very big picture here for publishers because you’re dealing
with, “How do I market products into a very crowded space?” you know,
“an even more crowded space every year with a longer tail of books over time?” You need to be able to stand out,
and that’s hard to do if you don’t have the marketing capabilities to actually
fight against all of the other, you know, input coming into people. You have to have some
ability to engage that. AI is a tool, like any other tool,
to give you a leg up, or to keep you up to pace, in some cases. It depends on how that works. – I think any application where you have a
lot of content, a lot of measurements, and a lot of comparisons and
so…is where you can apply AI. So real-time pricing is an example. And another is if you have an idea but you
don’t know how it stacks. For example, when we’re looking
thematically across books, what really matters
is, is this book unique? Like, does it have a theme that’s really
unique in this genre, for example. And so, to say that,
you really want to compare across, you know, thousands of books, right,
not just the books that you read or you can think of. And so, those are good applications for AI
is something that you want to do it quickly, you need an idea,
and it matters that you look across a large corpus. – I think it’s a great way to take a lot
of data and distill it in ways that are most effective, instead of someone just
pouring through spreadsheets or all the ways… – There’s no way a single person could do
that anyway, right? I mean your ability to process that
information is limited because you have a limited amount of processing
space in your head. A computer that’s doing this process can
handle millions of data sets, millions of pieces of data,
comparing them all one to another, seeing similarities. Assuming it’s built correctly,
and that’s a part of the problem as well we’ll talk about.
But you have the ability to benefit from a very specific
type of algorithm that can do a very specific kind of thing that you,
as a person, just can’t do. – Yeah. I think it actually probably leads pretty
well into one of my other questions was a part of machine learning and AI that I
find personally very fascinating is that it’s not really immune to bias
and bad training. Right? What do you have to consider when you’re
designing something to avoid that bias or that bad behaviour perhaps? As I think we’ve seen,
there’s some examples from the internet of, you know,
things going kind of hilariously wrong. I think my favourite one is…I don’t know
if anyone remembers the story of the Microsoft Twitter bot? – That’s what I was thinking of, too. – Yeah, it’s my favourite like bad AI
story where people thought, “Oh, it’ll emulate humans,” but… – Well, it maybe did. – It kind of did but not the good ones.
So… – Yeah.
It became horribly racist… – Very quickly. – …and horrible in every way possible,
its tweets were very ugly. – Very ugly. But it’s interesting because that could’ve
been prevented if they had thought about that. But how do you think about that when you
design something? – We have a couple ways of doing that. So, on your glossary,
it mentions neural net and Feature Engineering, and Neural Net… I’m going to really like simplify this,
but neural net is a way of throwing everything in the box and letting the
machine kind of sort through it and spit back out an answer. But it means you don’t know the whys,
you don’t have the reason codes underneath. So it gives you this comparison, but it
leaves the machine totally on its own. And that’s what happened in this example. Another way is to do feature engineering. And so, there’s a human element here,
for good or bad. So if you use feature engineering,
it means you’re deciding what you measure. So we use, for example, the books. What are you going to measure out of it? Well, we’re going to count the number of
characters, how much this character talks to another character,
whether they’re happy or sad, what the themes are. You know, so hundreds of
thousands of elements. Right? And so, then you put that into the
machine-learning box, so now, when you get results,
you know where it’s coming from. And we often use both of those to compare
because you want to look for similar results, but then, also,
you want to check Into the reason codes and see what’s popping up because… and the only way to get reason codes that
make sense to a human are to use feature engineering. But, as you can hear,
this means that we’ve chosen what we measure out of it. Now, there’s lots of machine-learning
techniques to figure out how to measure, so there’s lots of ways to do this. But I think it is important to ask the
questions of “What are the human elements along the way in building whatever
machine-learning programme you’re using?” because that gives you insight as to what
the potentials are for misfires. And also, a good test for the loop around
is, you know, a human should review…like and we have intern…I have gobs of
interns every semester who are happy to be in this crossover space, right,
between technology and literature, to review it too and just to check and see
what’s coming out. Because that’s really important to watch
for and catch something before like the crazy mean bot goes off rails. – Yeah.
Do you guys have any thoughts? – Yeah. Well, I think you have to be
careful as well, the input that you’re giving. You know, it’s the process
that you create but it’s also the input. This is a big problem in the scientific
world, the recent conversations about how, you know, the data sets that
they’re using to do this AI and machine-learning processing,
try to figure out, “Okay, here’s a conclusion we can come to based
on all of these studies.” Well, if the studies are flawed,
then you’ve got a problem with the potential output that comes out of it. It’s garbage in, garbage out. Just like any other system,
you have to be very careful about what you put into it in order to understand that
what comes out of it can be more accepted. – Yeah. That’s I think a really interesting like
point to make is that a lot of the research you do is you realise that like
algorithms are not immune to even human bias. If the data you put in because you
personally selected it, and, you know, everyone has biases of their own,
your model will just spit out your biases pretty much right back at you.
Which is very interesting. – I mean I’ve built the algorithm,
I’ve selected the data, so these choices will reflect in how the
thing functions and what it produces. And I find it quite interesting how AI has
become kind of a recent buzzword. Back in the ’90s, late ’80s early ’90s,
at school, they wouldn’t teach artificial intelligence like they do today. Back then, it was
called data mining. Right? Because there was only this growing amount
of data that people realised the internet started to spread, and then,
we get access to more and more data but what are we doing with this data? And so, that new field kind of grew out of
mathematics and statistics called data mining. And so, the people started to look for
correlations and similarities in data and started to kind of make sense of all these
blobs of data and, out of that, eventually grew what is called today
artificial intelligence. Because it seems like…like it seems
intelligent what that blob of code tells me but we shouldn’t forget that
it’s a blob of code. Right? It tells me what I told
it to tell me, in a way. So if I mess that up,
then the result is funny or sad kind of. – So if the robot overlords kill us in the
end, it’s our own fault because we wrote the code the wrong way?
– That’s right. – Yes. It’s because that’s
actually what we wanted. – It’s actually what we wanted done. – So what are your thoughts on technology
like this in creative industries? I was thinking about this question because
I think there’s a lot of…when you think about, I think Dartmouth College has
a really famous prize where, if someone can get an AI to write a short
story that could be published in “The New Yorker,” they’ll give you
like $10,000 or something. And it obviously has not happened yet. I don’t know if anyone’s ever read a short
story written currently by AI, but it’s really funny. It’ll start pretty good, and then,
get really strange. But I think that feeds into a lot of
anxieties, like, “What if,” you know, “the robots take over?” But like what are your thoughts or your
feelings about that? – Well, we’re already seeing
some thing’s being written by AI, by algorithms, I mean a lot of sports
articles, or even news articles, even business articles are being written
by computers that have the data. But if you read them,
you can tell that it’s just…it’s a bunch of stats, right? Baseball stats or something,
it’s not really real detailed. I don’t see this becoming a huge problem
for, you know, “All authors are going to go out of business,” “you can never
write another book,” you know, creativity still comes from human effort. I think that there may be a case where
somebody at Dartmouth actually wins that, sure, but that doesn’t mean that that’s
going to be the best story or that it’s going to hit really the emotional points
that a human wouldn’t hit. You know, there’s going
to be something different still about human interaction.
Another, umm, – Sunspring is the name of an
AI-produced movie, it’s on YouTube, and I giggle every time I see it because
the actors acted off the script with like, you know, real effort.
Have you seen it or…? I see you nod your head, it’s funny,
it’s like 15 minutes. But that’s just kind of an example of
where it is today. And probably that’s the question I
fearfully get the most is, “Are you trying to write content?”
when we work with creatives. And no, we’re certainly not. We’re trying to understand it,
in different ways, to then let creatives actually look at their own
work kind of out-of-the-box. And so, we look at what do you need to
measure to actually recognise that it’s a hero’s journey, what do you need to
measure to recognise whether you have a unique character or whether you’ve fallen
into some stereotypical traps. Which you might want,
you need stereotypical characters to carry the story sometime. But so, the way we approach it is to say,
“Hey, is this what you’re trying to do?” so that you’re able to create the book or
the movie that you want to create. And so, it’s like having someone else in
the room to ask the questions or to notice things that you wouldn’t have seen because
you can’t read 100,000 books and compare that, you know, instantly. But I think that it’s a real
challenge and everyone ought to… I mean, I don’t know,
like give yourself some credit. Like there’s a reason it’s hard to adopt
new technology, especially as it changes so fast, and it requires you to change the
way you think and work. And this is a real hurdle to get over. And I learnt through a lot of mistakes,
at a company, of getting this great app or something that, once my team got it
working, we’d hand it off to the rest of the company. And boy, when that handoff
wasn’t well-done, it failed. And sometimes lesser technology had a
better handoff and was used and integrated into the company. And so, this handoff is tricky and I just
think it’s important to recognise it because there is a level, I think,
of frustration of starting to use something that presents information
differently. But I also think it’s valuable. I mean you feel that way every time you
get a new phone, right? It’s like a couple of days of going,
“Ah,” you know, “where do I find how I used to work?” But my opinion, if you don’t force
yourself to jump those hurdles pretty regularly, then the pain level just
goes up even further. And so, you know, we spend time with some
of these big technology companies that, you know, feel like competitors, right,
and they are just testing this stuff like crazy. And so, that’s part of what I always feel
like…and we were talking about this before, like,
if we can be a friend to the industry and to individuals to understand how in the
world do you use this new tool internally, that’s always something that
brings me a lot of pleasure. Because I think that is such a hard
handoff but, boy, then when it works and can save time and save efficiency,
you know, that’s the big win. – And I think that’s one of the most
important things to emphasise is “AI is a tool to make my life easier.” And personally for me that was the whole
motivation of why I started writing one because I like designing books,
I like well-designed books, and I just got tired of having to deal
with messed-up manuscripts. Right? And so, I shouldn’t be dealing
with it but I had to. And so, the solution would be a decent
tool, you know, that helps me deal with that, so take the pain out of that so
that I can actually do what I really want to do, which is play around with typefaces
and book design, and do the fun things and let, you know, the machine do the
boring stuff. And that’s what I think AI
should be doing. It’s a tool. – It’s a tool to make your
life a little bit easier. If anything, I think maybe a good way to
describe it is, if used properly and used with intention, you can actually use it to
facilitate the creative process. That’s something we’ve been looking at in
some of our exploration is “Is it possible to make,” a lot of our focus is on making
people’s reading lives better, so, “can we use,” you know,
“an AI to bring up summaries of things to help someone understand the content of
a book? Or can we use it to help someone
understand the plot of their novel better?” So that, when they want to publish,
they know, “Ooh, there’s kind of an issue in my plot.
I need to clean that up.” But this is all early-days like
exploration but I think the idea is to facilitate creativity,
not take it over I guess. Okay. So what are you guys most excited
about in the intersection between, you know, this technology
and the content world? Where do you see the future? Or tomorrow? – Rosy and beautiful. – Rosy and beautiful. – I like the practical application of
technologies like this, like being able to say,
like Jens was saying a moment ago, being able to say, “I don’t have to do
that silly thing that I used to spend two hours doing. I can hit a button, it happens,
I’m done with that. I can focus on what’s
really important to me.” That’s the job of any tool is to make your
life easier, make your life better. You know, everything from a car,
I can get somewhere faster and quicker, you know, faster and more efficiently than
walking or taking a horse. You know, the whole idea
with any tool is to help you. And so, that’s where I see this benefiting
publishers, both, you know, on the production side,
and on the marketing side, and on other sides within
the publishing house. You know, for those of us in the room who
do e-book production, you know, people who are trying to figure out,
“How do I repackage my EPUB2 as an EPUB3?” You know, or “How do I fix all those old
EPUBs that Teresa was talking about?” You know, that shouldn’t be something we
have to deal with, that should be the kind of thing you pop it into the AI,
the AI figures it out, and, “Oh, this is what it’s supposed to be,” and
then, you verify and you’re done. The goal of these things is to help us
make more content, sell more content and, therefore, provide to the world better
content than we’re currently providing. – Quality, quality content. There’s a lot of discussion about
accessibility, right? But if you drill down what does
accessibility actually mean, it means, “I understand the semantic structure of
the document, I know what these elements are, other than just text.” Like that information doesn’t exist
inherently in a Word document, for example, until I put it there. But why should I?
Right? It’s tedious, it’s boring.
So I have an AI to help me with that. Which means, once I have that information,
it kind of rolls all the way through the process to help me build
accessible e-books. Right? Like you were talking about that earlier,
like building an accessible e-book, for example, should be trivial. We shouldn’t be discussing how hard it is,
we should be discussing what kind of tools do we need to make it easy for us. You know, and that is just one example but
there’s a lot of discussion about content of images for accessibility.
Right? I mean Google has an open API where I can
upload an image and Google tells me, “There’s a horse, there’s a beach,
and there’s a mountain in it.” That’s kind of a nice start to tag images,
why does nobody use it? You know, and yes, you still review it and
maybe sometimes it gets it right and sometimes it gets it wrong,
but at least it gives you a good starting point, it gives you a point where
your e-book runs through DAISY, and DAISY says,
“This is nice. It works.” And your reader, you know,
gets an e-book which sort of describes the image. And then, you spend an extra hour kind of
tweaking and tuning everything, but the big part, the big chunk
of the work is done. And that’s kind of to answer your
question, “Where do you see this all going?”
that’s where I see it going. I don’t want to have a headache
over these details basically. – I mean just I’m tagging on to what
Joshua said about the quality is there’s an element of, in creative work, right,
of human time that needs to be there, like of a breath, right, to get there. But I would predict that things aren’t
going to slow down. So if you have these efficiencies,
is pointing out, now you have the benefit of getting some of this work done faster
so you have the time. But I think the reason we’re all in this
business, right, is to say, you know, now what’s this unique, you know,
wonderful story that I can create out of it? And so, it’s creating that breath so we
don’t lose what we love most about the process but build in some efficiencies. I think these are all going to start
coming really quickly. Like, you know, there’s not a whole lot of
us talking like this now but I bet you, in five years, you’re all going to be able
to have sat up on this panel and talk about AI and, you know, whether it
used Random Forest, or whatever […] aid. Like all these phrases that aren’t used
very often I think are going to become really common because this is just going
to be part of the tool set that lets you do the work that you really want to do
even better. – Hopefully, we’re introducing some new
efficiencies to everyone’s process. Since AI and machine learning have become
kind of… they’re buzzwords now, right? I think, while we were
discussing this panel, we were talking about an article or like
something like…I forget the percentage, it was alarmingly high number of startups
mention the words AI and machine learning in their pitches.
But do they actually use them? This is something we were talking about. So like what sort of buzzy uses have you
seen and do they drive you crazy? Do you worry about them kind of like
polluting the industry? – Like I always feel like we’re shaking
our fists and like, “That’s just a filter.” You know, some of the stuff that’s machine
learning or AI, they’re just a complex set of filters. Which are super useful
and sometimes all you need. And we get, you know,
inbound sometimes of requesting, you know, a machine learning
solution or AI solution. And I’m like, “Yeah. Well, we can do this if you need help but
this is just a complex filter.” So I think it does help the more you know
too so you don’t get caught up by the, “It’s AI, therefore I can’t even ask the
right questions to understand what’s happening.
So how can I feel good about using it?” You know, so I think the more you’re in
tune with the words and know how to ask the questions, you can weed out some of it
and maybe lower your price a bit too, you know. I mean I probably can think
of more but I’ll stop. – I’m trying to think of specific
examples. I can’t think of any off the top of my head, but yeah. – When you hear it, you hear it.
– When you hear it, you hear it, yeah. – And you always feel mildly defensive
because I’ve got like, you know, PhD data scientists who spent years,
you know, working on this. And so, when I hear people
representing a simplistic solution, I always want to be like, “That’s not it.” – Yeah, I think the media is very…what’s
the right word…enthusiastic like with reporting. “This is an AI, that’s an AI,
that’s ML and it’s going to take all the jobs,” and then, “look at
what it’s doing.” But, most of the time,
when you actually look into it, it’s not really…as you say,
maybe it’s a filter, it’s about something much more simplistic,
and it just gets a lot of attention because it’s a buzzword, at this point. You know, and, yeah,
like the only way I can deal with that is just ask, “Really?” like a healthy sense of scepticism. Like if you read all these articles and
just kind of raise the eye brow and think, “Really? Is that actually true or is it just
another buzzy article because it’s just hip to talk about it?”
And… – And if the article starts in the first
minute saying something about, “Are robots going to take over the world?” I was joking, I was like, “I, for once,
would like to be covered not in an article that mentions robots taking over the world
because is not the same thing.” We’re not taking over the world this year,
it’s like way on the road map. It’s totally at least 2 years from now. – Think of Bender, right? Bender is a robot, you know,
Futurama reference, so I read all these things and all I see
is a drunken robot sitting somewhere in the corner.
That kind of turns it all down for me. – Yeah, I think the representations
sometimes is like, you know, more how “9,000” and Bender,
but I feel like Bender is probably really where we’re going to go. – If you just think of like anytime
someone’s talking about AI, it’s reaction to big data,
which was the phrase before, right? But now that you’ve got all this big data,
you got to figure out a way to make it make sense. So I think that’s the least scary way to
look at it is AI is just making sense of big data because now we’ve got so much
you can’t just review it. And then, that should be less
roboty-taking-over-the-world. – Well, and that’s an important way of
looking at this for publishing as well because publishers have lots of data.
Lots and lots of data. It’s in your books, it’s about your books,
it’s about the sale of your books, it’s about the other books that other
people sell that you want to compete with. All that data is there,
it’s sitting there in a variety of, you know, places. The more you can pull it together,
the more you can do stuff with it. And whether that’s filtering or whether
that’s, you know, whatever you’re doing with it, you have to do something with
that in order to truly engage your business. And that’s what big data really is and
that’s what AI is supposed to help us with. – And I think it requires you to
understand what do you want to get out of it as well. I mean what is it that I want to achieve
here in the first place? And like, what you mentioned earlier with
the marketing, I mean what kind of marketing information helps me to,
you know, sell more books or to target markets better,
to advertise better. And then, from that,
you kind of work your way backward, and then, you end up looking at your pile
of data thinking, “Do I have the data or do I need other data? How do I rummage through all of that data
to get to the point where I want to get to?”
Right? And that’s the process of building these
kind of things, these AIs. – I wondered, after what you just said,
if it’s helpful just to have this in your head too is that, if you’re trying to
solve something…so you’ve got all this data. If you have the question you’re trying to
solve too, this is your training set, and so, what you’re training to is what
you’re building your model off of. And so, your question that you’re asking
is super important. And that’s one of the benefits of once you
get your data in the clean situation is that now you can ask multiple questions
off it. And just, very simplistically,
you divide it in half. You build the model off half of it,
you test on the other half. And so, this is what kind of your basic
process. And so, when they’re talking about asking
the question is then, “Do you have the data on
this side to train?” So if I train a model on,
“Is this a romance book or is this a,” you know, “horror?” then I’ve got to know,
I’ve got to have enough books, and identify which they are,
train to that, and then, test it. That’s a kind of easy one. You can probably tell the
difference on your own. But if you had to do it across 10,000
books because you wanted to re-sort them between some new category.
That’s just an example. You have some new category that you want
to put and you’ve slogged through a couple of hundred and you have that data,
well then now you have a start to where you can start building it. I don’t know if that made sense,
but that’s kind of I think that’s really important to understand is what question
you’re asking matters and that’s separate from this set of data that you’re working
with and trying to decipher. That may have been too much. – No. That sounds like
a really good example to me. And I guess like the data is actually
probably the really important part. We kind of covered this as well in terms
of, you know, bad data or, you know, the confirmation bias, like,
“I put the data I want in so I get the result that I want out.” You know, I think a recent example I saw
again something kind of going really badly is, did anyone hear about Amazon’s
hiring algorithm? – No. – So, you know, I assume Amazon gets a lot
of job applications. I’d think, I don’t know. And so, they had an AI go through the job
applications to sort out who would be the best candidate for the job. And I assume that the engineers working on
this were probably very well-intentioned people. Most people are. Except that the algorithm never picked
female candidates. And I’m sure if you’d asked any of the
engineers, they probably would be like, “No-no-no, that was not the intention.” However, because of the training… – The training set had mostly men in it. And so, it reinforced the
mostly men as a… – Yeah. So being I think very conscious of the
data that you use and the result…like, as you guys said, like you got
to ask the right question. And sometimes the right question…you’re
going to fail a bunch of times before you get to the right question. I’ve even had this…like I did some
testing where I tried to get an AI thing, a model I trained to write
12-century Arthurian poetry. So I picked a very specific data set,
obviously, and I accidentally left in some project Gutenberg page markers, and so,
my AI just kept returning page markers to me, like, “Page marker, sentence,
page marker, sentence.” And I was like, “What?” But it’s because I did not
train my data properly. I wanted poetry, I got page markers. – Well, and it points out, yeah,
to cleaning the data. I mean it’s just like, you know,
when you decide to paint a room and you’re like, “Ah, this will be great,
it’s a small room,” but then, you spend all day prepping, and then,
you know…well, like cleaning data is like is a big deal, getting it into a
place where you can actually run and get accurate results is a huge amount of work. So sometimes we’ll go into companies and
they’ll say, “Well, we’re doing this ourselves,” but they can’t figure out how
to clean. And I mean, as a company,
we’ve spent so long building tools that can clean the data, whether it’s books,
or recognise the elements of the script, or pull off, you know,
the pages and the chapter headings of books that we can really analyse it. So that is no small feat,
and it’s nothing to kind of…I mean you can’t skip that, unfortunately,
like a painful process. – Yeah, so very much a human element. – I mean and I’m sure you’ve all
encountered that exact process, you’ve all helped Google train their data. Right? Every time you have a little caption on
your screen, “Please,” you know, “mark up all the traffic lights,”
or the cars, or whatever, storefronts, you’re helping
Google train their AI. Right? Google gives you random images and you
tell Google, “That’s a number, that’s a car, that’s a traffic light.” And Google then learns a little bit more
and a little bit more. And it’s a massive social effort to train
data or to categorise and classify data. Because like 10 engineers at Google can’t
do it, so they just packaged it into a conveniently little captcha,
made it a development kit for websites, and suddenly, that data reaches billions
of people and billions of people help Google train the data. The interesting social experiment would
be, “What if all of us would just do the wrong thing?” That would be funny, right,
because then things would go wrong, right, because we would tell that AI,
“That’s not a car, that’s a cat.” Right? And then, over time,
that AI would learn that everything that looks like a car is a cat, and so,
it would stick the label cat on every call. Which to us is funny and to that AI it’s
just, “Yeah, I’m just labelling stuff.” Right?
But that’s how it works. – I don’t know if you’ve ever seen this
image, like “Puppies or croissants?” – Yes, I’ve seen it.
– Like, “Labradoodles or fried chicken?” Like…I think we should start testing. So you guys work with this stuff every
day, you know, data and all other stuff, what is the funniest weirdest or worst
thing you have ever seen something return to you? – I had one person who was very
enthusiastic, very happy to format their books. The Word document was all over the place,
like different fonts, different weights, italic, non-italic, sizes, everything. Like she had made use of the entire range
of typography within a paragraph. Like there were paragraphs with 24
different stylings in them because she thought that that is more important than
this but there’s a different kind of importance down here. And so, she had formatted all these things
differently. And so, I fed that into my classifier. And it had no clue what to do with it.
Right? It just got…like the classification that
I got for that document was all over the place. And so, just for laughs and giggles,
I have like to test the precision of AIs, I have one classifier which is,
quite literally, like a 5-line classifier, which is a random number generator, so
it just randomly classifies stuff. Right? It has no clue, no plan, nothing,
it just throws out a random label. That thing did better than my AI because
the document was so all over the place, you know. And like I looked and just,
“I don’t know what you mean.” And if I don’t know what she means,
like how’s that blob of code supposed to know? Because I’ve trained it and it just
imitates what I’m doing. So I look at the Word document and just,
“I don’t know.” Right? And so, you know, that was kind of funny. – I don’t have a great story, not as
good as that one. – One thing we did…so the data scientist
I’m talking about I mean she used to work for NASA and put satellites in the air
went back to get her PhD and just focused on NLP and television shows. So I mean this is seven years,
so she’s good at it, I preface it because she walked in with
the results when we first started splitting and observing female roles
versus male roles in movies. And, you know, I’m going to then sell this
to the movie studios. And she came back and said,
“Here’s what,” you know, “the women look like,” basically. And I didn’t say it this way but I
thought, “She has made a mistake because all the female roles looked the same?” And so, well, it turns out 80% of them are
measured to be the same with our technology. But I just remember that moment of
thinking, like, “I can’t sell this. Who wants to see that all the female
characters are the same?” So we dug in a lot deeper, and now,
we feature the nuances of characters, but that was both the sad kind of
uncovering of like, “Wow,” you know, “they look the same. They’re measurably,” you know,
“written in similar fashion.” – I think I’ve a silly one, and then.. actually, I guess they’re
both kind of silly. But I was testing Shelfie,
which the role of Shelfie is to scan your bookshelf and identify spines and,
therefore, generate recommendations. So it was testing it for release,
and it didn’t work out as well as we expected because we discovered a very
interesting bug in the programme where, if it saw a blank, it identifies the shape
of spines, and then, attempts to classify them. And, every once in a while,
we catch things like the edge of your bookshelf would be a spine,
which it’s not. So wood grain, a lot of wood grain when
you’re looking through the segmenter. And then, every once in a while,
usually it would say, “Oh no, that’s…” it didn’t return anything,
it’s a bookshelf. But, every once in a while,
it would randomly assign a book to this blank space and it was always
a romance novel. – When in doubt. – When in doubt, go for like a shirtless
man on the back of a horse. Which was great but I felt a little bit
bad for the people whose bookshelves were otherwise not-romance content because
it’s, you know, maybe a bit of a mischaracterisation. The other one was we did a bit of a
hackathon where we used the same thing that produced chapter headings instead of
poetry for me, we all tested it and everyone tried to test it with different
data sets. And one of my colleagues tested it with
recipes, tried to produce a cookbook using only an AI. And it’s apparently astounding how many
recipes open with step one being “Cream together butter and sugar,” because that
was pretty much how every one of his recipe began, even if butter and sugar
were not in the ingredients list. So we have a lot of fun with it. – Every good recipe starts… – Apparently every good recipe starts with
butter and sugar. So we have a little bit of time before
questions, do you guys have any like last kernels of wisdom before we open it up for
questions? – I think my main point is don’t be afraid
of AI because it’s not here for your job, it’s here to help you. But be, as I think Jens put it,
be skeptical. Right? Look at it carefully. If somebody says, “This is AI,” figure out
if it actually is, you’ve got the definition.
Right? If it actually is and it actually has some
benefit, then look at how you can engage that in your workflow to help you do a
better job with what you do, save you time, save you energy,
save you money, and help you make better books. – Jens? – Yeah.
Be courageous. – So does anyone have
any questions for us? Or comments? – [Male] So I want to know,
you were talking earlier about getting publishing ready, how ready do you think
publishing is with their data for all of this? Because I know what you’re talking about
requires them to have good data, the garbage in garbage out,
so how far are we away from the garbage in for all three?
Like, in general. – Sure. You’re ready. You’re ready.
You got books. You got books, you got sales data,
you have goals. I mean it’s all there and you can identify
your goals, whether it’s awards or, you know… I mean, so, from my point of view,
you’ve got all you need. – I’ll give the caveat. You have all you need but, a lot of times,
it’s not clean. That’s where the concern is for most
publishers is that data that you do have, “Is it consistent? Is it correct?
Is it tagged properly?” You know, on the metadata side,
if you have citations set up properly, then you’ve got good data set. If you just have a bunch of text that’s
not broken out, you don’t know who the source was, when it was published or where
it was published, that kind of information is actual helpful in classifying that
data. So, I would say, as a publisher,
look at what you’ve got. If you’re using Excel spreadsheets to
manage your data, probably you can do something better. If you have your books,
you’ve gotten good data in the books but you have to find ways
of extracting it. Right? And it’s broader than that too. When you look at your sales data,
how are you engaging that? Are you bringing it all into one place
where you can then analyse it and compare it against other data that you’re pulling
up and that you, you know, think is important? Are you comparing that to
your sales rank on Amazon? Are you comparing your prices
to your sales rank? Are you comparing your other things to
other things? That comparison process is where you can
start to actually use the data you’ve gotten. And if you don’t have good data,
you won’t be able to do the comparison. – The follow-up to that is how
do we get to that good data? Like, if we’re not there,
are there steps to get there? Well beyond like… – Oh no, publishers are doing
this all the time. I mean, most of the time,
it involves getting a system that’s a single source of information for most of
your data. It may be the single source of information
for your metadata and some of your other, you know, product data. You may have another single source for
your sales data, you may have another single source for the product…the
information inside of your books. Those different pieces,
those different databases will be useful and can be compared to each other and used
with each other to engage that. But you may have to focus on one at a
time, you know, “Let me get my metadata figured out first, let me get my book-data
figured out, let me get my sales data figured out,” and clean it up and put it
into a system that you can then access. – So I think, in general,
building and training an AI is a very complex process. It’s awesome when it works,
less so when it doesn’t, as we talked about. But training is a careful process and it
takes effort, it takes human effort, because we need to label the data so that
the training can happen in the first place. And so, if you, as a publisher,
have all the data and half of it is labelled and ready to go,
and the other half isn’t, then that’s an effort and like kind of a
forward investment you have to decide on, you know, “Is it worthwhile spending the
human resources on cleaning up and labelling the data so that, in the future,
we will benefit from it or not?” And that’s a risk you have to take.
Right? And I think it’s worthwhile taking.
But I’m biased. – I think we mentioned like knowing
what questions you want answered. Right? Like publishers have mountains of data
but, if you don’t really know what the answers, or like the questions that
you want to ask or the answers you want to achieve,
it can maybe feel kind of overwhelming to say, “Oh, here’s all of our sales
data,” or, “here’s all of our metadata.” But if, you know, your question is, “Oh,
how are we doing in the fiction space for,” you know, “sci-fi?” Okay, well, step one, let’s look
at all of our sci-fi books. Okay. Step two is the sales data for the sci-fi
books. And then, by breaking that down,
I think it becomes much more manageable. But you’ve got to, I think,
go into it with a lot of attention. You can’t just be like, “Okay,
let’s throw all of our data into a machine and hope for the best,
it’ll tell us everything we need to know.” – Yeah. And there’s no point to clean,
don’t get it organised until you know your question. Because that’ll then identify what work
you actually need to do because that’s…I mean that’s part of what we’ve built is to
tag everything so that then you can get on to the actual…so I mean our
philosophy is, “Let’s find a way to automatically tag it so now we can look
at thousands,” versus individually. So you lose some of the…you know,
AI’s not 100%-accurate but it means it’s mostly accurate, and then,
someone can quickly review when you’re done with that part. But it also depends on what element you
want to tag. Like we can recognise whether it’s a
Mexican-food recipe or Thai food, right? So you don’t need to go through, you know,
thousands of recipes and do that. But then, maybe there’s other things that,
you know, we don’t recognise yet you have to build a different model for. The other thing is it is hard and there’s
a lot of different types of data scientists and there’s different kinds of
machine learning. And that’s one of the things we run into
sometimes is that it is hard and you have a different
data scientist who’s going to tag for metadata, then who’s going to work on
pricing algorithms? Those are different. And so, that’s sometimes one of the
challenges I see is that it’s hard to get just the information within your company
to then ask the questions and do the…it is, right? That’s why people have PhDs, right?
Turns out, it’s hard. – [Female 1] I don’t have a question but I
just have an idea I wanted to share. Every time I think of AI,
sometimes I think of Miyazaki Hayao, the filmmaker and the animator who even
refused to use a computer to do his original drawing, painting. But I think I agree that,
by introducing AI into our life, we can push our creativeness and
originality further, I mean push us to think more,
think more differently. So that’s what I think about AI. Yeah, that’s it. – I like that idea because then that
changes the way then you bring it into…I mean we joke a lot when I’m, you know,
working with new customers, it’s like when you plan your creative
discussion and you think of the people you want to have at the table but the person
you bring in is Sheldon from “The Big Bang” who’s not your instant like go-to
for creativity but they bring this new way of looking at creative content. You know, and sometimes I think that’s the
experience. So if you have an open mind and you don’t
mind Sheldon very systematically reporting, you know,
this information back to you. – […] of a show. And another point is I do believe all the
things you input into AI…I mean the output totally depends on your input. So it’s like we use AlphaGo to play the
chess but other things depend on your input, which means,
when we pull the AI the things on art or creation,
I don’t think it’s impossible that the computer cannot simulate Miyazaki Hayao’s
style because the way it– used a computer to simulate every specific
Chinese calligraphy, it’s possible but totally
depends on the input. So which means we need
to think something different. – So and that’s the thing.
Right? I mean there might be an AI that can
simulate Miyazaki’s style but I doubt it will be able to create a story that is
consistent, coherent, and meaningful to humans for
one-and-a-half hour, like with written stories.
Right? I mean you can imitate Mark Twain but can
you imitate it over 10 pages and, you know, still make sense?
That’s the other thing. And even if that happens,
then we’re kind of getting to the, you know, infinite-monkey problem.
Right? I mean you take an infinite number of
monkeys and give them typewriters, eventually they will produce the
books of Shakespeare. Right? It’s inevitable that this happens but it’s
not necessarily a reliable thing to happen.
It just is a coincidence. And so, I agree, like AI can maybe imitate
certain styles and maybe make a Miyazaki movie but that movie might suck. – And that’s going to be the 99-cent
Kindle book of the future, right? I mean the 99-cent Kindle book of the
future is the AI-generated Kindle book, and, you know, the 9.99 that you sell will
still be the better quality one. So there’s going to be, you know,
if we ever get to that point, there will still I think be a distinction
between human-generated creativity and AI-generated creativity. There’s going to be something we can tell. It’s like, how many of you actually pulled
up This is not a Person, thispersondoesnotexist? So did any of you see the big hole on the
side of somebody’s head as you were recycling the page? Refresh the page a couple of times and
you’ll see somebody who has, you know, a big hole in the side of the head or some
other kind of weird thing going on. Like it’s not perfect. It won’t ever, in my opinion,
get perfect to the point where everybody can tell immediately that it is…you
know, can’t tell that it’s AI. – It often reminds me of photographs of,
“Look, that elephant can paint a picture.” Right? When you look objectively,
what the elephant does, it takes a brush, it takes colour and smears
it over the page. That’s what the elephant does. Me, as an observer, when I look at the
picture, maybe I interpret that as a pretty picture but that is just
me projecting things. Right? It doesn’t mean the elephant had an intent
and the creativity and the vision of painting a picture, it just had fun
smearing colour around. Right? So we have to be very careful about how we
talk about AIs and what AI does and “How much I project onto the
result of that AI?” Right? And so, yeah, AI can help me be creative,
it can help me solve tasks much more efficiently. But, at the end of the day,
I am the creative person, I am the one who is writing the story,
who has the idea, and the vision, and the intent.
Right? And that’s what an AI just doesn’t have. – And some of the models like that are out
there, if you’re using, you know, Watson, it’s trained on all text,
so it’s trained on emails and Wiki…I mean it’s trained on everything, right? And so, the other thing
of these models, like the models we use and models, right,
is it you can always ask, “What is it trained on?”
Like, “Is it trained on fiction?” If we’re delivering stuff to educational
publishers, well, it’s not using our model that’s trained on fiction, right,
it’s using a model that’s trained on this type of work. And that’s one of the things that can
happen now that didn’t happen 5 years ago. So now, you really can get that narrow,
whereas groups…I mean Amazon is not, they’re purposely
doing these big models that are general enough to be accurate
across broad expanse. But what you’re going to see the companies
popping up is they’re going to be specifically-trained. And that’s like just another
question you can ask. – [Female 2] I guess I had a quick
question related to our relationship with data scientists who are going to be
actually creating this. As publishers or as people working within
publishing groups, are publishers going to be looking…should they be hiring and
is that something that we can expect? Or is this work that will probably happen
through a partnership, or a third party, or an additional company
just as data mining? – I think it’ll get less expensive. I think you sure wouldn’t want to hire the
equivalent of like my team now, that wouldn’t make sense. Now, there’s some companies that can do it
and have that, but I think, in the future, yes, you’d have that. I think a really quality hire right now is
someone who knows enough to ask the questions and integrate. Because again, you know, it’s hard to
build, it’s hard to integrate. So the first step, I would say,
is get someone who communicate, and that’s a lot of times the people that
I’m talking to in different companies are the people who can ask the right
questions, and then, know how to integrate it successfully in
the company. I mean I hear all the time like,
“We don’t want another platform that we have to log into.”
Okay, great, we can API it. I mean these are the kind of conversations
I think you want to get set up. So if I were doing it in stages,
I’d recommend first get someone, you know, if you’re hiring, who can integrate and
understand and ask the questions. And let some of the companies who are
spending, you know, comparable lots of money now because some of the
technology is still new. You know, so I’d say the first thing,
if you figure out how you can use it, now you can decide whether you want to
build it internally. And I also say that to companies all the
time, well, fine, you know, use us to test it and see if it works. And then, if you want to go build it on
your own, then do. You know, but partly,
you just don’t want to…this is my opinion, you don’t want to make that
investment on trying to hire people, and then, being so reliant because you can
only afford a couple of data scientists on exactly what they say. Then you’re limited to only what they can
build and what you want to do is get broad groups of data scientists that you’re able
to integrate from. Would you agree?
I mean… – I agree, yeah. And I think that as…how many of you are
the most technical person at your publishing house?
How many of you… – Top one-person. – Yeah, how many of you are a part of a
very small subset of people at your publishing house who actually know
technology very well? So you are that person in some ways. You’re the person who needs to know about
this stuff because your knowledge and your ability to grok that is going to make a
difference to your publisher as to whether or not they should be doing some other
thing and whether they should adopt that technology, or bring in that person,
or whatever. So your comfortability with this,
your understanding of this is going to make a huge difference to how well your
company can adapt to it as it changes. – Yeah, there’s 100…maybe that’s too
many. Say, there’s 20 great solutions that maybe
you need, I wouldn’t want to then pick one and start building it and, 2 years later,
decide you don’t need it. So I would find ways to cherry-pick and
test and see what works for your company. – Yeah.
That was a really good question. And I’m getting like the signal
to wrap up. – I thought she was just dancing.
– Yeah, I thought she was dancing. Thank you guys. This was awesome,
I think we all learnt a lot. And thank you all for listening and for
your fantastic questions. It was awesome.
Great.

Leave a Reply

Your email address will not be published. Required fields are marked *