Transcript

0:00 · please welcome uran Schmid [Applause]

0:16 · hoer three prisoners were sentenced to death one of them French one of them German one of them American what is your last wish they ask the French guy he says one exquisitor bottle of exquisitor French wine where’s your last wish they ask the German guy he says I want to give a

0:46 · speech what is your last wish they asked the American guy he says I want to get shot before the German starts a speech

1:04 · unfortunately for you guys it is too late now up there is the title of my talk and this is my name so this is how to pronounce my

1:24 · [Music] name and all the things I’m going to talk about um are in many papers on web pages and you can get overviews more than you wish if you go to these Pages or Google for keywords I’m affiliated with the IIA the Swiss AI lab in Lugano and I also have a

1:48 · group at the techn University at M Tech University Munich where there are a couple of pioneering robots all my slides are designed according to recur applications of the harmonic proportion the um golden ratio

2:06 · and that makes that makes them very compressible and therefore has a lot to do with my talk because it’s all about compression and compression progress but before I come to that let me just make a few General comments lots of people especially at this conference are talking about the future but it’s much harder to shape the future and um

2:30 · and I’m going to show you a a shaper here we see him he was mentioned before Marcus huta a couple of years ago did a post dark in in my lab in Switzerland but now he’s a professor in canara and he came up with this algorithm that every computer scientist should know a universal problem solving algorithm which takes any any well-defined problem as an input

Universal Problem Solving Algorithm

2:56 · and computes the solution to this problem as quickly as the fastest unknown algorithm for solving that type of problem that provably solves that type of problem so that’s an amazing algorithm the only um slowdown there is a little bit of a Slowdown which is just a constant slowdown so it’s as fastest the fastest method except for an additive constant that disappears as the

3:21 · problems get larger and larger as for example you are trying to solve a traveling salesman problem and the number of cities gets larger larger this additive constant becomes negligible and uh and so we can say that today most most problems are solved because most large problems are solved because almost all problems are large problems and there are just a few small problems and and um and the only reason why we why this is not the computer science end is

3:53 · that um that within this universe there are lots of small problems that are so small that additive constant which hides the complexity the constant effort requested by a proof searching technique it hides it um and and uh in in this universe these constants still are relevant because the problem small now can you do anything about these constants yes you can use an old trick of Kurt girdle who

4:23 · in 1931 founded theal computer science you can uh produce a self- referential good machine I will tell you something about that in a second uh here we see a picture of girdle and uh and he um he in

Self-Referential Girdle Machine

4:39 · 1931 U used the integers as a universal programming language and he uh used it to describe or to create formulas that talk about themselves that are self-referential and they say things such as I am not provable by a computational theorem proving procedure and in this way he showed that math is either fundamentally flawed or contains

5:04 · statements that are true but not provable and he was an Austrian he did this work in Vienna but later later he he came to the United States just like another great Austrian thinker who then became uh governor of

5:27 · California and we can use self- reference trick of girle to build another type of optimal Universal Problem Solver which is totally self-referential in the sense that it can inspect any of its code and can rewrite any of its code provided that it first can show

5:49 · through a proof that the rewrite is good according to some user defined utility function which can be anything any problem can be written if it is well defined as a as a utility function and so um we have a a self-

6:06 · referential machine this is a rather novel application of good self reference trick which can be used to build a universal Problem Solver which again is theoretically optimal in a certain sense and if you really want to understand the details of that you can download Pages um and papers on on this but uh here I

6:25 · just want to point out that we are currently entering through results like that into a phase where AI artificial intelligence is not any longer just a collection of heuristics but it is becoming a real formal science a real formal science like probability Theory and and established Sciences like that and that’s good because heuristics come and go but theorems are for eternity and this reminds me that um

6:54 · in just a few months we will have the next artificial general intelligence country confence in Lugano the first two were here in the United States and um and the deadline for submitting papers is 15th October I believe now the main message of my talk is that um if we understand if you any

7:19 · understanding of intelligence would be incomplete without an understanding of things such as art and uh science and music and and humor and I will show the you I will show that there’s a very simple algorithmic principle that can explain these things it’s so simple

7:37 · that it fits on a single slide in principle so that is now the most important slide of my entire talk it takes it it contains my take-home message for you if you understand that slide you understood everything there are still people coming in uh from lunch maybe I should wait a little bit and until everybody’s seated but um

8:01 · what should a an unsupervised intelligent agent be be it a a human baby or an artificial agent what should it do how should it deal with the data that is streaming in through the input sensors in response to the actions that it’s executing but first of all and this is a one a very trivial thing to do in principle at least is you should store all the data that is coming in you shouldn’t throw away any of the data if you can and it makes sense because within a couple of years we will be able to store 100 Years of Lifetime at the

8:35 · resolution of a highdef TV um video and and maybe and maybe human brains also can store 100 Years of human lifetime at the rate of maybe I I once made a rough calculation of about well comparably to a to a low resolution M video so in principle that is not a problem but with that by itself you cannot do anything you have to find regularities in this history of inputs

9:05 · and actions that you store and in other words you have to compress it you have to compress the history uh whenever there’s a regularity a symmetry or whatever then you can write a program that leads needs less bits than the uh raw data and still encodes the entire

9:24 · data so that’s what compression is about now let’s define the Simplicity the Simplicity or the subjective um compressibility or the subjective beauty of some data point x given some subjective Observer o at a given point in his life T and that is just the number of bits you need to encode the incoming data the X um at this point in

9:49 · time with the given limited compression algorithm that you have for example most of you know a lot about human faces and and that’s because you saw so many of these faces and now you are carrying around with you some sort of prototype face which allows you to encode new

10:09 · faces in the visual field by just encoding the deviations from the Prototype so whenever a new phase comes along and it looks very much like the Prototype phase then you just need a few extra bits to store that new phase and and your lazy brain likes that because it doesn’t want to um waste a lot of storage space so um the the more the face looks like the

10:33 · Proto type phas you could assume the fewer bits you need to encode it and um the prettier in a certain sense you find it this is just a word we just count the bits that we need to store these uh the new incoming data for example a face

10:49 · that is uh very regular doesn’t need a lot of bits to um be encoded all right now the important thing is not the compression by itself but the first derivative of the compressibility because what’s really going on is that as new data is coming in your compression algorithm improves all the time it improves all the time and becomes a better predictor of the data whatever you can predict you can you can um you can compress because

11:18 · you don’t have to store extra what you already already can predict so prediction and compression are almost the same thing and um and to the extent that your learning algorithms is improving the predictor such that it becomes a better predictor on The observed data so far you are saving bits to the you you can count this progress in in in bits that you are saving and

11:42 · that’s the interesting the only interesting thing um which signif which signifies that there’s a novel pattern in the input stream where you still have some learning progress so what you’re interested in is or what is the interestingness of some data X well it’s not the number of bits that you need to encode the data but it’s the first

12:00 · derivative the change of the number of bits as your subjective learning algorithm based on your subjective previous knowledge knowledge is improving the compressibility so you have to count the number of bits that you’re saving and then once you have that in place and it’s all um you can formally nail it down and implement it on computers and robots once you have that then you just need an additional algorithm an additional learning algorithm a reward optimizing algorithm

12:28 · which takes all the these um internal Joy moments then when whenever you have um whenever you save a few bits it means you have a novel pattern and you count how novel It Is by counting how many bits did you save and that’s an internal reward signal an intrinsic motivation an an internal Joy signal and that’s why you want to maximize the for the future you want your controller that is directing your arms and your uh your

12:54 · actuators to move such that you get additional data from the environment where you can still um get additional types of compression progress of this type where your particular compression algorithm still can make this type of progress and this and there are many reward maximizing algorithms and reinforcement learning algorithms that in principle can do this and this is the basic principle and I’m going to explain in the rest of my talk only how this explains Art and Science and um

13:28 · whatever so again in discrete time uh the formulation without derivatives if you don’t like that the Simplicity or compressibility or beauty if you want of the data is the number of bits you need to encode it given what you already know about the data but then um the

13:46 · interestingness of the data is the the the change in the number of bits so you get the data you learn a little bit on it which means you can now compress it a little bit better so the raw data is like that the compressed data is like that then you improve the compressor a little bit it learns something it becomes a better neural network that predicts the data and now you it takes so many bits and this is what you save and that’s your internal reward signal because you have a Noel pattern which you didn’t know yet and that’s why you find it interesting and you can just subtract the number of bits you

14:19 · needed before from the number of bits that you need afterwards and there there you go so that’s the reward signal give let me give you an example a very simple example of a robot sitting in a dark room the input doesn’t change it sits there and no matter what it does it’s always black black black so it’s extremely compressible input but because it already can predict

14:44 · that very easily because you just can predict by saying the next frame is exactly like the previous one you can totally compress the input and it’s totally boring because there is no compression progress because you don’t feel see a pattern that you didn’t already know now let me give you another extreme example which is just the opposite suppose you’re sitting in front of a of a screen with white noise and there are all these black and white pixels coming with equal probability at you conveying maximum traditional Shannon information or boltzman

15:17 · information and still this stream of inputs is totally boring again because yes it’s very uncompressible you cannot find a short pattern and you cannot improve your current description of the signal which again means that there is no compression progress so this is also boring the only thing that is interesting is stuff like certain types of music which you didn’t know yet but which was maybe a little bit similar to what you already knew about music and

15:41 · what there’s a new little Harmony in there which is which you haven’t heard just in this way and there you have a little pattern where you save a couple of bits and then what’s that’s what motivates you to listen to the same song again once more all right so here we have again a boring white noise and no internal reward for things like that so a discovery in physics for example is just a very large compression

16:07 · Improvement for example suppose you have 1 million videos of falling apples and and they all fall in the same way it’s always the same way they fall down so you can um when you when you’re good you extract the ruling the rule you can extract the rule behind this behavior and it turns out it’s a very simple program that describes gravity essentially and uh it’s always a very short um uh

16:32 · program that you can use again and again for all these many different um videos of falling apples to greatly compress these um orange blowups that are falling down there you cannot compress everything there are random fluctuations and noise and whatever that you can compress but there’s a substantial aspect of the incoming data that you can compress and there you can make a lot of compression progress suddenly save a lot of bits but the same is true also in the Arts suppose there’s a guy who um found

16:59 · a figured out a way of drawing Obama with just five lines such that everybody says hey that’s Obama so you um you have an artist who somehow extracted the essence of of the pH such that you have the same impression as you’re looking at at at at the space that you get when you are um when you are looking at a high

17:21 · resolution photograph with a million pixels so somehow there was a compression progress in the um Artist as he was trying many times to come up with a convincing caricature and um and there is a similar thing happening in in the Observer when he sees that for the first time so the scientists and the artists

17:39 · have something in common they always try to make new data which is compressible in a new previously unknown way a new pattern a novel pattern means yes it’s compressible but in a way that I didn’t know yet such that my compressor can make this um learning progress and can save a couple of bits you know before I came here I thought this is going to be just another Singularity Summit and probably there won’t be much of an audience but you are actually a large Audience by my

18:11 · standards the other day I gave a talk and there was just a single person in the audience a young lady I said young lady it’s very embarrassing but apparently today I’m going to give this talk just for you and she and she said okay but please hurry um I got to clean up

18:39 · here we had a whole bunch of different implementations of the principle that I just explained we didn’t start that yesterday in 1990 uh the first systems of this type were implemented using very simple prediction machines artificial neural networks how many people know what is an artificial neural network all right okay and so you can use them to become better predictors and better compressors of the data history and then we had additional um systems like that but I don’t even have the time to go through that um I would like to mention

19:09 · though that recently two guys in California they took the 1995 model and uh and found that it explains um eye movements of humans better than previous models so why are you looking here and not there well for some reason it’s more interesting there than here but what does it mean more interesting well because this ction of looking there leads to a new input which has some

19:34 · which leads to some input pattern that has something interesting why is it interesting well because there is something which you don’t know exactly yet and where you can still make a little bit of learning progress you can save a few bits that’s what’s driving all the interesting um all the Curiosity but also the creativity of any um intelligent system creativity and curiosity just the same um aspects of the same principle um trying to make new

20:00 · data more data which is compressible in previously unknown ways and we are currently uh implementing in a more complex system these principles on a humanoid like this uh which is called the iub and there’s a big European project where you um where the goal is to implement these principles such that you have all these things again in place you need a predictor or compressor of the data that improves over time and and all the new data that is coming in it’s more getting more and more data um but but you

20:32 · constantly try to compress it and then you constantly try to find new patterns better ways of compressing and then you get the motivation for the controller that is selecting the actions the eye movements the dances or whatever the robot executes which lead to new patterns that aren’t already in his repertoire so direct exploration of the

20:51 · world through science or just like a baby that is exploring what can I do with his fingers Etc I think it’s all driven by this very simple basic principle here let me give you a little example um an application of this I tried uh a thousand times to come up with a very simple geometric description of human faces and usually everything you try it

21:15 · really looks bad and stupid and awful and suddenly something comes along which doesn’t look so bad you know and then you see oh that is really there’s a simple geometric scheme now this one here is um a binary scheme which is based on certain very simple um well coordinate systems and you always take half of the original grid that you get there to Define then all the basic features of this face like for example the slant of the eyebrows the thickness of the eyebrows the U the the slant of

21:45 · the facial sides and all these things and um it takes a long time to come up with any um geometric scheme that makes sense at all most of them just look horrible now of course I’m not the first guy who’s trying to do that Leon Vinci himself he also tried to to um to come up with um proportion studies where you um Define the basic facial features through um mathematical rules but um but

22:10 · not to the same extent not in the sense that you also that you really describe all the slopes and all the details of these um of these images of these facial structures now um this face of course is interesting only as long as the regularity in it the symmetries Etc are not yet known but then it becomes boring like every data that you see over time it becomes boring because once you understand what’s going on there uh and you see and incorporate the uh the the

22:42 · um the the the the regularity and the short program that describes it um you you it gets boring and you want to see something else uh let me give you another example of uh very compactly encoding um images

22:58 · let me start with a bunch of large circles which look very much like what many kids draw with a circle um when they’re young and then we add four times as many circles with half the size and then we add uh 16 times as many circles

23:14 · with 1/4 of size and so on so you see it’s getting more and more circles and now the rule is I can use I can create new drawings but only by using arcs that are on these legal circles that I get there and there are a few large circles and many small circles now I can encode drawings like this one here for example by just describing by by enumerating all

23:37 · the circles for example giving each of them a little numb the large ones get uh get small numbers the small ones get larger and larger numbers and then you can uh very compactly encode drawings like this one here again that’s most of the things you try they don’t look like nothing but but if you try hard then in

23:57 · the end you find something that is acceptable and uh then you’ve got a low complexity artwork like this one now here I’m removing all the green circles which I’m not using and leave only the red circles and I’m which are the only ones that I need to specify the details of the drawing and finally I get um a very simple uh drawing that can be encoded by very few bits of information just a few lines of code much more uh

24:23 · compressible than for example a JPEG encoding of the same image or a gif encoding and that would be then an example of low complexity art which I defined in 1997 in a in a journal uh which is um dedicated to such things Leonardo is the journal and let me give you another example again the same scheme which can be used to very compactly um

24:47 · encode drawings that follow these basic rules and and here we see a a selfsimilar Thum fraile Thum fraile so I REM I’m removing all the circles that I’m not using and then only the green circles are left which are the only ones I need to specify the details of these of this drawing and then finally um we have

25:08 · again an image a low complexity artwork if you want that is can be encoded by very very few bits of information what is next in artificial um creativity and artificial curiosity well our first implementations they use these feed forward networks which are just um device that learn to map an input to an output and um over time be become better mappings between inputs and outputs but um but now we are much

What Is Next in Artificial Creativity

25:38 · more focused on these recurrent neural networks recurrent neural networks how many people in this room know what that might be we can so that’s a ma mathematical model which is very much like um like well it seems a lot like

25:53 · what your brain is doing it has feedback connections that it to implement arbitrary programs mapping arbitrary input sequences to arbitrary output sequences so it’s a general computer and it turns out that there are ways of training the weights the weight Matrix um of these recurrent networks which implement the program such that the whole thing becomes a better Des description of the data that you want to model with it and um just to give you a

26:20 · few examples what you can do with these things this is a picture of Alex Graves my postto who used to be a PhD student at C but now postu in and um and currently the best connected handwriting um systems are using recur networks like this so uh there are particularly useful um variants of recur networks that we developed at the IIA and um they are currently the state of the art in Connected handwriting recognition which is much harder than isolated digital

26:52 · recognition and what else can you do with that or you can predict time series like this so here you have a Time series and looks a little bit like a stock market chart maybe and you give that to your investment adviser and he’s supposed to predict the future of that but but mine is completely unable to do that and but then our networks uh can pick up the

Predict Time Series

27:14 · regularity there is a regularity in this um in this curve here and very exactly predict the future without seeing it so that would be another application or you have robots um that that have tiny little actuators and they are working on while they are designed to to work um in surgery settings and you have a surgeon that shows them how to do and how to tie a knot in very confined uh cavities which is hard but this is then the first robot that learned to um to to tie knots

27:46 · like that using recurrent neural networks like the ones that we developed but so once you’ve got these compression machines these recur networks which are just supervised compressors of the data history and they become better and better at predicting the data and therefore also can become better and better at at compressing it uh then you still you have to look at what happens to these reward signals that you get as you are getting a better compression and then these reward signals they go to the

28:13 · um to the reward optimizers to the reward to the reinforcement learning algorithm and what is a good choice um well we often use certain novel techniques for Co Evolution let me see whether I can get the cursor going there there oh there we are so up here we see no no we see a little robot which is

Techniques for Coevolution

28:36 · um which has learned to balance a pole maybe you see that it has three wheels it’s controlled by again a recal network but now there’s no teacher that tells the robot what to do in the beginning the pole or fell down but then after some time it maximized trial length and became better and better but then it was starting to run against the wall all the time and finally it learned to use its sensors without a teacher uh to to go away from the wall whenever it’s getting too close and uh down here let me see

29:04 · down here we see the same robot but now with two poles on top of each other so now here there’s a joint in between and this robot after 4,000 iterations trials has learned to to balance another pole on top of the first one here you see that but now it’s not perfect yet I I I showed you something which isn’t perfect yet just to show you that there’s really two poles on top of each other 1,000 iterations more and it knows how to do that um or here there is an an object which

29:31 · is moving through water and then it’s uh generating turbulence now what you want to do is you you place little tiny actuators on top of this object uh such that they create little tiny anti- turbulences which kill the vortices such that you can reduce the drag of the object the goal is to reduce the drag

29:50 · and um Let me show what happens if I if I run that here H Okay so here you see a supercomputer simulation of the vortices in the back and down there here you see what happens after learning so first we see what happens um before the learn Behavior switched on and suddenly you will see that these vortices get really small because the system has learned to generate anti- turbulence again through one of these reward optimizing algorithms and the the the drag goes

30:20 · down from to to 40% of the original value which was an excellent result a couple of years ago so that’s what you can do how much more many how many more minutes do I have I still have 10 more minutes okay so I have only two more slides to go maybe three I want to put all of this now into this historic context which is the main the main topic of this conference um here we

30:46 · see the beginning of the computer age uh 1623 Wilhelm shikar he built the first computer you couldn’t program it but you could you could do the basic arithmetic operations already and it didn’t have transistors but it had gears but

31:02 · nevertheless you could compute with this thing automatically and then for a long time nothing happened but then babage came along in England two centuries later in the 1830s he came up with this idea of having a program controlled computer and this is considered as a as another major milestone in the history of computing his machines did not work even 20th century replicas of these program controlled computers didn’t work but at least the idea was there and then we had to wait only 100 more years one century

31:33 · later and and modern computer science started because this guy here T conad he built really in 1941 he completed the first computer that really was working a program controlled working working computer and in the decade before that the foundations of theoretical computer science were lay um were were created through girdle whom we already know and touring then took the results of girdle and rephrase them in this um in in in

32:00 · the context of of of the famous touring machines which are now still widely used in theoretical computer science and and you know just a few years earlier this guy um Julius lilienfeld can we see that Julius lilienfeld he in the 1920s

32:17 · p and it happened within you know 10 years or something uh another Austrian Hungarian by the way who also later then immigrated to the United States and then um lots of things happened but but the main breakthrough after that or the the thing that again changed everybody’s life or lots of lives at least was the

32:38 · the worldwide B exactly half a century later now you see the pattern um so Tim Bennis Lee in Switzerland he came up with the with um with the worldwide web and within a few years it affected the lives of billions now you see the pattern between the major Milestones two centuries differ one century to the next major now one half century um and and you can generate a logarithmic plot and you can easily fit a straight

33:16 · line no it’s going to converge obviously because the next thing is going to happen not one half century but one quarter of a century in 2015 where many people claim that computers will be as fast as human brains for the first time and then we will have an infinitude of additional milestones and and they are going to converge in 2040 which is the Omega

33:39 · Point uh which I like better than the singularity I like Omega much better than Singularity because that’s what um T called in 1900 when he said will reach its next level um also I like it because Omega it sounds so so much like oh my

34:08 · God there’s only one problem with lists like that which is that actually Omega should not have um is will not happen in 2040 it should have happened already in 1540 in 154 I I can prove it to you I can prove it to you at the turn of the Millennium Life Magazine made a list of the most important inventions or most important events of the Millennium and then number one was uh

34:38 · the the invention of the printing press 1444 by Gutenberg and and then um the second most important event in the history of the past 10,000 years according to Time Life Magazine was the discovery of America through Columbus of course many people know that

34:56 · um other people before Columbus like life Ericson and probably others also discovered America but Columbus did not become famous because he was the the first to discover America but because he was the last to discover

35:16 · America and now let’s look at the next uh at number three in Time Life list of um of most important event and that was um the Reformation according to Time Life Magazine because it was the only major religious movement of the past 1,000 years and and you see there’s a pattern 96 years and then between these two events there are exactly 48 years and then half of that 24 years you know it’s about to converge in 1540 you

The Reformation

35:49 · know and so of course there never has been a shortage of prophets claiming that the end is near um and maybe all of this is just a a byproduct of the way we are allocating internal memory space to events in our

36:06 · past because look at yourself most of your mind is occupied with things that happened recently yesterday a week ago not so much a year ago not so much 10 or 20 years ago Etc and the same is also true of course for history books of societies entire Society is writes write a lot of history books but most of the history books are about the most recent events and not so many still a lot but not so many are about say the second world war and you know 200 years ago things and

36:38 · the further you go back in time the fewer history books you will have and um of course this is just a natural thing you allocate less and less storage space to events that are more and more or deeper and deeper in the past so from

36:54 · that point of view at whichever point in time you live it may look to you that the most important things are acceler accelerating exponentially you know it doesn’t really matter when you live so yes I really have to go okay all right think about that thank you very much for your attention

37:31 · are you saying there’s no time for questions no you you oh but Ray crotwell said he is going to compensate by speaking faster all right so