
Transcript
0:00 · All right. So, I’ll stop blabbing. So, now uh we’re going to go to Hector.
0:07 · Dr. Hector Zenel has held multiple academic positions including Oxford, Cambridge, and King’s College London. He is also the founder and chief visionary officer of immune algorithmics, an AI diagnostic spinout from these institutions. He previously led a research laboratory at the Karolinska Institute, home of the Nobel Prize in Physiology or Medicine and served as a senior scientific adviser in artificial intelligence at the Alan Turing Institute.
0:36 · His work spans contributions to the original Wolfram alpha team, collaborations with NASA on microgravity research and editorial roles in journals such as Frontiers in AI and theoretical Computer Science. He is the editor-inchief of the first journal in the field of complex systems founded by Steven Wolram and the originator of algorithmic information dynamics, a framework for studying causality in dynamical systems that combines algorithmic information theory, perturation analysis, and statistical causal inference. [applause] Everybody say hi to Hector.
1:19 · [applause] Thank you.
1:38 · Thanks Martin for that kind introduction. Yeah, by the end I’m hoping to talk a little bit about algorithmic information dynamics and that’s precisely one of the main or one of the misconceptions I want to tackle precisely the applications of comor of complexity and first of all I want to thank everyone for coming and for the invitation to Olaf Alisa and the great teams that have organized this wonderful conference and I’m so glad to
2:04 · see so many familiar faces old friends new friends and colleagues So, let me show you a couple of pictures that I took in the north of Scotland and the north of North Island. On your right is what is called a fairy ring. I don’t know if you’ve seen this before. I was surprised about it because I hadn’t seen any before. I didn’t even know what was it. I was completely surprised walking around and finding almost like a perfect circle on the floor.
2:31 · Well, it turns out that these are mushrooms that start growing from the center and they exhaust the resources and start growing towards the periphery and they build these kind of circles. And this one is not even a nice one because if you Google it, you are going to find some much better ones.
2:48 · And then on your left, what you have is basel columns that are very common in some places. These ones are from the Gayan causeway in the north of Ireland.
2:58 · I know there’s in Mexico, there’s in Iceland, there’s in a few other places and these are basically generated by hot lava getting into contact with cold source of water. Can be the sea, a river or a waterfall. And for me, these two images basically illustrate what we aim in science. So we are the observer. We are looking at some patterns in the world and trying to make sense of it and basically try to explain what is the mechanistic cause behind them.
3:24 · And in these two cases, these are two completely different causes that I just explained you. One has to do with living systems and the other one is completely a physical process. Yet both of them are producing patterns. And our purpose um as researchers in complexity science or artificial life is usually to try to understand make sense and find the mechanistic cause for all these.
3:46 · So I think there’s a lot of overlapping between these three areas especially complex the complexity community and artificial life because I think we have come with methods and metrics to try to understand these patterns and also some overlapping with computability in the sense of for example things like open editness or um self-organization all these uh concepts that are related to computation and let me share with you where I think
4:14 · complexity comes from because you know the typical ical explanation for complexity is that we have a lot of particles interacting with each other and suddenly things emerge, complexity emerges. But I actually think that it comes from computation because you can see in rule 110 that um most of you may
4:32 · be familiar with because it is very popular in the artificial life community and um this is a elementary cellular automaton introduced by Stephen Wallframe and basically it’s a computer visual computer program that runs from top to bottom from an initial condition and has been proven to be during universal and it illustrates with how little you can reach during universality with just a few bits.
4:54 · Basically, you start running this and you get Turing universality and that basically means that you also get unbounded complexity because T universality basically means that you can run any other computable function does not it doesn’t matter how complex it may look like and you can see also this is a nice very beautiful paper that we wrote some years ago. We were trying to look for what we called prime rules.
5:18 · So what is the minimum set for example in elementary cellular automatter that when you combine them gives you the full space and turns out that it is very rich and with only two three rules composing them you can get any other rule in the same space but even more um surprising was this result this is another paper where we basically showed that you can basically break these barriers between uh Wra’s classes
5:44 · so you remember Stephen Wra came with these four behavioral qualitative classes in which he categorized different cellular automata behavior by how the outcome or the output looks like in four classes and what we did was to explore what we call the space of possible compilers. So this space is another space of computer programs they are fixed size and actually is the way you you prove t universality.
6:11 · If you want to prove that a system is tun universal, what you do is to basically uh design some sort of translator that is can be also called a compiler to show that anything that a tur universal system computes you can also compute it and and the other way around. Well, the other way around is trivial. But it turns out that we found that for uh pretty much every cellular automaton, we could find a compiler that basically sits at the top of the original computer program.
6:38 · It is of fixed size. it is not really carrying out the computation. It’s only just translating the input into something else. You can basically emulate any other possible cellular automaton no matter in which class it may be. And you can see that we didn’t complete the picture for elementary cellular automaton. So at the top row you are seeing which ECA is emulating what which is the second row.
7:05 · And you can see for example that we have class 2 cellular automaton like rule 94 emulating rule 90 with a compiler. So that’s kind of surprising because basically it’s telling that there’s not there’s nothing fundamentally different between these classes. Any of these computer programs can really behave like any other. There’s some trade-offs by the way because it is not the same to try to find a compiler for one class one and for class two. So I’m not saying that we are breaking all the um barriers but in some ontological fashion we are.
7:37 · So it is more epistemological because you have to find these compilers. But immediately when you go to general cellular automata which is basically expanding the neighborhood of the local rule you start seeing this more and more. And by the way it doesn’t mean that we couldn’t find a compiler that makes a class one emulate a class three or four. It only means that we didn’t go far enough in the space of compilers to find the compiler to emulate it. It was slightly easier when going to the larger general cellular automatic space.
8:07 · And you can see that we found actually very simple computer programs that are able to emulate very complex ones with the appropriate compiler.
8:20 · In other words, this is kind of an very strong empirical evidence of pervasive turing universality. Basically any of these very simple computer programs are able of doing universality because each of them can emulate every other cellular automaton in the same space and that actually has a name. It is called intrinsic universality and has been proven to be a stronger kind of universality than Turin universality. In other words, it implies Turin universality.
8:47 · So I think this is a very fundamental paper that um but potentially deserves um more attention because it is very cool. So when seeing how easy it is to generate complexity all these potential complexity in very simple systems one question is how many measures of complexity can you count them there are many I wouldn’t know exactly what number but the question will be is the problem of understanding the world a problem of lacking metrics or methods or is it more
9:17 · about understanding the ones that we already have and understanding the landscape of complexity metrics and how they relate relate relate to each other because I think it is easier to basically come up with a new metric that really understand what the other metrics do or are. And um we have a couple of papers in this topic not as much as we would like and one of the things as as a community I think we should do is to understand the landscape of complexity metrics is that most if not all of them are related to each other.
9:46 · And this is kind of an oversimplification but we have two papers. One for example connecting you see the paper at the bottom a fractile dimension with time complexity and we have also a few papers um exploring the connection between shannon entropy with cologor complexity which is not trivial. I’m not going to go into the details of this because it is not the purpose of the talk.
10:10 · But one way to characterize these complexity measures and this is again oversimpl simplifying is that every one of these metrics is basically trying to characterize a feature of a a pattern in the world. The underlying question behind for example classical probability one way to put it is how likely an event or an object is in channel entropy.
10:29 · One way to see it is how diverse an object is if you don’t have access to the probability distribution. uh in colmormorph complexity the traditional explanation for algorithmic complexity is that it characterizes how random an object or an event may be and so on.
10:44 · So we think that all these metrics are basically capturing some fundamental property of the patterns that we see on in the world and that’s kind of the the hope but I’m going to argue that actually not all patterns are equal and we don’t understand very well where to apply many of these uh complexity measures to what objects. Here are some examples of um sequences. The first one is the simplest one is basically just the alternation of zeros and ones.
11:10 · And I’m going to call it statistical because basically the sequence is giving away its simplistic origin nature. Basically just by looking at the sequence if you’re an observer looking one digit by digit come to you model that as a random variable you immediately can make or come up with computable hypothesis behind that sequence and say oh it’s likely coming from for example that kind of computer program. So in some way uh the sequence is giving up some clues of its mechanistic origin.
11:41 · But turns out actually that the most interesting sequences and many sequences are of a different nature. I’m going to call them non-statistical because they don’t give a wave so easily their mechanistic origin. And the second one some of you may be familiar with it is a two more sequence and you start with a zero and you binary negate the first segment. So uh the next digit is one and then the new one is a 0 one. So you negate again.
12:11 · So 0 1 1 0 will be the the next step and so on. I just described for you an infinite sequence a [snorts] very short fashion. This um sequence has been proven to be boral normal which in other words also means that it is maximal shon entropy rate. In other words, you don’t have any over representation of any subsequence compared to any other sub sequence in this in the whole sequence.
12:38 · There’s no statistical regularity. So any statistical metric that you apply on this sequence if you don’t know that is deterministic because that’s the whole point. You could you could ask well we we already know that is a tumor sequence but as an observer you don’t I’m modeling this as a random variable and digits are are coming to you.
12:56 · So it will be very hard to tell anything about this sequence if you don’t know their mechanistic origin. We have here another sequence which is the concatenation of the natural numbers.
13:07 · This is another number constant. It is called the cheronone constant that is completely deterministic but has also no statistical regularities is also boral normal. It only makes sense to us because we have a convention on how to how we name natural numbers. But otherwise there there are no statistical regularities in this number. So again I’m going to call this sequence non-statistical because it is not giving away its simplistic origin.
13:31 · And lastly there’s a a a slightly more complicated sequence because we don’t have that convention. It is not the successor function. It is a segment of pi which as you may remember from the BBP formula basically you can calculate any segment of pi without having to calculate the first digits and for all purposes it looks random.
13:54 · I think it hasn’t been proven to be bor normal but if you plot the distribution of uh numbers and it doesn’t matter the subsequences there’s no statistical regularities in this sequence either. So what all these sequences have in common is that they come from a mechanistic causal origin. But some of [clears throat] them are more easy to find out than others.
14:15 · And actually we have a nice paper that shows that human minds exploit these non-statistical regularities. And one way to see it is for example when you learn a telephone number if that number doesn’t have any regularity other than for example it is just the o or even numbers in order you don’t have to remember the whole telephone number. you only remember that is the first first segment of prime numbers or whatever it is.
14:38 · So it is not statistical it’s really algorithmic and I think this illustrates also this tension in science there are two kind of schools one that um uses a lot of statistical models and this is one of my favorite examples from textbooks. This is a distribution of times between cars on a road and they turn out to distribute in a poson distribution with this parameter which basically means that cars come in batches and it almost looks like magic when you see it the first time because the statistical description is not giving you the cost.
15:12 · It’s just telling you how the times between cars distribute and you would think at the beginning perhaps that everybody’s kind of agreeing at what time to hit the road but obviously that doesn’t work. Um I mean in to some extent that’s true because we kind of have similar working hours but it doesn’t explain that all cars come like in these patches and the explanation is very simple. There are always like slow drivers holding back faster ones and that’s why this phenomenon emerges and actually you can destroy the pattern just by adding new lanes.
15:41 · So you can make interventions and then you destroy the pattern and you can reconstruct it by adding lanes and then adding cars and then you have slow drivers also holding back other behind. But you can see that the whole interpretation and the cost that I just given you comes from me. It didn’t come from the metric from the distribution. I made the whole interpretation afterwards. So that’s what I call postto interpretability.
16:08 · On the other hand we have modeldriven um science. So we have a model that basically is telling you a state by state and corresponds to the observation to the states of the observation and you can tell whether it corresponds or not and update it and it doesn’t matter that the model is wrong. It is already giving you an idea what may be the mechanistic cause and mechanistic means exactly that that you can follow the process step by step. Here there’s nothing to follow. It is a descriptive uh thing right.
16:38 · So one of the first things that we did for example was to try to explore how statistical metrics in particular channel entropy may be objective or subjective um or dependent on you describe things and we came up with this very nice example of a graph that we created on our own. The graph is constructed in the following way. you start with node uh one with label one and then you keep adding nodes so that the number of edges corresponds to the number of um edges that each node has.
17:10 · So if you look at for example step two, node number one has one edge. Node number two has two edges and node number three doesn’t have three edges yet and that’s why it is in blue color but in the next step basically you complete it and you keep completing this number of edges so that they coincide and this was completely on purpose because the degree sequence which is a way to describe this graph and allows you to reconstruct the same graph looks like maximal entropy on purpose because this is again the cheronone constant which is bor normal
17:43 · it is maximal channel entropy it and it’s a completely legitimate way to describe this graph. The other way to describe it is with a genens matrix and you have the two descriptions on that side and the des the genens matrix basically looks like all the diagonal with the ones obviously is low shown entropy. So you have two ways in which you can describe exactly the same deterministic graph that has divergent shon entropy values and and again you could you could say but this doesn’t make sense. What is the Shannon entropy of something deterministic is is zero.
18:16 · We already know it, right? But if you you knew it, you it doesn’t make sense to apply Shannon entropy in the first place. What we do is to apply these tools when we don’t know anything about the object. We don’t have access to the probability distribution. We are an observer seeing an object unfolding over time. And if you are seeing this um object unfolding over time and you representing in two different ways, you get two completely different channel entropy values.
18:44 · So um what we have is this situation in which you have the observer and a metric could be either the example with the times between the cars or could be the graph that I showed you before where basically the borden or the weight of the interpretation is on the observer side and not the metric. the metric is not really very useful because it is highly observer or description uh dependent and ideally we should have
19:10 · like a more balanced uh situation I mean in the ideal world the metric should be doing all of it and finding the models but if we don’t have that ideal case at least there should be some balance where you are looking at some process unfolding and basically you can have access to the generative mechanism and this is what usually happens with an observer that it can be misled.
19:32 · So either this is again another elementary cellular automaton just for illustration purposes and if you’re an observer you could be looking at different places of this system unfolding and you could be misled into thinking that it is completely random for example if you are looking at the right or that it is not random. It is very simple if you are looking at the left and an ideal observer should have access to the source code.
19:55 · So this is very much along the lines of the theme of the conference because we are trying to decipher this system that we are seeing as an experimentter or observer and what I’ve been proposing for the last 10 15 years if not more is that moor of complexity usually seen as a measure of randomness but it’s actually a method of causality a theory of causality and that’s because the theory helps you basically filter out the random cases.
20:26 · And then you are left with causality and triviality and with statistical metrics you can also identify triviality. So what you are left with is basically the middle and this is because more of complexity is um suggesting that you have to find the computable model that explains something and you don’t find it that’s when something is random but basically the metric itself is telling you the purpose of uh trying to find that computable object and I’m going to describe it here. So this is for those not familiar with colmograph complexity this is the official definition.
20:56 · So the so the compograph complexity of an object S is basically the shortest computer program P that running on a universal constructor basically gives you back the original object in this case S and notice for example the first thing is that you need a universal constructor but it doesn’t have to be a universal Ting machine it can be any universal system there doesn’t have to be a head or a tape that basically you
21:27 · discount it this metric is uncomputable as Martin Martin was saying because you are requiring uh the system to find actually the shortest computer program but as as soon as you relax the condition of not finding minimality the metric becomes computable because you can always keep computing to find a shorter um model.
21:46 · So for the founders of the theory it was very useful to have this requirement of minimality to prove all sort of mathematical theorems but in practice is not really very practical but I think embodies what we do in science. In science we have an observation we want to come up with a mechanistic model or computable model which I’m using as synonyms because I think they are it doesn’t have to be the smallest one. So what we do in science is actually to try to find perhaps the smallest one.
22:17 · But we we don’t require to only find the smallest one. We always keep like looking for another one.
22:23 · Smallest smallest also in terms of how much you can explain with your model or your theory. And the diagram on the right is only to show that uh this definition is not only relevant to computer programs and binary strings. So this the space of computer programs basically includes all the mathematical functions that you can think of including ordinary differential equations uh partial differential equations uh recursive functions and so on.
22:51 · So even in the example I put before for example in this one this is obviously included in the space of computer programs that doesn’t mean that it has to halt or not and that’s another question and and this this should be even more obvious because basically 100% of what we do in science goes through a digital computer so it is all numerical solutions to these kind of equations and if you use the argument that cologor
23:16 · complexity cannot be applied because it is uncomputable we wouldn’t be doing basically old modern mathematics today there are very few things that are actually decidable and computable and those include propositional logic ukidian geometry presser arithmetic cartesian geometry above that anything
23:35 · else is actually not computable and that doesn’t mean that we don’t use it or we discard it or we don’t approximate or deal with these theories and actually it’s very easy to see how something can be not computable in mathematics I think it has been proven that this set of equations is the smallest one describing the Loren attractor. But in general, if you ask these questions in mathematics, that is immediately uncomputable. What is the minimum set of differential equations describing a dynamical system?
24:04 · So it is very easy and quickly to find uncomputability and undecidability in modern mathematics and that doesn’t mean much. Imagine that we would say starting a paper saying I’m not going to use calculus or theory of analysis or or measure theory in my opinion. Starting move to the mechanistic model side of things try to explain things.
25:06 · So before before we had the next presentation we thought it could be uh divine intervention and actually the planets have names of gods or it could be magic but basically science has been moving all these processes towards the other side and I think this illustrates very
25:26 · much the spirit of cooraph complexity of trying to move things that appear to be random but actually keep finding computable models shorter than the thing that you want to describe. and move them to the other side and colmore of complexity has um motivated or inspired a lot of research in the last 20 25 years. One of of the most perhaps landmark papers is this one from Lee Celibbrasium vitan. Actually it’s not Lee in that paper.
25:56 · So it is only celibbras and vitan showing that with statistical compression algorithms you can basically reconstruct for example a phot genetic tree that corresponds to what we know in evolutionary theory just by compressing mitochondrial uh DNA sequences.
26:15 · So that’s that’s kind of surprising but actually we are also kind of critical of using um popular statistical bloodless compression algorithms because they are more related to Shannon entropy to really call more of complexity. The only thing that makes them related to colog of complexity is by way of Shannon entropy and they are not adding much more when you use popular statistical compression algorithms like LCW or uh GCP and so on. ourselves.
26:38 · We apply this for example to things like um organic versus non-organic molecules in this paper I think in 2018 and we were able to show that we could separate these two groups quite well with a threshold between I think we did it on over 50,000 molecular compounds or something like that. We did it both with compression but also our own method that combines compression with something else that I’m going to explain.
27:06 · We thought this was worth reporting but we never thought that this would be like u more than that or explain evolution or selection or anything like that because there has been some suggestion that suggestions that separating these groups what we believe are compression algorithms give you an insight into selection or evolution theory. But we’ve been applying all this in u many applications including a problem that is called nucleosome positioning which is the second most important problem in molecular biology after protein folding.
27:39 · It’s about finding these nucleosomes. Some these are like packages of uh DNA that give you an idea of what genes encode and and we always compare our metrics with the best gold standard in that field. uh so in this case for example I think is the Kaplan model with which uh they were using nucleon positioning so when we use a a new metric we compare to the metric that was
28:07 · used before so the question is obviously we can do better than statistics or um lossless compression like LZW unfortunately we can it is very expensive but I’m going to tell you how we have solved part partially solved this uh problem of resources
28:25 · And the alternative is called algorithmic probability that was introduced by Ray Solomonov and you see a picture of him in this famous Darmmouth college workshop on AI where basically the AI term was used for the first time and basically there’s an agreement that artificial intelligence could be traced back to this particular moment and Rey is uh together with other people like Marvin Minsky um McCarthy and even Shannon.
28:54 · So this was a very important meeting and it has been acknowledged that basically Ray Solomonov gave a solution for artificial intelligence and optim optimal inference and I’m going to rely on what Marvin Minsky actually says about it. So you don’t have to trust me. Uh but let me explain you very briefly how beautiful this theory is. So you may remember this idea of the infinite monkey theorem. uh that was introduced by Emil Borrell.
29:22 · The idea is that you have a monkey typing on a typewriter randomly. So basically the monkeys are source of randomness. You don’t really need a monkey. But but the experiment is to ask questions like um how long it takes to write the works of Shakespeare using a source of randomness. And if you actually do the math, it turns out that just getting the first paragraph of Hamlet, for example, would take longer than the edge of the universe.
29:51 · But the interesting thing happens when you replace the typewriter by a laptop. So now every keystroke instead of being a letter of Hamlet is a piece of computer program. And this works very nicely with this other object. So this is a mathematical constant pi. Let’s ask how long it would take the monkey to get the two the first 200 digits of pi.
30:14 · Right? It will be 1 / 10 which is the base of pi to the 200. Right? So, uh it’s exponentially more difficult the more digits you want the monkey to get right and this is an infinite number.
30:28 · But when you replace the typewriter with a laptop now the monkey has to write a computer program that produces pi not pi itself. And pi because it’s very short that has a very short description or short formula actually many. So that’s because it is low colograph complexity then it has a very high probability to be produced by a random computer program.
30:50 · So you see how things change dramatically when you change something so easily and this is a beautiful theory because I just connected complexity theory to probability and that’s why it is called algorithmic probability and this that’s the the relationship between probability and complexity is given by that coding theorem and don’t take my
31:10 · word for it this is what Marin Minsky said about algorithm probability a few years before dying it seems to me that the most important discovery since gel was the discover discovery by chitin solomonov and co mugar of the concept called algorithmic probability which is a fundamental new theory I wouldn’t call it new theory but maybe for him it was of how to make predictions given a collection of experiences everybody should learn all about that and spend the rest of their lives working on it so
31:41 · basically he was describing my life because I’ve been devoted to this for the last 15 20 years and it looked almost like he was um regret threatening not to spend his own life in this. He had an amazing life and he’s one of my heroes especially his book on uh how how is it called infinite computation or something like that. It is a green book but he gave this quotation basically a couple of years before passing away a few years ago. So what we came up with is was this um method and formula.
32:11 · It is a greedy algorithm because basically decomposes a larger problem into smaller pieces. And we call it the coding theorem method because it uses this relationship between culmer of complexity and probability. And then block de composition method because we are doing exactly what that illustration is uh
32:31 · showing which is once you have like a large object you decompose it into smaller pieces because the idea is to find a computable model for the large piece but it is so large that the space of all possible computable hypothesis is very large. So you decompose it and basically you try to explain each of those pieces separately and then you put everything together and basically the concatenation of the computer programs give you an idea of how you can construct the original larger object.
33:01 · Obviously it may look like we are doing something quite arbitrary because maybe we find only like one program but it is not the case. So actually what we do is to find a lot of computer programs. But before getting there, let me describe a little bit this formula because it has a very interesting component. The worst case scenario for this formula to behave is just like Shannon entropy because of the component at the end. It is the log of n. So basically it’s counting repetitions uh assuming a uniform uh distribution because you don’t have access to the probability distribution is is basically here the assumption.
33:32 · And another way to see this formula is that is a weighted version of Shannon entropy because the CTM part which is basically looking for computer programs that can explain pieces of the original object is giving you some clue of how the probability distribution may look like whether is it is dealing with something deterministic or not. So in the in the best case you don’t have to run the Shannon entropy part you find exactly the program that produces the original object.
34:01 · But in the worst case if you didn’t find any computer program basically you are just applying channel entropy. So for us it was very interesting to see how we could combine like the two of the best worlds because we have the statistical component that we don’t completely discard but we also have the causal component that if we are lucky but basically we can say something about it. Another way to see it is this kind of a iterative process. Um it is not exactly like evolutionary uh programming but that’s one way to see it.
34:31 · It is more like a leving search if you are familiar with it and this concept of doveetailing basically you start running all computer programs at the same time and when you don’t have enough resources basically you go in cycles running every program one step at a time from smaller to larger so obviously it is not only very expensive but it is also very uncomputable especially if you request to find the shortest computer program but as I said before that is not our purpose here it’s just to get a clue of the coal content of an object.
35:02 · So because we decompose the original object into small pieces then we can find computable hypothesis for those small pieces. But actually we also help the leving search by doing something else. We do perturbation analysis. So we change a little bit the original object to see how the computable hypothesis that we have found on the other side actually change over time.
35:24 · So I’m going to show you some examples and what the algorithm comes back with is basically what you are seeing on the right which is completely transparent method that can give you even the state space diagram the automaton behind the data and here is the example of for example concatenating all the natural numbers which is the chump own constant which has no regularities and how the process actually found the computer program that is able to explain the natural numbers.
35:54 · Yes. So this is an illustration of how this works because basically we have like a sliding window looking into an object and trying to find some what we call the cosal patches. And in practice basically we traverse the whole object.
36:09 · We don’t only focus in a few places. And there’s a whole paper even looking at what happens with the boundary conditions and many other things comparing with other metrics. And all this works basically because we have this nice uh theorem on your right which basically is telling you that if something is causal most of their parts have to be causal as well. And the other way around if you have something random its parts cannot be causal or non-random.
36:34 · So there’s a correspondence and notice that I’m talking about the level of description because I’ve got already some questions about for example what happens with thermodynamics where where perhaps the molecules are very complex but but the overall behavior the emerging behavior is very simple that could be the case actually but here I’m fixing the level of description so you can see the comment there fix the level of description this is actually what happens and we can run this and already pinpoint
37:04 · For example, some cases in which Shannon entropy would tell you that something is random, but actually it is not. Here is a very small example with two strings that look like high Shannon entropy, but actually have a very small computer program that can be produced. And this also is very popular in textbooks of Shannon entropy and information theory.
37:25 · If you plot all the values of Shannon entropy applied to a set of binary strings, they distribute into a Bernoli distribution like that. And if you apply lossless compression algorithms like LCW or GZIP, you get exactly the same distribution. So they coincide and that basically tells you that they are doing exactly the same thing. They are not more related to more of complexity than what Channon entropy was already related to.
37:51 · But then our distribution actually starts looking different and it is normalized to the top and bottom obviously and it starts building a longtail distribution which which conforms with the expected distribution which by the way is called the universal distribution and it is a concept in algorithmic probability. There’s a beautiful paper called the miraculous universal distribution by bit published 15 or 20 years ago also talking about algorithmic probability.
38:18 · So another common misconception that algorithmic probability can um give you um some clue about this um idea that ting machines, digital computers or even algorithmic complexity can’t deal with uncertainty.
38:32 · That is very common because our idea of a ting machine is something that is a very deterministic step by step and this is the the core of algorithmic probability. You can write a universal ting machine that basically is some sort of belief updator because what you have is that you have found a computable hypothesis, a computer program that can explain an unfolding process like it sequences on the screen.
38:53 · And when there’s new evidence, a new observation, and it doesn’t coincide with the output of that computer program, basically you go to the next computer program that best describes the new sequence. And actually you can assign a confidence value to the computer program explaining the sequence because algorithmic probability is telling you that if you have more computer programs explaining a single sequence like the digits of pi, it is of lower algorithmic probability.
39:22 · So you have this concept of actually having more or less computer programs explaining an object and then assigning a confidence value and then actually you computable hypothesis are also sorted by size which somehow captures the this idea of Okam’s razor so you favor the simplest theory in this case the simplest is the shortest but you can easily see how you can implement like a basian estimator and actually solonov and levin show that algorithmic probability is the ultimate basian predictor.
39:53 · It’s the optimal theory of inference. Actually, it is more like abduction because what you have access to is partial information. So, it is the optimal theory of abduction.
40:06 · And with our own approach, for example, we can find that the same computer program is able to explain many many of the subsequences of uh a sequence like pi. And we have done a lot of experiments showing that actually we pick up signals that indicate that some of these objects are causal rather than just statistical. So now it is the concept of algorithmic information dynamics which is this idea of introducing a perturbation analysis into the mix.
40:36 · And you can see how interesting this is because we already combined Shannon entropy with colmormorph complexity. And now we are introducing a new another concept that is also fundamental in the theory of causality today and was introduced by some people places like Carnegie Melon but also Judia Pearl with the do calculus. So this idea of perturbing a system to try to find out and reveal causal mechanisms and here we have some examples. So let’s say we have like for example the sequence of all ones and you can either perturb the object randomly or not.
41:10 · Let’s say that the perturbation actually is a flip of a a digit into zero. So the original explanation for that sequence was was very simple because the sequence was was very simple has a very short computer program. But when you do a perturbation like this one, then the computable hypothesis changes the length. And remember also that I’m not talking only about one computer program.
41:35 · We find like hundreds or even thousands of computer programs explaining the same object. So you can take the average of the length of your computable hypothesis. And the other example is you you start from a random object and then you do a random perturbation. Then your computable hypothesis doesn’t change uh much because that was already long to describe.
42:00 · And when you introduce a random perturbation into a random object, basically the computable hypothesis remains pretty much the same in terms of length and everything else basically lies in the middle. But you can see how interesting this is because the nature of how things change in software space is telling you two things. The nature of the original object and also the nature of your perturbation.
42:23 · So the way I like to see this is almost like the matrix for those familiar with the movie because we are basically trying to relate the real world which I mean it looks like sequences but it could be anything with the software space or the algorithmic world. We are trying to explain everything with computable hypothesis and what happens on this side when something happens on the other side. So it’s almost like seeing these green screens in the matrix to try to find out things about the other side. And you can as I said you can do this with anything.
42:56 · It doesn’t have to be binary sequences or sequences at all. You can do it with networks. And if you can do it with networks, because there are adjacency matrices, you can do it with images, but you can also do it with vectors, tensors, and pretty much anything that has a computable description, you can apply these ideas. So here are three examples, two on the extreme of the complex networks spectrum where you can describe a graph or network that is very simple with a very short computer program. So for example, the complete graph is basically connect everything to everything else.
43:28 · I just gave you the description and it doesn’t matter the size of the complete graph. The description almost remains unchanged. You only have to change the number of nodes that that grows the description by the logarithm of the number of nodes.
43:43 · But if you have a random graph, especially an algorithmic random one, then you cannot take advantage of any regularity and every time that you grow the network, you have to describe exactly where the edge is going to go.
43:57 · So the growth of the algorithmic complexity of a random graph basically grows by the number of edges and everything else is like in the middle complex complex networks and then you can start playing in the same way. What happens when we remove a node or an edge to the set of computable hypothesis? Do they grow? Do they shrink? What happens when you remove an element that makes your computer programs go smaller?
44:20 · We call it a positive element because basically had like a shift towards a shorter computer program and when you remove or or perturb an element produces a longer average of computer programs we call it a negative one. So we started applying all this it doesn’t it didn’t remain in the theoretical realm only. So for the last 10 15 years I’ve been associated to positions in different places where we apply all these ideas and this is a very famous network.
44:53 · It is called a transcription factor network and it is from a bacteria ecoli and this is validated because they have basically done in the lab knockout experiments where they silence every one of the genes and basically see how they connect to other genes. So this has been already validated u experimentally. So we colored all the nodes according to the definition I gave you in the last slide.
45:22 · Whether removing that node if you look at the computable hypothesis space whether that is increasing the size of your computer programs or reducing their size and then we go to are called genome browsers. If you are not familiar with genome browsers, basically these are online repositories where you provide the clusters of genes and basically what you get back is whether those clustering made any biological functional sense. If you are clustering by anything interesting.
45:51 · So is a very nice way to validate whether this kind of clustering is doing anything or not. If you are not doing anything interesting basically the result is flat that no clusters come back. In this case, we got these clusters to our own surprise. But but it’s not completely a surprise because this network is already showing I mean the topology of the network is actually underlying functionality behind this biological entity.
46:17 · So we used three different genome browsers and the three of them came back with these six clusters bas basically showing that by clustering these nodes that are genes we are actually clustering in some functional way which was very interesting. So we kept exploring obviously so we took a few more networks. This one is an immune cell called TH17 which it is um not completely differentiated.
46:46 · So it is not a completely stem cell but it is in the middle of the process. It is a stem cell for the process of immune cells and there are three time steps towards completely maturing the cell and then we apply the same idea because we have a genetic network for this system over time. So we have three steps that’s why there are three plots and on the other side there are also three columns.
47:08 · So at the beginning when you do our analysis the perturbation analysis with algorithmic information dynamics what you get is about the half half of the genes can move the network towards randomness and the other half can move it towards simplicity.
47:26 · So in some way is like highly programmable because you can move it both ways. Then in the second step basically the genes flipped all over. So now you have actually half of them that were positive before almost negative no kind of it is it is noisy but then we
47:43 · had the third step where it is a completely mature cell and basically it was kind of locked in even in algorithmic space meaning that there’s no change that you can make to the network removing any nodes that would make the computable hypothesis grow on average and again that was an interesting result so we apply it to many more networks uh genetic networks under underlying a lot of uh type of cells that you are seeing on the right and then we were able to reconstruct a wadington energy landscape with this idea of reprogrammability.
48:15 · So how many genes can send your system to towards one side or the other and that was again very interesting and obviously I’m going very fast through all of this but you can see the pointers to the papers like I science so in some way we have this
48:31 · very interesting connection between the algorithmic space and the dynamical system space and we did all sort of experiments even with boolean networks for example how many attractors will be on average when we remove some nodes or the others or reachable states for example And in in all cases, we found some statistical significance that we were actually picking up some interesting signals. If you are interested in these ideas, there’s a very nice video produced by nature out of a paper that we published a few years ago and uh the you can scan the QR code.
49:03 · We didn’t pay for this video. They were just I guess excited about our paper and they produced it. We published this in nature machine intelligence six years ago. But let me summarize very quickly some of the common misconceptions that if I didn’t convince you uh otherwise at least you now have the an alternative view of them. So misconception number one called morph complexity cannot be applied because it is uncomputable. Most areas of modern mathematics are incomplete undecidable and uncomputable.
49:36 · Codmorph complexity requires a universal Ting machine an account perhaps even for the Ting universal Ting machine and perhaps the humans that have to build the digital computer side. I’ve heard that argument before. For me it doesn’t make sense but it’s a common misconception that things don’t make sense because you need digital computers to apply colmormorph complexity to make any sense. Okay, I’ I’ve seen that before. Colmor of complexity only measures randomness. I mean it is a mathematical metric of randomness accepted mathematical theory.
50:05 · Uh but it is not only that I already gave you my own view that I think it is a theory of causality. Ting machines are niche and can’t deal with uncertainty.
50:19 · It is false algorithmic probability is exactly the opposite of that. Colmor of complexity can be approximated by statistical means like lamp cif and popular lossless compression algorithms that is also mostly false. I mean it is true only because it is connected to Shannon entropy and Shannon entropies connected to colmor of complexities in some way something that is low shon entrop is also low calmor more of complexity but not the other way around
50:48 · the machines can’t deal with real data is also false uh and that should be almost obvious in science today because basically we use digital computers for literally everything so I’m going to stop there and before that I perhaps a couple of books if you’re more interested in these topics. Thank you very much. [applause] [applause] Okay, thank thank you Hector for this great talk. We have time for some questions and I see already Hioki has a question.
51:32 · They’re working. Thank you so much for super nice talk very interesting. I have a bunch of question but I would choose just one. So uncomputability or undecidability can be seen as kind of challenge that the science or learning m mechanism need to cope with or the other way to see the same thing is maybe that’s the reason why science exists the learning. So can you comment from that kind of alternative view that’s the source.
51:57 · Yeah right.
51:57 · Yeah absolutely. So for me on computability is the almost this idea of open-endedness.
52:03 · Yeah.
52:03 · Because in science if we were able for example to find this um minimalistic model we would we wouldn’t have anything else to do. So is this idea of actually having something that you have to keep computing over time and it is a neverending process. So I agree that for me on computability is actually what you require to deal with the complexity and openness of the world.
52:26 · Yeah. Thank you.
52:27 · Yeah. Thank you. So it is a feature not a box in my opinion.
52:32 · Uh Asa thank you for the wonderful talk. Um just a weird question. So in training neural network there is a phenomena called grocking and when you train the network uh like a long enough then suddenly uh even reach the bottom of the loss then loss just dropped and suddenly yeah so it’s called rocking.
52:54 · So I’m uh wondering if this is related to like uh the network can finding the the the good program and if it’s related then uh what happened in the training of neuronet network that can uh make this happen.
53:11 · Yeah, I’m not familiar with the concept, but definitely we have done some work on trying to apply these ideas to AI and neural networks with some success. And you may have heard people like Elon Musk and um uh also Ilia, I don’t remember his last name from OpenAI basically saying that all this is very much connected to colmormorph complexity and compression.
53:35 · Uh but yeah, I think there there’s a lot of potential in neural networks. Our main uh challenge is that we are dealing with a uh discrete uh theory and for things like um gradient descent and those kind of things that require differentiability is kind of a challenge but we have been able uh to combine them in um interesting ways. So I I think there’s a lot of application there and we already have a few papers about it.
54:04 · By the way, something else I should mention is that at the core of neural networks, there’s these loss functions that are always based in channel entropy. So that that’s also another thing to consider. So that neural networks are basically statistical uh tools that may give the impression to deal with causality and maybe in some way they do as as long as you have like these statistical regularities. But for more complex things they definitely fall down into the same correlation trap.
54:37 · Uh thank you for the very interesting talk. I have a question about um do you think that there are some links between the step that you use to compute your measure and the steps that are necessary to build uh epsilon machines.
54:53 · Yeah, interesting question about epsilon machines. So in some way I like epsilon machines but I don’t like the terms that have been used to define them because I don’t even think they are really machines or they are really related to software space or algorithms. So in some way they are very misleading. Uh so if they hadn’t used that narrative I would have perhaps taken um in a much more positive view the the whole theory. But I think it is a still interesting.
55:18 · I think it lives still in mostly the statistical world and they can be like improved perhaps by combining the some of these ideas. Uh but some interesting work has been done uh there.
55:36 · Okay. Yeah, you can.
55:40 · Yeah.
55:40 · Thank you. Wonderful. Um my question relates to computational complexity like complexity classes because you can have a a problem and then solve it in one way which might take I mean I don’t I don’t think of the size of the program but how long it takes right so some algorithms take a lot of time some others take shorter time and uh I wonder if what you have done relates to this uh because you can
56:06 · find a I’m always been thinking about what is the stupidest way to solve to solve uh a list of numbers, you know, because the the the fastest way is kind of n loginish form. But now I think that the most stupid algorithm will be like just the flip run monkey sorting, you know, and and now there I see there is a boundary on the most stupid way without being redundant. Trivial but not redundant. So I’m looking if you have been uh thinking about that.
56:34 · Yeah, one of the first papers I showed uh deals with uh those connections. The one that relates fractal dimension to um time complexity. So it is a fascinating um topic uh for research and is very much open how it relates to each other. We found some interesting things to say about the connection between uh time complexity and algorithm complexity which is related to fractal dimensions.
57:04 · So you can see why it is so interesting this paper because it’s already connecting several complexity measures.
57:11 · But when it comes to time complexity is very interesting because unlike colog of complexity that doesn’t really depend on the size of the universal touring machine or the architecture time complexity fully relies on the architecture of the touring machine. As soon as you change the underlying definition for example use a cellular automatan which is highly parallel then your comp some of your complexity classes collapse.
57:34 · Uh so unlike other areas time complexity is very much dependable of on the architecture of the ting machine and actually we already know some examples like in real life doesn’t work that way. So if you want to sort for example this is my favorite example I came up with hopefully makes sense.
57:52 · So if you have like a straws and you want to sort them by size like uh plastic straws that you use for drinks, uh basically the thing that you can do in the in real life is just to hit it against the table and basically in constant type the time no matter how many straws you you have them sorted. So we already know that in in in physical terms uh time comp complexity may not fully correspond to what actually happens in the natural world. But that’s completely different to colmormorph complexity.
58:22 · You cannot really use that sort of uh arguments against colmore of complexity because there are there are none for two reasons. For time complexity, you don’t require Ting universality. For Colmograph, you do.
58:34 · And then you may think but what happens if I change the underlying universal Ting machine? We have this invariance theorem in colmore of complexity that is very powerful and basically tells you that changing the underlying computational model is bounded by a small constant and basically converges which is something we don’t have in chon entropy or many other areas
58:56 · one one last question hopefully not too long I promise I’ll be quick uh the ecoli experiment is super fascinating I was just wondering a very silly technical question how long did it actually take to run and comput that.
59:11 · Oh, okay. So, how long it takes to run the these methods? Yeah, that’s a very interesting question. So, what we are doing is basically to start store partial solutions to many of these objects. So, we exchange computational resources for memory basically. Uh so once you have seen something before we store it, we have a gel number basically explaining that binary or non-binary string. and we don’t have to recomputee it again.
59:39 · And if we want to explore the computer program, basically we go to the golden number and and basically uh unpack it.
59:47 · uh so for things that you haven’t seen before is very expensive and ultimately is as I said uncomputable but not as uncomputable as requiring the minimal program because we are relaxing that condition but you can make very much educated um guesses and accelerate things but it’s one of the main challenges if this were cheap we will be basically solving science so that’s exactly what we cannot do and we have to combine it with statistics Okay.
1:00:18 · Uh then let’s thank Hector one more time and then that’s it. [applause] [applause] Perfect. Awesome.