DEV Community

Marianne
Marianne

Posted on

Code Generation (featuring Tikhon Jelvis)

Summary

Part of Marianne’s flash of inspiration came from a talk she’d seen about translating programs to a form Z3, a popular SMT solver, can run as proofs. Full of enthusiasm she invites the software engineer who gave that talk, Tikhon Jelvis, to elaborate on the similarities between SMT and code normally generated by the compiler.

Links

Compose 2016 Analyzing Programs with Z3

Tikhon Jelvis’s Website

Jelvis’s Foundations of Reinforcement Learning

Z3 tutorial

Bonus Content

Become a patron to support this project.

Transcript

MB: Sometimes when I think I’ve had a really great idea it blows everything else out of my mind. I love this feeling. Especially when facts I thought were unconnected click into place and filled out a whole bunch of context I had been missing before.

MB: In my experience this is the reality of learning new things. It’s not a steady stream of knowledge being inputed into my brain. It comes in flashes that interrupt these long dry periods where I’m hearing words and translating them into meaningful sentences but the concepts themselves are not meaningful.

MB: This came up in my conversation with Ron Garcia actually

RG: I've said this a lot to people that at some point I started just going to things I didn't understand and I just sort of let it wash over me and like, I have no idea what's happening, but I'm just going to pay attention. And then I leave and I'm like, it sure doesn't feel like I learned anything, but something happened. And then and then, like, you go back again and then one day you're just like, wait, I know what that means. Or you ask a question and they're like, oh, good question.

MB: Yeah.

RG: So you're doing something.

MB: You have a moment like Neo in the Matrix. A sort of “I know. Kung Fu” moment.

RG: Yeah, exactly.

MB: It comes out of nowhere nut it's actually like years of exposure to things

RG: Totally. Like who put that there?

MB: Exactly.

MB: So when it occurred to me that the best way to implement a system modeling language that can use probability might be to use formal logic of a traditional model checker to bypass the need to run simulations, I immediately thought of a talk on SMT solvers I had seen on youtube a few months before. Tikhon Jelvis gave this amazing talk at Compose in 2016 called Analyzing Programs with Z3 that was the first time I had seen someone use SMT formulas to actually model stuff that is relevant to programming.

MB: Typical example models most experts use to help people learn are games: solve a sudoku, here’s the eight queens problem, the famous Zebra word puzzle… things like that. So when you start learning this kind of tooling the learning curve isn’t just steep. It’s got this bottomless crevasse in the middle where you have to somehow make a jump from Highlights puzzles to pragmatic programming without any references.

MB: I loved this talk the first time I watched it and then as I rewatched it I realized that the steps of converting a piece of code into a representation that can be modeled in SMT were the same steps I was working through in different tutorials on writing compilers.

TJ: So to convert to a formula, there are basically three steps, the first one is in-lining all our functions now. I mean, I didn't talk about functions here. And the main reason is because after we in-line them, we can just pretend we didn't have any. Then we unroll all our loops and we transform it into something which is single static assignment style. I'll go into what that means in the next slide, but so in-lining. Again, if we have functions, basically all we do is we create fresh names and then paste function body into where it was called.

TJ: So this is sort of how the CPU process works, except I think it didn't even create fresh names. The biggest advantage of in-lining, of course, is that our formula doesn't have to know anything about functions. We can forget about function names, and it's just as if it was all straight line code.

TJ: So the second step is unrolling loops where basically we take a loop in the form of while x… something and we just turn it into a whole bunch of if statements. And realistically, each loop is going to be unrolled to maybe 100, maybe a thousand steps depending on your application. So if you actually took the program and you printed it out, it would suddenly become very long. But luckily, the SMT solvers are fast enough to handle surprisingly large formulas

TJ: And so single static assignment. This is the part where we really take the sort of directionality inherent in executing code and transform it, translate it to the logic layer, which has no inherent directionality. So from the SMT solvers perspective, the logic formula does not happen in any particular order. So we need some way to structure the formula to maintain the order of the program. And the way we do it is by turning every single variable at each time step into a new name. So if we take X and then we set X or something new in the formula, we would have X0 and then we would have X1. And we have a constraint that says, oh, X1 is equal to X0 + A0. I mean, if anyybody here has taken a compilers class. You've probably seen SSA before because this is an intermediate form used by a lot of imperative compilers and it happens to be a really good fit for expressing as the formula.

MB: Every time I watch this talk I get more out of it. So this week I wanted to chat with Tikhon directly about how to model programs in SMT and about the parallels between SMT and the intermediate languages of compilers that use the same principles.

MB: What you talk about when you talk about this— when you gave this talk at Campose, is this idea of translating a program that you've written in a particular language, you used a very simple language in your example, which I appreciate, because I'm also writing a very simple language so I was like goodie!

MB: and then following this process that is very similar to the process that the compiler kind of goes through anyway. This idea in-line creating fresh variables for everything, unrolling, kind of getting rid of loops and creating these nestled if statements instead, and then single static assignment, which I think is the one that kind of pops for people as the most. “Oh, wait, hold on a second. This from here is also from over here” sort of thing. And like I always— like from the perspective of an SMT solver. Well, the thing that was the most valuable for me to understand is that you are doing this, you're flattening it out because essentially it needs to see every part of the timeline all at once. You have, like all the states all at once, like that was the thing that took me a really long time to learn with other formal specification languages is that, like, you're not thinking about it in terms of this kind of linear progression you're thinking about in terms of like almost like string theory of computer science, like all the states exist at once. And we must program in a way that facilitates all the states being programmed at once, I suppose.

MB: Because you started with a very simple language in this talk, where is the ceiling for this approach? Are there like— if we tried to do the same thing in something like, I don't know, for example, Ruby, like, would there be certain constructs that we just can't translate to Z3?

TJ: Well, that's a good question. And I think there's kind of two different questions there. Z3 itself, at least when done this way, knows very little about the actual language that I'm converting from.

MB: Mm hmm.

TJ: There's some amount of built-In operations that are designed for use in languages. You know, you have these logical operations that corresponds to bitwise arithmetic, for example, which is clearly put in there to help people model programs. But for the most part, it's just a way to solve big logical problems. And what that also means is that we actually have a lot of freedom in how we do the encoding. In this specific talk I went to over one way to encode code at a very low level where we're kind of manually setting out a whole lot of these bit vectors, which is what they call, you know, just words made out of bits and the connections between them. And so for something like a Ruby program. You know, even there there's probably these two different approaches to try to model it in Z3, one of which will actually be first compiling that Ruby program to something very low level that talks about how the memory is moved around and what the actual logical operations are, and then modeling that in Z3. And a different one would be trying to figure out how to use the various logical theories that Z3 offers, and just as an aside, the word theory in SMT, which is stands for satisfiability modular theories, Modular theories always confuse me a little bit.

MB: Yeah I was just about to ask what is a theory in this context?

TJ: Exactly. And this specific case, a theory is sort of a set of capabilities that the logic solver can work with. Or maybe a different way to think about it. It's the kind of constraints or the kind of operations that the solver can can consult. And so at the very extreme you could imagine having a SAT solver, which literally only gives you boolean operations. So you have like variables X, Y and Z, and you can just recombine them and clauses using essentially AND, OR, and NOT. And each variable, X, Y and Z can either be true or false, and that just gives you a SAT solver.

TJ: But then you can actually take this the basic logic of just essentially literally one bit per variable and add capabilities in different ways. So the example that I used in the talk was this thing called the theory of vectors, which is basically lets you say like, hey, you know, X is a variable that instead of being a single, true or false, is actually a vector of, let's say, thirty to thirty two true and false bits with some kind of operations that the solver knows about that correspond to things like, you know, binary addition or multiplication or bitwise, bitwise negation or something. And that theory, I think it's pretty intuitive that you could take that theory and actually compile it back down to just normal SAT. Hmm. Because if you have a variable that has thirty two bits.

MB: Yeah.

TJ: You could also express that as 32 variables of one bit each. Right. And when you have addition, you could actually like the same way that you could imagine writing addition out of logic gates.

MB: Yeah.

TJ: You can actually write addition out of ANDs and Ors and NOTs you know in a way that is just a normal SAT solver can see.

MB: Yeah.

TJ: And that's actually a strategy that is really valuable for a lot of these purely bit vector oriented problems that it's called bit blasting. And it just means you take this bit vector logic and you turn it into raw bits in a SAT solver.

MB: Yeah,

TJ: but SMT solvers have a lot of other theories that aren't amenable to this. A really simple example is that some solvers can also support doing reasoning with unbounded integers. So you're going to variables that can be any integer of kind of any size. Yeah. And in order to handle that, there's no clear way, or at least as far as I know, there's no way to express that as a small set of or any set of individual true and false variables combined with these operations. So instead, SMT solvers need some extra logic in the solver to know how integers work, how integer operations work, and how to do like symbolic reasoning or symbolic algebra with integers.

TJ: And then under the hood, and this is kind of an approximation, but my understanding is what happens is that, you know, they have this kind of logic to deal with integers or with other kinds of theories. But then they take the problem that you set up using integer variables and booleans and whatever else, they break it up into cases and then they kind of they can solve individual cases using these specialized solvers that can deal with integers and so on. But then they can solve sort of the relationship between the different cases using the underlying SAT solver or a SAT solver like algorithm.

MB: Through out this series I have had several conversations where one off hand comment created such a powerful moment the whole rest of the interview could be wasted and it would STILL have been 100% worth my time.

MB: I had one of those moments here … do you get it?

MB: It should have been obvious, but I had honestly never thought about it because modern day high level programming languages have so many layers of abstractions but— basically any programming construct can be expressed in Boolean logic because the processors that sit inside our computers and execute these programs are constructed with a series of logic gates. Therefore a tool like SMT, which is essentially just a solver for Boolean logic puzzles with a bit of syntax sugar added to it can model anything if you compile it down at a low enough level. Anything that Z3’s robust set of abstractions and builtins can’t represent, you can just move to a lower level of abstraction, get a little closer to the chip and model that instead.

MB: It’s weird I have been trying to get good at thinking about code for formal verification for years. I never imagined I was going to get better as a modeler by trying to learn how to write a compiler. But life is strange sometimes…

MB: So on the subject of intermediate languages, I’m going to ask you a question that may not be possible to answer, but why does it seem like no matter how fast the computers get, no matter how complicated or sophisticated they get, the end of the day, it's always compiling down to these very simple procedural based steps?

TJ: Yeah, I think that's a good question, and I think it's a lot more an answer about how the market works than anything else because. You know, at a low level, and I am very much not an expert in computer architecture, but essentially the incentives for the people making CPU's right is to be as fast as possible and kind of as energy efficient as possible and to do a good job with this massive amount of existing software in C. And in some cases, I think probably a lot of cases they have to do a good job on software that's already been compiled.

MB: Yeah.

TJ: And so what happens is then the actual the low level CPU's these days with things like out of order execution and these like fancy or not fancy, but like relatively loose memory models and so on, they don't really act like a sequential processor.

MB: Mm hmm.

TJ: What they act like is some kind of dataflow system that is doing its best to appear as sequential as possible while still getting the advantages of a data flow system.

MB: Mm hmm.

TJ: Now, I think even in a different world, we'd still want a simple low level language to work with. But if there wasn't this constraint of trying to work with the existing world of software, it's not clear that the low level languages would look exactly as procedural as they do now.

MB: What could they look like?

TJ: Well, this is where I think it's really worth talking to, that people have a much better understanding of computer architecture. For me, you know, like this research project I mentioned, we worked with a chip designed by, among other people, Chuck Moore, who was also the creator of the Forth programming language.

MB: OK.

TJ: And now that one is still procedural in some sense, but it actually had a stack in hardware made out of registers. And so programming, it felt really different from programming a more traditional register machine.

MB: Yeah.

TJ: Was kind of different, you know, different constraints and some advantages and disadvantages. And I don't know if that's an example of something that, you know, if all constraints were freed, we'd want to use for sort of, you know. High energy, high speed kind of general-purpose chips, but that's at least one example of where. You know, you could imagine a different architecture, but. I've I think probably the best place today to look, to see how things could be different is to see which ways people are taking advantage of FPGAs.

MB: Mm hmm.

TJ: Because, again, like, I actually haven't worked in an FPGA project myself, although I kind of imagine I will at some point. But my impression is there that actually pushes people to essentially view their programs as a lot more sort of data flow oriented, even at the very, very low level of, you know, configuring the hardware that runs on compared to running on a general-purpose chip.

MB: I had always assumed that the challenge of learning to think in formal logic in order to write models that could verify systems was necessary and unavoidable. But in the course of our talk Tikhon made a historical comparison that changed my mind….

TJ: In Z3, when you're working with it, it really does feel like you're trying to figure out how to take your problem and then encode it in this, you know, given this kind of set of logical primitives.

MB: Right.

TJ: It it feels a lot like the compiler whereas Prolog is flexible enough and higher, you know, sufficiently higher level than Z3 that it actually feels like you can write code directly in Prolog, you know, and it feels like using a normal language that has this weird, logical paradigm.

MB: Yeah.

TJ: Once you've got your mind around to it, you know, you can use that as a programming language pretty easily. Whereas even though you could write maybe these SMT lib style formula, you know, formulas by hand and Z3, it still feels like you're mentally taking the things you want to write and figuring out figuring out how to encode it with these primitives. So in some ways, it almost feels like the difference between writing in a a higher level language and writing in assembly where in the higher level language, you know, it's really oriented around helping you express your thoughts on abstractions.

MB: Yeah.

TJ: Whereas in assembly you're the one kind of translating from those abstractions to the set of instructions that you have.

MB: Yeah, that's funny actually. I think that's a really good point, that a lot of the angst we're seeing now in the formal specification world is because of a lack of high level languages that allow people to express themselves without having to do that translation into how the logic, whatever the engine behind it, thinks about things very similar to the era where everybody wrote everything in assembly. And then people who who do high level languages were inferior programmers because they weren't really understanding what was going on.

MB: Current formal verification languages focus their expressiveness on condensing what should be hundreds of lines of logic statements into short human readable phrases… And that’s important! If you’ve ever tried to write a Boolean formal in Conjunctive Normal Form by yourself you understand the value of having a higher level language to do it for you! The simplest problems become an overwhelming amount of code really fast.

MB: By most formal verification languages do not do much to help programmers figure out how match challenges to approaches in formal logic. It would be like writing a high level programming language and not including any abstractions around arithmetic, instead assuming that the programmer should know how arithmetic works in binary and write expressions based of directly manipulating bits.

MB: What strategies are there for making sure that you're producing the most optimal SMT expression when you're using something like Z3, right?

TJ: Yeah

MB: Because there are always a bunch of different ways to write same thing. And some of them have better runtime performance than others, right?

TJ: Absolutely. And unfortunately, actually, this is the one place where I really don't don't have a good answer to. You know because when I did it there was a lot of guess and check, but I also talked to a lot of people who were actively doing research on programs synthesis for example, and it felt like even to people with a lot of experience, you know, maybe they over time developed some kind of internal mental model. But the performance of the solver was still often kind of a bit of a black box to them.

TJ: And my impression was that, you know, the only people who really did better than this were the ones who wrote their own dedicated solvers for solving their problems, or wrote dedicated algorithms to answering specific program synthesis problems. But unfortunately, like figuring out the best way to encode something for Z3. I think it's just something where you just have to spend a bunch of time giving it a try. As far as I know, there's no really general-purpose approach,

MB: Run in some old fashioned performance tests where we just run the SAT solver a thousand times and see what the average it returns and like see if we can beat that time?

TJ: That's right. Although I think usually the difference is less than— it's less about the small differences and more about like, you know, if I run this, will I get an answer or will get bored while waiting for it? Some of these things. I've seen examples where different ways of encoding it take a problem from, you know, like the same problem encoded in different ways. Yeah, sometimes in ways that are actually logically equivalent to you but are different to the SMT solver. And it goes from being like a ten minute solution to being like a twenty four hour solution just because the SMT solver gets the heuristics, get trapped in the wrong part of the search place or something.

MB: Hn. You don't by any chance have an example of that do you?

TJ: So the details are kind of old. But I think one specific example I remember is. I was taking this kind of seminar on program synthesis and one of the homework assignments. You know, it was a pretty, you know. a pretty reasonable problem, which is basically, hey, you know, we talked about how you would encode a certain problem as a simple formula. So here's something really similar. Can you go at home and encode it?

MB: Hmm.

TJ: And, you know, people came up with reasonable solutions, but most of them like, you know, I mean, also, just to be clear to most people, also waited until the last day to do it. The problem was that, you know, people came up with good solutions that actually didn't even finish one run overnight before the deadline.

MB: Oh, my goodness.

TJ: And in that case, I think it actually took the professor and some of his, you know, a senior graduate student some time to figure out what was going on. And I think it just turned out that, like, you know, we used it like people used a reasonable subset of the SMT operations, but they needed to explicitly tell Z3 which theories they were using and which ones they weren't in order to get it to work faster.

TJ: And I think another maybe slightly like that one, that's kind of forgivable in the sense that, you know, you can kind of see the Z3 default mode trying to be more general-purpose and therefore slower.

TJ: But I think another example that would throw me off, but I've heard people mention is that there's actually a difference between making a single assertion with a whole bunch of ANDs in it versus making a bunch of assertions in a row, even though, again, semantically they should be the same.

MB: Yeah.

TJ: And again, I don't know if newer versions of Z3 handle this better, but, you know, there were some people who you know, I saw some very concrete advice that said, hey, you know, if you're using the theory of a raise and you need an array where you need to sort of initialize it with a bunch of values, then it's better to have a bunch of individual asserts like assert that index one has value X and index two has value Y instead of having a single big assert with an AND in it.

MB: Yeah, that's an example of where it's very much. Some kind of detail of how the salver processes formulas and does the search that leaks out into the performance characteristics, because logically those two should be exactly the same.

MB: My first couple tests of this concept of compiling to something like Z3 didn’t involve an intermediate language between my type checked AST and generated SMT code. I wanted to see if I could generate things that ran on Z3 and get my bearings.

MB: But given the how variable performance can be given different equivalent constructions, it’s clear I can’t skip the optimization stage of compiling. So I need to develop an intermediate representation from my AST that my compiler can analyze and modify.

MB: The other thing I need … some way to handle floats!

MB: If you try to represent a number with a decimal point in Z3 Real number types what the solver will return is an expression that divides two integers which would equal the value of that decimal….. a literal fraction in other words.

MB: This makes a lot of sense but also I’m not sure how it will scale. There are other approaches to decimals. Another friend who’s savvy with Z3 thoughts using bit vectors would kill performance. There’s support in Z3 for floating point arithmetic now… but honestly I’m not sure I understand how it all works from the documentation.

MB: Somehow I feel like the easiest thing might be to make Fault fixed point, where every number is represented with two integers: the value written out as if you just erased the decimal point. And a scope in base ten specifying where the decimal point actually is positioned in that number.

MB: Will this result in faster models than something more direct and built into Z3…. not sure!

MB: You mentioned in your talk that four are floating point numbers are difficult in verifying programs, that the programs are verified using real numbers. Why is that?

TJ: Yeah, well, it's also I think this is kind of the same thing I was saying. And I guess I'm not sure what kind of what the state of the art is necessarily in a lot of areas. But it's just if you want to figure out some mathematical properties of some logic, it's much easier to prove those properties on top of real numbers, which are, you know, well behaved mathematically than it is to kind of do the numeric analysis to understand exactly what happens with floating point numbers.

MB: Yeah,

TJ: And to be clear, there is a whole active area of research and practice around numerical analysis, and there are some really good tools to help with that.

MB: Yeah

TJ: But it makes, you know, using other kinds of verification somewhat harder if you have to deal with exactly the kind of, you know, bitwise operations you get with floating point.

MB: Is it because you can always add another place behind the decimal?

TJ: Conceptually that's exactly why real numbers are so much easier, right? And that's because you don't have to worry about all of this potentially complicated sort of rowdy behavior. And also just in general about the exactly which values floating point numbers represent. Yeah. But also because with real numbers, you can basically bring to bear continuous mathematics onto your problem. And that's a field that's been studied very extensively and so there are a lot of good results and are really nice properties that continuous functions have, for example, that other functions don't.

MB: Yeah

TJ: Whereas if you really, really care about exactly what a floating point number of program is going to do. You know, thinking about it as as implementing continuous functions doesn't really give you the full behavior, especially kind of as you go further from the the the the kind of numbers that floating point numbers represent well.

Discussion (0)