Equipped with new knowledge about type systems, Marianne begins speccing out plans to implement uncertain types and inject probabilistic programming into Fault models. She picks the brain of Maria Gorinova—researcher and probabilistic programming evangelist—and wonders if she’s finally bitten off more than she can chew.
Become a patron to support this project.
MB: Okay now we know how to do types … maybe.
MB: Here’s what I want to do with types in Fault. First I want to implement a natural number type. There are plenty of system models where it’s illogical for a value to go negative. You can’t have negative people, for example, so it makes sense to have a type to express that.
MB: The big challenge I want to conquer, though, is uncertain types. This concept was first introduced to me through a paper from Microsoft Research called Uncertain: A First-Order Type for Uncertain Data.
MB: In this paper the researchers describe the implementation of a specific type that represents a value as a probability distribution. You initialize it with a mean and a standard deviation, and then can perform certain operations on this data type.
MB: The reason this idea was attractive to me goes back to the fundamental problem with modeling the behavior of feedback loop systems using formal specification. Formal specification answers the question whether a state is impossible or not, but there are lots of states that are possible but dangerous in a particular context. In those cases we want to know how likely a state is and how much change the system can toleration before an undesirable state occurs.
MB: Uncertainty would be useful here, but as I read (and reread) the paper I realized that I felt a little shaky on my fundamentals. This is the world of probabilistic programming, and rather than making a dumb math mistake by jumping right in maybe it’s better to get an overview of probabilistic programming itself.
MB: So I called in an expert…
MG: My name is Maria Gorinova, I'm a student at the University of Edinburgh towards the end of my Ph.D. So fingers crossed. I work on probabilistic programming and more specifically, I mostly working with Stan.
MB: Okay so what do I want to know? Ugh… there’s so much!
- I’m I reaching for the right tool? What is probabilistic programming used for? Maybe my use case for uncertain data types just doesn’t make sense … I mean… does the whole model have to be done in probability or can I just drop one uncertain value and everything stays kosher?
- Is it really true you can add and subtract (maybe even multiple or divide) distributions? What are the conditions behind that? Only normal distributions? Only with independent variables like described in the paper? This idea of doing operations on probabilities seems like black magic
- Where are the pitfalls? I know this can’t be as straight forward as it seems. What am I missing? What are the gotchas?
- And lastly … if this can be done why hasn’t anyone done it before?
MB: If I can figure out what can and cannot be done with a probability distribution, then I can start thinking about how to represent that in types. The operations is the easy part. I can write those restrictions into my type checker and invalidate models that perform unsupported operations on uncertain data types.
MB: The authors of the paper do have a reference implementation in C#. It took me a while to track it down, but it’s up on GitHub. Unfortunately so much of the logic is imported from other libraries it doesn’t clear much up. It seems like this problem might take more than one conversation to tackle. So I best use the time Maria’s giving me to the best of my advantage.
MB: Ok, so, I mean, I guess the best place to start is just with the absolute beginning, like what exactly is probabilistic programming?
MG: Yeah, that's a very good question. And I like to think about probabilistic programming as in a sense in inverse programming. So in conventional programming, you would write a function where the function takes some input and the entire job of your coding and of your program is to produce an output based on this input. Right? But in probabilistic programming, you really that is that you in a sense write a model which describes how some data was generated.
MG: So you still start from some inputs. But this input is unknown and you put some probability distribution in it and then you sample random variables inside of the program and at the end you spit out data. And what happens is that you actually have this data, you've observed this data, and now you want to model the parameters it was generated from. And by model, I mean, you want to compute somehow the idea. You want to have uncertainty over your belief about these parameters. So you want to compute the posterior distribution where you can say, oh, I know the probability that this parameter mu that's the mean of my data will be smaller than zero is 90 percent or something.
MB: This sounds perfect right? Quantifying uncertainty? Check! Running a program backward so that you get the inputs that will produce a particular output? Check! It feels like there are a lot of places where programming with probability would be useful.
MB: As it happens, Go has a package for probability distributions under the stats section of gonum … it has a couple actually, depending on whether you want to do univariate or multivariate or matrices.
MB: A lot of what people seem to be using it for are random numbers or other to build other stats tooling rather than probabilistic programming. I’m not seeing very many models…
MB: And what are the practical applications of something like probabilistic programming?
MG: Um so it has been it has been a growing area, it's now used— Stan in particular— has lots and lots of users. It's being used in a lot of sciences like biology and particularly evolutionary biology, astrophysics, particle physics. So things like that.
MG: Also, I think it's used in economics and business in general. And so anywhere you have data that you want to learn from, but you also want to have to quantify uncertainty. So in contrast with deep learning here, we are very precise about our research into everything. So it seems everything is statistics driven and we we believe our confidence intervals. So we will say confidence interval interest when it comes to Bayesian stuff. But still. Yeah, yeah.
MG: Well, with deep learning, we can learn from data, but it's very different because we actually trust most of them, we can't really trust what the results… that we can’t trust— no, not only can trust the results, but we can't trust our confidence in the results. When you say, oh, I believe with 90 percent progress that these images are of a cat.
MG: Can be arbitrarily wrong. Like it can be … You know, a bird or something, and it will still be very certain that this is a cat.
MB: Yeah the math part is what I’m worried about. I took some graduate level statistics classes when I got my masters degree. I loved it. I feel comfortable with the concepts … but maybe I know it well enough to know that once you add some programming it becomes possible to create something that seems to produce the correct response but stands on a foundation that is not stable.
MB: That’s the reason why we tell people not to write their own cryptography libraries. Lots of things can look random but aren’t. Correlations can be misleading. It’s so easy to misunderstand a concept and build something that has no real value.
MB: Do people tend to use probabilistic programming sort of just in isolated research cases, or do they tend to do it in tandem with more traditional programming?
MB: So like, for example, something that I do a fair amount of is like logic programming. And you can sort of use logical components within a traditional application to do various functionality, like, for example, policy adjudication. Like you can write your policy server based on an SMT salver using sort of logical programming, declarative structure. And then the rest of your application is just kind of object oriented and it just kind of those two components, talk to one another and work with one another. Is there a same relationship and probabilistic programming? Is it really more like it kind of sits off to the side in a research capacity?
MG: Hmm. That's a that's a very good question. I think currently it's more of a separate thing that is used when you when you have some data and you want to do analysis data. You sit down and write up, probably growing, but. I think the dream is that it's more integrated and it's more accessible as well, because the dream of probabilistic programming is to completely separate the modeling. So this is like writing your belief about how your data is generated from inference, which is the complicated inverse maths stuff.
MG: And this in practice is not really the case. You have to think a lot about what you're modeling, how you're modeling it, is your model correct? Do you make any stupid assumptions in it and so on. And it evolves. It involves like diagnostics and things like that. Currently it's not it's not that accessible. And we're working towards making it more accessible to people who have programming experience, but not necessarily statistics, experience, experience. But it it's it's hard.
MG: It's supposed to be something democratizing that just gives this tool to people that are domain experts in other fields, not statistics, and they can then use it for their problem by kind of inputting their domain knowledge into the program without having to think about anything else. But in reality, there is still a steep learning curve there.
MB: This is starting to feel really intimidating…. I asked Maria about uncertain types and she had never heard of the research effort, which threw a slight monkey wrench into my plans to ask someone to hold my hand through the math bits.
MB: Maybe I’m on the wrong path here? If no one else is incorporating uncertainty into normal programming, maybe there’s a good reason?
MB: Is it that really the probabilistic programming mostly manifests in its own separate languages, or have people incorporated libraries into other languages?
MG: They’re libraries, Yes. although I still call them languages because. Yeah, because they're kind of like almost like a standalone thing. It's it's a little bit like Tensorflow, PyTorch for…. It's sort of entirely new. Yeah. So we have Edward II and Pyro and PMC3 which are kind of integrated into Python. We have Gem and Turing and I think. What else? And….. A couple of other languages in Julia, so, yeah. There are lots of of those.
MB: Hmmm… Julia and python, two languages well know for data science. That’s not a great sign. (sigh) …. but I just can’t shake the notion of leveraging probability. I might need reinforcements. And by that I mean, I might need to call on some experts a bit closer to home. Experts I haven’t cold emailed out of the blue, but people within my friend group who I can more easily lure into hour long brainstorming sessions and guilt into code reviews.
MB: What are the most common mistakes people make with probabilistic programming?
MG: It's very, very easy to define a model that is ill conditioned. So that means that the posterior distribution that is given by by this model and the data you observe… has a very weird form, basically. So. So imagine imagine we are talking about the bivariate distributions. So we're going to do a 2D plot where the color kind of represents how how much probability there is for each of the axis. And now imagine a Gaussian distribution, which looks like an ellipse in this. And if if it's well conditioned, this Gaussian distribution is going to look like a circle like this, because this is it has to do with like the ratio between the smallest and the bigger variance of this axis. So, like the in the larger it is, the more ill conditioned it is. So like, if we have something that is like a circle you have this axis is equal.
MB: Right. Right.
MG: The the smallest in the larger axis. So…
MB: But the like more oblong it is the more ill conditioned it is.
MG: Exactly like this, especially if it's making a diagonal like that and it's very, very, very thin. Then, then the ratio is very big and that's bad. And this has to do with, you know, the statistics, algorithms that we use in order to find a distribution or will have for most anyway will have a problem with this ill conditioned distribution. So you can very easily define a model just right. A model that is going to imply a posture like that.
MB: So not to leave this at a cliffhanger or anything, but next episode is going to be a little bit different. I’m pulling in some friends and we’re really digging into this paper on uncertain types. You’ll be very surprised where it all ends up and hopefully not so overwhelmed by the final math!
MB: To be continued!