Marianne

Posted on Dec 23, 2020

User Research for Programming Languages (featuring Michael Coblenz)

#programming #research

Summary

Marianne has a working prototype of Fault, but still no idea if anyone will understand the design or find it useful. She needs to test it with some users and see whether it has the right features and syntax. To do this she talks with programming language researcher Michael Coblenz, who specializes in techniques for testing usability of programming languages.

Links

Marianne’s DevOpsDays Talk
Interdisciplinary Programming Language Design
Coblenz’s user research on type systems
More about Michael Coblenz
Jetbrains MPS
XText

Bonus Content

Become a patron to support this project.

Fault User Research

Open to helping Marianne out by giving up 30 minutes of your time to try to use her language? Book a time slot here

Too shy to talk to Marianne directly? You can play around with the Fault prototype at fault-lang.herokuapp.com

Transcript

MB: Back when I started gathering research trying to figure out what I wanted to do and how I was going to do it I came across a paper titled: Interdisciplinary programming language design, all about how program language design should include more user research.

MB: User research for text based interfaces has been a particular crusade of mine for a while. Two years ago I gave a talk about this topic as it applies to SRE and DevOps teams at DevOps Days DC

MB: And so when you have unexpected user behavior, like one of two things tends to be the result. Either the system or the product that they're using doesn't know what they want to do and gives them an error that may be decipherable or may not be decipherable or… it seems to work. And it can be difficult to sort of see where your preferences should be as a software engineer because, like, errors are super annoying and they frustrate users and they make them mad and then they complain about your product or they may be stopped using it.

MB: But if it works and it's unexpected that we don't actually know what the system is doing in response to that command, maybe it's doing the right thing. Maybe it's doing something else. And if users are using your systems in ways that you didn't anticipate, then it's going to change your scaling needs ultimately, which will ultimately change how you monitor for those scaling needs.

MB: So really important concept about why design is relevant for people like us who mainly stay on the back end of things. Is that design determines how people use things and how they use it determines how you have to scale it when you have to scale it and what you're scaling when you're scaling.

MB: It went really well and I still have people come up to chat about it from time to time so I’m really proud of that.

MB: Anyway, despite my passion for user research, I had not given much thought to how user research fits with program language design. But as I develop my language…. I keep comforting over and over again the reality that I don’t actually know what inputs I should be expecting because I don’t actually know whether my ideal design makes sense to anyone else.

MB: Sure…. I could probably come up with a list of expected and illegal variations…. but writing the scaffolding to catch such behaviors, stop undesirable activity and communicate the problem back to the user is…. well…. a bit tedious. Error handling certainly isn’t the MOST fun part of writing a language. And I could be missing something big too.

MB: I’ve always felt that the biggest reason why other specification languages have failed to catch on is because it is not immediately obvious to the average programmer what you should be doing.

MB: I’m definitely sold on user research for language design as a concept…. but …. a programming language is much more robust than a command line tool and has many more conditional branches than a graphical interface. I needed some advice on how exactly to do this, so I had no choice but to track down the paper’s author.

MC: I'm Michael Coblenz. I recently finished my PhD at Carnegie Mellon. My work is on the usability of programming languages. So the general question is how do we design programming languages that make programmers and software engineers more effective at achieving their goals?

MB: This is what all forms of design have in common: it’s all about goals, not some universal standard of code quality. I approach legacy modernization in the same way: let’s not focus on what’s new and “better”, let’s focus on what the system is supposed to be doing and what’s the best way to do that.

MB: In that line of work, when you think of things in that way … you end up turning off fewer mainframes and getting rid of fewer Excel spreadsheets than I assume you will.

MB: But I digress….

MB: So I'm really familiar with user research from the product perspective, right? I'm building a website. I want to make sure that the usability of the website is as best as it possibly can be. So I'm going to take some paper prototypes. I'm going to take some wireframes and sit down with some users. How is your research for programming languages is different?

MC: Right. So it's different on a variety of ways. One of them is that typically you expect your users to be experienced in various ways, now not necessarily, right? Some programming languages are targeted specifically at novices, you know, beginning programmers. And so you want people to be able to write software even though they don't have much, you know, computer science background. I tend to focus on programming languages that are more for larger scale systems. And so I tend to think more about how can we prevent bugs and systems. And so the usual ways you prevent bugs and systems are by various kinds of type system approaches, which may make the system harder to use. And so there's this kind of interplay between how you train people, what you teach people, what experience they have and what behavior you see when you actually get people to use your system.

MB: That's the thing that I think I am trying to wrap my mind around is that there is a certain amount of learning that we expect people to do when they're picking up a programming language. So how do you separate poor usability from no, that's just the learning curve with this language? Like how do you level set?

MC: Yeah, I wish there were a good answer to that question. There isn't really yet. You know that’s why it's still research in part. I have some answers. Right. One answer is, if you can build something that you can show is effective, then you have done well enough. Another answer is there's no reason to build something that's harder to learn or harder to use than necessary. Right? So if you teach people something and then you have them use the tool you taught and then you discover that they had problems, if you can address those problems, well, you're making their life better.

MB: Hmm.

MC: Right? So why not do that?

MB: Hmm.

MC: So, you know, then the question is, OK, you know, are you only improving your improving learnability or are you also improving kind of long term performance?

MC: And you have to kind of think more is theoretically about that. Are you going to make it annoying in the long term? And. There are lots of you know, we have larger scale tech links like case studies where we can kind of think about that. So it's important to kind of see both sides of the picture.

MB: Hmm. And like what kind of techniques to use when you're doing your research on a programming language?

MC: Yeah. So I've gotten kind of traditional user studies. So I give people programming tasks before I actually before I give them the tasks, I give them some training materials to teach them the language.

MC: And then I try to see what problems they encounter and then try to figure out, OK, why are they encountering these problems and can I change the language or possibly the documentation in order to address those problems?

MB: Okay so what should I expect people to know before they pick up my language and try to use it?

MB: I supposed an understanding of basic stock/flow style models would be necessary, but that is a simple enough concept to communicate reasonably well through documentation.

MB: I would also expect them to recognize the syntax around object properties because that’s the way I want to represent stocks and flows. And while we’re at it the basic syntax around anonymous functions too.

MB: The first user tests therefore should seek to optimize how that information is communicated and test whether users create models that make sense given those patterns.

MB: Most of that can be done with a text editor, or paper. Sketching out how models should look and writing out a representation in the syntax of my language … but eventually I have to comfort the reality that I need to build something interactive to actually test its usability.

MB: Do I have to build a working language to put in front of users? Fortunately, Coblenz had a solution for that issue.

MC: So one of the problems we have in programming languages is design and implementation is very expensive. It could take months or years to design and implement a programming language, even even just sort of the core parts of the language. Never mind the library or tools, never mind all the things need to actually be created in the language. And so, you know, you could spend years of your life building things and then discover that they're no good. Right? So the idea is, can we retrofit our ideas into existing languages or assimilate them without having to implement them completely.

MB: Okay so… I can use my grammar to generate a parser and insert some additional functionality into another language. My ANTLR reference book showed me how to parse and generate code in a language called dot, which has a javascript library that can rendering simple flow chart style visualizations. That sounds perfect. The first implementation of my language will actually be a transpiler, taking a spec and transforming it into dot code so that the user can visualize the connections between stocks and flows.

MB: That kind of test feels like it can be open to anyone on the internet. Most of the work stays in the browser…. and maybe when the user finishes their model and clicks a button to render the visualization, the browser sends a copy of the model to a server where I can track what types of models people are trying to build.

MB: But none of that simulates the behavior of the language itself, right? I need some better ideas about how you do that. How you simulate the behavior of a language without building the language.

MC: I was designing a language called Obsidian, and I had some ideas for a type system, but I wanted to know, can people actually work effectively with this type system? Right. So I took some of those ideas and I inserted them into Java. And then by which I mean, I told people that I inserted them into Java, but I didn't actually implement a type checker. Right? And then I said, OK, please do these programming tasks. And any time you want to compile, let me know and I'll see if there are any compilers. Right?

MC: And so as the experimenter, I can tell them verbally, you know, error on line forty-two you're losing ownership of this asset here. And then I can see, you know, does that error message make any sense to them? Can they make progress along the lines that they're supposed to make progress on for this task or are they stuck? And if they're stuck, then what information do they need to get unstuck? How do I kind of clarify the system or refine the system design in order to prevent people from getting stuck in this way?

MC: And then once I had done that, then I actually built the system.

MB: That… that feels to me like very— I find that very intense, the thought of doing that, because I feel like you have to learn and anticipate what the common mistakes are likely to be made with the language. And like how a type system would interact, how it would error if given that that endpoint. But like, what did you do to prepare for that?

MC: So I spent a lot of time designing kind of language prototypes and so I had a pretty good idea of what the type system would be able to do and what it couldn't do. Yeah, you have to have a you have to have the idea in mind in order to kind of provide a good simulation.

MB: Right. You have to really thoroughly explore and know it before you can really…

MC: Yeah. Well, I mean, you do kind of theoretical work to try to figure out what is this thing that I'm building and what are its properties. And you don't have to do a perfect job. Right, if you. If you give sort of inappropriately good error messages and the thing works out, then, OK, you're going to try again with a more refined prototype or more accurate error messages. On the other hand, if people can't make progress, even with your really good error messages, that it means you need to rethink things.

MB: I don’t feel prepared to predict what kinds of compiler errors a compiler I have not built will throw. I have never built any kind of compiler … I have never even looked at the code for a compiler before.

MB: But the first step in any iterative cycle is figuring out what the minimum viable implementation is … and of the functionality I’m looking at, maybe that’s the behavior of a flow.

MB: If a flow is a pure function … the types of errors that one might encounter doesn’t feel too intimidating. There are scope issues, is a given input visible to the function? Assignment issues. Basic arithmetic issues … Some potential type issues but very limited ones. There’s no reason is use a variable that isn’t numeric in a flow.

MB: My first attempt at building something that mocked execution for user testing I don’t like very much. I walked the tree and rewrote the functions as strings of javascript code … basically the same thing I did for the visualizations but a whole lot more complicated.

MB: I mean it works. It runs and it runs my models correctly. But the idea of it was to kind of do a pseudo transformation to javascript and execute that and it ended up being a mess of eval statements that feels gross and will probably break real easy.

MB: But building it did help me refine my grammar a bit— which was useful— and I had to shake off my anxieties about working with a stack.

MB: Most languages execute from the tree created by the parser by pushing values onto a stack and popping them off. You follow the tree all the way down, store the value of the last node on the stack, then as you work your way back up you take that value, insert it into its proper place in expression in the node above it, evaluate the expression, and push the result onto the stack. The node above that takes that value off the stack, inserts it into its expression, evaluates that and puts that value onto the stack. And so on until you get back to the top…

MB: This was one of those concepts that made sense to me but that I was nervous about doing wrong… it felt like something where I could make a basic mistake and not notice or not realize it. Like I could pop off and assign the values in the wrong order.

MB: Anyway it was great to push myself to just write the code and realize that— yes in fact the resulting program’s output was what I expected it to be…. But I knew I could do better. Once I got over my nervousness about using a stack that way it seemed silly to write strings rather than just evaluate the expressions in the tree.

MB: So my first functional implementation of my language will be a functional implementation of my language…. now what should I do with it?

MB: How scientific are your user research sessions, because like at least for the product framework thing, my experience has always been that they seem to be very heavily anecdotal and that's useful, like just sitting in front of a user and like seeing they're making observations about their experiences often enough to bring something back for iteration, whereas something like human computer interaction research seems to be much more structured and scientific with like control groups and all of that. So where would you recommend on that range? Is it necessary to go all the way to the control group side, or is there value in anecdotal or is there some middle ground that we should stick to?

MC: Right. So. You can evaluate the sausage very carefully without being very careful about how you made it

MB: (laughs)

MC: Right? Or you can think very carefully about how to make it. Or right, so when you use user centered design methods, you get a bunch of where you can get a bunch of feedback from users or via users about… design prototypes that you came up with that didn't work and why they didn't work. And then you feed those back in to your design process to make a better design and you keep doing this and eventually come up with design that you think is pretty good. And then you want to know, OK, is it better than something that already exists? I mean, is this you know, if I actually achieved something?

MB: Yeah.

MC: Or have I been kind of spinning my wheels and not made any progress? So at that point you can do an empirical study to do a comparison between the thing that you created and some earlier thing that you maybe started with or you're comparing to. And so. So that's a scientific evaluation process. And then maybe I should say an empirical evaluation process. But then the question is, OK, I did a bunch of work to refine that design. And to what extent does that work generalizable. That kind of gets to what extent is the scientific you know, to what extent have I learned generalizable knowledge that other people might be able to apply? And it's hard to know in the moment. Right, so you discover people have some problem with some design approach that you had and then you refine your design and they don't have a problem anymore. So does that mean that if somebody else makes a similar design decision, they're going to encounter the same design problem that you did?

MC: That's not clear, right, so, you know.

MC: How you generalize from these studies is not necessarily clear in the formative part, one of the things that we do is we ask people afterward for their opinions and you can try to glean from some of that, you know, what parts of the system work for them and what parts of the system didn't work for them. Yeah, I think there's an ongoing discussion about. You know what, parts of this will generalize, but parts don't.

MB: I don’t know that I caught on to his point about science being generalizable until I listened to our conversation again. It really highlights for me the idea of goals. What are our goals?

MB: About the same time I had this conversation I started learning about these things called language workbenches. They’re tools to build domain specific languages. While ANTLR and bison and other tools will generate parsers from a grammar, workbenches will generate almost everything— parsers, linters, IDE integrations, a whole language in a box.

MB: (FYI If you want to check this out the two tools that seem the most mature are Jetbrains MPS and an Eclipse tool called XText)

MB: When I saw these tools for building languages I couldn’t help thinking to myself, shit…. am I going about this the wrong way? But then I realized that …. so far, I’ve built and rebuilt elements of this language several times in different ways. The first time I build something “from scratch”, the next few times I’ll use more advanced tooling and boilerplates … each time I build it I have a different experience and learn different things.

MB: And by the same token, each type of user research— from the very simple just talking to people to the very structured control group study— each type will illuminate something different. So it really all comes down to goals. If I just wanted to build a language I could load the grammar into a workbench and throw my nice generated code online … but I do want to understand how these things work, and I do want to build something that people will actually use to model things.

MB: So I have to just keep iterating until I accomplish those goals.

MB:How do you find your users in general?

MC: Yeah, so this is always a challenge. You know, I did that work at a university. So at a university, it's easy to get access to students.

MB: You put up the flyer with the little like tabs. Please come and we'll pay you twenty five dollars to program a computer sort of thing?

MC: Yeah… did some of that. Emails to degree programs, that's a really good way because then you get students who are in the right programs, like for example, we had a masters of software engineering program.

MC: And so a lot of those people had some industrial experience.

MC: And there's some evidence that a lot of the practicing software engineers have somewhere in the industry have somewhere between zero and five years of experience in the industry. So if you get people in an MSA program who are generally either right out of undergrad or a few years out, there's an argument that that kind of reflects a significant, significant fraction of, you know, the developer workforce.

MC: It obviously does not reflect all of it. So there are other people that, you know, at companies who have kind of easier access to those kinds of users. And a lot of people are doing that work too, you know, recruiting via the Internet, collaborating with industrial partners, that kind of thing.

MB: You’ve been listening to Marianne Writes a Programming Language. So I’m going to take a break for a few weeks while I do user testing and some more research— as fun as the stream of consciousness episodes are I very much prefer to know what the hell I’m talking about (mostly)

MB: BUT! The prototype that I talked about this episode… that’s online. If you’re down for doing a little passive user testing you can play with the pre-alpha version of Fault by going to Fault-lang.herokuapp.com. That’s (F-A-U-L-T dash L-A-N-G)

DEV Community