Wednesday, May 22, 2013

The future of software, the end of apps, and why UX designers should care about type theory

The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures... Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. […] The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

Fred Brooks

Software is a profound technology with enormous potential, and we are stifling this potential with a bad metaphor. That metaphor is the machine. Currently, we organize software into noncomposable, static machines called applications. These applications ("appliances" is a better word) come equipped with a fixed vocabulary of actions, speak no common language, and cannot be extended, composed, or combined with other applications except with enormous friction. By analogy, what we have is a railway system where the tracks in each region are of different widths and where trains and their cargo must be disassembled and reassembled to transport anything across the country. As ridiculous as this sounds, this is roughly what we do at application boundaries: write explicit serialization and parsing code and lots of tedious (not to mention inefficient) code to deconstruct and reconstruct application data and functions.

So we need to cast aside this broken metaphor and reconceptualize user software as what it actually is--a programming environment which is at its core creative, extensible, and composable.

It's understandable why we still think to build machines using software. Before its discovery, arguably in the 1930s with Alan Turing's invention of the universal Turing machine, human technology had produced only tools and machines--physical artifacts like cash registers, engines, and light bulbs, built for some particular purpose and equipped with a largely fixed vocabulary of actions. With software came the idea that behavior and functionality could be specified as pure information, independent of the machine which interprets them. This raised novel possibilities. As pure information, a program is infinitely copyable at near zero cost, and in the internet age, capable of being transported anywhere on the planet almost instantaneously. A programmer can now miraculously turn thoughts to reality and deploy them around the globe by typing on a keyboard and clicking a few buttons.

With the introduction of software, the machine, which once held primacy for the artifacts and technology produced by civilization, was relegated to an implementation detail, a substrate for the real technology--the specification of behavior in the form of a program. But despite the paradigm shift, we've held on to the notion that technology invariably takes the form of machines with a fixed vocabulary of actions, and we build our applications in this image.

Now stop and think about it for a moment--it is rather strange that we take this infinitely flexible medium, software, and then reserve a tiny, select group of people (programmers) to use its power build applications with largely fixed sets of actions (and we now put these machines on the internet too and call them "web applications"), and whose behaviors are not composable with other programs. Software lets us escape the tyranny of the machine, let's not just use it to build more machines!

So here is what I want to argue: applications ultimately can and should be replaced by programming environments, explicitly recognized as such, in which the user interactively creates, executes, and inspects programs. Interacting with the computer then is fundamentally an act of creation, the creative act of programming, of assembling language to express ideas, access information, and automate tasks. Software presents an opportunity to help humanity harness and channel "our vast imaginations, humming away, charged with creative energy". And as programmers and designers, our goal is to let the natural light of programmability and customizability shine through, not build machines with a fixed set of actions determined ahead of time when we cannot possibly satisfy all the ideas users may wish to express. (And have you noticed how applications accrete feature after feature, but never seem quite capable of doing everything we want?)

The application-oriented viewpoint wouldn't be so bad if applications were composable and could be easily used as building blocks for more complex programs, but applications invariably attempt to assert themselves as the center of the universe and aren't optimized to work in concert or harmony with other applications as part of some larger system.

Bob: Now, wait a minute. Applications usually have an API too, you know. If you really want programmability, why not just use the API?

Alice: I wouldn't say 'usually', but okay, in theory let's suppose that's true. In practice, even though I'm a programmer and could in principle customize the applications I use, I don't because of the friction of doing so. Each application is a universe unto its own, with its own language and idiosyncratic modes of interaction. The situation hasn't improved with web applications, which have somewhat converged on ad hoc JSON+REST protocols as the lingua franca of application programmability.

Bob: What's wrong with that? There are JSON parsers for every programming language under the sun! I even wrote a really fast, push-based, nonblocking parser in 5,000 lines of Java! It's pretty awesome. Check out how I optimized the parsing by hand-coding a switch-statement-based state machine for the parse table to reduce allocation rates and improve cache loc--

Alice: You're missing my point! Compare the overhead of calling a function in the 'native' language of your program vs calling a function exposed via JSON+REST. And no I don't mean the computational overhead, though that is a problem too. Within the language of your program, if I want to call a function returning a list of (employee, date) pairs, I simply invoke the function and get back the result. With JSON+REST, I get back a blob of text, which I then have to parse into a syntax tree and then convert back to some meaningful business objects. If I had that overhead and tedium for every function call I made I'd have quit programming long ago.

Bob: Are you just saying you want more built in types for JSON, then? That's easy, I hear there's even a proposal to add a date type to JSON.

Alice: And maybe in another fifteen years JSON will grow a standard for sending algebraic data types (they've been around for like 40 years, you know) and other sorts of values, like you know, functions.

Bob: Functions?? Are you serious? You aren't talking about sending functions across the internet and just executing them, that's a huge security liability!

Alice: Nevermind that for now. My point is--

Bob: --now wait a minute! You know, I was humoring you earlier by saying if you wanted programmability you could just use the application's API. Okay, for the sake of argument I'll grant that this can be rather inconvenient. But so what? You and I both know that 99.9% of users don't want to program or customize; they are perfectly happy with applications that do one thing, and do it well.

Alice: I wouldn't say 'perfectly happy', I'd say that users are resigned to the notion that applications are machines with a fixed set of actions, and any limitations of these machines must simply be worked around via whatever tedium is required. But of course they would think that--we've never shown our users software that didn't work just like a machine, so how could we expect them to know about the wonderful, customizable universe of possibilities that we programmers get to play in every day? This isn't a good state of affairs, it's sad, and we ought to start doing something about it! It isn't hopeless--in fact, I find that if you get users in the right mindset they are positively incessant about wanting to customize their user experience and the actions supported by an application. It's human nature, our inner spirit of creativity and invention that can never be truly squelched! When we are shown something of use or interest to us, some piece of functionality or data, we begin thinking up possible variations and combinations that also interest us or seem useful.

Bob: Okay, but let's be realistic. Do you really expect your users to be booting up text editors, running compilers, interpreting syntax and type errors and so forth just to get something accomplished?

Alice: Of course not--no user should have to put up with the arcane programming environments that we professional programmers have to endure on a daily basis. Then again, we shouldn't have to either! Which is why the goal of software should not be to build machines, but to build pleasing, accessible programming environments that delight and inspire our users to creation while facilitating the sharing and reuse of programming ideas! Yes, we can and should optimize these environments for programming in various domains, which could include graphical views and so forth, but we should still place these environments in a unified framework rather than in walled gardens of functionality like the current batch of (web) appliances... er, 'applications'.

Bob: So what are you saying? Get rid of Microsoft Word, Outlook, Gmail, Twitter, Facebook, and all the rest?

Alice: Yes! Or rather, I would deconstruct these applications into libraries and grant users access to the functions and data types of these libraries within a grand unified programming environment.

Bob: I want to talk more about that... but in any case, these applications you deride aren't just libraries, they are providing an intuitive interface to functionality that people find valuable, and we are going to need some sort of interface to this functionality that's better than a text editor and the command line. Providing this better interface is what applications do.

Alice: If the interfaces provided by these applications are so intuitive, why are there rows and rows of 'Missing Manual' and 'For Dummies' books covering just about every application under the sun? Applications are failing at even their stated goal, but they do worse than that. Yes, an application is an (often terrible) interface to some library of functions, but it also traps this wonderful collection of potential building blocks in a mess of bureaucratic red tape. Any creator wishing to build atop or extend the functionality of an application faces a mountain of idiosyncratic protocols and data representations and some of the most tedious sort of programming imaginable: parsing, serializing, converting between different data representations, and error handling due to the inherent problem of having to pass through a dynamically typed and insufficiently expressive communication channel! And that's if an application even exposes any significant portion of its functionality through an actual API, which they often don't. We can do so much better!

Bob: All right, I'll bite. Let's hear your story for how to organize the computing world without applications.

Alice: I'm glad you asked...

The world without applications

The 'software as machine' view is so ingrained in people's thinking that it's hard to imagine organizing computing without some notion of applications. But let's return to first principles. Why do people use computers? People use computers in order to do and express things, to communicate with each other, to create, and to experience and interact with what others have created. People write essays, create illustrations, organize and edit photographs, send messages to friends, play card games, watch movies, comment on news articles, and they do serious work too--analyze portfolios, create budgets and track expenses, find plane flights and hotels, automate tasks, and so on. But what is important, what truly matters to people is simply being able to perform these actions. That each of these actions presently take place in the context of some 'application' is not in any way essential. In fact, I hope you can start to see how unnatural it is that such stark boundaries exist between applications, and how lovely it would be if the functionality of our current applications could be seamlessly accessed and combined with other functions in whatever ways we imagine. This sort of activity could be a part of the normal interaction that people have with computers, not something reserved only for 'programmers', and not something that requires navigating a tedious mess of ad hoc protocols, dealing with parsing and serialization, and all the other mumbo-jumbo that has nothing to do with the idea the user (programmer) is trying to express. The computing environment could be a programmable playground, a canvas in which to automate whatever tasks or activities the user wished.

Let me give an example of the problems with the current application-oriented model, and show what possibilities are put out of reach by our current framing of software. Please don't get bogged down in the details, I'm just trying to be illustrative here.

Suppose Carol and Dave are a young, conscientious couple intent on being disciplined about saving for retirement. But, they want to enjoy their time together as well, and so as part of their budget, which they manage using Mint.com, they allocate $200 per month to a virtual 'vacation' fund which accumulates from month to month. They also keep a shared Google doc in which they both jot down ideas for places they'd like to go and things they might like to do. Periodically, they take a vacation, drawing ideas from this doc. They make sure to keep the total cost of the trip under the amount that has accumulated into their vacation fund, and then attribute the cost of the trip to their vacation budget so it is deducted by Mint.com.

So far so good, but Carol, who is the planner in the relationship, notices that whenever she plans a vacation for the two of them she's doing a similar sort of thing. First, she opens up the Mint.com application to see how much money has accumulated in their vacation fund. Next, she opens up the Google doc to remind herself of the possible locations for trips they could take. Then, she goes to Kayak.com and searches for plane flights under the budget price, taking care to reserve enough leftover money for booking a hotel (say, on Hotels.com) and whatever other expenses are to be expected on the trip. It's a complex process, with lots of information and factors to keep straight, and it must be repeated each and every time they wish to plan a trip. Carol wonders if it would be possible to automate this process somehow, at least partially. She'd like a program that extracts a list of locations from their shared Google doc, then gets a list of possible flights to these locations and a list of possible hotels, then filters out any flight+hotel combinations that exceed the budget, then gives her the opportunity to interactively filter and browse through possible results, perhaps even allowing for interactive adjustments to certain base assumptions like the daily cost of miscellaneous expenses while on vacation, the dates of the trip, etc. This would save a lot of time and make the planning process more fun and creative.

Unfortunately, this sort of thing just isn't possible today. Kayak.com and Mint.com both lack APIs! Mint lets users download their transaction history, but this history doesn't indicate how much money has accumulated in each budget category. Kayak is even worse--it lacks a search API entirely.

So it seems Carol and Dave would be reduced to screen scraping if they want to programmatically build on Kayak and Mint. Google docs at least comes equipped with an API, but it's an ad hoc XML over REST API and there's friction associated with its use due to having to parse XML and so on. Overall, the friction and overhead to implementing this automation idea is way too high to justify it, so Carol doesn't bother and just does everything manually, or worse, gives up on a dream vacation!

Now let's imagine how things could be. Kayak, Mint, and Google docs would be, first and foremost, libraries rather than applications. Each might come equipped with custom views or editing environments for writing and executing certain 'shapes' of programs, but these views would not be their primary (or only) mode of interaction, as they are now. Instead, the collection of functions and data types in these libraries would be primary, and accessible within the unified programming environment of the user's desktop. This programming environment, moreover, would allow for transparent access to remote functionality, so users could write programs that call functions exposed via cloud services as well as functions defined locally.

If that example seems contrived, here's a more 'serious' one: a widget-making business has a customer relationship management (CRM) application that's used by the sales team. For each potential client, they make notes about what widget features clients are most interested in. The company also uses some project management software that lets them track features, improvements, and fixes to the product, and group these into releases. Whenever the company rolls out a new version of the widget product, the sales team would like to cross reference the list of changes that can be extracted from the project management software with the list of all the clients or leads that would be interested in these changes. Moreover, it would be nice to be able to take this list of potential clients who might be interested in newly released features and perhaps even assemble a form email calling out the particular features or improvements in the new version that that particular client was interested in. The sales team can of course add any personal touches to the emails before sending them to the potential clients.

Today, this process might end up being done manually, which doesn't scale very well if a business has hundreds or thousands of 'live' sales leads and a large number of features that they roll out with each release. Even if both the CRM and the project management app come with APIs, there is quite a bit of friction involved in writing a program that 'speaks' both APIs and handles all the boring concerns like parsing, deserialization, error handling, and so on.

I just made up these use cases, and I could come up with hundreds of others. No one piece of software 'does it all', and so individuals and businesses looking to automate or partially automate various tasks are often put in the position of having to integrate functionality across multiple applications, which is often painful or flat out impossible. The amount of lost productivity (or lost leisure time) on a global scale, both for individuals and business, is absolutely staggering.

Bob: All right, I think I finally see what you're getting at. These are very old ideas, you know. Haven't you ever heard of the Unix Philosophy? In fact, I could probably implement most of your use cases with 'a very small shell script'.

Alice: You make it sound like Thompson and Ritchie invented the idea of composition. Mathematicians have been composing functions for hundreds (or even thousands) of years before that without making such a fuss about it or waving any sort of philosophical flag. But anyway, I would love to see you try to implement those tasks with a shell script, as you say. Have you ever tried reading a shell script written by someone else that's longer than 10 lines or so? I'm a professional programmer, well-trained in navigating all the arcane nonsense that's common in software, and a small part of me dies every time I have to write or read a bash script. I appreciate the spirit of the Unix Philosophy, but the implementation, of writing programs in a terrible language that read and write 'vaguely parseable text' leaves a lot to be desired. And JSON and XML aren't much better, either.

Bob: So you really think that Carol and some sales guys are going to be writing programs, even if it is some theoretical future souped-up graphical programming environment?

Alice: Why does that seem so unlikely to you?

Bob: Because writing software is complicated! I know because I'm a professional programmer. We can't expect the masses to be writing the sort of complex program that we professional programmers produce.

Alice: 'Complex programs'? You mean like Instagram? A website where you can post photos of kittens and subscribe to a feed of photos produced by other people? Or Twitter? Or any one of the 95% of applications which are just a CRUD interface to some data store? The truth is, if you strip applications of all their incidental complexity (largely caused by the artificial barriers at application boundaries), they are often extremely simple. But in all seriousness, why can't more people write programs? Millions of people use spreadsheets, an even more impoverished and arcane programming environment than what we could build.

Bob: Maybe so, but I still don't think that a programming environment can ever be accessible to the majority of people. Spreadsheets are a good example--they are a rather accessible (if limited) form of programming, and not everyone uses the programmability of spreadsheets or even wants to!

Alice: And two thousand years ago, most of the population was illiterate and arithmetic was considered too difficult for the average person, yet now we teach kids these things in elementary school. The truth is, we don't really know how many people might program if given a learnable programming environment and programming were reduced to its exhilarating, creative essence. I worry we have raised generations of programmers who are simply very good at tolerating bullshit and, paraphrasing Paul Lockhart, the most talented programmer of our time may be a waitress in Tulsa, Oklahoma who considers herself bad at computers. The spreadsheet brought programming (in a limited fashion) to millions of people, and a more accessible environment could bring it to millions or billions more. Who are you, with your limited imagination, to place a ceiling on how accessible programming could be? Well, the world is what we make of it, and I want to make a world in which applications die off, programming is no longer the awkward, arcane and tedious process it often is today, and where the internet is used to transparently share, use, and compose functionality across the internet. Which brings me to my next point...

What's wrong with the internet

The internet contains vast pools of data and functionality largely trapped within noncomposable applications all competing to be the center of the universe.

The economy of the internet is deeply broken. Have you ever wondered why the internet market is dominated by a few huge businesses like Google, Facebook, Twitter, etc? High transaction costs imposed by application boundaries have distorted the software economy, making it artificially expensive to integrate functionality from third-parties. This selects for larger businesses with the resources to develop and integrate functionality internally, which they do using composable libraries within their own application boundaries. From here, network effects due mostly to high switching costs (again, because of application boundary friction) sustain the positions of these larger market players. We essentially have a situation in which these larger market players own a significant portion of the network effects on the web. It would be preferable if ownership of these network effects were transferred to the public domain and businesses were forced to compete on their ideas and cleverness in describing these ideas in software, rather than competing as they do now on how well they can coax users into entering various walled gardens and keep them there with lock-in and high switching costs. With a unified programming environment spanning the web (I'll say more about this in another post), we could see these transaction costs and switching costs drop to nearly zero and a radical democratization of the internet market as ownership of these network effects is transferred to the public domain.

Unlike the production of many physical goods and services, software does not have any natural economies of scale. Arguably, there are diseconomies of scale with software--per unit of functionality, software becomes harder to write with the addition of more people, resources and code, because of the complexity of managing a large codebase and coordinating concurrent development. Large businesses with significant codebases fight a constant (losing) battle against entropy and employ armies of developers to maintain and make rudimentary additions to functionality. The 'economies of scale' with software are almost entirely due to artificially high transactions costs caused by the application-centered world view and the lack of a unified computational framework owned by the public. As a civilization, we would be better off if software could be developed by small, unrelated groups, with an open standard that allowed for these groups to trivially combine functionality produced anywhere on the network.

What I am proposing is a radical shift that could mean the end of huge internet businesses like Google and Facebook. Or rather, it means that Google and Facebook would be forced to compete on functionality with programmers all over the world, any of whom could write similar functionality that could be substituted for Google/Facebook functionality with literally zero switching costs. Oh, I might use Google as 'cloud provider', a place to stick my data and my computations, but this would be using Google as a commodity, an implementation detail, much the way I use the physical computer on which I type this right now. At any point, I could choose to transfer my data and personal functionality to another cloud provider, again with zero switching costs. And while we're at it, perhaps we could dispense with cloud providers entirely and replace them with a peer-to-peer network in which individuals share compute time and local storage!

Bob: I wouldn't knock Google, Twitter, and Instagram... they are serving literally millions of concurrent users. That's a serious technical challenge, you know.

Alice: A serious technical challenge that has been created artificially! In the world I envision, the (limited) functionality of sites like Twitter could be written as a library and then used in a decentralized way by anyone connected to the internet. Writing such a library would require no servers, no capital, and could be completed by a programmer (or user) in a weekend! Think about it--if I write quicksort as a library function, is there any 'serious technical challenge' in making it possible for my function to be used by millions of users? No, of course not, because my function is pure information and can be transported all over the world and run by a billion people simultaneously, without my having to do anything other than put the code somewhere connected to the internet. But for some strange reason, if I write a function that operates on the follows-graph maintained in an (unnecessarily) centralized way by Twitter, I need to deal with all sorts of complexity if I want this function to be used by more than a few hundred people concurrently? Twitter (and Facebook, and Instagram, and Google) are solving problems created by the 'application as center of the universe' viewpoint that is so common today.

Bob: Even so, I think you are vastly underestimating the complexity of the software that these companies produce. These companies are coordinating the activities of fleets of computers, doing error handling and recovery, and wrapping up often complex functionality in nice, usable interfaces (which by the way have seen many man months worth of tuning and testing) that you do nothing but complain about! We have it so easy!

Alice: And yet, I still can't get Gmail to do even simple tasks like schedule an email to be sent later or batch up all incoming emails containing a certain phrase into a weekly digest! By the way, I just thought up those use cases on the spot, I could think of dozens more that aren't supported. The problem is, I don't want a machine, I want a toolkit, and Google keeps trying to sell me machines. Perhaps these machines are exquisitely crafted, with extensive tuning and so forth, but a machine with a fixed set of actions can never do all the things that I can imagine might be useful, and I don't want to wait around for Google to implement the functionality I desire as another awkward one-off 'feature' that's poorly integrated and just adds more complexity to an already bloated application.

Bob: Okay, if you want a toolkit, how about Yahoo! Pipes, or If This Then That? Aren't those sort of what you want?

Alice: Absolutely not. For one, I don't want my data and functionality locked up with a particular provider like that. I want an open platform. Who knows when Yahoo! might kill off Pipes or start changing inordinate sums of money for it, and who knows if ITTT is going to even be around a year from now given that they seem to have no business model. I would only use these services for throw-away code I don't care about. Have you ever noticed that all the programming languages people use voluntarily are open source? I think it's because no one wants their creations owned by anyone. But beyond that, the bigger reason I don't like these services is that I want a real programming language, with a real type system that lets me assemble complex functionality with ease and guides me through the process.

Why UX designers should care about type theory

Applications are bad enough in that they trap potentially useful building blocks for larger program ideas behind artificial barriers, but they fail at even their stated purpose of providing an 'intuitive' interface to whatever fixed set of actions and functionality its creators have imagined. Here is why: the problem is that for all but the simplest applications, there are multiple contexts within the application and there needs to be a cohesive story for how to present only 'appropriate' actions to the user and prevent nonsensical combinations based on context. This becomes serious business as the total number of actions offered by an application grows and the set of possible actions and contexts grows. As an example, if I just have selected a message in my inbox (this is a 'context'), the 'send' action should not be available, but if I am editing a draft of a message it should be. Likewise, if I have just selected some text, the 'apply Kodachrome style retro filter' action should not be available, since that only makes sense applied to a picture of some sort.

These are just silly examples, but real applications will have many more actions to organize and present to users in a context-sensitive way. Unfortunately, the way 'applications' tend to do this is with various ad hoc approaches that don't scale very well as more functionality is added--generally, they allow only a fixed set of contexts, and they hardcode what actions are allowed in each context. ('Oh, the send function isn't available from the inbox screen? Okay, I won't add that option to this static menu'; 'Oh, only an integer is allowed here? Okay, I'll add some error checking to this text input') Hence the paradox: applications never seem to do everything we want (because by design they can only support a fixed set of contexts and because how to handle each context must be explicitly hardcoded), and yet we also can't seem to easily find the functionality they do support (because the set of contexts and allowed actions is arbitrary and unguessable in a complex application).

There is already a discipline with a coherent story for how to handle concerns of what actions are appropriate in what contexts: type theory. Which is why I now (half) jokingly introduce Chiusano's 10th corollary:

Any sufficiently advanced user-facing program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of a real programming language and type system.

Programming languages and type theory have largely solved the problem of how to constrain user actions to only 'appropriate' alternatives and present these alternatives to users in an exquisitely context-sensitive way. The fundamental contribution of a type system is to provide a compositional language for describing possible forms values can take, and to provide a fully generic program (the typechecker) for determining whether an action (a function) is applicable to a particular value (an argument to the function). Around this core idea we can build UI for autocompletion, perfectly appropriate context menus, program search, and so on. Type systems provide a striking, elegant solution to a problem that UX designers now solve in more ad hoc ways. These ad hoc methods don't scale and can never match what is possible when guided by an actual type system and the programming environment to go with it.

The work that remains is more around how to build meaningful, sensitive, real-time interfaces to the typechecker and integrate it within a larger programming environment supporting a mixture of graphical and textual program elements. Note that the richer the type system, the more mileage we get out of this approach.

Conclusion

I'll conclude with a great quote by Rúnar Bjarnason, explaining how we got to this point, and what's wrong:

In the early days of programming, there were no computers. The first programs were written, and executed, on paper. It wasn't until later that machines were first built that could execute programs automatically.

During the ascent of computers, an industry of professional computer programmers emerged. Perhaps because early computers were awkward and difficult to use, the focus of these professionals became less thinking about programs and more manipulating the machine.

Indeed, if you read the Wikipedia entry on "Computer Program", it tells you that computer programs are "instructions for a computer", and that "a computer requires programs to function". This is a curious position, since it's completely backwards. It implies that programming is done in order to make computers do things, as a primary. I’ll warrant that the article was probably written by a professional programmer.

But why does a computer need to function? Why does a computer even exist? The reality is that computers exist solely for the purpose of executing programs. The machine is not a metaphysical primary. Reality has primacy, a program is a description, an abstraction, a proof of some hypothesis about an aspect of reality, and the computer exists to deduce the implications of that fact for the pursuit of human values.

Though the post talks specifically about not creating our programming languages in the machine's image, we should apply the same reasoning to the useful bundles of data and functionality that we now call 'applications'.

So there you have it. The machines are no longer primary. End the tyranny of applications!

Saturday, February 23, 2013

Why functional code is shorter

Was rereading John Carmack's Functional Programming in C++ post. For the most part, it's an excellent discussion of FP and its benefits, but there are a few points I wanted to dissect. Here, Carmack suggests the conciseness of functional programs has more to do with features like pattern matching and list comprehensions:

The process of refactoring towards purity generally involves disentangling computation from the environment it operates in, which almost invariably means more parameter passing. This seems a bit curious – greater verbosity in programming languages is broadly reviled, and functional programming is often associated with code size reduction. The factors that allow programs in functional languages to sometimes be more concise than imperative implementations are pretty much orthogonal to the use of pure functions — garbage collection, powerful built in types, pattern matching, list comprehensions, function composition, various bits of syntactic sugar, etc. For the most part, these size reducers don’t have much to do with being functional, and can also be found in some imperative languages.

Features like pattern matching and list comprehensions can indeed make code shorter, and they are convenient, but they're just constant factors. Like Carmack says, imperative programs could have access to these features too. The real decrease in code size when using FP comes from there being exponentially more code reuse possible due to the compositionality of pure code. I don't mean compositionality in the narrow sense of function composition, I mean it in the sense of being able to assemble complex functionality by sticking together simpler components in various ways. And compositionality dwarfs all other factors affecting our ability to write complex software.

The 'convenience' features are important more because they reduce the friction of composition (for instance, if your syntax for function literals are too heavyweight you are discouraged from writing your list-processing code by composing use of higher-order functions like map and filter, and you are discouraged from ever writing higher-order functions yourself).

We have only just begun to appreciate how compositionality can change the nature of software. I keep getting surprised by it. The progression of functional programming shows the rabbit hole goes very deep--yes, we can write loops with less boilerplate using map and filter, but we can also write combinator libraries, which let us assemble programs compositionally within a domain. Going further, we can abstract over commonalities across libraries, letting us assemble libraries for a domain from a preexisting suite of compositional library construction components (typeclasses like Monad, Applicative, and so on). And further patterns emerge from there--FP has now discovered patterns of architecture, like stream processing libraries, in which the typeclasses themselves are simply building blocks.

Moving on, another point Carmack mentioned was that we can't maintain purity throughout our programs since we eventually have to interact with the outside world:

Not everything can be pure; unless the program is only operating on its own source code, at some point you need to interact with the outside world. It can be fun in a puzzly sort of way to try to push purity to great lengths, but the pragmatic break point acknowledges that side effects are necessary at some point, and manages them effectively.

One of the big discoveries of FP is that while effects need to occur, observable side effects are not necessary. This is more than an accounting trick; we can indeed write programs that have effects on the outside world and which mutate data, etc, all while preserving referential transparency of the overall system and letting us program in a nice compositional style. FP is not in any way a restriction on what can be expressed, only a restriction on how it is expressed. It only requires us to be 'honest' about what is occurring, not limiting what functionality is expressible.

Tuesday, February 12, 2013

The design of a decentralized currency (part 1)

This is the first of a series of blog posts discussing issues involved in the design of a decentralized (central bank free) system of digital currency. The goal is a currency that avoids typical pitfalls associated with fiat currency and central banking, as well as problems associated with commodity-backed currencies like a gold standard or simplistic digital systems like Bitcoin. I am interested in exploring the idea that currency design is essentially a computational problem, and that the functions of a central bank might be better handled by a decentralized 'smart' currency that becomes possible with software in the digital world.

This should hopefully be a very beginner-friendly series of posts, though it should be of interest and accessible to both economists and programmers. I will start from elementary first principles and use the opportunity to introduce topics and terms that are relevant. (For instance, for programmer and computer scientist readers, I won't assume any prior knowledge of finance, monetary systems or banking, and for economist readers, I won't assume any knowledge of distributed algorithms, cryptography, and so on)

The purpose of money

Before launching into the design of a monetary system, we need to answer a fundamental question: what is the purpose of money?

Let's consider a hypothetical economy in which there are just two goods, apples and oranges. Suppose that hypothetical market participant Alice has an apple, and Bob has an orange, but Alice would really prefer to eat an orange (Bob's, or really any comparable orange!) and likewise Bob would really prefer an apple. Even without money involved, in a strictly barter economy, Alice and Bob can trade--Alice gives Bob her apple in exchange for Bob's orange. (Aside: Alice and Bob are common placeholder names used in discussions of crypography)

Let's notice some things:

  • If this transaction is voluntary, which we have assumed it is, then both Alice and Bob are better off as a result of making this exchange, otherwise they would not have agreed to it. We say the transaction is mutually beneficial to all parties involved.
  • Actually, to be a bit more precise, Alice and Bob will agree to the exchange so long as they both believe they will be better off. In reality, each may learn that they are not better off after all. For instance, Alice may discover that Bob's orange was in fact rotten and regret the exchange (after all, a perfectly good apple is better than a rotten orange)--that is, the exchange may have been made with bad or imperfect information. Or Bob may be giving up his orange when in fact, if he were really being honest with himself, he much prefers oranges--that is, Bob may be irrational. Although these are real problems and econonomists have studied them extensively, we are not going to talk about them here, since they are orthogonal to the problem of designing a monetary system.

Let's now introduce currency. Suppose Alice has $1, and an apple, but would prefer an orange, and Bob has an orange. Assume the 'going rate' (the price) for oranges in this economy is $1--Alice pays Bob $1 for his orange, and he turns around and pays Alice $1 for her apple (again, assuming the going rate for oranges is $1). The same exchange has been conducted as before, but it has been mediated by the use of currency. We can think of currency as enforcing a very simple tit-for-tat system. If Alice provides something of value to Bob (say, her apple), she obtains a token (the $1) that she can use to obtain something of equivalent value from someone else (or multiple people), perhaps Bob, or perhaps 50 others, from whom she purchases a sunflower seed each (for Alice, assuming she would trade her apple for 50 sunflower seeds).

But the use of money can conveniently facilitate a number of other mutually beneficial transactions beyond what simple bartering can achieve, for instance:

  • Suppose Alice's true favorite fruit is mangos and Carol has a mango but would prefer an orange. Alice can buy Bob's orange for $1, then use the proceeds to purchase a mango from Carol, which Carol then uses to purcahse an apple from Alice. A mutually beneficial exchange has taken place in which Alice, Bob, and Carol each end up with their preferred fruit, but Bob and Carol did not even need to know about each other, nor did the three parties have to get together and agree on any sort of explicit contract!
  • Alice may decide she's getting a bit tired of mangos. She sells her mango now, and a few weeks later, after her love of mangos has returned, she buys another mango from someone else. Alice has traded 1 mango in the present in exchange for a mango in the future. This sort of exchange, trading something now for something in the future, raises further questions that we'll discuss in the next post.
  • Fast-forward to the modern economy. Alice may work as a rocket scientist for Acme Rockets, Inc., where she earns a salary. She buys a cup of coffee one morning for $5. She has effectively traded a portion of her time and expertise as a rocket scientist for a cup of coffee. The coffee shop does not need to know or care about rockets--the shop will use the proceeds from the sale to purchase other goods and services it values.

Even these do not even begin to exhaust the possible mutually beneficial transactions that we can imagine. But in the case of each such transaction, we could imagine writing up a contract that specifies who is to provide and receive what, and getting all parties to agree to and sign the contract. For instance:

Effective June 3, 1999, at 3:45pm EDT:

Alice is to provide an apple, 
Carol is to provide a mango, and 
Bob is to provide an orange,

AND

Alice is to receive a mango, 
Carol is to receive an orange, and 
Bob is to receive an apple. 

Signed, 

Alice: ___________________
Bob:   ___________________
Carol: ___________________

Currency can be thought of as a much more convenient way to arrange and execute these sorts of contracts, which could in principle be arbitrarily complex, involve thousands of people, multiple timescales, and so on (we'll see examples of this later).

The more technical way to state that money is a more convenient is that transactions mediated by currency have lower transaction costs than direct barter or using explicit contracts. In general, when multiple parties enter into some mutually beneficial transaction, there are some costs to doing so. I am using 'cost' in a more abstract sense here. The cost could be a literal fee, like a sales tax imposed by a government, or it could simply be some burden the parties to the contract will have to bear for the transaction to take place. In a direct barter economy, these costs would include, among other things, the cost to Alice and/or Bob of traveling to the same physical location to conduct the exchange (or having to pay for some shipping service), the extra work that Alice must do to first to obtain an orange from Bob before finding Carol and exchanging for her mango, and so on. If transactions are conducted with explicit contracts, there will be overhead to writing up the contract, having all parties read and understand it, perhaps hire lawyers, etc. (Not to mention paying for the system of government likely needed to enforce such contracts, mediate disputes, and so forth)

Even more abstract, if buyers and sellers have difficulty finding each other, this can result in a form of transaction costs if the seller has to accept a lower price than he would otherwise be able to obtain if buyers and sellers could find each other more easily. For instance, suppose Alice doesn't know about Bob, but only knows about Dave, who demands two apples in exchange for an orange. Alice may decide this transaction wouldn't benefit her and decide to keep her apple, when if she and Bob could have been paired up they could have made an exchange that benefited them both. The technical way to state this is that insufficient liquidity can result in higher transaction costs. We will discuss this much more in a later part.

Transaction costs are a kind of friction in the economy. They prevent what would otherwise be mutually beneficial transactions from taking place. It would be preferable to Alice not to have to travel to wherever Bob and Carol are; it would be preferable to Alice not to have to pay a lawyer to write up an explicit fruit-exchange contract, or even to have to read a contract at all, even if comes in an email and she has to spend fifteen seconds reading it before replaying 'Yes'.

We can now state the primary goal of a monetary system: facilitate mutually beneficial transactions, with a minimum of transaction costs. We can see that, all else being equal, we would prefer that transaction costs be as low as possible, to promote the maximum number of mutually beneficial transactions.

From here, things will get more complicated. From here we will discuss banking, loans, interest rates, inflation, economic growth, the velocity of money, aggregate demand, etc. It's easy to get lost in these discussions, to forget that money has no intrinsic value (nor should it!) and is merely a more convenient way of arranging, agreeing to, and executing mutually beneficial contracts. Any collection of exchanges of money and goods could in principle be represented using explicit contracts--even exchanges involving billions of dollars, multiple countries, timescales, and huge numbers of people. When deciding on how to engineer a monetary system, we will return again and again to the perspective: what will ensure that parties can participate in mutually beneficial transactions, with a minimum of transaction costs?

By the way, I realize these examples involving Alice, Bob, and Carol might seem silly or trivial. And this is exactly the point; simple examples like this are often the most illuminating. I'll end with a relevant quote from Paul Krugman, from an essay where he uses a hypothetical economy producing only hot dogs and hot dog buns (!) to reason about questions of international trade:

You can't do serious economics unless you are willing to be playful. Economic theory is not a collection of dictums laid down by pompous authority figures. Mainly, it is a menagerie of thought experiments--parables, if you like--that are intended to capture the logic of economic processes in a simplified way. In the end, of course, ideas must be tested against the facts. But even to know what facts are relevant, you must play with those ideas in hypothetical settings.

We're only just getting started! Stay tuned for part 2.

Thursday, January 19, 2012

Possible projects for Boston Haskell Hackathon: smarter evaluation strategies and refactoring combinators

This weekend I'm going to be participating in the Boston Haskell hackathon. I'm very excited about it and I have a couple idea for projects to work on. If any of these sound interesting or you are thinking of something similar, I'm looking for people to collaborate with! Send me an email or come talk to me in person! I think I'm going to get there sometime in the late afternoon to early evening on Friday and I'll be around all weekend.

Evaluation strategies for Haskell that don't leak space

The first project is doing some research / prototyping of an alternate evaluation strategy with the same termination properties as normal order evaluation, but with much easier reasoning about space usage. For lack of a better name, I'm calling it specializing, strictness-propagating evaluation. In this model, calling a function is something like two communicating coroutines. When calling a function, the callee begins evaluating its body, yielding control back to the caller when it needs its first argument, and also indicating whether that argument should be strict or lazily passed, using whatever information is available at runtime. Subsequent arguments work similarly. As a result, functions are automatically specialized as arguments are passed, and we do not construct thunks if they are going to be consumed strictly by a subsequent callee. This can be implemented efficiently using just two call stacks, and there are various optimizations to the scheme. It is intended to augment, not replace, the existing static strictness analysis and argument passing.

Here's an example working through this for the if function, which let's assume has the following implementation:

foldl f z l = case l of 
  [] -> z
  h:t -> foldl t f (f z h)

I'm also going to assume we've done some static strictness analysis to determine that all branches evaluate z and that therefore the h:t branch evaluates f (since all branches evaluate z and in the h:t case, f appears at the head of an expression passed as z). Suppose we call this with foldl (+) 0 [1,2,3,4].

  1. Caller pushes foldl onto the call stack. foldl begins evaluating with no arguments. It gets as far as the case l. It will then request l strictly, since it is about to evaluated it anyway.
  2. To request the argument, foldl pops its currently running frame from the call stack and pushes it onto the save stack. It then resumes the caller now at the top of the call stack with an argument of strict.
  3. The caller passes the argument as requested - if the caller were itself receiving these arguments as function parameters, it would propagate the strictness request of foldl to its caller.
  4. To resume foldl, it pops foldl from the save stack and pushes it onto the call stack, giving it the (strictly evaluated) list it requested.
  5. Now we hit the interesting case: inside the h:t branch, we know that z is strict (this is known statically). We also know that f can now be evaluated, so we request this argument strictly from our caller. With f now evaluated, we can propagate its stricness information. We know we will be evaluating f z h - what we did not know until runtime was that f was plus (let's just say it was + specialized to Int), and therefore static SA has no choice but to pass f z h as a thunk. We now know that f is strict in both its arguments, so the call to f z h means we can fully evaluate z (which we do by requesting z strictly from our caller), h, and then f z h.

Each step of the iteration works similarly and foldl ends up running in constant space. I'm handwaving a lot here, but in general I want an evaluation order that is totally predictable in its space usage - values are immediately forced as soon as their consuming functions are known at runtime. The consuming functions tell us if an argument will ultimately be forced so we find out sooner rather than building up enormous thunks.

This needs some serious whiteboarding, but assuming it is at all sensible, here's what I propose doing:

  • Come up with an instruction set for this evaluation model, and write a simple interpreter for it
  • Write a compiler for a toy functional language to this instruction set, including the basic static analysis needed to kickstart the dynamic analysis
  • Try writing some programs with it

Some other interesting ideas - I wonder if there's some way to embed this evaluation model in GHC itself.

A code database for Haskell and refactoring combinators

The other project I'm interested in working on is a code database for Haskell, and a Datalog interpreter to go with it. Using this database and the datalog query language, I then want to implement a set of refactoring combinators. A "refactoring" is simply a compilation-preserving function from one code database to another. I've started tinkering with a set of combinators that individually preserve compilation and can be composed to allow arbitrary code transformations. I wrote up some ideas for that here:

... Refactoring times in this new model will go from weeks or months to hours, and writing code to transform a codebase will become a separate but critical skill, distinct from the usual act of programming. That is, programmers do not simply conceive of a refactoring (which is often quite simple to express to another programmer), then begin a tedious, manual and error-prone process of text munging to implement it. Instead, the programmer conceives of a refactoring, then conceives of a code transforming program to implement the refactoring, then applies this transformation to the code database, all in the span of a few hours.

... First, I am not advocating for datalog syntax. I don't care about that. The key functionality enabled by datalog over and above the relational algebra is the ability to express transitive closure and mutual recursion guaranteed to terminate. Together these features enable many of the common queries we'd like to express in transforming and querying our codebases. For instance, here is a hypothetical query to find all references to a given function id, fid. Don't worry if the syntax looks alien or doesn't make sense. The key is more that this query is just a few lines of code to express, and it can be reused and built upon.

-- propagate reference to containing apply
refs(Id) :- apps(Id, fid, _). 
refs(Id) :- apps(Id, _, fid).
refs(Id) :- refs(X), apps(Id,X,_).
refs(Id) :- refs(X), apps(Id,_,X).
-- any lambda whose body is or contains fid
-- is considered to reference fid
return(Id) :- lambdas(Id,_,Id1), refs(Id1).
return(Id) :- lambdas(Id,_,fid).

Much of the analysis required to implement refactorings has this sort of "transitive-closure" feel to it - you need to do something to the "direct" callers, then do some transformation for their callers as necessary, and so on.

Here's what I propose for this project:

  • Implement datalog, possibly backed by just in-memory data structures, or maybe tied to something like SQLite. Or if there's an existing free datalog interpreter and backend for it somewhere, let's see if we can use that.
  • Come up with the normalized datalog representation for the Haskell AST and type information - besides just the AST I think you'll need to know all the type information. Is there some way to use the GHC API to get the type of all expressions in the
  • Implement or steal a Haskell parser, and write code to translate this to the normalized datalog representation. As a proof of concept, take some existing Haskell project and "code-database-ify" it.
  • Come up with a good set of refactoring combinators. Implement them using datalog. As a proof of concept, use the combinators to express some nontrivial refactoring (like - make this value monadic rather than pure, and propagate the change in calling convention to all direct and indirect callers as needed - this is exactly the sort of refactoring that is trivial to describe to another Haskell programmer, and is totally mechanical, but is still done via a tedious process of text munging)

If all this is too much, I propose not doing this for Haskell but instead for a toy functional language with a very simple AST and type system.

Wednesday, December 28, 2011

The future of programming

What will programming look like 10 or even 20 years from now? With another new year almost here, now is the time to wax philosophical about the future of our industry. We are on the cusp of a number of major transformations in programming that will make 2011 programming technology, techniques, and ideas seem primitive by comparison. The transformations will occur in several key areas: tooling and infrastructure, languages and type systems, and runtime systems.

Before I get started, let me preface this all with the disclaimer that this should be taken with a grain of salt, in the spirit of being provocative, and these are not really predictions per se. There are lots of reasons why inferior technologies might continue to dominate the industry longer than expected. I won't talk much about that here, although it's certainly interesting. No, instead I want to give my vision of the future of programming. If there were no accumulated network effects attached to all the mediocre technologies currently in use for programming, if the programming world were given a clean slate and the directive to invent the future, what would I want to result?

An overarching theme to all these transformations is the move away from incidental structure and its close cousin incidental complexity. Manifested in various forms, these factors are major barriers to the building programs of greater complexity, and many of the major transformations will be to enable lifting of these barriers. The resulting programming world will look much different--orders of magnitude more code reuse, ease of deployment, efficiency, and much more.

Where we begin

The first major change is the move away from storing programs as text, split across "files". You can be forgiven for thinking of this as a minor thing. Who cares how programs are stored or represented, right? In fact, this major source of incidental complexity has spillover effects all the way down the programming toolchain, starting with the pollution of programmers mental thought processes, continuing to development environments (which I'll discuss next), and then to build systems and deployment. Like many of the things I discuss here, the full extent of lost productivity due to this incidental complexity is below most programmers radars. Programmers have unconsciously accepted this lost productivity, to the point that many now rationalize their dependence on files and text for programming.

But what about intentional programming, you ask? Or Smalltalk development environments? The idea of moving away from programs as text spread across files is not exactly new, but the devil is in the details. Previous efforts to rid ourselves of this incidental structure have failed in part because they merely substituted one arbitrary structure for another. For instance, intentional programming advocated the good idea of separating code presentation from its storage, substituting instead another format like XML with only slightly less incidental structure. Any advantage conferred by the new format and structure was not exploited to significant benefit, and there was a huge, very real cost in terms of lost network effects from all the text-based tools that could no longer be leveraged. Likewise for Smalltalk development environments. Also, somewhat ironically for Smalltalk, being an OO language, its fundamental premise includes one of the most arbitrary choices of all, in some sense the ultimate generator of incidental complexity, namely, the decision of which class should implement a particular method.

There is a real barrier to entry here for any new technology. It doesn't need to just be better than the antiquated status quo; it needs to be significantly so to overcome the network effects and switching cost advantage that status quo technologies inherently posses. In my opinion, none of the proposed text-in-file alternatives so far have come close to overcoming these handicaps.

But this is nothing fundamental. We should not conclude from these failed experiments that the idea is unsound. It's merely been poorly executed. The first major shift in getting this right is not storing programs as text at all, not even some normalized textual form like XML, and not some normalized binary form which nonetheless contains a huge amount of incidental structure. Likewise, get rid of files. Instead, a codebase is stored relationally, as a database, with some version of datalog being used for querying and update. Careful thought will be given to this query-update language so that many of the large-scale refactorings that are currently difficult or impossible can be fully automated, or guided automated, with just a few lines of this codebase transformation code. As a simple example, we eliminate the notion of where a function or datatype "is located". Instead, each function gets a unique id (perhaps content addressed, to enable a simple, automated form of code deduplication), and a separate names table maps these ids to names for purposes of rendering the code in some fashion to a human (and in principle, there is really no reason why there can't be multiple names tables, with different programmers choosing different names for the same operation). This representation trivially enables the "renaming" refactoring in just a single table cell update.

Let me sketch out a few additional thoughts on how this could work. First, I am not advocating for datalog syntax. I don't care about that. The key functionality enabled by datalog over and above the relational algebra is the ability to express transitive closure and mutual recursion guaranteed to terminate. Together these features enable many of the common queries we'd like to express in transforming and querying our codebases. For instance, here is a hypothetical query to find all references to a given function id, fid. Don't worry if the syntax looks alien or doesn't make sense. The key is more that this query is just a few lines of code to express, and it can be reused and built upon.

-- propagate reference to containing apply
refs(Id) :- apps(Id, fid, _). 
refs(Id) :- apps(Id, _, fid).
refs(Id) :- refs(X), apps(Id,X,_).
refs(Id) :- refs(X), apps(Id,_,X).
-- any lambda whose body is or contains fid
-- is considered to reference fid
return(Id) :- lambdas(Id,_,Id1), refs(Id1).
return(Id) :- lambdas(Id,_,fid).

Current IDEs, with their enormous development staff and reams of special purpose code, support a very limited subset of code transformation/querying operations, poorly, slowly, and with huge overhead. As a result, there is an entire subculture of programmers (myself included) who have largely rejected IDEs in favor of minimal, less feature-rich but more dependable text editors. Future development environments will take the refactorings now supported by the most advanced IDEs as the most insignificant starting point, and build from there.

Large scale refactorings are the inevitable consequence of achieving the reuse needed to build programs of significant complexity that don't die of heat death. Refactoring times in this new model will go from weeks or months to hours, and writing code to transform a codebase will become a separate but critical skill, distinct from the usual act of programming. That is, programmers do not simply conceive of a refactoring (which is often quite simple to express to another programmer), then begin a tedious, manual and error-prone process of text munging to implement it. Instead, the programmer conceives of a refactoring, then conceives of a code transforming program to implement the refactoring, then applies this transformation to the code database, all in the span of a few hours.

The code as database concept has spillover simplification effects in other areas of tooling, in particular version control. The problem of handling merges of ordered sequences of characters spread across files and directories with yet more arbitrary structure is extremely difficult, resulting in a huge amount of complexity in the area of version control software. The difficulties have led many to settle for what are arguably inferior VCS models (Git, Mercurial), where changesets form a total order and cherrypicking just doesn't work. In the future, with code databases, we'll see a resurrection of Darcs' patch theory, only this time, it will work exactly as expected, and will be trivial to implement.

Related to this notion of code databases, we get much more fine-grained dependency management. Dependency management is another serious problem for software development, a huge hindrance to code reuse, and once again, something that is simply below many programmer radars. A library will often introduce some new interface or abstraction and give several instances of that abstraction for particular concrete types. The library now depends on these concrete types being available for compilation. Including this library now means pulling in all these dependencies, even if you just require the one instance. The overhead of doing this in conjunction with the existing complexity of builds (again partially caused by the programs-as-text-in-files paradigm) means code is reused much less than would be possible or easy otherwise.

In the future, we'll be able to conditionally specify code, and not using an ad hoc, 1970s technology like the C preprocessor. The datalog query and update language will factor in here, and allow us to express things like: if a concrete type X is available in the codebase, define some set of functions and new datatypes. Likewise, dependencies will in general not be on "a library" or "a module". A function will depend only on the set of functions and types it references, and a codebase will be a much more fluid thing. We can extract any connected component of functions and datatypes from a codebase, and this is trivial, automated transformation supported by the query language. There are some unknowns here, all solvable, around versioning, and they are related to how we reconceptualize version control as a partial order of changesets, with no extraneous dependencies due to the representation of programs as ordered sequences of characters.

Code editing, IDEs, and type systems

Code editing will be done in structural editors, which will look nothing like the existing batch of IDEs that are little more than glorified text editors (and they are actually rather poor text editors). In a structural editor, the programmer will construct expressions which may have holes in them not yet filled with terms. Importantly, these structural editors will be type-directed, so for any given hole the programmer can be presented with set of values matching the expected type, ordered in some sensible way. The editor will perform local program search to enable autocompleting of multiple expressions. If you've ever seen someone live-code Agda, you'll know how powerful and productive this idea could be. Yeah, the actual interface for programming Agda is still kind of 1970s (a custom Emacs mode), but the idea of type-directed editors is powerful. It makes it clear that types are effective not just at preventing many software errors, but also in guiding development. They are a powerful tool to augment our puny programming brains.

Along these lines, the rise of type-directed development in languages with real, state of the art type systems will mark the beginning of the end for dynamically typed (aka, single-typed) languages. Dynamic typing will come to be perceived as a quaint, bizarre evolutionary dead-end in the history of programming. This is already widely accepted in some programming circles, where the general consensus is that most dynamic typing advocates are not familiar with type-directed development in a language with a real type system and are basically unaware of the state of the art in type systems and programming language theory. With some additional developments in type systems we'll see any last of any advantages to dynamic typing completely evaporate.

A related transition is that types will become even more critical and type systems will grow features to better handle data normalization, the lack of which is a major source of incidental structure and complexity. A big problem in large codebases is data normalization. Lack of data representation normalization is responsible for the significant amounts of plumbing code involved in aligning two pieces of code so they can talk to each other. Most of the mainstream programming world isn't really aware of this issue because code reuse in most imperative codebases is extremely limited (because side-effects don't compose well).

In the functional programming world, there is insane amounts of code reuse happening, but along with this comes a fair bit of plumbing. As a small example, consider a function, f, of type X -> Y -> Int, and a value, xs of type [(Y, X)] and suppose you want to apply f to the list. An experienced functional programmer will bust out map (uncurry f . swap) xs in about 3 seconds and not think twice about it. But this code is total boilerplate, there to convince the compiler this is what you really want to do, and is noise when reading the code. Yes, you are acheiving reuse (an imperative programmer would still be writing out a for loop for the millionth time instead of reusing higher-order functions) but it should be cleaner. If this code needs to exist (and I'm not sure it even does, see below), I'd rather get to this point in the editing process and then tell my editor to map f over xs. The editor will search for a program to make the types align, show me the program for confirmation if I request it, and then not show this subexpression in the main editor view, perhaps just showing map f* xs, where the * can be clicked and expanded to see the full program.

Even better, perhaps we can get away with much less plumbing in the first place. Plumbing can be eliminated from code in two general ways--the first, which I've just discussed, is to have plumbing code written for you automatically, then hidden unless requested. The second is to have more normalized types and type systems so that there is no incidental structure for code to broker between in the first place. To support this we'll see things like row types, unordered tuples, type sets, and so on, and any related type system machinery needed to make all this feasible and convenient. An idea I'm interested is explicit separation between the model underlying a type (which could be something very normalized, with no incidental structure), and views of that type, which may contain some additional structure. Functions will generally be written to operate over the model, from which any number of views can be reconstructed post-transformation. The compiler or runtime is generally smart about choosing runtime representations to avoid unnecessary round trip conversions between different representations.

Language runtimes

Non-strict languages will come to dominate the programming world, due to the increase in code reuse and modularity that comes with pervasive non-strictness. As I've argued before, optional laziness doesn't cut it. As with other issues I've mentioned, the problems with strict as default aren't apparent to most programmers, even those who view themselves as well-versed in functional programming. The problems with strictness only become fully apparent as you get much further along in the development of the FP style, in particular, after the discovery of combinator libraries and the further abstractions that develop to remove duplication across these libraries. This has been a source of tension in the Scalaz library, which supports functional programming in Scala, a strict-by-default language, and also in our Scala codebase at work.

The only real problem with laziness has to do with reasoning about space usage performance, and evaluation stack usage. These problems get a lot of play among people who enjoy rationalizing their ignorance of Haskell and FP in general, but the truth is some additional research and smarter evaluation strategies can address the problems. There's nothing fundamental here suggesting we should throw up our hands and resort to strict evaluation.

The usual normal order evaluation is guaranteed to terminate if any is, but reasoning about is space usage in this evaluation order is problematic. We can do much better. Besides static strictness analysis, which only covers a fraction of the cases we'd like, we can conceive of additional evaluation orders which terminate for the same set of programs as normal-order evaluation, and which can propagate additional strictness information. The one I am particularly interested in I refer to as specializing, strictness-propagating evaluation. I'll elaborate on this in another post, but in this evaluation model, calling a function is something like two communicating coroutines. When calling a function, the callee begins evaluating its body, yielding control back to the caller when it needs its first argument, and also indicating whether that argument should be strict or lazily passed, using whatever information is available at runtime. Subsequent arguments work similarly. As a result, functions are automatically specialized as arguments are passed, and we do not construct thunks if they are going to be consumed strictly by a subsequent callee. This can be implemented efficiently using just two call stacks, and there are various optimizations to the scheme. It is intended to augment, not replace, the existing static strictness analysis and argument passing.

In general, the goal of new evaluation strategies should not be efficiency (though that is a nice goal as well), but simple reasoning about space and evaluation stack usage, so that these things do not depend as they currently do on incidental details of how a function is factored or what the compiler chooses to inline (being forced to reason about these things in Haskell breaks the black box abstraction of functions that FP and higher-order programming depend on).

Language runtimes and VMs will grow to support these new evaluation strategies, and the current crop of "general-purpose" VMs that are actually quite specialized for strict, imperative languages (the JVM, the CLR) will likely die off.

Code distribution and the future of the web

What of the web, javascript, html 5 and beyond? Are we going to keep hobbling along with these technologies indefinitely? I don't think it's too controversial to say that writing applications of any significant complexity as a web application involves far more work than would be required with access to a truly powerful client-side language. Again, I don't think many web programmers realize just how bad the situation is. In comparison to mainstream languages like Java, C#, or Python, Javascript isn't so bad; in some ways it's even a better language. But compared to a programming language with a good type system and real support for FP like Haskell, Scala, and whatever the next generation of languages can bring, Javascript is quite sad.

Of course, proponents of existing web technologies are always finding ways to rationalize the status quo, pointing out how much better things are today than they used to be. That's maybe true, but why settle?

I envision a future where Javascript dies off, as do browser-specific plugins like Flash, and instead we'll see client code written in arbitrary, compiled languages, using something like NaCl. This will be combined with a signed code caching and dependency tracking mechanism, so that for instance, you can distribute an application that downloads the entire Java virtual machine and other dependencies, but only if they aren't already cached on your local machine (and the signatures match, of course). This changes how we think about software. Software won't be something you "download onto our computer and then run". Instead, software exists "out there", and you run it. As an implementation detail of running it, it may choose to download some additional code it needs to do its work.

This will end the monopoly that html+javascript has on client-side web applications and largely eliminate the switching costs of moving to new client-side technologies. A few lower-level protocols and standards will be enough to tie everything together and maintain the good parts of the web today (I'll say more about this next), but nothing so high-level as specifying the client side language for interaction (Javascript, or Dart, or whatever) or the client-side language for layout and display (html + CSS). In the future, client-side code is written in whatever language, compiled to native code if desired.

In place or in addition to native client support, another option would be some sort of in-browser low-level VM designed by people who know what they're doing, who are fluent in the state of the art in programming languages and runtimes. In other words, not the people who designed Dart, Go, Ceylon, or any of the other recent languages that are 30 years or more behind state of the art. We need something that actually supports languages of the future, not something that rearranges the deck chairs on the titanic sinking ship that form the current batch of mainstream languages.

What of the suggestion that we simply use Javascript as the "assembly language" of the web, use it as a compilation target, and avoid programming in it directly? This obviously is not ideal. Compiling a real programming language like Haskell or whatever is next through Javascript or Dart is obviously not how anyone would not how design things today given a clean slate. Even if this were possible to do well without bloating client code size beyond what's acceptable, there is the inevitable efficiency hit of compiling to a language whose performance is as bad as Javascript. When you think about it, it makes no real sense to be reverting to 1/10th or 1/100th the speed of what native code or a well-JIT'd VM could provide, purely for ease of deployment. We don't need to be making this tradeoff, and with the rise of something like NaCl and/or a real browser VM, we won't have to.

There's one wrinkle. There are tremendous network effects on the web. This is part of its power, and its usefulness, and we don't want to lose it. We do still need the moral equivalent of urls and hyperlinking but one thing we don't necessarily need is the DOM. What will take its place? Don't we need the DOM to enable mashups and services like search engines? Actually, no. DOM munging and traversal is not the only way programs could obtain information from applications on the web, and when you think about it, it's actually rather primitive. Why screen scrape when you can call a real API? This transformation is already sort of happening; most modern web applications expose APIs in the form of REST+JSON. It just needs to be taken a little further.

What would this look like? Well, we would need a standard type system for the web. By this I mean a standard way for the applications that live at a url to expose a module of types and functions they support. The underlying type system would be expressive enough to encode richer data than what JSON currently provides (which besides being dynamically typed, cannot even represent sum types effectively) and will support for algebraic data types and some of the type system features alluded to earlier. With this in place, standards will arise for certain function signatures, with standard meanings attached to them. So, for instance, rather than the googlebot crawling the DOM looking for links, it invokes the getAllLinks function of the application at the given url, which returns an actual list of url values. getAllLinks is some separate standard, and new ones can arise on an ad hoc basis, as we grow new modes of interaction with web sites. There will be certain near-universal standards (like getAllLinks) and more specialized ones, specific to the web application in question (for instance, Facebook exports certain functions and datatypes that are specific to Facebook, which are unlikely to be implemented by other sites, though this is certainly not a requirement).

Already, we can see this happening somewhat: there are various ad hoc APIs and mechanisms for basically controlling how web crawlers should interpret the DOM.

With a standard way of interacting with web sites programmatically, there's no longer any need for the DOM and we can see a proliferation of innovation of different display and layout technologies.

Closing thoughts: the rise of functional programming

I've hinted at this throughout: functional programming will absolutely win. Not everyone shares my views, but many of the improvements I've talked about start with the assumption that we are working in a functional, post-imperative world. FP provides a dramatic increase in productivity due to massive increases in code reuse and the ease of reasoning about functional code (not to mention ease of parallelization). The industry is slowly starting to understand this, but it doesn't really matter if many programmers are still for whatever reason resistant to learning FP (which is unfortunately true). Eventually, the selection pressures of a competitive market will weed out less productive and effective techniques, and this largely means the death of imperative programming as we know it, except in certain niche areas. Regardless of initial biases or attitudes, programmers and companies who wish to stay competitive will be forced to employ FP.

As for why FP hasn't gained more prominence already, well, perhaps I'll write more about that in another post. What I will say is FP is currently reaching a tipping point enabling its wider adoption and relevance. There are several factors in play: functional language compiler technology is advanced enough, computers are fast enough, and most importantly, the FP community is rapidly discovering all the necessary techniques to organize large programs that preserve referential transparency. The Haskell community has mostly led this charge, Haskell being the only widely-used language that, due to its non-strict evaluation, had no choice but to fully commit to preserving referential transparency throughout. Ten years ago, admittedly, there was a good chance that expressing some programs purely functionally involved what was basically new research. Today, many of the required techniques are known, and expressing even the most imperative-seeming program functionally is possible, and for an experienced functional programmer, pretty effortless. There are still interesting open questions of how to express certain programs, but these are diminishing rapidly.

That said, I understand why many people claim FP is too hard, or it's unnatural or awkward, etc. Like many worthwhile subjects, attaining fluency in FP is difficult and requires dedication. Once this level of fluency is reached, though, expressing functional programs is quite natural and effortless (of course, software design is still hard, but finding functional designs becomes easy with practice). For people looking in, the techniques of FP seem opaque and unnecessarily difficult. For people on the inside, there's nothing difficult or complex about it and the benefits are enormous. For those who would criticise FP, I think a little humility is in order. To draw an analogy, no one without mathematical background would feel equipped to dismiss or criticise an entire branch of mathematics ("real analysis is a stupid idea"), and yet programmers with barely a cursory understanding of FP regularly (and loudly) criticise it.

I see why some people are frustrated with some of the existing resources for learning the subject. But it's wrong to dismiss the subject on that basis, or on the basis of personalities of people (myself included!) who feel that FP is more productive; objectively, either FP is worth knowing because of the productivity and other benefits it provides, or it isn't. For my part, I am co-authoring a book on FP that I hope is helpful to some who are interested in learning the subject. With the solidification of FP techniques I expect to see more and more resources like this, to the point that fluency and the huge resulting benefits are something that any motivated programmer can work toward, without encountering some of the discouraging hurdles to learning FP that are present today.

Will the future of programming look anything like what I've laid out here? Maybe, maybe not. But I certainly hope it will.

Tuesday, December 27, 2011

Programmatic translation to iteratees from pull-based code

When working with iteratees, one notices a sort of dual between producer and consumer--for any given stream processing task (which pairs a producer of values with some consumer of these values), operations can often be expressed on either the producer side or the consumer side. For instance, if we want to square each value in a stream of integers, we can map over the stream (the producer), or we can map (contravariantly) over the values as they enter the stream's consumer (the iteratee). Here is an intriguing idea: what if we could programmatically translate stream processing tasks from one mode to the other? That is, write pull-based code, but translate to push-based for purposes of execution. It turns out this is actually possible and I'll show code in this post for doing the conversion.

Why is this desirable? Push-based APIs like iteratees are generally preferable for execution--data is produced and sent to its consumers, then any resources associated with that data can be freed. This model is simple and easy to reason about in terms of memory and resource usage. Pull-based execution is more problematic: resource lifetimes (and therefore space usage) are not deterministic since at any point in time, a function may choose to request "historical" data from the producer. The consequences of this have plagued some FRP implementations (though there are ad hoc workarounds that involve limiting access to history in various ways or only exposing certain "safe" combinators rather than the full pull-based API that forms the underlying model) and led many to reject lazy IO in favor of iteratees.

While iteratees address these problems for streaming IO, programming with them directly requires inverting your thinking, which is arguably somewhat unnatural. Likewise for other push-oriented APIs like arrows, which have used as the basis for FRP implementations that aren't as susceptible to space leaks. Honestly, I'm not sure what I think about the claims that push-based APIs are less "natural". What's most important is that whatever API you use composes, avoids code duplication, and gives you some reasoning tools. With those ingredients, I suspect you can get comfortable with any API. (I remember when I first started writing iteratee code, which we have a lot of at work, my brain nearly exploded when writing even the simplest of iteratees and HOFs for composing them... but after some practice it came much more naturally.)

Moving on, the key to converting from pull-based code to push-based code is to represent streams of data as a ListT, universally quantified over the underlying monad. I'll explain this in a minute. First, a ListT is just a list that yields a monadic effect with each uncons. Here is a ListT implementation:

-- Skip nodes let us do filtering without lookahead
-- uncons is then a simple loop that removes any Skip nodes
data Step a s = Empty | Yield a s | Skip s

newtype ListT f a = 
  ListT { runListT :: f (Step a (ListT f a)) }

instance Monad f => Monad (ListT f) where
  return a = ListT (return (Yield a empty))

  (ListT s) >>= f = ListT $
    s >>= \step -> case step of  
      Yield h t -> return $ 
        (Skip $ f h `mplus` (t >>= f))
      Skip t -> return $ Skip (t >>= f)
      Empty -> return Empty

instance (Monad f) => MonadPlus (ListT f) where
  mzero = ListT (return Empty)

  (ListT s) `mplus` b = ListT $ 
    s >>= \step -> case step of  
      Yield h t -> return $ Yield h (t `mplus` b)
      Skip t -> return $ Skip (t `mplus` b)
      Empty -> return $ Skip b

instance MonadTrans ListT where
  lift fa = ListT $ liftM (\a -> Yield a empty) fa

empty :: Monad f => ListT f a
empty = ListT (return Empty)

cons :: Monad f => a -> ListT f a -> ListT f a
cons h t = ListT . return $ Yield h t

Now, consider a function of type forall f . Monad f => ListT f a -> ListT f a. Parametricity guarantees the function cannot assume anything about f other than that it's a monad. Substitute the identity monad for f and we can see how any function [a] -> [b] could be written as a forall f . Monad f => ListT f a -> ListT f b. In other words, this API is just as expressive as the usual lazy IO style pull-based API.

The trick is that the caller of such a function can substitute another monad in for f, namely the reader monad. This lets us push values into the ListT.

data Input a = Done | Await | Element a 

prompts :: ListT (Reader (Input a)) a 
prompts = ListT . reader $ \input -> case input of
  Done -> Empty
  Await -> Skip prompts
  Element a -> Yield a prompts

prompts is a potentially infinite stream of a, but with each step, we obtain a function from Input a -> ListT (Reader (Input a). On the one hand, we can feed this to a forall f . Monad f => ListT f a -> ListT f b and from its perspective it is pulling values from the stream. On the other hand, at each step we get to push a value into the stream (or signal termination, or whatever "instruction set" we wish to support).

We need one more ingredient, a slightly-reformulated version of iteratees. A well-behaved iteratee will always yield a result (rather than a Cont) when fed EOF. We can make this more explicit in the type:

type IsEOF = Bool

data Moore a b = 
    Feed b (Input a -> Moore a b) 
  | Stop b IsEOF

I've renamed this Moore since it is essentially just a Moore machine supporting early termination (the Stop case just encodes the fact that the state transition function always leads back to the same state from that point). With iteratees, it is really just a convention that intermediate b values are not inspected (and hence not computed in a lazy language) until either EOF is hit or the iteratee signals termination early.

We can now use this to perform the inversion of control programmatically. Note the signature for invert!

isDone :: Input a -> Bool
isDone Done = True 
isDone _ = False

invert :: (forall f . Monad f => ListT f a -> ListT f b) 
       -> Moore a (Maybe b)
invert f = go Nothing (f prompts) where
  go cur res = Feed cur $ \a -> 
    case runReader (runListT res) a of
      Empty -> Stop cur (isDone a)
      Yield h t -> go (Just h) t
      Skip s -> go cur s

We can write very similar code for monadic iteratees:

data MooreT f a b = 
    FeedT b (Input a -> f (MooreT f a b)) 
  | StopT b IsEOF

invertT :: Monad f 
        => (forall t . (MonadTrans t, Monad m)
          => ListT (t f) a -> ListT (t f) b) 
        -> MooreT f a (Maybe b) 
invertT = ...

Here is the full code. I haven't played with it much, but isn't it fascinating that this translation is possible? (I suspect there is some categorical connection here though I'm not quite sure what it is yet) It means we can write code using a pull-based API, where functions have access to the "full history" of the stream they are transforming, but translate to a push-based API for purposes of execution. The translation will discover and retain the exact portion of the history required to express the transformation, and it will retain this history for only as long as needed. I find it interesting to think about how different operations on ListT will get mapped to the equivalent Moore machine. For instance, a function that conses onto a ListT results in a Moore that "delays" the input stream by one step. And so on...

This trick of writing code which is parametric in the choice of monad is not really specific to stream processing and I suspect it has other uses. (Ed Kmett uses a similar technique in his recent searching infinity post and it's probably been used elsewhere) For any given function parametric in the choice of monad, it's fun to consider what sort of interesting structures can be built by substituting different monads.