One question is: how might we interact with AIs?
Something I like to do is to sketch in code to explore different angles in interaction design. You can reach interesting conclusions even with scrappy code.
What youāll find in this post:
- A review of a few of todayās human-AI interaction modes ā and why I think that multiplayer is an approach worth taking too.
- Some software sketches built on tldraw, a multiplayer infinite whiteboard web app, with NPC users that can follow commands and proactively offer help.
- Some conclusions in both the design and tech domains.
Thereās also code! We made a starter kit to get tldraw working with PartyKit, for your own experiments. Youāll find that at the bottom of this post.
(I delivered a version of this post as my talk at NEXT23 in Hamburg a few weeks ago.)
The pros and cons of todayās human-AI interaction modes
Letās take a look at a few. Iām sure I donāt need to remind anyone what ChatGPT looks like, but letās start there.
Affordances. I use ChatGPT daily, itās amazing. It takes some learning though. From an interaction design point of view, ChatGPT is missing visual affordances: like a door handle informs you that youāre looking at a door and not a wall, and also provides the possibility of opening it, āaffordanceā is the technical term for being able to see what you can do.
Sure, ChatGPT has those hints about where to begin. But letās say Iām in the middle of writing a blog post and Iām stuck for phrasing: it has no affordance that points the way at exactly how it can help.
Notion AI is really smart: a menu command like āChange toneā is a great affordance. Youāre likely to notice the feature before you need it, and change your workflow to take advantage of it. The video above is taken from their marketing site.
(Replit Ghostwriter has a great spin on this. You choose a command like āGenerateā and then add a free-text prompt to nuance the AI.)
Proactive. The affordance gain is great. But context menus arenāt an interface which makes it easy for the AI to jump in. If the AI knows it can be super helpful, wouldnāt it be great if it put its hand up to offer?
Hereās another human-AI interaction mode.
One of the best patterns we have is ghosted text for suggested autocomplete. This is how GitHub Copilot has worked since it launched in 2021 ā another AI that I collaborate with daily.
But the video above is from further back. In Writing With the Machine (2016!), author Robin Sloan trained a recurrent neural network (RNN) on a corpus of mid-century science fiction short stories. Then he hooked the output up to tab-autocomplete.
Sloan found quickly that this wasnāt a text editor that āwrites for you.ā
The animating ideas here are augmentation; partnership; call and response.
Which is exactly how the rest of us - years later - use Copilot, right?
Multifunctional. Where Iād like to extend both Sloanās work and Copilot is to somehow have many different functions instead of the same style of autocomplete all of the time. Think of working on a Google Doc with your human team. Your ideal team changes over timeā¦ perhaps youād pair with a creative sparring partner at the beginning, ask some critical friends to drop by at the midpoint, and a work with a detail-oriented copyeditor at the end. A fact-checker might come and go. But what would be stifling would be a copyeditor dropping comments right at the outset, when youāre still feeling out your topic.
So what would it mean to have AI interactions that are as sophisticated as that human team?
AI as teammate
To summarise, this is what Iām looking for. An interface that
- has affordances
- means the AI can be proactive
- allows for a multifunctional AI with different functions at different times.
Now, itās early days. People are iterating AI interaction design so fast. And those are only a few examples above. The number of future approaches will dwarf that list! But thereās a specific approach that I feel cuts through a lot of the issues: multiplayer. Thatās the direction I want to explore.
Once you say that you can have many different AIs, simultaneously, each with its own personality, and you interact with them (almost) exactly as you would interact with your colleagues, a whole set of problems just disappear.
Let me go through my software sketches and show you what I mean.
Demo: Interacting with non-player characters on an infinite whiteboard
Itās demo time! (By the way, I use the term NPC to mean non-player character. This is because the NPCs are not entirely AI-driven, and some donāt use AI at all.)
1. Combining tldraw and PartyKit, and adding fake users with their own cursors
Whatās weāre looking at here is an integrated version of tldraw, the multiplayer infinite whiteboard web app. tldraw supplies an open source library which Iāve taken and added a sidebar with chat and a āfacepileā (line of user avatars).
Active uers have to hang out somewhere: Iāve chosen to supply a cursor park, a place in a shared document where users can leave their cursors while theyāre reading. Itās a gag but genuinely, itās useful to have a place to go! Like having pockets to stuff your hands in when youāre in an idle state.
NPCs are summoned to the filepile across the top of the screen, and simultaneously their cursors appear on the multiplayer whiteboard.
2. Proof of concept: the poet NPC in action
Letās see how it worksā¦
- We can tell the poet to circle. This is just to prove that we can programmatically control the cursor NPC.
- We can ask the poet to compose a poem. Behind the scenes this is using OpenAI. Itās neat to see the locus of attention of the NPC move with the cursor ā you have a better idea what the NPC is āthinkingā about.
You can also see that the NPC can speak into the chat. An earlier version didnāt have chat and it felt too constrained.
3. Proactive NPCs! The painter helps out when it can
Letās get more sophisticated. The painter likes to paint stars.
If you draw a rectangle on the whiteboardā¦ nothing happens.
If you draw a star on the whiteboard, then the painter NPC moves its cursor nearer to the shape in question, and puts its hand up (by speaking into the chat) to say that it can help. You can accept its help by choosing the action from the command menu, at which point the NPC colours the star and returns back to the cursor park.
This is just a toy example. But I think thereās something there akin to proxemics, the meaning the people ascribe to the distance between us. For example we might example an editor NPC that hovers nearby a paragraph when it has something to suggest regarding style. The more confident it is it can help, the closer it would come.
4. Functions: Ask the maker to draw on the canvas for you
In this example Iām using OpenAI function calling and Iāve provided the AI with a straightforward command to add shapes to the canvas.
If we ask it to draw āa 3x3 grid of squares, narrow gutterā then it does just that. (This recorded version of the prototype doesnāt have an idle animation built in; my current version does.)
And if we ask it to draw a houseā¦ then it canāt. Maybe AI wonāt take our jobs quite yet.
I want to give a shout out to Fermat for pushing hard into the space of AI operating on an infinite canvas. In particular I was inspired by Max Drakeās prototypes on X/Twitter (thread), showing how the agent can even be given knowledge of the canvas state, and the human and the AI can work together.
Some design conclusions
Iām taken with a few of the interactions that emerged while I was working on this sketch:
- Cursor proxemics to show attention. Itās a powerful pattern, to use the distance of a cursor to show where an NPC is paying attention, and being near/far to allow it to chip in with varying levels of confidence. Cursors arenāt an all-purpose solution (neither smartphones nor VR headsets use cursors) but thereās something here to explore.
- Multiple helpful AIs, not just one. As a user, I can have a ātheory of mindā about a focused AI that I canāt with a general-purpose copilot. Thereās my affordance right there. I can see a world in which we bring in different AIs with different goals at different project stages.
tl;dr: AI-as-teammate is an approach that answers some (though not all) of todayās issues in AI interactions. Future solutions will likely combine several approaches; this one has some unique strengths.
While itās fun to experiment using an infinite canvas, I feel like some of these patterns could also be applicable in other apps like Figma or even text editors like Google Docs ā or perhaps even at the OS level.
An outstanding question is how to avoid anthropomorphising NPCs. These NPCs arenāt full AIs (just using large language models for specific features), but even if they did have GPT-4-level smarts, they still wouldnāt be human, so how do we encourage users to think about non-human intelligence? In a previous version of this software sketch, I depicted the NPCs as dolphinsā¦ a playful metaphor but perhaps too distracting.
Technical architecture and future-facing hunches
As with my other sketches, the code is open for you to browse and build on. Iāll summarise the architecture first.
tldraw can use Yjs as its multiplayer sync server: hereās the tldraw Yjs example code, made open source by the tldraw team.
Happily PartyKit can act as a Yjs backend. See y-partykit in the docs.
Once those are brought together, itās possible to write NPCs as separate PartyKit servers that connect directly to tldraw and manipulate its Yjs document directly. For presence (i.e. to show a user is online) we make use of the Yjs awareness protocol, just as tldraw does.
Having built this, I have two technical conclusions hunches:
- In the future, weāll need NPC-specific APIs. NPCs, whether using AI or just mechanical fake users, need access to real-time application events, and they need high-level ways to view and manipulate the current application state. These APIs will have to run on the server, so that the NPCs are independent of any specific client. Iām not convinced that REST is a good fit for this, and my hunch is that NPC APIs will look quite different from the APIs we have today.
- If weāre going to have human-AI collaboration, then apps should be natively multiplayer. Itās way easier to solve both design and technical challenges when applications already have the concept of multiple users working simultaneously on the same document.
You can download the code
- Scrappy exploration code: In the repo sketch-tldraw-npcs on GitHub youāll find all the NPC experiments above. For example, hereās the maker NPC with OpenAI function calling. Feel free to browse. (Iām not going to link to the playable demo itself hereā¦ I cut corners making this sketch, and while itās fine to use to record videos, itās not robust enough to be used!)
- Starter kit for a multiplayer whiteboard: In sketch-tldraw you can find a minimal implementation for tldraw with a PartyKit backend, based on the tldraw teamās own work with Yjs (thank you!). Use this as a starting point for your own NPC investigations!
Let me know if you do any digging in this direction yourself. Iād love to see.