Exploring AI interaction design and multiplayer with tldraw

Matt Webb
Posted on:4 Oct 2023

One question is: how might we interact with AIs?

Something I like to do is to sketch in code to explore different angles in interaction design. You can reach interesting conclusions even with scrappy code.

What youā€™ll find in this post:

Thereā€™s also code! We made a starter kit to get tldraw working with PartyKit, for your own experiments. Youā€™ll find that at the bottom of this post.

(I delivered a version of this post as my talk at NEXT23 in Hamburg a few weeks ago.)

The pros and cons of todayā€™s human-AI interaction modes

Letā€™s take a look at a few. Iā€™m sure I donā€™t need to remind anyone what ChatGPT looks like, but letā€™s start there.

A blank ChatGPT chat with a few providing starting points

Affordances. I use ChatGPT daily, itā€™s amazing. It takes some learning though. From an interaction design point of view, ChatGPT is missing visual affordances: like a door handle informs you that youā€™re looking at a door and not a wall, and also provides the possibility of opening it, ā€˜affordanceā€™ is the technical term for being able to see what you can do.

Sure, ChatGPT has those hints about where to begin. But letā€™s say Iā€™m in the middle of writing a blog post and Iā€™m stuck for phrasing: it has no affordance that points the way at exactly how it can help.

Notion AI is really smart: a menu command like ā€™Change toneā€™ is a great affordance. Youā€™re likely to notice the feature before you need it, and change your workflow to take advantage of it. The video above is taken from their marketing site.

(Replit Ghostwriter has a great spin on this. You choose a command like ā€™Generateā€™ and then add a free-text prompt to nuance the AI.)

Proactive. The affordance gain is great. But context menus arenā€™t an interface which makes it easy for the AI to jump in. If the AI knows it can be super helpful, wouldnā€™t it be great if it put its hand up to offer?

Hereā€™s another human-AI interaction mode.

One of the best patterns we have is ghosted text for suggested autocomplete. This is how GitHub Copilot has worked since it launched in 2021 ā€” another AI that I collaborate with daily.

But the video above is from further back. In Writing With the Machine (2016!), author Robin Sloan trained a recurrent neural network (RNN) on a corpus of mid-century science fiction short stories. Then he hooked the output up to tab-autocomplete.

Sloan found quickly that this wasnā€™t a text editor that ā€œwrites for you.ā€

The animating ideas here are augmentation; partnership; call and response.

Which is exactly how the rest of us - years later - use Copilot, right?

Multifunctional. Where Iā€™d like to extend both Sloanā€™s work and Copilot is to somehow have many different functions instead of the same style of autocomplete all of the time. Think of working on a Google Doc with your human team. Your ideal team changes over timeā€¦ perhaps youā€™d pair with a creative sparring partner at the beginning, ask some critical friends to drop by at the midpoint, and a work with a detail-oriented copyeditor at the end. A fact-checker might come and go. But what would be stifling would be a copyeditor dropping comments right at the outset, when youā€™re still feeling out your topic.

So what would it mean to have AI interactions that are as sophisticated as that human team?

AI as teammate

To summarise, this is what Iā€˜m looking for. An interface that

Now, itā€™s early days. People are iterating AI interaction design so fast. And those are only a few examples above. The number of future approaches will dwarf that list! But thereā€™s a specific approach that I feel cuts through a lot of the issues: multiplayer. Thatā€™s the direction I want to explore.

Once you say that you can have many different AIs, simultaneously, each with its own personality, and you interact with them (almost) exactly as you would interact with your colleagues, a whole set of problems just disappear.

Let me go through my software sketches and show you what I mean.

Demo: Interacting with non-player characters on an infinite whiteboard

Itā€˜s demo time! (By the way, I use the term NPC to mean non-player character. This is because the NPCs are not entirely AI-driven, and some donā€™t use AI at all.)

1. Combining tldraw and PartyKit, and adding fake users with their own cursors

Whatā€™s weā€™re looking at here is an integrated version of tldraw, the multiplayer infinite whiteboard web app. tldraw supplies an open source library which Iā€™ve taken and added a sidebar with chat and a ā€œfacepileā€ (line of user avatars).

Active uers have to hang out somewhere: Iā€™ve chosen to supply a cursor park, a place in a shared document where users can leave their cursors while theyā€™re reading. Itā€™s a gag but genuinely, itā€™s useful to have a place to go! Like having pockets to stuff your hands in when youā€™re in an idle state.

NPCs are summoned to the filepile across the top of the screen, and simultaneously their cursors appear on the multiplayer whiteboard.

2. Proof of concept: the poet NPC in action

Letā€™s see how it worksā€¦

You can also see that the NPC can speak into the chat. An earlier version didnā€™t have chat and it felt too constrained.

3. Proactive NPCs! The painter helps out when it can

Letā€™s get more sophisticated. The painter likes to paint stars.

If you draw a rectangle on the whiteboardā€¦ nothing happens.

If you draw a star on the whiteboard, then the painter NPC moves its cursor nearer to the shape in question, and puts its hand up (by speaking into the chat) to say that it can help. You can accept its help by choosing the action from the command menu, at which point the NPC colours the star and returns back to the cursor park.

This is just a toy example. But I think thereā€™s something there akin to proxemics, the meaning the people ascribe to the distance between us. For example we might example an editor NPC that hovers nearby a paragraph when it has something to suggest regarding style. The more confident it is it can help, the closer it would come.

4. Functions: Ask the maker to draw on the canvas for you

In this example Iā€™m using OpenAI function calling and Iā€™ve provided the AI with a straightforward command to add shapes to the canvas.

If we ask it to draw ā€œa 3x3 grid of squares, narrow gutterā€ then it does just that. (This recorded version of the prototype doesnā€™t have an idle animation built in; my current version does.)

And if we ask it to draw a houseā€¦ then it canā€™t. Maybe AI wonā€™t take our jobs quite yet.

I want to give a shout out to Fermat for pushing hard into the space of AI operating on an infinite canvas. In particular I was inspired by Max Drakeā€™s prototypes on X/Twitter (thread), showing how the agent can even be given knowledge of the canvas state, and the human and the AI can work together.

Some design conclusions

Iā€™m taken with a few of the interactions that emerged while I was working on this sketch:

tl;dr: AI-as-teammate is an approach that answers some (though not all) of todayā€™s issues in AI interactions. Future solutions will likely combine several approaches; this one has some unique strengths.

While itā€™s fun to experiment using an infinite canvas, I feel like some of these patterns could also be applicable in other apps like Figma or even text editors like Google Docs ā€” or perhaps even at the OS level.

An outstanding question is how to avoid anthropomorphising NPCs. These NPCs arenā€™t full AIs (just using large language models for specific features), but even if they did have GPT-4-level smarts, they still wouldnā€™t be human, so how do we encourage users to think about non-human intelligence? In a previous version of this software sketch, I depicted the NPCs as dolphinsā€¦ a playful metaphor but perhaps too distracting.

Technical architecture and future-facing hunches

As with my other sketches, the code is open for you to browse and build on. Iā€™ll summarise the architecture first.

Code architecture diagram: the client hosts the tldraw client, which uses Yjs to talk to a PartyKit-hosted backend. The NPCs also run on PartyKit.

tldraw can use Yjs as its multiplayer sync server: hereā€™s the tldraw Yjs example code, made open source by the tldraw team.

Happily PartyKit can act as a Yjs backend. See y-partykit in the docs.

Once those are brought together, itā€™s possible to write NPCs as separate PartyKit servers that connect directly to tldraw and manipulate its Yjs document directly. For presence (i.e. to show a user is online) we make use of the Yjs awareness protocol, just as tldraw does.

Having built this, I have two technical conclusions hunches:

You can download the code

Let me know if you do any digging in this direction yourself. Iā€™d love to see.