Prompt Input UX: What Makes a Great Text Box for LLM Products

Every LLM product, no matter how clever the model behind it, ends at the same humble surface: a text box. It is the single most-used element in the entire experience, the thing a user touches every turn, and yet it is routinely the least-designed. Teams pour months into model selection, streaming infrastructure, and tool calling, then drop a bare <textarea> at the bottom of the screen and call it done. The input is not a detail. For a prompt-driven product, the input is the product's grammar: it teaches people how to talk to the system.

So what actually makes a great prompt input? After staring at enough of them, the good ones share a small set of deliberate decisions.

Get the keyboard contract right first

The fastest way to annoy a power user is to get Enter wrong. In a chat or prompt context, the near-universal expectation is:

  • Enter submits the message.
  • Shift+Enter inserts a newline for multi-line prompts.
  • Modifier shortcuts (Cmd/Ctrl+B, Cmd/Ctrl+I, undo, redo) behave the way they do everywhere else.

This sounds trivial, and it is the kind of trivial that breaks constantly. The hard part is that submit-on-Enter must coexist with IME composition. When someone is typing Japanese, Chinese, or Korean, the Enter key confirms a character candidate, so it must not fire your submit handler mid-composition. A prompt input that sends a half-finished sentence the moment a CJK user presses Enter is not a minor bug; it is a product that quietly excludes a huge slice of the planet. Proper composition handling (listening for compositionstart and compositionend, and suppressing submit while composing) is non-negotiable, even though almost no demo screenshot will ever reveal whether you did it.

Make state visible: the streaming send/stop button

Here is the affordance that separates a real LLM input from a copied chat template. While a response is streaming, the send button should become a stop button. Users need an obvious way to interrupt a long or wrong answer, and the same corner of the screen they just clicked to send is exactly where their eyes already are.

This is also where the input has to stay honest about application state. The button is not just "send"; it reflects whether the app is idle, has submitted, is streaming, or has errored. The cleanest way I have seen this expressed is by wiring the action bar directly to a status value from your chat layer rather than juggling three booleans by hand.

The Vercel AI SDK makes this almost mechanical. Its useChat hook exposes a status field (ready, submitted, streaming, error), and that single value can drive the whole action bar. When status is streaming, you render a stop control wired to the SDK's stop(); otherwise you render send. This is exactly how Prompt Area, a React input built specifically for prompt-style composers, integrates with the AI SDK: the input owns the text, useChat owns the stream, and they meet at a single status-aware action bar.

The seam between rich input and a plain string

A good prompt input is rarely just plain text. Users drop in @mentions that resolve to real entities, type /commands, attach files. But the model, ultimately, wants a string (and maybe some structured context). So the design question becomes: where does the rich, chip-laden input turn back into something a model can read?

The answer is to keep that conversion at one explicit seam, not scattered across your component tree. In Prompt Area the controlled value is an array of segments (text segments and immutable chip segments), and a single helper, segmentsToPlainText(), flattens it for the model. That seam is exactly where the AI SDK takes over:

// PromptArea owns the input; useChat owns the stream. // They meet on one line: sendMessage({ text: segmentsToPlainText(segments) });

Because there is one well-named function doing the flattening, you can reason about what the model actually receives. And when you do want structure, say the user picked a model via a chip or invoked a command, you read those chips with a helper and pass them in the request body as typed fields rather than parsing them back out of a string on the server. Structured in, structured out; no regex archaeology.

Discoverability beats documentation

Most users will never read your docs. They will, however, type a character and see what happens. This is why slash commands and trigger menus are an onboarding mechanism, not just a power feature. When typing / opens a menu of available actions, you have taught the user your capabilities without a single tooltip. The same goes for @ opening a list of mentionable entities. The input becomes self-documenting.

Two design notes that are easy to miss:

  • Resolved triggers should become immutable chips. Once @design-team resolves to a real entity, it should render as a single non-editable pill, not loose text a user can half-delete into an invalid state. This keeps the value clean and makes copy/paste behave.
  • Rotating placeholders are a gentle, underrated discoverability tool. Cycling the placeholder through a few example prompts ("Summarize this thread…", "Draft a reply…", "Find the bug in…") shows people what the system is good at before they have typed anything. Prompt Area supports passing an array of placeholder strings precisely for this.

Attachments and the messy real world

The moment your product touches images or files, the input has to grow up. Users paste screenshots straight from the clipboard, drag in PDFs, and expect to see what they attached before sending. That means thumbnails, loading states while an upload is in flight, and a clear remove button on each item. An input that accepts a paste but shows no feedback feels broken even when it is working.

This is one of the dividing lines between a prompt input and a generic text field. A bare textarea has no opinion about a pasted screenshot. A purpose-built composer treats attachment as a first-class part of the message, with its own affordances.

Accessibility is not the polish pass

It is tempting to treat ARIA labels and keyboard navigation as something you add at the end. For an input, the most-interacted element on the screen, that is backwards. Screen reader users need the input announced correctly, the send and stop actions labeled, and the trigger menus navigable by keyboard alone. Auto-grow behavior (the box expanding on focus and shrinking on blur) needs to not trap focus or confuse assistive tech. Build it in from the first commit; retrofitting accessibility into a contentEditable surface later is genuinely painful.

Why a purpose-built input beats a document editor

The most common mistake I see is reaching for a full document editor (Tiptap, Lexical, Slate, ProseMirror) to power a chat box. These are powerful frameworks built for writing documents: pages, headings, tables, collaborative cursors. Bending them into a single-line-ish prompt composer means inheriting two-to-five dependencies and a mental model designed for something else entirely. You spend your time disabling features rather than enabling the few you want.

Most rich text editors are document editors shoehorned into chat inputs. A prompt input should be purpose-built for prompts: affordances first, document machinery never.

A purpose-built input inverts that. It starts from the affordances that matter for prompting (Enter-to-send, stop-while-streaming, mentions, slash commands, attachments, IME safety) and carries no document baggage. That is the niche Prompt Area targets, with zero extra editor dependencies and a deliberately small surface of one component and one hook. The point is not that document editors are bad; it is that an LLM product's input has a specific, well-understood job, and it deserves a tool shaped for that job.

A short checklist before you ship

  1. Enter submits, Shift+Enter newlines, and neither fires during IME composition.
  2. The send button becomes a stop button while a response streams, ideally driven by a single status value from the Vercel AI SDK.
  3. There is one explicit seam, like segmentsToPlainText(), where rich input becomes the string the model sees.
  4. Typing / or @ teaches users what is possible; resolved triggers become immutable chips.
  5. Attachments show thumbnails, loading states, and remove buttons.
  6. ARIA labels and keyboard navigation exist from the first commit, not the last.

Get these right and the box at the bottom of the screen stops being an afterthought. It becomes the most fluent part of the product, the place where users learn, without being told, exactly how to talk to your model.