Promptly

Prompting generative AI effectively is challenging for users, particularly in expressing context for comprehension tasks like explaining spreadsheet formulas, Python code, and text passages. Through a formative survey (n=38), we uncovered a trade-off between standardized but predictable prompting support, and context-adaptive but unpredictable support. We explore this trade-off by implementing two prompt middleware approaches: Dynamic Prompt Refinement Controls (PRCs), which generates UI elements for prompt refinement based on the user's specific prompt, and Static PRCs, which offers generic controls. Our controlled user study (n=16) showed that the Dynamic PRC approach afforded more control, lowered barriers to providing context, and encouraged task exploration and reflection, but reasoning about the effects of generated controls on the final output remains challenging. Our findings suggest that dynamic prompt middleware can improve the user experience of generative AI workflows.

Promptly was published at CHIWORK 2025 and won 2nd/1,064 projects in the Every Day AI Executive Challenge as part of the 2024 MS Hackathon.

You can find the full paper here. Below you will find a summary of the full research paper.

Introduction

Prompting seems easy right? You just tell the chatbot to "Plan a workshop" and it knows exactly what you want and gives you a good response... Well, not quite, because what you really wanted was a 3-day, in-person, machine learning workshop, etc. And then you wanted to go even further by telling the AI how you want it to respond. For example, the format of the response. Each one of these represents context, or prompt options, that the user needs to provide to the AI to control it and get the response they need most. However, expressing this context via natural language is challenging and tedious for users to do. A better way... Promptly:
We propose instead to generate prompt middleware customized to the user and task. Our approach instantiates a language model agent (the "Option Module") whose responsibility is to analyze the user's prompt and generate a set of options for refining that prompt. These options are rendered using graphical elements such as radio buttons, checkboxes, and free text boxes, which the user can select and edit to cause deterministic additions, removals, and modifications of their original prompt, which are passed as context to a conventional language model agent (the "Chat Module"). This not only produces a better result for the user by producing higher-detailed initial prompt, but also allows users to quickly select from alternatives to help steer the AI towards producing the best personalized response for their task.

Schematic overview of Dynamic Prompt Refinement Control interface for generating inline prompt refinement options to increase user control of AI-generated explanations (derived from real system use). (1) User prompts the system. (2) The Option Module takes the user's prompt as context to generate prompt options which provide prompt refinements for the user to select. (3) The Chat Module uses the user's prompt and pre-selected options to generate an initial response. (4) The user can initiate refinements by selecting their preferred options in the UI. (5) On each change, the Chat Module regenerates the response based on the new selections.

Contributions of this project:
(1) A formative survey (n = 38) including an interactive design probe to understand user needs around customized explanations of GenAI output which found users desire greater direct control over AI responses than afforded by traditional chat interfaces, and that controls should adapt to the task and user.

(2) We develop two variations of an interface with graphical controls for steering AI responses, one with static controls, and one with dynamic controls that are generated based on the user's prompt. We report a second controlled, within-subjects study comparing the two interfaces (n = 16), finding that users preferred dynamic controls because they lower barriers to providing context, offer guidance for better prompts, facilitate greater exploration and reflection.

(3) We derive design implications for dynamic controls, such as greater control of how the AI interprets and applies options, leveraging of user context and data to generate helpful controls, and expanding dynamic prompt middleware to richer modalities.

Formative Design Probe

First, we needed to understand what to build to address this issue. So, we ran a design probe survey of 38 copilot users with the idea of a set of options that helped control AI responses as a primer to collect user needs for AI explanations.

Through these survey responses we found that users found it difficult to control AI responses due to current AI chat prompting interfaces. Participants wanted the AI to respond in a predictable manner, and saw the control afforded by the design as a way to "tailor the experience to your specific needs" and put users in control of the AI. This lead to design goal 1:
DG1: Generative AI tools should afford users direct control of the AI's responses.

Participants also wanted options generated based on the prompt they gave AI, and wanted these options to be quick to change within the chat interface and adapted to the type of task the user was trying to accomplish. This lead to design goal 2:
DG2: Control affordances should be flexible, so users can quickly correct AI assumptions, and dynamic, so control surfaces adapt to the task and user.

Promptly design and implementation

Informed by the formative survey's findings, we developed Dynamic Prompt Middleware or Promptly.

User flow with the Dynamic Prompt Middleware system. (1) User submits their prompt. (2) The Option Module generates a set of options to help steer the Chat Module's response. (3) User can update the refinements sent to the Chat Module by clicking their preferences. (4) On change, Chat Module regenerates the response with the new chosen refinements. (5) User can request controls through NL prompting. (6) The Option Module generates a set of session options based on this prompt. (7) The session options apply to the current and every subsequent response from the Chat Module.

System Overview

Inline: (1) The Option Module takes the user's prompt input and conversation history. (2) The Option Module returns a set of prompt options with initial values based on the prompt. (3) Prompt options are rendered inline using a rendering engine. (4) The Chat Module uses these refinements as grounding, along with the user's prompt input and conversation history, to generate a chat response. (5) User can adjust GUI controls which updates the refinements and re-invokes the Chat Module to regenerate the current response.
Session: (6) Users can directly invoke the Option Module to generate Session options through natural language prompting within the Session control panel. (7) Users can also directly add and modify options manually through JSON input in the Session control panel. (8) Users can pin Inline options to the Session by clicking the pin icon on each inline option, which applies the inline option to follow-up queries.

In-lab comparative study

Participants: 16 knowedge workers familiar with generative AI tools with varying experience with data analysis and programming.

Tasks: Participants completed 6 tasks, organized into 2 task-sets. Each task-set involved 3 common tasks done with GenAI assistance: code explanation, complex topic understanding, and skill learning. The goal of each task was to interact with the system to craft an explanation based on each task prompt that helped the participant understand each task and made them confident that they could answer questions about the task with the explanation.

Protocol: Our two conditions for this within-subjects user study were Dynamic and a Static version of the system that provided generally applicable prompt refinements which did not change between prompts or task (see image to right) . Every other element and interaction within the system was identical between conditions. We chose a within-subjects protocol so that participants could reflect on the similarities, differences, and trade-offs of using the two systems.

We did not compare Dynamic or Static systems to a 'baseline' ChatGPT or similar system, as our formative study results showed that users strongly desired control of AI-generated content, which we suspect would be replicated in the comparative tool study. Instead, we directly compare Dynamic against Static Prompt Middleware to better understand the effectiveness of different approaches of control for users, and the trade-offs of each.

Participants were assigned Dynamic and Static conditions and A and B task-sets through a counterbalanced design, such that half the participants received the Dynamic condition first, and the other half received the Static condition first. Within each of these condition-first groups, each task-set was balanced such that half of each condition saw the A task-set first, and the other half received the B task-set first. Therefore, there were four equal groups of participants during the study.

Results

User preferences for control: Participants preferred to use the Dynamic system over the Static system, as they felt it made them more successful at interacting with the AI.
User perceived effectiveness: Participants thought the dynamic controls were more effective for controlling AI output, and were satisfied with the level of control granted vs in the static condition where many felt they still needed more control over the AI responses.

Promptly lowers barriers to providing context to AI: Current participant workflows for prompt engineering was seen as tedious (P4), inconsistently effective (P4, P5), time-consuming (P5), and a current barrier participants had with using GenAI that new control affordances, like that of the Dynamic system, could help address. Participants found it difficult to "even have the words for the context" they wanted to convey in their prompts as there was a noticeable gap in expertise needed between prompting and prompt engineering (P4). Providing task and user context within prompts was seen as critical for obtaining responses that aligned with user needs (P2). Promptly was described as helpful for assisting the users in providing much of the context necessary for the AI to effectively respond to a user's needs for a task and eliminate much of their struggles due to forming prompts (P2, P4, P8, P15). For example: "I felt like [Promptly] would really help me shortcut what I'm doing manually today." (P15)

Promptly improves user perception of control over AI: Participants felt that the Dynamic system gave more effective control over the Static system and their current prompting strategies (P4, P6, P8). Without the control provided by Promptly, one participant felt:
"restricted by the current Copilot [because I] don't have these options, I'm trying to modify the prompt to get the answer out and the AI just goes out of control after some time, so I have to refresh and start again and change my prompts." (P12)
The greater control afforded through generated options was seen as useful for receiving AI responses that fit the personal preferences of the user (P14), which participants felt was made possible through Promptly's afforded flexibility (P7), greater precision in defining the task (P8), better predictability of AI responses (P6), and greater adaptation to user needs (P13).

Promptly provides guidance for AI steering interactions. : Participants saw the Dynamic system as a guide in interacting with AI by providing helpful, and relevant, options (P1, P2, P4, P6, P7, P9, P14). For example, P7 saw the generated options as a form of debugging the AI's response since it "gives you some ideas around if you change [the response] in this way then you might get better answers." Participants felt that Dynamic PRC guided them towards better responses (P7, P10, P11), giving alternatives of what they might do to solve their problem (P7). Dynamic options did this by helping participants review potential refinements to their prompt (P1, P4, P9) and give greater structure to their AI interactions through generated options (P14). P11 said it even helped them learn potential strategies for future prompts, saying that Promptly was "giving me more of a prompt learning experience, and I'm getting out of [the AI] what I want. And actually, it's a better response."

Design Implications

From our analysis of participants' reflections, we derive four implications for improving the design of Dynamic PRC systems: balance appropriate/useful mental load to users, provide control over how options are applied, leverage relevant context, and allow direct manipulation of options.

Dynamic PRC should provide appropriate mental load to users: Dynamic PRC was found to be more mentally demanding to use over Static PRC. This might have been due to the amount of novel prompt refinements generated that the user had to attend to and understand with each prompt. One potential UX-pattern to address this is progressive disclosure, where Dynamic PRC initially only offers a few critical meta-refinements and allows the user to request that the Dynamic PRC system generates more when interested or required. Dynamic PRC might also leverage what it knows about the user, through system usage or user personas, to automatically select refinements it is confident about. Dynamic PRC might present these refinements in a separate part of the UI, so that the user does not have to attend to them while they are effective, while still retaining control over these refinements when necessary.

Dynamic PRC should gather and leverage relevant context for the user: While the current implementation of Dynamic PRC uses the user's prompt and currently selected options to generate options, participants wanted the system to leverage their past AI usage and other context or tool usage to inform the generation, and selection, of Dynamic PRC and provide greater context to the AI. For example, for code generation tasks systems could use previous AI code generation requests to infer what programming language to use (P10), or incorporate the files the user has open in their IDE. As with the previous need for understanding how AI interprets options, transparency was cited as a critical need for understanding and controlling what data the AI uses to form a response (P6, P7, P9, P12). Forming user personas based on information like user-provided expertise level and job position (P2) and interacting with the user to let the user select what relevant data AI should use to form responses (P1, P11, P12, P15) were also seen useful features for greater specificity and personalization of responses. This need might be fulfilled by explicit signals from the AI in the response on when data comes from an external source, or from the model itself, by highlighting each within a response (P6). Users might then select these highlights to steer the AI toward using sources and data that the users prefer (P4, P10).

Dynamic PRC systems might also generate richer PRC modalities, like interactive maps to reveal and collect geographic needs or visual representations of options to provide the user inspiration for what kind of context the AI is looking for, which might improve the process of context gathering from the user (see image). However, there must be a balance between the amount of context-gathering interactions that the user must attend to, so that information overload and fatigue in providing preferences to the Chat Module is avoided. This issue can potentially be eased through PRC option reuse, similar to what session options in the Dynamic PRC and options in the Static PRC approaches provide.

Dynamic PRC should provide control of how and where the AI applies options: Though participants believed the control afforded by Dynamic PRC was useful and made them more effective, they felt that more control over the application of options could help address the barriers that remain. Participants noted a need for greater transparency on how the AI interprets the options selected and prompt given to the AI (P1, P14, P15). This might be achieved through providing several potential examples of the structure and format of the response for selection (P3, P8), supporting interventions within an AI response where users prompt the AI to adjust specific sections of a response by directly interacting with the response to communicate intent (P6, P10, P15), or even provide response diffs (similar to code diffs) to allow comparisons between responses based on the selections made by the user to address barriers to user understanding around how the option selections impacted the response (P1). The current design of the Dynamic PRC system is capable of supporting such response branching and exploration, but the addition of informative diffs would help address the barrier to user understanding of the impact of options on responses.

Another useful ability would be to allow the user to make targeted changes to the response, so that users can keep results that they like, and refine what they do not through Dynamic PRC interactions. The user might wish to preserve some aspects of the structure of the response while refining the content (P14) which can be lost when modifying prompts.

Dynamic PRC should provide direct manipulation to update options: While it was uncommon for participants to want to modify the generated Dynamic PRC options during the study (as participants could either prompt the Option Module to regenerate the option with changes or modify the JSON representation), some felt the need to directly update options (e.g., click on an option to edit the text and regenerate the option based on the changed text) (P2). The Dynamic PRC system can be improved to allow users to update parts of the UI (e.g., the option labels) to send these modifications to the Option Module as user intent. Based on this intent, the Option Module could regenerate the option to better suit the user's needs, supporting a form of response-by-example similar to programming-by-example interfaces.