Facilitating AMIE’s physician-centered oversight
Our research AI system for medical reasoning and diagnostic dialogue, Articulate Medical Intelligence Explorer (AMIE), was recently shown to be able to provide accurate medical advice in text-based simulations of patient visits. However, prior to any patient communication, individual patient diagnoses and treatment plans are regulated activities that must be reviewed and approved by licensed medical professionals. Simultaneously, oversight is an established paradigm in the medical domain allowing autonomy for care team members while overseeing primary care physicians (PCPs) retain accountability for the care of the patient. Our current study investigates a framework for physician oversight of AMIE in light of this. In “Towards physician-centered oversight of conversational diagnostic AI”, we introduce an extension of our AMIE research system, guardrailed-AMIE (g-AMIE), with a multi-agent setup based on Gemini 2.0 Flash. g-AMIE can gather patient information (i.e., history taking) via a dialogue and generate a body of information for a clinician to review.
A draft message to the patient, a proposed differential diagnosis and management plan, and a summary of the collected data are all included in this. g-AMIE is designed with guardrail constraints that prevent it from sharing any individualized medical advice, such as a patient-specific diagnosis or treatment plan. A specialized web interface known as the clinician cockpit allows an overseeing PCP to review and edit this information. The overseeing PCP is able to review cases asynchronously because the taking of histories and making medical decisions are decoupled. We compared the performance of g-AMIE with that of nurse practitioners (NPs), physicians assistants/associates (PAs), and primary care physicians (PCPs) working within the same guardrail constraints in a randomized, blinded, virtual objective structured clinical examination (OSCE). Oversight PCPs and independent physician raters preferred g-AMIE’s diagnostic performance and management plans, according to our findings. Additionally, patient actors favored g-AMIE’s patient messages. Even though this is a significant step toward human–AI collaboration with AMIE, it is important to interpret the results carefully, especially when comparing them to clinicians. While clinicians haven’t been trained to work within this framework, the workflow was designed with AI systems in mind. An oversight cockpit for clinicians To enable physician oversight, g-AMIE produces a detailed medical note that is then reviewed by the overseeing PCP using our clinician cockpit interface, which we developed in a co-design study with 10 outpatient physicians. Before results were shared with a UI designer to draft the interface, the co-design was carried out through thematic analysis and semi-structured interviews with potential users. The cockpit is based on the popular SOAP note format, which has sections for Subjective (the patient’s perspective on their condition), Objective (observable and measurable patient data, like vital signs or lab results), Assessment (a differential diagnosis with justification), and Plan (a management strategy).
History taking and medical note generation
We developed a multi-agent system with a dialogue agent, a guardrail agent, and a SOAP note agent in order for g-AMIE to respect its guardrails while taking history and produce high-quality, accurate SOAP notes. The objective of the dialogue agent is to carry out high-quality history taking in three phases: the general history taking phase, the targeted validation phase of an initial differential diagnosis, and the conclusion phase that addresses questions from the patient. Rephrasing responses as necessary, the guardrail agent ensures that the dialogue agent’s responses do not contain any individualized medical advice. Separating the summarization tasks (Subjective and Objective) from the inferential tasks (Assessment and Plan) and the generation of patient messages, the SOAP note agent performs sequential multi-step generation.