All references to “we” or “us” in these AI Specifications means Included Health.
We are building our natural language processing (NLP)/large language model (LLM) platform as a centralized unit that may serve multiple applications. This platform enables us to adapt to technical and regulatory changes by making updates in one place and then translating to all the applications. We currently use NLP over call and chat data to determine member intent in real-time and glean greater insights into reporting.
Our GenAI work is still very new. Our current approach uses readily available LLMs combined with prompt engineering, grounding (e.g., RAGs and other integration tools), and robust safety guardrails. This combination allows us to deliver practical and efficient solutions without extensive custom model development. Future plans include exploring more advanced techniques, such as fine-tuning existing models or custom model training. Current ways we use AI are described below.
- Provider Matching: For nearly a decade, we have been using machine learning (ML) in AI to develop our matching capabilities. This system harnesses hundreds of predictive models that analyze a wide range of factors at the provider sub-specialty level. These factors include, but are not limited to, provider safety, efficacy, track records, clinical experience, member population experience, and bed-side manner. We have an extensive set of models that determines member needs, preferences, and values. This comprehensive data analysis is fed into a sophisticated optimization engine that generates personalized matches, connecting members with high-quality care providers who are best suited to their individual needs and preferences. Members will experience significantly increased engagement, improved clinical outcomes, and demonstrable cost savings.
- Personalized Recommendations: The healthcare system’s inherent complexity can create challenges for members navigating the most appropriate course of care. We use ML and AI to compute optimal care pathways through the healthcare ecosystem. We use our app and our Care Team to guide members to the right care with the highest clinical outcomes at the lowest cost.
- AI Agents: We use Dot, our innovative generative AI (GenAI) agent, to provide a better member experience. We understand the frustration of waiting on hold for basic benefits questions. Dot offers a fast, convenient, and accurate way to get answers to common coverage and billing inquiries, find a provider, and get assistance with a number of common navigation needs. There is always the option to connect with our live Care Team for more complex questions. We use a Retrieval Augmented Generation (RAG) architecture in which plan documents and queries are vectorized for similarity-based retrieval, and then extensive prompt tuning generates an answer to the query from the retrieved documents. Dot is also connected to Included Health APIs and so is able to take a number of actions on behalf of the member, much the way a care advocate would. For example, Dot can check if a provider is in network for the plan. Responses may be inaccurate so always review and validate outputs before making decisions.
- Care Team Tools: We’ve deployed AI to help our Care Team continue to be more efficient, ultimately allowing for more time with our members. We’ve built tools that help our team find and access member-specific benefits quickly. We also use a tool called Ghostwriter to help our Care Team take notes during chats and calls. Ghostwriter allows our Care Team to take structured notes, making it easier to read and review, while minimizing post-call note-taking.
- Clinical Efficiency Tools: With member consent, we use LLMs to help our clinical staff efficiently and comprehensively capture information during telehealth visits. This reduces the need to focus on note-taking, allowing our clinicians to spend more time interacting with members during a visit.
How we manage the risk of AI tools
- Bias: For internal and external use cases, we take time to curate a representative training and test set, as well as additional validation on model performance to check for bias. For internal use cases, we do extensive training with users, providing documentation on how to use the tools and describing potential biases. We recognize that the effectiveness of bias evaluation and reduction depends heavily on the quality and comprehensiveness of the data used. This is why we actively collect patient-reported data on Social Determinants of Health (SDOH) groups through various services.
- Data Drift: All production applications undergo continuous monitoring to detect potential data drift. This includes, but is not limited to, human review of all summaries generated by summarization applications and automated monitoring of metrics of interest over time.
- False Information or Hallucinations: We curate test datasets for each application in which we are developing AI. In generative AI applications, we additionally require extensive human review until we find that the application approaches human-level performance. We include “adversarial” inputs in these test datasets which are intended to produce hallucinations, allowing us to determine the extent to which we’ve been able to prevent AI hallucinations. Technical steps taken to prevent hallucinations vary by application, but may include prompt engineering, refining information retrieval systems, evaluation by an additional model, self evaluation, and added guardrails on multi-turn conversational inputs.
- Data Anomalies or Errors: Our solution utilizes a multi-faceted governance structure to ensure high-quality data and mitigate anomalies or errors:
- Pre-Launch – We employ a combination of automated and manual evaluations during the testing phase, iteratively refining our conversational AI until it meets our quality standards.
- Post-Launch Monitoring – Both automated and manual reviews of member interactions with the AI and Care Team are conducted to ensure accuracy and identify areas for improvement. This includes random and targeted reviews. Our Product and Service Quality teams also monitor escalations and feedback, identifying trends and significant issues. These findings are reported at least monthly to management and quarterly to leadership for action and improvement. AI-specific evaluations include:
- Human Evaluations – We assess the quality and accuracy of AI-generated answers to real member questions at scale (hundreds of question/answer pairs).
- Comparison Evaluations – We compare AI-generated answers to real member questions with Member Care Advocate responses at scale (hundreds of question/answer pairs).
- Automated Evaluations – We leverage automation to evaluate the quality of AI-generated answers to real member questions at scale (thousands of question/answer pairs).
- This multi-layered approach, combining human oversight with automated analysis, allows us to proactively identify and address data anomalies or errors, ensuring a high-quality and accurate member experience.
- Unsafe Data: Our algorithms are designed to mitigate bias and ensure accurate results. To achieve this, we employ rigorous quality assurance processes and establish robust alerting systems across all datasets. These measures guarantee the high quality of our data. Furthermore, we actively work to eliminate bias from the underlying datasets themselves. Any identified instances of inaccurate or biased data are quarantined and removed from training datasets to prevent their inclusion in future model development.
- Violations of a Third Party’s Intellectual Property Rights: Our governance structure is built on a foundation of clear policies, proactive measures, ongoing training, and a commitment to continuous improvement. We believe this comprehensive approach effectively mitigates the risk of IP infringement and demonstrates our respect for the intellectual property rights of others.
In detail, the nature, source and extent of the information collected and used by the Plan.
We use a variety of in-house data to fine-tune our solutions and proprietary in-house AI systems. This includes our provider matching, chatbots, Care Team tools, clinical tools, and our recommendation engine. The data used to fine tune these systems, as well as the specific methodologies employed, are proprietary and confidential. We are unable to disclose detailed documentation for these systems, but we remain compliant with all applicable regulations. At a high level, some data we use can be found below:
- Claims data: To ensure members are matched with the right providers and benefits, we leverage the power of their claims data.
- For provider matching, we use NCCT claims to identify suitable options.
- For benefits recommendations, we analyze a member’s health history (as reflected in their claims) to suggest the most advantageous programs and services.
- Plan information: Plan documentation (e.g., SBC and SPD), benefits flyers, etc.
- Audio / transcript: For care and clinical tools, the audio or transcript of the conversation is also used.
- Usage Data – Information about how the AI system is being used, such as the features accessed, the frequency of use, and the duration of sessions.
- Query/Request Data – The specific queries or requests made to the AI system, including the exact question phrasing or keywords used and the results returned.