Forget the Medical Chatbots. Let’s use AI to Fix Healthcare Administration.
I am puzzled by the interest in AI medical diagnosis chatbots. Dozen of startups are tackling one of the most difficult technical problems — clinical diagnosis — with fairly immature technology and no real business model. It’s even more intriguing when you realize there are billion dollar opportunities in administrative process automation, which is a massive pain point for almost every medical practice in the country. It’s also much simpler than algorithmic medical diagnosis.
Context: We’re Spending Hundreds of Billions of Dollars on Automatable Clerical Work
Unless you’ve worked at a provider organization or insurer, you probably aren’t familiar with the colossal administrative engine that makes our healthcare system tick.
Let’s start with the basics. We spend about $3.4 trillion annually on healthcare in the United States. About one third, or $1 trillion, is spent at hospitals. About 25% of all hospital spending — over 1% of GDP — doesn’t go to care delivery. Instead, these dollars — about $250 billion, or $750 per American per year — fund hospital administration (per a 2014 Health Affairs study).
Some of those expenses are necessary. Hospitals are complex organizations that require many non-clinical staff. However, a large portion of that expense is clerical work that adds no value for the patient.
Keep in mind that the $250 billion figure only covers hospital-based administrative expenses.
About 25% to 30% of healthcare spending is on ambulatory and outpatient care, which includes most healthcare services that occur outside the hospital, such as primary care, specialty care, outpatient surgeries, and diagnostic services. Benchmarks suggest that at least 20% of all ambulatory spending goes toward administrative support staff. In fact, the average primary care provider in a private practice is supported by almost five non-clinical staff (per Medical Group Management Association benchmarks). Ambulatory and outpatient administrative expenses add another ~$170B in costs each year.
If only 25% of this $420B in annual expense is automatable or augmentable clerical work, we arrive at a total opportunity of over $100B in annual costs. Though some of those costs are outsourced offshore, the domestic clerical burden is enormous. Nearly 60,000 Americans work as medical transcriptionists, over 200,000 are medical records technicians, and another 630,000 act as medical assistants (per the Bureau of Labor Statistics). I didn’t even include the administration expenses incurred at insurance companies, pharmacies, home health, nursing homes, device suppliers, or in government — which makes up the other ~40% of healthcare spending.
While outpatient practices are rife with operational inefficiencies, I’m going to focus on three clerical (i.e., non-clinical) areas where AI — or its simpler cousin, Robotic Process Automation — can yield the most value.
1. Revenue Cycle & Credentialing
Revenue cycle management (RCM) is the core of getting paid in healthcare. It’s a very complex topic with many opportunities for automation. RCM tools offload some of the complexity to rules engines but still leave a lot of manual processing to humans. In addition, credentialing a clinician with an insurance company — a process in which the insurer validates that the provider is eligible to render care as an in-network provider — is a tedious and time-consuming exercise that adds to a practice’s administrative load.
- Coding & Claim Generation: Practice staff have to create electronic claims after each visit based on the notes in the electronic medical record. They have to scrub each claim based on a series of rules to ensure it will get adjudicated correctly by the insurance company, and these rules can change at any time without advance notice.
- Denials & Appeals: Resubmit rejected claims. Often involves calling insurance companies, reviewing coverage documentation, and consulting coding guidelines to select the appropriate codes for each insurance company.
- Credentialing: Submitting detailed personal and educational histories for every billing provider to each insurance company every few years.
2. Insurance Administration
In addition to submitting claims for payment, practice staff spend a lot of time helping patients and clinicians navigate the insurance system.
- Evaluating Insurance Coverage: A patient receives a referral from her PCP to a cardiologist. The practice staff call the insurance company to confirm the cardiologist is in-network and accepting new patients.
- Managing Insurance Authorizations: A patient receives a referral from his PCP for an abdominal CT to rule out appendicitis. In order for the insurance company to cover it, a practice administrator needs to log into a portal and provide documentation. Alternatively, an insurance company might call the practice to conduct a “peer-to-peer” (P2P) with a clinician. During a P2P, a clinician needs to explain why a patient needs a particular service. A similar process is required for pharmacy denials for off-formulary prescriptions (e.g., “Yes, this patient needs the Tier 3 medication because she had an allergic reaction to the Tier 1 preferred option.”).
- Explaining Insurance Benefits: A patient does not understand his benefit design, and the practice staff needs to explain it to him, along with providing an out-of-pocket cost estimate for his high deductible health plan.
3. Document Management
A significant portion of clerical overhead is spent managing documents, including responding to medical records requests, triaging incoming documents, and completing medical and administrative forms.
- Chart Extraction: A law firm requests medical records for a patient. Practice staff need to identify the patient and excise sensitive information from the medical records (e.g., mental health history). Then they need to mail or fax the records and generate an invoice.
- Document Triage: During a 30 minute period, six specialist reports, two radiology reports, and twenty medication refill requests arrive via fax. Practice staff have to classify each document (CT scan or CT angiogram?), triage what’s time sensitive, and route each document to the correct patient’s chart, often by matching names and dates of birth. If a name or date of birth is inaccurate, they have to destroy the document, contact the sender, and request a new set of records to avoid a HIPAA violation. If the fax is illegible, they have to contact the sender to have the document re-sent.
- Form Generation: A patient needs their PCP to complete a four-page FMLA form, a referral for FSA reimbursement, or a back to school form. Support staff take a pass at filling out the form, and pass it to the PCP to complete the clinical components.
In addition to lowering the cost of care, automating clerical aspects of physician practices would help reduce errors, speed up access to care, and shift our spending to higher-impact roles — reskilling medical billers as care navigators, home health aides, and health coaches. Moreover, there are countless clinically-focused AI opportunities, such as augmenting population health outreach, extracting relevant clinical data from medical records, and facilitating care coordination.
So, chatbots might be fun, but there’s probably a real business to be built automating the clerical operating system of the medical economy.
This post was adapted from my recent presentations at the 2017 Behavioral Science & Policy Association Annual Conference and the 2017 Ford Research and Innovation Center Behavioral Analytics Workshop. You can view other conferences I’ve spoken at here.
Three cognitive biases make health hard:
- Present Bias
We are overly focused on present rewards at the cost of our long-term intentions. While it is rational to value the present over the future, we discount the future inconsistently — i.e., hyperbolically instead of exponentially. (There’s a bunch of nuance and debate here, including the role of subadditivity, but I’ll leave that for another time.)
In the context of health, present bias shows up when we discount the future too steeply, which leads us to favor small near-term rewards (e.g., eating chocolate cake) over big long-term rewards (e.g., reducing your likelihood of heart disease by exercising every week). Sitting on your couch and watching TV today is always going to be better than preventing heart disease in 30 years. As a result, present bias makes it hard for people to invest in long-term prevention or chronic disease management.
- Default & Status Quo Bias
We tend to stick with the defaults in our environment. We don’t order a cheeseburger. We order a cheeseburger with a coke and fries — that’s the default option. We tend to do what we have done in the past, but many of our habits were created by other people.
Few defaults in our environment promote health. You have to opt-in to most health behaviors.
- Optimism Bias
We tend to think we have a lower risk of a bad outcome and a higher risk of a good outcome — i.e., we are unrealistically optimistic. We smoke cigarettes but believe we have a lower lung cancer risk than other smokers. It won’t happen to me!
By making us think that bad outcomes are less likely than they really are, unrealistic optimism psychologically dampens the rational factors that should motivate us to change our behavior.
These biases are extremely difficult to tackle. In fact, it would be unwise to challenge them directly (e.g., by pointing out one’s statistical risk of lung cancer or their family history of heart attacks). Fortunately, we don’t have to fix our innate biases to improve health. In fact, we can leverage other cognitive biases to mitigate these biases through counternudging.
I typically tap into five key biases to act as counternudges. I’ll briefly review each bias and provide examples of how I would leverage them in an email (or secure message) designed to nudge patients who were overdue for a FIT test that they previously agreed to complete. (For the unanointed, a FIT or FOBT test is a stool-based swab used to screen for colon cancer. It is fairly easy and painless, and can be completed at home when you use the bathroom.)
We like to be consistent with ourselves. We want to be the people we think we are. You’ll find some excellent examples of how companies use this effect in the Consistency entry in my Behavior Library.
You might use the following consistency frame in an email: “It looks like you said you’d complete a FIT test to screen for colon cancer, but we haven’t received your results.”
This tactic works because we remind you that you committed to a particular course of action (without using an accusatory tone), we remind you why it is important, and we give you an “out” if you already did it.
- Social Norms
We like doing what everyone else is doing. As with consistency, social norms feature prominently in product design (see the Social Norm entry in my Behavior Library).
In an email, you might simply state: “Most people return their kit within 2-3 days.”
What’s nice about this example is that it implies a level of ease (“oh, I can do this quickly”) while also providing social cues.
A word of caution: social norms done wrong are counterproductive. For example, overspecifying a social comparison (e.g., “91% of my patients return their kit within 2-3 days”), can break voice and make a personalized message feel like a mass email.
- Goal Acceleration (AKA Goal Gradient Effects)
We put more effort into actions as we move toward goal achievement. You might have experienced this effect in the context of loyalty programs, which are typically initialized above zero in order to make you feel that you are already making progress toward your reward (see the Goal Acceleration entry in my Behavior Library). When a restaurant starts your loyalty card with two hole punches instead of one, you are perceptually closer to the goal. The closer you are to the finish line, the more you visit the restaurant.
You can make patients more likely to act by making them feel perceptually closer to their goal. In the FIT test example, you might tell patients: “You’re almost done. You’ve already decided to do the test, and we sent it to your home yesterday. Now all you have to do is mail it back.” This language can help shrink the gap to the finish line, and accelerate progress as a result.
- Implementation Suggestions
An implementation suggestion is a variant of an implementation prompt. In an implementation prompt, you ask someone how they are going to do something. For example, you might give a patient a flyer listing dates and times for a flu vaccine clinic, and ask them to circle the time they plan on attending.
An implementation suggestion is a unidirectional implementation prompt. Instead of asking patients to tell you how they are going to implement their intention, you instead give them a helpful suggestion on how to best do it. These suggestions should help patients overcome common behavioral barriers.
For example, patients sometimes fail to complete their FIT test because they forget to put it in their bathroom. The following implementation suggestion offers a helpful tactic: “I’d recommend putting the kit on your toilet seat to remind you to complete it the next time you use the bathroom.”
We are more likely to do things for individuals than for groups. We might donate money to save a particular child’s life (“Uma, age 4”), but we are less likely to help an unknown person in a village 3,000 miles away.
This simple effect suggests that leveraging the identity of a familiar individual might help patients more readily act on their intentions. For example, emails sent from Dr. Welby (my PCP) will likely prompt more action than those sent from other medical staff or the impersonal “no-reply” sender. When direct emails are not possible, reiterating connections to someone the patient knows can be helpful. For example, a team member might introduce herself in an email as “Sally, a nurse practitioner who works closely with Dr. Welby.”
Incorporating a photo of the trusted health provider is another way to leverage identifiability. A headshot of your PCP set as the contact photo or embedded in the signature line of the personalized message increases the salience of the relationship and motivates action.
These tactics work. Deploying counternudges has enabled us to achieve rates of preventive cancer screenings and vaccinations in the top 10% in the country — while simultaneously delighting our patients.
Obligatory disclaimer: All views are mine and do not necessarily represent the views of my current or former employers.
A commitment pledge is a behavior change device in which an individual makes a positive affirmation to adopt a set of behaviors or beliefs.
Pledges work in theory because people generally dislike holding inconsistent views (i.e., dissonance). High-salience pledges (e.g., a signed pledge hanging on a wall) should present more opportunities for dissonance and therefore be more effective. In addition, committing to public pledges might be even more effective because of the added effects of social pressure.
As a result of these behavioral dynamics, commitment pledges can play a valuable role in behavior change. Over the past few years, pledges have become increasingly common in primary care. Many organizations now ask primary care providers to sign commitments to avoid unnecessary antibiotics, prescribe opioids responsibly, and to deliver an excellent service experience.
Pledges garner a lot of attention sometimes yield positive results. However, I think pledges distract from much bigger opportunities to empower PCPs to deliver better care.
The problem with pledges is that they amp up motivation without increasing ability. To reduce unnecessary antibiotic prescribing, we should use a pledge if we think that PCPs are skilled at convincing patients with colds that antibiotics are not helpful, but simply don’t care enough about antimicrobial resistance to have those conversations.
This strikes me as odd. When I talk to outlier clinicians, I almost always find that they care deeply about delivering the highest quality care; they just don’t know how to confidently and persuasively talk a demanding patient out of a z-pak. Asking these clinicians to take a pledge will likely lead to blunt interactions that turn trust-building and educational opportunities into a relationship-harming events. On the other hand, the PCPs who have better antibiotic prescribing rates and better patient satisfaction scores have often cultivated an interpersonal style that instills trust and makes patients feel heard. Subtle differences in language, tone, and physical presence separate the PCP you trust from the one you second-guess.
Instead of creating pledges, we should extract tacit knowledge from high performing clinicians and teach these skills as best practices. Empowering PCPs to deliver relationship-based care will be far more effective than leveraging flashy tactics from behavioral science.
Obligatory disclaimer: All views are mine and do not necessarily represent the views of my current or former employers. This post will also be very boring if you don’t care about employee health benefits.
Over the past few years, I’ve chatted with many healthcare startups looking to break into the employee benefits space. The question I get most often is “What do employers know about the health needs of their employees?” There are several ways to answer the question, but I’m going to focus this post on explaining the types of data that small to midsize employers (500-5,000 covered lives) receive from payers.
For context, I’m not an employee benefits expert (nor do I aspire to be one), but I’ve spent a lot of time leading value-based care programs with employers, brokers, consultants, and payers. As a result, I’ve been intimately exposed to the nitty-gritty details of how employers of all sizes — from 500 to over 50,000 lives — make benefit decisions.
The Challenge & Opportunity of Employee Benefits
HR leaders have a hard job. To gain empathy for the problem, imagine you lead HR at a technology or professional services firm that employees about 1,000 people. If the average age is in the mid 30s, you’re probably funding care for about 1,500 members (including spouses, children, and other dependents). Assuming each member incurs about $5,000 in total medical, pharmacy, and administrative costs per year, you are in charge of about $7M to $8M in annual spending.
Given this level of responsibility, HR leaders need insight into their population. Most benefits managers are not clinicians or data scientists, so they need help to make the right decisions for their employees. Many large self-insured organizations pay consultants to use claims data to generate insights and provide recommendations. However, small and midsize employers primarily make decisions based on data provided by the insurer. These employers are generally less empowered with the right information and insights, and it’s not their fault.
What Payers Provide
The data that payers give employers varies based on several factors. The biggest determinant is whether the employer is self-insured or fully-insured. Self-insured employers generally have at least 1,000 covered lives. At this scale, they bear risk for the medical costs incurred by their population. They use the payer for administrative services only (ASO). The payer provides the network, pays claims, manages enrollment, conducts utilization management, and delivers a variety of other administrative services. Self-insured employers often have more data because they are at risk for medical costs and essentially operate as their own health plan. On the other hand, fully-insured employers pay pre-determined premiums to the payer, and the payer bears risk. As a result, fully-insured employers generally have less access to data. Various privacy laws, which differ by state and city, also govern what data are available to employers.
These complexities aside, most employers have quarterly or biannual meetings to review utilization data. Payers have tools that generate standard Excel reports from claims data. Account management teams typically build PowerPoint slides that use these templated reports. Employers, payers, and often the employer’s benefits broker (unless they use a PEO) will sit down for an hour or two to discuss utilization data and review opportunities for improvement. Depending on the size of the employer and the importance of the relationship, payers will bring a several account management representatives, a doctor or nurse, and sometimes a pharmacist.
While every payer is different, the reporting packages are fairly similar in the small to midsize market. I’ll break down the key components and give you a sense of what’s generally included and what’s not.
1) Summary Demographics, Utilization, and Cost
The first section typically provides the employer with basic demographic data and key utilization metrics. All reporting usually includes current period data, prior period data, and “book of business” benchmarks (i.e., comparisons to the payer’s other employer populations).
|Typically Included||Typically Not Included|
|Demographics & Enrollment
Plan Selection (sometimes)
Cost & Utilization
People assume the summary contains the most important metrics. However, the summary is merely a read-out of whatever data are in the default reporting package. These topics might not be relevant for an employer and are generally provided with no sense of scale or importance. Additionally, there are no indications of what differences are meaningful. Distracted by the deluge of metrics, people may waste time trying to interpret data that likely has no significance (e.g., a 5% change in ER visit rates on a base of 1,500 lives).
2) Utilization & Cost Breakdown
Payers generally provide a breakdown of spending by category, with charts that show the percent of total spend by category and year-over-year cost trend comparisons.
|Typically Included||Typically Not Included|
To make matters more complicated, the medical services categories that most payers use are highly confusing. Having gone through the exercise in the past, I’ll admit that categorizing each procedure, service, and item delivered in healthcare is challenging. However, the reports that employers receive are generally incomprehensible and incomplete, and often lead to incorrect conclusions.
Here’s an example of how a payer might categorize healthcare spending (all data is fictional but directionally accurate):
|% of Total Paid Amount||% Year over Year Trend|
This data is confusing because some of the categories are places of service — inpatient facility, ambulatory facility, emergency room — while others are types of services — lab, radiology, home health. Furthermore, there is no data on unit cost and volume, so there is no way to dissect the drivers behind changes in medical trend. It is also difficult to evaluate the success of interventions (e.g., an ER avoidance campaign) when the relevant data is not reported.
While it is impossible to draw conclusions from this data, smart people can tell any story. You can imagine the confusing conversation that results:
- Are procedures shifting from inpatient settings to ambulatory surgical centers? Or is the total volume going down?
- Maybe the volume of both are increasing, but costs are lower because the payer negotiated better facility rates?
- Or perhaps inpatient admissions are increasing, but total costs are lower because fewer people had babies?
- Wouldn’t fewer babies mean lower radiology costs? But are sonograms done by a hospital-based obstetrician listed in the radiology or inpatient facility category?
3) Network Utilization
|Typically Included||Typically Not Included|
The big challenge with this section is that it is difficult for employers to understand trends in OON utilization. I often see OON spending cluster in a few areas, such as physical therapy and psychotherapy, but these patterns are invisible in most payer reporting. As a result of these gaps in reporting, employers struggle to take action to improve in-network care for their employees.
4) High Cost Claimants
High cost claimant reporting isolates the effects of high cost members (typically $50K+ or $75K+ in total TTM spending) from the rest of the population.
|Typically Included||Typically Not Included|
This section is generally useful, though this is usually the only area where the distortionary effects of catastrophic events are removed from utilization reporting. As a result, employers can make incorrect conclusions because the high cost claimant exclusions do not propogate throughout the reports. In addition, even when employers use a payer’s PBM, the reports never link high cost claimants across medical and pharmacy spend. Unfortunately, payer reporting reflects the structure of the data instead of the structure of healthcare. The decoupling of medical and pharmacy costs understates the impact of high cost claimants. This is an obvious but pervasive oversight.
5) Inpatient & ER
Payers generally provide several breakouts on inpatient and ER utilization.
|Typically Included||Typically Not Included|
Gaps in inpatient and ER reporting present significant impediments to insight. The key issues I have seen are as follows:
- It is difficult to understand inpatient admission trends because full-term deliveries, which represent about 40%-50% of admissions in younger commercial populations, are typically not broken out from other inpatient admissions. It is hard to know if changes in inpatient admission rates are good or bad if a significant portion is driven by a positive but costly event.
- ER visits are usually reported without context. The highest volume ER visit reasons — chest pain, headache, syncope and collapse, unspecified abdominal pain — can be difficult for even experienced clinicians to evaluate on paper. Encouraging HR leaders to postulate about whether abdominal pain was avoidable is not a good use of time. Fortunately, there are several algorithms that provide estimates of ER visit avoidability (e.g., the validated NYU ED algorithm), which could help employers understand opportunities for improvement. However, I seldom see an avoidable ER visit metric in a standard payer report.
- Similarly, each payer categorizes ER and inpatient visits using their own medical diagnosis groupers. Categorizing diagnoses is difficult, but the categories that payers use can be incredibly confusing. What does an ER visit category called “Blood/Organs” mean?
- ER visits are not broken out by member type, which is particularly relevant for populations with young children. Pediatric ER visits can be a significant driver of utilization, and many interventions to reduce adult ER visit rates do not apply to children.
- Despite the rise in urgent care and convenience care (i.e., walk-in clinics), metrics on these types of care are not commonly included in major carrier reporting. The lack of data can lead to misplaced initiatives. For example, many employers have been trying to reduce ER visits by encouraging the use of urgent care. This strategy can work, but promoting urgent care creates new demand for medical services. I have seen several cases in which the cost savings of fewer ER visits was offset by disproportionate increases in expensive urgent care visits. Substitution effects are sometimes mitigated by induced demand. Employers need data on these dynamics to more accurately evaluate performance.
Pharmacy reporting is similar to the medical reporting in that it decomposes spend by demographic factors and clinical categories.
|Typically Included||Typically Not Included|
Pharmacy spending is not immune from ontological challenges. Payers usually cluster medications into proprietary categories that are informed by GPI drug groups. These ontologies do not reflect therapeutic uses in ways that employers can understand; e.g.,:
- Central Nervous System
It is difficult to know what to do with these labels. Even the deceptively straightforward “anti-infective” category includes antibiotics used to treat traveler’s diarrhea, antifungals for toenail infections, and antiretrovirals to treat HIV. It is hard to create medication ontologies that fully reflect therapeutic use (e.g., “diabetes medications”) because a single medication might be used to treat different conditions, but we can definitely do better than the current approaches.
Quality measurement is not a significant component of standard payer reporting. Employers will typically receive a slide that shows the percentage of employees who have used the most common preventive services. This format is extremely dangerous because it implies that very few employees are receiving appropriate preventive care. However, coverage levels and care needs are not the same thing. For example, a plan might cover yearly pap smears, but most women do not need (and should not get) yearly pap smears. Utilization reporting misrepresented as preventive care metrics leads HR leaders to perceive that their employees need more preventive services. Payers have spent billions of dollars incentivizing primary care providers to improve industry standard HEDIS quality measures (cancer screenings, vaccinations, and chronic care management), but I rarely see them in employer reporting.
Employers get a bunch of data, but almost no insight. Companies selling to small and midsize employers need to understand these limitations. We only know about the pain points we feel, and the unfortunate reality is that many employers are largely unaware of their opportunities for improvement. In addition, vendors need to help employers understand the value of their new employee benefit post-implementation. Many employers simply do not have the tools to measure a vendor’s impact on healthcare utilization. Finally, these gaps suggest there is a potential market for companies that leverage claims data to provide small and midsize employers with actionable insight.
HR leaders want to do right by their employees. It is up to the rest of the industry to help them do that.
Casual observation is often a useful tool in sparking topics for academic research. To illustrate how this works, let’s consider three paradoxes about price.
Paradox 1: How much you paid either matters a lot, or it doesn’t matter at all
- We all know someone who, years later, still raves about the $90 dress she purchased for $6 through machinations involving a clearance rack, store credit cards, a fistful of coupons, and a doe-eyed cashier.
- We all know someone who has purchased something extremely expensive mostly because it was extremely expensive (i.e., conspicuous consumption).
- We forget the prices of almost everything we buy. (Can you remember how much tomatoes cost? What about the shirt you’re wearing?)
- Many products are described by their inherent or observable characteristics, but some are defined by price as the primary dimension. Over time, underpaying or overpaying can become a feature of a product. In these circumstances, price effectively takes on a hedonic role (e.g., I like it because it was really expensive or inexpensive).
- Although price is one of the most important attributes when purchasing a product, it’s usually irrelevant after the purchase.
Paradox 2: Price isn’t monotonic
Observation: High prices sometimes indicate high quality, but sometimes low prices indicate high quality.
Inference: Most attributes are monotonic (i.e., the attribute and how much we like it covary in the same direction; e.g., as the quality of cheese increases, we like it more). However, price isn’t monotonic.
Paradox 3: Social factors can disrupt price
Observation: Most of us work a roughly consistent set of hours in exchange for an income (i.e., we put a price on our time). But if your neighbor offered to pay you to help clean out his basement, you’d be weirded out and likely refuse any compensation.
Inference: Price is not intrinsic; social norms can dramatically affect how we price our time.
Any one of these basic observations could initiate a line of research to evaluate whether the inference is valid, the circumstances that modulate the inference, and the mechanisms that cause it. Next time you’re struggling to develop a research idea, start by paying attention to interesting patterns in everyday life.