Benchmarking APMs

With the release of the CY 2023 Medicare Physician Fee Schedule Final Rule on November 1st, I saw a flurry of online activity about how, after much consideration, CMS seemed not to have accepted many of the comments offered during the open comment period. In consequence, they appeared not to have changed much of anything from their Proposed Rule back in July.

One Twitter thread on this topic caught my attention in particular, and made me curious about how alternative payment models (APMs) – and specifically population-based payment (PBP) models that require plans and providers to manage total cost of care – will be financed going forward:

I’ve been studying APMs for a while now, but until I saw this tweet, I never drilled down into the particulars of how CMS decides how to incentivize plans and providers that rely on PBPs to perform at high levels, a process called benchmarking. Here is my take on how the process currently works, and how it might work going forward.

Benchmarking Basics

Before we delve too far into the complexity of scoring healthcare providers and reevaluating performance standards, it might be good to cover some basics. For starters, although the term “benchmarking” emerged in the 1800s during the industrialization of weapons manufacture, the concept can be applied to a range of disciplines where results of one endeavor are compared to those of another. For example, some common continuous improvement tools like client surveys and SWOT analyses make use of benchmarking.

Philip de Vroe, a.k.a. the Finance Storyteller, has a good primer video on benchmarking, which he describes as “Making meaningful comparisons to others, and identifying opportunities to improve.” Deciding on which peer groups against which to compare your performance, along with focusing on the drivers of that performance, are key elements of benchmarking. This process is one way, de Vroe says, to identify a current leader in a given field, zero in on any gaps between you and the market leader, and take action to eliminate the gap.

Although benchmarking has been around in healthcare for a while (for example, the federal government has been keeping track of total health care spending in the U.S. via the National Health Expenditure Accounts since 1960), the complexity with which insurers now deploy the concept when entering into shared-risk arrangements with providers is relatively new. Benchmarking has evolved into a solution for quantifying and addressing areas like cost, care quality, and other gaps on the individual physician, practice, hospital, or health system level.

Although much of this blog post is devoted to examining benchmarks assigned to large groups of physicians, hospitals, and health systems, on the individual practitioner level, benchmarks can act as guardrails that help clinicians monitor and improve key metrics in the clinical, operational, financial, and equity spaces, among others. Apart from keeping track of the day-to-day operations of clinicians, the term “benchmarking” is also flexible enough to describe efforts such as the Physician Practice Benchmark Survey, an ongoing initiative of the American Medical Association to track “the practice arrangements and payment methodologies of physicians who take care of patients for at least 20 hours per week and don’t work for the federal government.”

A Role for ACOs

Speaking of payment methodologies, benchmarking has become a key way insurers gauge the provision of value-based, cost-effective care with respect to payment arrangements like MIPS, advanced APMs, and a subset of APMs called accountable care organizations (ACOs). I’ve mentioned ACOs a few times in past blog posts (here, here, and here), but I’d like to elaborate on them a bit more here and examine how incentive structures and attendant bonuses and penalties are derived from contract period to the next.

The concept of the ACO in all its complexity deserves its own blog post, but for our purposes here, we can think of ACOs as “groups of doctors, hospitals, and other health care providers, who come together voluntarily to give coordinated high-quality care to their Medicare patients.” Best described as Medicare population-based payment models, or PBPs (I wrote about PBPs in a past blog post), ACOs incentivize providers to efficiently manage total cost of care, i.e. to provide the right care at the right time and avoid duplication of effort – a concept called “coordinated care.” Any cost savings realized while not sacrificing quality is passed along to the ACO.

It’s important to note that over the past decade or so, CMS has increasingly pushed affiliated physicians to enter into risk-sharing arrangements, with an ultimate goal of 100 percent of traditional Medicare (TM) beneficiaries being covered under an accountable care relationship by 2030. One of the main vehicles for delivering this result, if it is to be achieved, will be ACOs.

From the outset, however, ACOs have experienced mixed results, with “fierce debates over the ability of these organizations to meet their performance goals, as well as any unintended consequences that could adversely impact members of the health supply network.” Overall momentum in the adoption of APMs has slowed in recent years, and savings generated by ACO models haven’t kept pace with expectations. 

Still, ACOs and other APMs likely represent the best chance U.S. healthcare has of decoupling from the fee-for-service (FFS) payment model and becoming more efficient at delivering high-quality care. Physician-led ACOs are leading the way when it comes to delivering cost savings coupled with improved outcomes, often producing better results than hospital-led ACOs. A key factor in their success appears to be longevity of program participation mixed with both a nimbleness to learn how care partners achieve efficiency levels, and a willingness to implement these approaches.

So how do insurance plans structure arrangements such that they encourage providers to find new ways of providing increasingly better care while also not sacrificing quality? That’s where benchmarks come in.

Benchmarking ACOs

As population-based payment models like ACOs have matured, so too have their methods of motivating healthcare providers to provide top-notch services while also being mindful of costs. This concept is best encapsulated in the Institute for Healthcare Improvement’s Triple Aim, where a balance is struck between the patient experience of care, healthcare outcomes, and reducing per capita costs. One useful definition of benchmarks as they relate to ACOs has been provided by the National Association of ACOs (NAACO): 

“ACO performance is measured using a multi-step process that evaluates an ACO’s effectiveness in lowering expenditures for a group of assigned beneficiaries against a financial benchmark reflective to historical costs. Benchmarks are initially established for new ACOs, updated during agreement periods and reset or rebased when ACOs enter subsequent agreement periods.”

As things stand now, one issue that challenges broad uptake of the ACO model is this “rebasing” process. We’ll get more into rebasing a little further on, but for now, suffice it to say that rebasing occurs when benchmarks are adjusted as a result of an organization’s past success or failure at controlling spending. I’m using the term “success” loosely here, because as we’ll soon see, physicians who manage to lower costs while maintaining quality care are often “penalized” by having the bar for success raised ever higher, a phenomenon called “ratcheting”.

Although often a net positive in the short run because it helps lower program spending, in time this upward adjustment of standards can dampen plan or provider incentives to participate in an ACO as it becomes increasingly harder to identify new efficiencies. In consequence, new opportunities to save money dry up, leading to lower levels of shared savings. This vicious spiral often leads to plans or providers being less inclined to participate in ACOs over the long haul.

Despite this suboptimal playing field, CMS has persisted with the rebasing process. Reasons for this are varied, but the authors of a 2021 paper discussing the merits of the Medicare Advantage (MA) program versus PBPs explain it this way:

“Medicare can save money if the benchmark is set below what would otherwise have been spent, if Medicare keeps a large enough share of any savings, or if any efficiencies in care delivery spill over to populations outside the PBP model. Higher benchmarks induce plan or provider participation but increase program expenditures. Lower benchmarks may reduce available benefits in MA or reduce plan participation in MA or provider participation in voluntary ACO models.”

We’ll talk about spillover effects in a little bit, but suffice it to say that relying on the efficiencies you’ve realized in caring for ACO beneficiaries to spill over to non-ACO patients may not be the most optimal strategy in building an enduring framework for value-based care delivery.

This brings up an interesting point: hospitals and health systems administer care to patients belonging to a patchwork of different insurance carriers (and often to patients who have no coverage at all). So does that mean they cater to ACO beneficiaries differently than everyone else? And if so, how do they pay their contracted providers within the ACO versus non-ACO practitioners? The paper quoted above provides a useful explanation of how this works: 

“ACOs typically operate on budget-based versions of PBP, where FFS payment is used to pay all claims, but bonuses (or penalties) are paid to ACOs at the end of the year based on accrued FFS spending relative to a benchmark.” 

So if I’m understanding this correctly, providers who work with ACO beneficiaries are paid using a FFS model, and the rewards and deductions are handled separately. It’s an interesting idea, but isn’t one of the core purposes of ACOs to transition medicine away from the FFS model?

The short answer, as far as I can tell, is yes; however, until FFS no longer dominates the reimbursement landscape, hospital and health system executives and their insurer partners have to work within the system to effect change. To do this, many ACOs predicated on budget-based payment systems like bundled payment, capitation, and shared savings arrangements base bonus payments on projected FFS spending. This configuration abrogates the need for ACOs to contract with non-ACO providers while maintaining the overall value-based incentive structure.

External Empirical Benchmarking

As the above example illustrates, there are any number of approaches one can take in tracking an ACO’s performance over time. But even so, there are three forms of benchmarking that are commonly used: empirical benchmarks, bidding-based benchmarks, and administratively set benchmarks. Since empirical benchmarking currently dominates the APM landscape, let’s focus on it for the remainder of this post.

For PBP in Medicare, empirical benchmarking has proven attractive to many ACOs up to this point. It’s important to note that benchmarking for MA programs is slightly different than benchmarking for ACOs, with empirical benchmarks for MA programs called “external” benchmarks while those for ACOs are known as “circular” benchmarks. Despite these differences, however, there are a number of similarities in the ways MAs and ACOs chart performance and, as a result, they are often treated almost synonymously.

In both approaches, CMS seeks to save money by sharing in the savings when MAs or ACOs spend below their benchmarks, or by charging them when they overspend. As mentioned earlier, CMS also can benefit financially from changes in practice patterns brought about by the MAs or ACOs that spill over to other non-attributed patients. Although there have been some successes in using benchmarking, it has been argued that, particularly in the case of the Medicare Shared Savings Program (MSSP), which is a type of ACO, results have been skewed for a variety of reasons

Starting with MA benchmarking, an external sector must first be chosen that provides a status quo that risk-based contracts aim to beat. These benchmarks are set using “observed spending.” The “external sector” against which MA programs are measured has often been TM populations. In other words, they factor in a given entity’s county-specific benchmark for non-MA attributed Medicare patients. The benchmark is a multiple of the average spending in the TM sector for each county in a given plan’s service area, often with a slight discount built in. This approach is different from that taken by ACOs because in the MA configuration, a given MA’s historical spending patterns are not factored into the benchmark.

In the early days of benchmarking the MA program, the vast majority of Medicare patients didn’t fall under the auspices of an alternative payment model. For this reason, comparing their performance to TM beneficiaries provided ample opportunities to out-perform the benchmark. At that time, benchmarks were set at 95% of spending in the TM system. Despite the passage of time, and even though legislation has been instituted to bolster program participation, the tenant of basing benchmarks on TM has endured for both MA programs and ACOs.

This may prove to be a problem going forward, however, because when the TM population against which an MA is benchmarked shrinks too much, the benchmarks can fluctuate unpredictably, thereby invalidating a core purpose of these payment models which is to stabilize revenue. As a result, plans may find it too hard to endure such vicissitudes and choose to drop out of the program at the end of their contract period.

If, on the other hand, plans persist and remain a part of the payment model, this can become a problem not just for participating physicians but for patients, particularly if patients in TM prove less costly over a given time period. In this instance, lower benchmarks mean lower levels of benefits and higher premiums charged to plan beneficiaries. If costs pass a threshold beyond which patients cannot afford their care, this becomes an access to care issue. So as more MA programs enter the market, it becomes less useful to rely on external empirical benchmarks to chart performance.

Circular Empirical Benchmarking

Now let’s turn to empirical benchmarking with regards to ACOs, a practice that has been termed “circular” benchmarking. As mentioned above, like those of MA programs, ACO benchmarks are based partly on TM spending in the ACO’s service area. But unlike “external” MA benchmarks, circular ones also take into account an ACO’s historical spending, blending it with TM spending in a given market to arrive at a hybrid number. Adjustments in this kind of benchmark are informed by either projected or actual TM spending growth.

The component of the benchmark derived from the ACO’s historical spending patterns comes into focus when it’s time for a new contract period to begin. As the authors of the 2021 paper put it, “When an ACO transitions to a new contract period, the ACO-specific component of the benchmark is rebased such that the spending in the performance period of the first contract period contributes to the baseline of the next contract period. The regional component of the benchmark rises with regional TM spending and receives increasing weight over time, up to 50%.”

This creates a circular pattern in the sense that historical spending feeds directly into the calculation of subsequent years’ new benchmarks. So one major issue with circular benchmarking is the “ratchet” effect I touched on earlier. Another issue arises from the regional component in this scenario. Specifically, if one ACO (or a small group of ACOs) dominates a certain market, they can basically dictate where the benchmark is set, discouraging wider participation.

Aside from what’s been mentioned above, there are additional advantages to using empirical benchmarks, which include the following:

  • When ACOs stay in a given market instead of dropping out due to incurring too many penalties, the regional component of empirical benchmarks exposes ACOs to competitive pressures that force them to emulate the success of other ACOs in their market, which could improve overall savings (although there’s a fine balance, as mentioned above, and dominance by one ACO or a small cadre of ACOs can have undue influence on a given region).

  • Empirical benchmarks are flexible in the sense that they can be adjusted for forces that affect spending outside an insurer’s or provider’s control such as shifts in the economy, novel technologies coming online, and changes to care standards.

Despite these positive aspects, I have to wonder if empirical benchmarks will be around for the long haul. As I mentioned earlier, CMS has set a goal of 100 percent of TM beneficiaries being covered under an accountable care relationship by 2030. And with around 68% of Medicare beneficiaries who are enrolled in both Medicare Part A and B being currently enrolled in MA or attributed to an ACO or direct contracting entity, empirical benchmarks may cease to be useful in the near future as more Medicare patients are siphoned away from TM.

There’s a lot left to be said about benchmarking, including an examination of types of APMs and their varying levels of success with respect to using benchmarks. A particularly interesting example of this is the aforementioned MSSP, which has been called an “off ramp” to relying on FFS. But I’ll reserve evaluation of various benchmarking configurations for another time, and will continue learning all I can about this fascinating topic. 

Leave a Reply

Your email address will not be published. Required fields are marked *