Concept Page - Medicaid / Fraud Analysis Timeline

Buffaly Medicaid Analysis Timeline

This concept page reconstructs how Buffaly turned newly downloaded HHS Medicare / Medicaid provider-spending data into a multi-stage analysis workflow: segmenting claims by remote-care CPT families, reusing the existing RPM fraud-risk pipeline, extending that pipeline for CCM, enriching providers with NPI and geocode mapping, and then generating visualization-ready RPM, CCM, revenue, and state reimbursement outputs.

Primary RepoFeedingFrenzy
Core Source Filemedicaid-provider-spending.rpm-ccm-rtm-apcm.2024plus.csv
Programs SegmentedRPM, RTM, CCM, APCM
End StateFraud + revenue + state visualization payloads
Annotated project timeline

How the Medicaid analysis unfolded from source download to Buffaly demo outputs

1. Buffaly obtained the source Medicaid provider-spending file from HHS / CMS and staged it locally

What happened
The upstream source on disk is Z:\MedicareData\medicaid-provider-spending.csv\medicaid-provider-spending.csv, an 11,086,231,433-byte file, about 10.325 GB. The local copy I inspected contains 11,355 lines total, or about 11,354 data rows after the header.
Why it matters
This establishes that the project began with a real HHS / CMS Medicaid spending dataset downloaded to disk rather than with a synthetic demo file.

2. Buffaly generated shell-script tooling to preprocess the raw file

What happened
Inside Z:\MedicareData\medicaid-provider-spending.csv there are dedicated preprocessing scripts: extract_cpts.sh, extract_2024.sh, counts.sh, and count_npi.sh.
Why it matters
These scripts show the project started with Buffaly creating repeatable data-prep tooling before the fraud analysis pipeline was run.

3. Buffaly filtered the raw claims file down to the remote-care CPT families we cared about

What happened
The CPT extraction step selected RPM, CCM, RTM, and APCM billing codes from the full Medicaid spending file and produced medicaid-provider-spending.rpm-ccm-rtm-apcm.csv.
Why it matters
This is the real segmentation phase of the project: reducing a very large public claims file to the remote-care billing programs relevant to fraud-risk and opportunity analysis.

4. Buffaly quantified the segmented dataset before deeper analysis

What happened
The extracted remote-care file is 13.2 MB and contains 284,294 rows. Program counts in that file are CCM 203,075, RPM 79,595, and RTM 1,624.
Why it matters
This is the checkpoint where Buffaly validated that the filtered claims population was large enough and well-structured enough to support downstream fraud modeling.

5. Buffaly narrowed the remote-care claims to a 2024+ working subset

What happened
The extract_2024.sh step produced medicaid-provider-spending.rpm-ccm-rtm-apcm.2024plus.csv, a 2.8 MB working file used later in the pipeline.
Why it matters
That reduced the analysis to a current-timeframe slice suitable for fraud-risk exploration, market sizing, and demo-ready visualization.

6. Buffaly measured unique NPI coverage across programs

What happened
The NPI counting workflow showed unique billing-provider counts of 2,682 CCM, 1,223 RPM, and 59 RTM in the 2024+ subset.
Why it matters
This quantified the provider universe before enrichment and helped establish how broad each program’s footprint was.

7. Buffaly used an existing RPM fraud-risk pipeline as the first analysis engine

What happened
Once the remote-care subset was prepared, Buffaly applied the existing RPM fraud-risk pipeline to identify anomalous provider behavior in RPM billing patterns.
Why it matters
This let the project stand on a proven fraud-analysis foundation rather than inventing the full approach from scratch.

8. Buffaly grounded that RPM fraud logic in OIG-style statistical signals

What happened
The RPM fraud pipeline was framed around statistical risk signals informed by the 2024 OIG report on RPM fraud risk, translating public fraud patterns into provider-level scoring logic.
Why it matters
This made the RPM stage feel defensible and research-backed instead of merely heuristic.

9. Buffaly used an NPI lookup skill to enrich the RPM results

What happened
After fraud scoring, Buffaly enriched provider rows through NPI lookup so each suspicious or noteworthy record could be tied back to real provider identity data.
Why it matters
This is where abstract risk rows became intelligible provider entities that could be reviewed, grouped, and visualized.

10. Buffaly used a Google Maps skill to geolocate the enriched providers

What happened
A second enrichment phase geolocated providers using Google Maps and follow-up passes, converting provider identity data into place-aware data.
Why it matters
This turned the fraud analysis into a geography-aware story, which is critical for map-based demos and regional pattern analysis.

11. Buffaly moved the RPM analysis into visualization-ready outputs

What happened
With identity and geography attached, the RPM work could be rendered as provider maps, suspicious provider views, and reimbursement-oriented exploration artifacts rather than raw JSON alone.
Why it matters
This is the stage where the work became demoable to non-technical audiences: the system could show where providers were, how they clustered, and why they stood out.

12. Buffaly then generated a new CCM fraud-risk pipeline from the RPM foundation

What happened
After RPM, Buffaly created a CCM-specific fraud pipeline by extending the same general approach and deriving new CCM-oriented fraud signals from the earlier RPM analysis patterns.
Why it matters
This was the major expansion of the project: the system moved from a single-program fraud model into a multi-program fraud-analysis platform.

13. Buffaly carried the same enrichment and visualization pattern into CCM

What happened
The CCM pipeline followed the same high-level progression of provider scoring, NPI enrichment, geolocation, and visualization-ready output generation.
Why it matters
That consistency meant RPM and CCM could be compared side by side using the same conceptual demo flow.

14. Buffaly layered revenue-opportunity analysis on top of the fraud work

What happened
The project evolved beyond fraud-risk alone into reimbursement opportunity modeling, producing provider-level and summary outputs that highlighted missed or under-realized remote-care revenue opportunities.
Why it matters
This widened the story from pure compliance risk into business intelligence and growth opportunity.

15. Buffaly built state-level reimbursement visualizations across programs

What happened
State reimbursement viewer payloads were produced so the work could be explored geographically not just by provider, but by market and state-level reimbursement variation.
Why it matters
This gave the project a strategic planning layer on top of the provider-level fraud and opportunity views.

16. Supporting analysis documents captured the reasoning and formulas

What happened
Companion docs such as FairPath_RPM_CCM_Medicaid_Analysis_Calculations.md, FairPath_RPM_Revenue_Opportunity_Calculations.md, and FairPath_Medicaid_State_Reimbursement_Benchmark_Plan.md preserved the logic behind the calculations and outputs.
Why it matters
The project was not just coded; it was explained, documented, and prepared for communication.

17. Public storytelling followed the technical build

What happened
The FairPath article on HHS Medicaid remote-care billing and related fraud framing indicate the analysis matured into a market-facing story after the data engineering and modeling work was complete.
Why it matters
This is what makes the work a showcase: a chain from public data acquisition to analysis, enrichment, visualization, and public narrative.

18. The end state was a Buffaly-orchestrated healthcare intelligence demo

What happened
By the end of the reconstructed flow, Buffaly could demonstrate source ingestion, claims filtering, CPT segmentation, fraud scoring, provider enrichment, geolocation, revenue modeling, and state-level visualization across RPM and CCM.
Why it matters
That end-to-end orchestration story is the actual product showcase: Buffaly acting as an AI operator that turns public healthcare data into usable intelligence.