INC-2024-0847
ADF Pipeline Cascading Failure - March 2024
Claims Data Mart, Reporting Dashboard, Daily SLA Reports
Case Study Details
~2.3M claims records delayed 42 minutes. Downstream reports published late. Customer dashboard 1 hour stale. SLA impact: 0.8% of daily reliability window.
DPU allocation exhausted due to undocumented parallel activity. Resource contention between incremental load and ad-hoc reporting queries.
1. Scaled IR from 2 to 4 DPUs (15 min) 2. Paused non-critical reporting queries (3 min) 3. Increased ADF activity timeout from 120 to 180 min 4. Restarted pipeline with prioritization (18 min) 5. Validated data consistency post-recovery (6 min)
Implement usage tracking dashboard for ADF IR resources. Document capacity planning assumptions. Add runbook validation step for competing workloads before pipeline execution. Increase standard IR size from 2 to 3 DPUs.