We have completed our review of today's service interruption on the Style API and wanted to provide a more complete update on our next actions and recommended next actions for customers.
From about 00:00 to about 06:00 UTC some responses for styles document requests contained an invalid URL in the sprite
property. The affected styles were Dark, Emerald, Light, Outdoors, Satellite Hybrid, Satellite Streets, and Streets. Custom Mapbox Studio styles were not impacted by this issue.
The invalid sprite URL in the response interacted badly with our native GL rendering library, a core component of our Android and iOS SDKs, triggering host applications to crash in a variety of ways.
Thanks to helpful reports from customers, we were able to quickly identify and fix the root problem on our API, which was caused by a bug in the code that handles loading of certain high-traffic styles (the ones listed above), which are managed on a slightly different code path than custom styles. Specifically, the bug was introduced by code that mutated the styles' sprite
property on certain requests and unintentionally persisted the mutated object to a shared object cache under certain conditions. From there, the mutation slowly accumulated in the cache and eventually resulted in responses containing an invalid sprite URL. We fixed the problem, cleared our global caches, and began serving valid responses approximately 15 minutes after discovering the issue.
However, the issue was exacerbated by caching behavior in our mobile SDKs. To ensure performance in offline/semi-offline conditions, our mobile SDKs employ a cache strategy that permits the use of stale resources while refreshing the content from the server. This behavior lets your application display maps quickly when network connectivity is limited but still fetch up-to-date content from the server in the background. Unfortunately, this functionality introduces a race condition where the SDK loads the invalid style object from its cache and crashes before it is able to initiate the refresh routine, leaving some applications in an unrecoverable state for users that attempted to load a map with an empty cache during the incident. The caching strategy has been in place since the first version of our mobile SDK, thus all SDK versions are affected.
We understand the gravity of this service interruption and the lingering impact, and are taking the following actions to remedy the situation and prevent similar interruptions in the future: