Invalid Style API Responses
Incident Report for Mapbox
Postmortem

We have completed our review of today's service interruption on the Style API and wanted to provide a more complete update on our next actions and recommended next actions for customers.

From about 00:00 to about 06:00 UTC some responses for styles document requests contained an invalid URL in the sprite property. The affected styles were Dark, Emerald, Light, Outdoors, Satellite Hybrid, Satellite Streets, and Streets. Custom Mapbox Studio styles were not impacted by this issue.

The invalid sprite URL in the response interacted badly with our native GL rendering library, a core component of our Android and iOS SDKs, triggering host applications to crash in a variety of ways.

Thanks to helpful reports from customers, we were able to quickly identify and fix the root problem on our API, which was caused by a bug in the code that handles loading of certain high-traffic styles (the ones listed above), which are managed on a slightly different code path than custom styles. Specifically, the bug was introduced by code that mutated the styles' sprite property on certain requests and unintentionally persisted the mutated object to a shared object cache under certain conditions. From there, the mutation slowly accumulated in the cache and eventually resulted in responses containing an invalid sprite URL. We fixed the problem, cleared our global caches, and began serving valid responses approximately 15 minutes after discovering the issue.

However, the issue was exacerbated by caching behavior in our mobile SDKs. To ensure performance in offline/semi-offline conditions, our mobile SDKs employ a cache strategy that permits the use of stale resources while refreshing the content from the server. This behavior lets your application display maps quickly when network connectivity is limited but still fetch up-to-date content from the server in the background. Unfortunately, this functionality introduces a race condition where the SDK loads the invalid style object from its cache and crashes before it is able to initiate the refresh routine, leaving some applications in an unrecoverable state for users that attempted to load a map with an empty cache during the incident. The caching strategy has been in place since the first version of our mobile SDK, thus all SDK versions are affected.

Mapbox Actions

We understand the gravity of this service interruption and the lingering impact, and are taking the following actions to remedy the situation and prevent similar interruptions in the future:

  • We are preparing emergency releases of our mobile SDKs - Android SDK v4.1.1 and iOS SDK v3.3.1 - that fix the immediate issue. These releases will be available within 24 hours.
  • We are working to make the input validation routines in our core GL rendering library more robust to prevent crashes and ensure stability when a remote service returns an unexpected response.
  • We are conducting a careful review of the API code path where the bug was introduced. We are also examining our testing and pre-release automation to determine why this issue was not detected before it hit production.

Recommended Customer Actions

  • Upgrade to iOS SDK v3.3.1 or greater or Mapbox Android SDK v4.1.1 or greater
  • Change the style that your application uses, which will invalidate the cache
  • If you are unable to upgrade your application and a user of your application contacts you about this issue, please have them to clear their application cache (Android) or reinstall your application (iOS).
Posted Jul 19, 2016 - 19:31 UTC

Resolved
This incident has been resolved.
Posted Jul 19, 2016 - 10:46 UTC
Monitoring
We deployed a fix for the bug that caused the Styles API to return invalid values for the `sprite` property. We forced cleared our global caches. We are continuing to monitor the situation.
Posted Jul 19, 2016 - 10:29 UTC
Identified
We have identified a bug introduced in a recent deploy that causes the Style API to return an invalid value for the `sprite` property. The bug is triggered after using Studio to load or manipulate a style and then impacts all style loading via the API. We are deploying a fix for the issue now.
Posted Jul 19, 2016 - 09:47 UTC
Investigating
We are investigating invalid responses from the Style API.
Posted Jul 19, 2016 - 09:39 UTC