A mysterious bug in the firmware of Google's Titan M chip (CVE-2019-9465)

29 Feb 2020

Starting with the release of the Pixel 3, all of Google’s Pixel Android smartphones come with the Titan M security chip on board. When I realized the Pixel 3a XL I purchased also had it, I decided to try to take advantage of it in an app I work on. It turned out that using the Titan M chip through the Android Keystore API for AES-GCM in a specific way lead to predictable and bogus ciphertext. This is the story of how I stumbled upon that bug, and why it’s a bit mysterious.

Discussed on HN and Reddit.

The Android Keystore

Let’s start with a short high-level primer on the Android Keystore and StrongBox before we jump into this. If you’re already familiar with these two concepts, you can skip ahead to Discovering the bug.

The Android Keystore is a system service that allows apps to securely generate and use cryptographic keys. While apps can use the keys they generated, they cannot extract them, as they never enter the process of the app. The idea behind this, is that even if the process of the app is compromised, the key is not.

The practical security of this system varies across devices, as the Android Keystore can have various kinds of “backings”. Some devices simply keep the whole system in Android itself, while other devices come with a TEE. Some even come with an entirely separate chip to back the Android Keystore with. The latter offers the best security, because even if an attacker is able to compromise the operating system, they would still need a vulnerability in the HSM to get to the keys in the Android Keystore. The fact that these chips are shipping in more and more smartphones these days is a pretty exciting development, if you ask me.

StrongBox and the Titan M chip

An Android Keystore implementation that falls under the latter category is referred to as a StrongBox Keymaster, and the Titan M chip is one of them. To take advantage of it in an Android app on Pixel devices, one has to indicate a preference for StrongBox during key generation. By simply calling setIsStrongBoxBacked(true) on the KeyGenerator instance, apps that use the Android Keystore get more secure storage for their cryptographic keys, with no apparent downsides. That’s what I thought, anyway.

Discovering the bug

I work on a 2FA app for Android, Aegis Authenticator. The app stores the OTP secrets in an encrypted file (referred to as the vault). The master key used to encrypt/decrypt the vault is managed by a key slot system that’s very similar to LUKS. You can learn more about it here. In short, Aegis has the concept of two types of key slots: for a password and for biometric authentication. The biometric key slot encrypts the master key with a key stored in the Android Keystore, so that the master key can be decrypted after successful biometric authentication by the user.

Here’s what the key generation code for that looks like in Aegis. Excerpt from KeyStoreHandle.java:

KeyGenerator generator = KeyGenerator.getInstance(KeyProperties.KEY_ALGORITHM_AES, STORE_NAME);
generator.init(new KeyGenParameterSpec.Builder(id, KeyProperties.PURPOSE_ENCRYPT | KeyProperties.PURPOSE_DECRYPT)
         .setBlockModes(KeyProperties.BLOCK_MODE_GCM)
         .setEncryptionPaddings(KeyProperties.ENCRYPTION_PADDING_NONE)
         .setUserAuthenticationRequired(true)
         .setRandomizedEncryptionRequired(true)
         .setKeySize(CryptoUtils.CRYPTO_AEAD_KEY_SIZE * 8)
         .build());

It generates a 256-bit key, to be used with AES in GCM mode, for encryption and decryption. User authentication is required for this key, so that it can only be used after biometric authentication by the user.

This works fine. Let’s switch this over to StrongBox.

diff --git a/app/src/main/java/com/beemdevelopment/aegis/crypto/KeyStoreHandle.java b/app/src/main/java/com/beemdevelopment/aegis/crypto/KeyStoreHandle.java
index cfa1e57..00cae52 100644
--- a/app/src/main/java/com/beemdevelopment/aegis/crypto/KeyStoreHandle.java
+++ b/app/src/main/java/com/beemdevelopment/aegis/crypto/KeyStoreHandle.java
@@ -57,6 +57,7 @@ public class KeyStoreHandle {
                     .setEncryptionPaddings(KeyProperties.ENCRYPTION_PADDING_NONE)
                     .setUserAuthenticationRequired(true)
                     .setRandomizedEncryptionRequired(true)
+                    .setIsStrongBoxBacked(true)
                     .setKeySize(CryptoUtils.CRYPTO_AEAD_KEY_SIZE * 8)
                     .build());

To test this change, I went through Aegis’ setup process, set a password and enabled biometric unlock. The main view opened and I could start adding tokens. So far so good. However, when I closed the app, reopened it and tried to unlock the vault with biometric authentication, the app showed an error dialog, caused by the following chain of exceptions:

com.beemdevelopment.aegis.db.slots.SlotIntegrityException: javax.crypto.AEADBadTagException
    at com.beemdevelopment.aegis.db.slots.Slot.getKey(Slot.java:57)
    ...
Caused by: javax.crypto.AEADBadTagException
    at android.security.keystore.AndroidKeyStoreCipherSpiBase.engineDoFinal(AndroidKeyStoreCipherSpiBase.java:517)
    at javax.crypto.Cipher.doFinal(Cipher.java:2055)
    ...
Caused by: android.security.KeyStoreException: Signature/MAC verification failed
    at android.security.KeyStore.getKeyStoreException(KeyStore.java:1292)
    at android.security.keystore.KeyStoreCryptoOperationChunkedStreamer.doFinal(KeyStoreCryptoOperationChunkedStreamer.java:224)
    at android.security.keystore.AndroidKeyStoreAuthenticatedAESCipherSpi$BufferAllOutputUntilDoFinalStreamer.doFinal(AndroidKeyStoreAuthenticatedAESCipherSpi.java:373)
    at android.security.keystore.AndroidKeyStoreCipherSpiBase.engineDoFinal(AndroidKeyStoreCipherSpiBase.java:506)
    at javax.crypto.Cipher.doFinal(Cipher.java:2055) 
    ...

Huh. Decryption of the key slot fails, because apparently verification of the MAC failed. How could that be? We only added a single line to enable StrongBox and left the rest of the code alone. I would have attached a debugger at this point to see what’s going on, but that’s not very useful here, as we can’t see what is going on in the HSM, where the cryptographic operation takes place.

After a couple of more tries, it turned out that the kind of exception I was getting was not consistent. If I waited a couple of seconds before authenticating after the biometric prompt was shown, I got an entirely different exception:

java.security.InvalidKeyException: Keystore operation failed
  at com.beemdevelopment.aegis.crypto.KeyStoreHandle.isKeyPermanentlyInvalidated(KeyStoreHandle.java:104)
  at com.beemdevelopment.aegis.crypto.KeyStoreHandle.getKey(KeyStoreHandle.java:83)
  at com.beemdevelopment.aegis.ui.AuthActivity.onCreate(AuthActivity.java:83)
  ...
Caused by: java.security.InvalidKeyException: Keystore operation failed
  at android.security.KeyStore.getInvalidKeyException(KeyStore.java:1362)
  at android.security.KeyStore.getInvalidKeyException(KeyStore.java:1402)
  at android.security.keystore.KeyStoreCryptoOperationUtils.getInvalidKeyExceptionForInit(KeyStoreCryptoOperationUtils.java:54)
  at android.security.keystore.KeyStoreCryptoOperationUtils.getExceptionForCipherInit(KeyStoreCryptoOperationUtils.java:89)
  at android.security.keystore.AndroidKeyStoreCipherSpiBase.ensureKeystoreOperationInitialized(AndroidKeyStoreCipherSpiBase.java:265)
  at android.security.keystore.AndroidKeyStoreCipherSpiBase.engineInit(AndroidKeyStoreCipherSpiBase.java:109)
  at javax.crypto.Cipher.tryTransformWithProvider(Cipher.java:2984)
  at javax.crypto.Cipher.tryCombinations(Cipher.java:2891)
  at javax.crypto.Cipher$SpiAndProviderUpdater.updateAndGetSpiAndProvider(Cipher.java:2796)
  at javax.crypto.Cipher.chooseProvider(Cipher.java:773)
  at javax.crypto.Cipher.init(Cipher.java:1143)
  at javax.crypto.Cipher.init(Cipher.java:1084)
  at com.beemdevelopment.aegis.crypto.KeyStoreHandle.isKeyPermanentlyInvalidated(KeyStoreHandle.java:96)
  ...
Caused by: android.security.KeyStoreException: Invalid key blob
  at android.security.KeyStore.getKeyStoreException(KeyStore.java:1292)
  ...

Digging deeper

At this point, I decided this issue may be worth reporting to Google through the Android Security Rewards Program. In an attempt to get a better grasp on what the issue actually is, I wrote a small configurable PoC app that demonstrates the issue to go along with the report. It is open source and available on GitHub. It has gone through a few iterations since the initial report to Google. The screenshots below show the latest version.

The app sits in a loop, repeatedly encrypts/decrypts some hard coded plaintext and prints the results to the log. As shown in the screenshots above, user authentication, the timeout between tries and some other things are configurable in the options dialog. While we can’t see what going on inside the HSM, this gives us a good look at the behavior we can see on the outside. Looking at the log, it turns out that this bug causes cryptographic operations to produce predictable and bogus ciphertext:

plaintext: 746869732069732061207465737420737472696e67
ciphertext: d62a2349d993632dddabc30a4a2c8ab7ba2608c5f2
tag: fd6b38cba35ea579918f3b5ec1863e4b
nonce: f102f60a0ef39e310c5f9c4c
decrypted: 746869732069732061207465737420737472696e67

plaintext: 746869732069732061207465737420737472696e67
ciphertext: 0dd0adde0dd0adde0dd0adde0dd0adde585301005d
tag: 995d100cc5d068a83e7ecf13c49e92eb
nonce: cffb9148cb5154232c957ab7

Note the ciphertext of the second iteration: 0dd0adde0dd0adde0dd0adde0dd0adde585301005d. That doesn’t look as random as one would expect. In fact, there’s even a sequence that repeats: 0dd0adde. Let’s try that again.

plaintext: 746869732069732061207465737420737472696e67
ciphertext: 789ae5602a997f5ce7768b08fe5db2d1d7139fad30
tag: 12e28ebf0f9492a76a453cd5d3bcf4ff
nonce: e4921e4825ea1ad01881ad82
decrypted: 746869732069732061207465737420737472696e67

plaintext: 746869732069732061207465737420737472696e67
ciphertext: 0dd0adde0dd0adde0dd0adde0dd0adde585301005d
tag: 74eaa7be302f1dce6d53ae5a58dc2408
nonce: a454451014a075e1883f4521

No matter how many times this is repeated, the resulting ciphertext is always the same (while the nonce is always different, even). Now it’s obvious why decryption failed in Aegis before. Trying to decrypt bogus ciphertext like that will certainly result in a MAC verification failure and bogus plaintext. The most dangerous thing about this, is the fact that it fails so spectacularly without throwing an exception. This also completely defeats one of the most important guarantees that symmetric ciphers provide: ciphertext that is indistinguishable from random noise.

Playing around with the app some more, I concluded that waiting ‘too long’ (around 2 seconds) between cipher initialization and using it to encrypt/decrypt something elicits this behavior. This appears to only be an issue with AES in GCM mode. I also tested the other supported ciphers and modes, but those seem to behave correctly.

Unfortunately, this is where my research grinded to a halt. To go further I would need access to the firmware that is running on the Titan M chip, to see what might be going wrong there. While Google has promised to open source the firmware, it has not delivered on that promise to this day. My best guess would be that this is a memory corruption issue, but that’s about as far as I’m going to get without more access. Google doesn’t seem to be willing to discuss the details of the issue either, as you’ll notice while reading the Timeline.

I also took a quick look at the Android source tree to see if I could find any places where AES-GCM was used in combination with StrongBox and the randomness of ciphertext was being relied upon. The latter is the case with some CSPRNG implementations, for example, though I can’t think of a reason why someone would use GCM mode for that purpose. That’s the only attack vector I could come up with, but perhaps someone more knowledgeable in Android’s internals can think of something else.

Timeline

In total, it took 6 months to get this fixed. Here’s the timeline:

2019-05-20 Report submitted.
2019-05-21 Google responds: We’re looking in to it.
2019-05-28 Google responds: Won’t Fix (Infeasible).

The team tells me there are some errors in my implementation that cause the exhibited behavior and that it’s not actually a bug in Android:
1. In the KeygenParameterSpec, setUserAuthenticationRequired(true) was set up, but KeygenParameterSpec was not set on how to authenticate such as: setUserAuthenticationValidityDurationSeconds(int) So the encryption was never authorized.
2. In addition, when instantiating the KeyGenParameterSpec, KeyProperties.PURPOSE_DECRYPT was not added, so decryption failed.
Neither of these suggestions make sense. To be fair, I’ve always found the Android Keystore key generation API to be a bit confusing, but it’s interesting to see members of the Android Security team struggling with it as well.
2019-05-28 I respond, suggesting that they take another look and actually try to run the POC app, instead of trying to find flaws in my use of the API.
2019-06-06 I ask for a status update.
2019-06-11 Google responds: We have no update at this time.
2019-06-14 Google responds: High severity.
2019-07-05 I ask for a status update.
2019-07-08 Google responds: We’re still looking into it.
2019-08-26 I ask for a status update.
2019-08-26 Google responds: apologizing for the delay, saying that the issue is taking longer to remediate than the usual 90 day window. Asks if I intend to disclose, and if so, whether I’d be willing to participate in coordinated disclosure.
2019-08-26 I respond, saying I don’t intend to disclose yet, as I don’t have a patch/solution to go along with a disclosure.
2019-08-28 Google responds, thanking me for the clarification and assuring me that they’ll keep me updated.
2019-10-08 I ask for a status update.
2019-10-14 Google responds: the issue has been fixed and they’re tracking the fix for release in December. They’ll provide details soon regarding CVE and reward eligibility. Also, the severity has been adjusted from High to Critical.
2019-10-15 I respond, expressing that I’m glad it’s been fixed.
2019-11-20 I ask for a status update.
2019-11-25 Google responds: We’d like to acknowledge your contribution publicly. CVE assigned: CVE-2020-0014.
2019-11-25 I respond: Thanks for the status update. I ask if the fix is still on track for release in December and whether this report is eligible for a reward. I tell them I plan on writing a blog post.
2019-11-26 Google responds: Thanks me for sharing my disclosure plans and reassures me that the fix is still on track for release in December and that the report is eligible for a reward.
2019-11-26 I respond: Thanks for the clarification and assuring them that I’ll let them review this blog post first before I publish it.
2019-11-27 Google responds. The CVE number that was given previously is incorrect. The correct one is: CVE-2019-9465.
2019-12-02 Google responds. The rewards committee decided to reward me for reporting this security issue.
2019-12-02 I respond, thanking them for the reward. I also ask for more details about the bug, as I have very little to go on the for the blog post I plan to write.
2019-12-02 Google responds, confirming my suspicion that the bug was in the Titan M firmware, but doesn’t provide any additional information.
2019-12-05 I respond, reporting my findings after installing the December 2019 security update: the bug is still present on my device.

2019-12-07 - 2019-12-24 Google responds, saying that they are unable to reproduce the issue. After that, there was a lot of back and forth to try to get logs from my device. It turned out that Android was not able to update the firmware of the Titan M chip on my device, possibly due to a bad state I had gotten it into while developing the PoC.

12-24 16:22:57.618  1087  1087 I init_citadel: Citadel version: 0.0.3/brick_v0.0.7574-5b47d37e 2019-06-25 19:20:19 gdk@chunky.cam.corp.google.com
12-24 16:22:57.666  1108  1108 I init_citadel: Citadel is running a known older firmware so sending update
12-24 16:22:57.778  1136  1136 I init_citadel: Citadel is C2-PVT, allowing RO updates
12-24 16:23:11.812  2808  2808 I init_citadel: Citadel update loaded
12-24 16:23:11.852  2810  2810 I init_citadel: Could not enable Citadel update: password required
12-24 16:23:12.009  2812  2812 I init_citadel: Citadel rebooted
12-24 16:24:19.263   584   584 I chatty  : uid=1064(hsm) /vendor/bin/hw/citadeld identical 5 lines
12-24 16:24:20.357   806   806 E /vendor/bin/hw/android.hardware.authsecret@1.0-service.citadel: Incorrect Citadel update password
12-24 16:24:29.466   825   825 I /vendor/bin/hw/android.hardware.oemlock@1.0-service.citadel: Running OemLock::setOemUnlockAllowedByCarrier: 1
12-24 16:24:29.473   584   584 I chatty  : uid=1064(hsm) /vendor/bin/hw/citadeld identical 1 line

2020-02-02 I ask for a status update.
2020-02-07 Google responds, recommending me to factory reset my device to resolve the state it is in, while they continue to investigate the issue.

2020-02-23 I respond, saying I factory reset my device and that the security issue is no longer present. I didn’t catch the update process itself, but the firmware version does seem to be newer:

02-23 16:38:11.981   921   921 I init_citadel: Checking citadel version
02-23 16:38:12.279  1027  1027 I init_citadel: Citadel version: 0.0.3/brick_v0.0.7580-2d3a8cfc 2019-09-16 20:42:26 gdk@chunky.cam.corp.google.com
02-23 16:38:12.333  1054  1054 I init_citadel: Citadel isn't running a known older firmware so not updating

A call with Google

After sharing the draft of this blog post with the team, I was asked if I would be willing to join a short call, as they wanted to “provide some details/context about the fix process for this issue prior to the publication” of my blog post. I agreed.

During the call they apologized for the long turnaround time and poor communication. I appreciate that, as that was indeed the main disappointment I had when submitting this as my first Android security issue report. I had to keep asking for status updates and felt like I was being kept out of the loop, while one would expect the team to share new information as it becomes available. The initial draft of the blog post had a fairly snarky comment about the fact that the severity rating changed from Won’t Fix, to High, to Critical. They explained that their guidelines for severity ratings changed while in the process of handling my report and they adjusted the severity rating based on those new guidelines. Fair enough, I removed that comment.

The overall impression I got is that they genuinely care about handling Android security reports well and regret that it didn’t go as well as they would have liked this time around.

Naturally, I also used the opportunity to ask if they could provide some more information about the nature of the actual bug and about the status of open sourcing the Titan M firmware, but they couldn’t comment on that.

Conclusion

I ended up performing a factory reset on my device a few days ago, which seems to have magically fixed the Titan M firmware update issue. The security issue itself also appears to be resolved. I would have loved to try to dig deeper into this, but as said, the source of the Titan M firmware is simply not available. Google appears to have added CTS tests for the exhibited behavior, but other than that, there is no trace of how this bug was fixed.

I also learned some important lessons. Upon initial disclosure to Google, I should have firmly set the standard 90-day deadline, so that it’s clear for all parties what the expectations are. Secondly, despite not setting a deadline upfront, I should have disclosed this publicly much earlier than I ended up doing. It just didn’t feel right to disclose a security issue that I didn’t have a fix or workaround to offer for, but I’ve since come to the realization that getting the information out there is more important.