Fun With Kernel Cipher Hacking
Fun With Kernel Cipher Hacking⌗
This is probably not practically useful but it’s a neat proof of concept solution to a real-life problem! We run a Ceph cluster for the majority of our data and we’re looking into using a managed backup-as-a-service thingy, which is fine, but we have requirements that the backup service isn’t allowed to see or store any of our plaintext data. Totally reasonable but a little hard to swing on Ceph — although the backing disks are encrypted the actual disk images hosted there aren’t.
A normal solution would probably look like grabbing chunks of each image on a separate server, encrypting them, and then giving those to the backup agent to store. Very much like what OVH does. But the reason OVH went this route was because of limitations of Duplicity, and the backup agent we’re using handles huge block devices just fine. So we’re going through this just to encrypt some data in-flight and then we have to do some awkward reassembly dance when we restore. If only we were trying to decrypt the data it would be no issue. We would just use dm-crypt
and the kernel would decrypt on the fly for us and the backup agent would be none the wiser.
Can we run this process backwards?
Shout out to Arno Wagner for this thread posted in 2013. Assuming your problem is still unimplemented 7 years later I got you sorted! After a lot of back-and-forth we finally get to a potential way of accomplishing this.
I think that quick hack to try it would be to write simple kernel cipher module (or wrapper), where you only change cipher name (so it will not mix up with normal implementation, name like reverse_aes or so) and just switch encrypt/decrypt callbacks.
I am afraid you will need to avoid LUKS and IV where encryption is used (ESSIV) (or at least you must analyze if encrypt/decrypt change for the given cipher is safe for use there).
I’m going to describe how to implement what the poster suggests. At the end I will have a block cipher revaes
and I will use it like cryptsetup plainOpen --cipher revaes /dev/sdb1 sdb1_enc
. Can any Linux kernel or crypto enthusiasts call in advance why this won’t work? And if you get that, can you call in advance how to use an existing cipher to do what I want?
Let’s start with a bare-bones Makefile
to actually build the module.
Wonderful! So all we need to do is write a file revaes.c
with the module and we’re done. Let’s poke around the kernel source and see if there’s anything we can use as a guide. Looks like [aes_ti.c](https://github.com/torvalds/linux/blob/b25c6644bfd3affd7d0127ce95c5c96c155a7515/crypto/aes_ti.c)
is short, sweet, and has all the bits I need. Let’s just touch it up a bit.
Sweet! Let’s build it, load it, and try using it!
Welp. What went wrong?
This is the point where I banged my head for a while and where the crypto nerds should be frustratedly shouting at the screen like when Dora should clearly be able to see that the largo plank fits in the hole in the bridge.
Let’s talk about cipher modes or chainmodes as the kernel calls them. If you look in the kernel docs you’ll see the very underdescribed dm-crypt
cipher specification.
cipher[:keycount]-chainmode-ivmode[:ivopts]
or
capi:cipher_api_spec-ivmode[:ivopts]
Zero idea what these are. And outside of the examples it’s not even clear what values are permissible in each of the slots. Is is clear I probably shouldn’t even be touching crypto code? But where’s the fun in that?! How else does one learn? Luckily Wikipedia has my back with a fantastic article on the subject.
Here’s the short explanation. Naively when you have a block cipher and you want to encrypt lots of data you can just chunk the data into blocks, encrypt each one, and call it a day. But if you have two identical blocks in this system they encrypt to the same thing which reveals information about the plaintext. So the idea was generalized into cipher modes and initialization vectors (IV). The cipher mode is a function that takes a block cipher and an optional alg-specific argument (the IV) and produces a function that can encrypt more than one block.
Here is the relevant diagram for CBC, the default cipher mode on my system.
So just because I reversed the block cipher (aes
) doesn’t mean I reversed the cipher mode. If we look at the bottom diagram and replace decryption with encryption it doesn’t suddenly become the encryption diagram which is what we wanted. In fact, if we were to reverse the cipher mode we don’t even need to reverse the block cipher! Well that was a waste. Are there by chance any cipher modes that are actually reversible by only reversing the block cipher? YES! The naive alg which is called ECB. It looks like it would be a terrible idea to actually use this in prod but does it work though?
Neat! Now let’s create a revcbc
from [cbc.c](https://github.com/torvalds/linux/blob/b25c6644bfd3affd7d0127ce95c5c96c155a7515/crypto/cbc.c)
and try that.
Add revcbc.o
to our Makefile
and let’s try again.
Now we actually have something workable! Let me just quickly deploy this to prod real quick.
Why All This Was Unnecessary⌗
Well it turns out that there are these neat little things called steam ciphers. Rather than encrypting the plaintext with the block cipher they encrypt blocks of a stream of pseudorandom numbers and XOR them with the plaintext. Technically speaking dm-crypt
doesn’t actually support any stream ciphers but they do support two cipher modes OFB
and CTR
which make a stream cipher out of any block cipher. Steam ciphers are perfect for this application because encryption == decryption
. If you take some data and XOR it with the same thing twice you get your original data back!
Here’s the diagram for OFB
.
Final Thoughts⌗
This was a lot of fun! It really scratches that itch of making something a little silly work. I can actually imagine the latter solution being used in prod but I doubt my hacked-together kernel module will make the cut.