Podcast: Storage functionality in Kubernetes 1.31

In this podcast, we look at storage features expected in Kubernetes 1.31 with Sergey Pronin, group products manager at Percona, which develops open source products for SQL and NoSQL databases.

Pronin talks about storage functionality expected in this week’s 1.31 release, but also what he sees as some of the gaps in terms of storage for databases and more generally for storage in Kubernetes. He also discusses compliance and security gaps he thinks still need to be addressed in Kubernetes.

How would you sum up the forthcoming additions to Kubernetes that are of interest to people who deal with data storage?

Sergey Pronin: I don’t believe there are lots of storage-related improvements because the 1.31 release was heavily focused on a major removal of legacy code. It’s like 1.5 million lines of code removed from the core code base, but this code base was created mostly for legacy container storage interfaces (CSIs) created by various cloud providers, and then they moved to plugin structure. That was the main focus of this release.

There are some storage improvements. I believe the biggest one and the most interesting for me is volume attributes class. It allows users to modify existing volumes on-the-fly, like if you want to change the number of IOPS of the volume – you know how on Amazon you have EBS volumes and they have IOPS. Previously, to do it in Kubernetes, you would create a new storage class and then migrate your application to this new storage volume.

It was quite the process. For now, through Kubernetes, you can just change the IOPS for this specific volume and that’s it, but this feature was in alpha, and now in 1.31 it graduates to beta, so it’s getting closer to the GA or stable version.

That’s one major storage characteristic that’s changing. Are there any others in 1.31?

There are some additions to persistent volume status. In 1.31, there was a new “status last phase transition time” status added to persistent volumes.

This allows you to measure time between various statuses of the persistent volume. It can be in pending state, it can be in-bound, it can be in error, and so on. And now, as this last phase transition time status is added, it can be leveraged by various cluster administrators to measure various service level objectives and so on much easier.

Again, it’s not a huge improvement, but it’s definitely something the community was waiting for for quite some time. Especially cluster administrators, because persistent volumes are maturing in the Kubernetes environment, and something you would expect from day zero is not there. And now it’s added, so it’s a good thing.

Are there any other additions that you would group with these?

I have some, but they’re not really major and they’re not in GA, so I don’t think it’s worth mentioning those.

What do you think are the remaining challenges in Kubernetes for people who want to administer storage?

I think one of the issues I see lies in the realm of automated scaling and storage. Historically, Kubernetes was designed as a tool to remove toil from administrators, and for various compute resources like CPU or RAM, it is quite easy to implement automated scaling for those.

If you see that you reach a certain threshold, you can either add more nodes into the picture or you can perform vertical scaling by adding more CPU resources or RAM to the container.

But, for storage, it’s not really the case. Whereas, if you look at most of the cloud providers – I mean public cloud providers like Amazon RDS or Aurora [databases], for example – they have automated storage scaling from day zero, and it’s just super surprising for me that there is nothing like that in Kubernetes as of now.

There are some ad hoc solutions developed by companies, but they are either very limited or they are not maintained any longer. It’s more like, “Hey, I created a POC. Now, community, go figure it out!”

And for me as a developer of various [Kubernetes] Operators for databases, I definitely want to provide the same level of user experience to my users in Kubernetes, because sometimes they think, “OK, if I move from this nice Aurora from Amazon to Operators, what are the trade-offs I’m going to make?” This is one of those.

Are there any developments in Kubernetes that head towards this, or is there just nothing?

There are always some activities going on in various fields in Kubernetes, but unfortunately, there are just discussions as of now. I haven’t seen any single line of code created for that.

Also, I’m not 100% sure it should be driven by the Kubernetes community, or it can be something in the CNCF ecosystem, like the Keda project, for example.

Keda is Kubernetes Event-driven Autoscaling. The CNCF incubated it from a cloud-native incubator, and they do compute scaling quite successfully. So, I would think, why not add storage? We discussed it with them some time ago, but it didn’t move anywhere yet.

Are there any other major areas you think are yet to be solved in Kubernetes with regard to storage?

I think overall standardisation across how various Operators interact with storage would definitely help. But again, I don’t believe it’s something the Kubernetes community should be solving. It should be a wider community, involving various SIGs, because again, if I look at how various Kubernetes Operators or how various Kubernetes projects interact with storage, some of them use stateful sets, the majority of those, some of them create deployments and mount PVCs.

So, from a technical standpoint, it’s very different, and the reason for that is an underlying technology that this application’s power, like it can be some MySQL database or some MongoDB database, and for those you might want to play with storage a bit differently.

But the end result you should be getting is just stability. Your storage should be available all the time, your data should be consistent and you should be able to inspire confidence for the users that if you run something related to storage in Kubernetes, it’s just going to work. It’s not some voodoo magic for it.

Being in this field for quite some time, I still feel that we have not reached this point where companies, enterprises would be confident saying, “Oh, yes, running databases in Kubernetes is for us. We believe it’s the way forward.” There are still a lot of questions [like] how stable it is, how robust the solutions are and what are the trade-offs that they would be making?

Is it still considered that storage in Kubernetes is quite complex? Is that what you’re saying?

Yes, well, I would say that around three or four years ago, running databases in Kubernetes was a greenfield thing. Whereas some huge enterprise would be brave enough to run databases in K8s [Kubernetes], now we see that overall storage in Kubernetes, Operators and other tools in this CNCF ecosystem, they’re maturing to support storage in Kubernetes.

But that results into this fact that once enterprises start looking into data on Kubernetes, they want to apply the existing thought process [about] how databases should look. So, they run something on VMs today and they have LDAP integration, various encryption levels, standards and so on, and they are trying to project those to databases on Kubernetes, which are not there yet.

There is still some missing functionality which enterprises would believe, “Oh, that should be there on day zero. Why is it not there?” But we’re slowly getting there. We’re slowly catching up, and I believe that we covered the stability and performance aspects, so right now, I would not see any issues for anybody with SLAs or with uptime if they run their databases in K8s.

But for security and compliance, there are definitely some gaps, or maybe even some tricky features. I described this outer scaling thing. [It’s] still not there [and] someone would expect it to be there right away. So, yes, I think we’re getting better and better year-over-year, but there is still a lot of work to do.

What do you consider to be the gaps in terms of compliance and security?

Data-at-rest encryption for databases. For some, you would see it’s available. For some, like PostgreSQL, it’s still more like a desire. It’s not available in the community version of PostgreSQL. [It’s] just in some enterprise flavours like EnterpriseDB, for example. They have it. They forked it, and so on.

It’s similar for backups – how you address business continuity overall. Do you encrypt your backups, how they are stored and so on?

Most Operators already resolved that, but, for example, things like disaster recovery, where you want to run your database across different datacentres and have an automated failover within your SLAs, well, it’s not there yet.

Source