Introduction
In this post we are going to see how to solve the following error that we face while working with MySQL databases using AWS EFS service for storage:
ERROR 1030 (HY000) at line 1744: Got error 168 from storage engine
Starting situation
Our use case consisted of a Kubernetes EKS cluster, where we needed to create and destroy new environments for development as quickly as possible. Each of these environments had to contain several MySQL databases, in addition to other tools such as Redis, RabbitMQ, etc.
To do this we decided to use Helm Charts that raised these tools as StatefulSets. At first, we did not specify any StorageClass, so the EKS default was used, which raises EBS gp2 (General Purpose SSD) volumes when creating PersistentVolumeClaims.
The problem with using EBS volumes is that each volume is available in a specific Availability Zone, so if the Kubernetes cluster needs to move a pod from, for example MySQL, it will not be able to lift it unless it is available in another worker node that is in the same Availability Zone as that EBS volume.
For this reason, we decided to use a new StorageClass that uses EFS instead of EBS. In this way, by having the EFS mounted on each worker node in the Kubernetes cluster, the pods could move seamlessly between nodes, even if they were in different Availability Zones.
Error with EFS
Since our use case required creating new environments as quickly as possible, when building a new MySQL database, we also executed an import of data from sql files to have an initial data set by default.
It was at this moment that we began to detect the error, since when importing this initial data, we continuously began to see the following error in the logs:
ERROR 1030 (HY000) at line 1744: Got error 168 from storage engine ERROR 1030 (HY000) at line 1744: Got error 168 from storage engine ERROR 1030 (HY000) at line 1744: Got error 168 from storage engine ...
This error began to appear after a certain number of sql instructions executed, and from that moment on, it was repeated until the sql file was finished reading.
Resolution
Upon investigation, we discovered that the problem resided with an EFS service boundary. Specifically, it was the limit of 256 unique locked files. We can see this limit in the quotas described in the AWS documentation:
https://docs.aws.amazon.com/efs/latest/ug/limits.html#limits-client-specific
Because our data import was trying to create more than 256 tables, this limit was reached and the error began to appear. This limit cannot be modified, but we were able to avoid it by modifying the MySQL parameters to, as far as possible, not reach those 256 locked files.
The MySQL parameter to modify is innodb_file_per_table. In our MySQL databases, this parameter was activated by default. This causes an .idb data file to be created for each table in the database instead of creating a single data file. We can modify this parameter in the MySQL configuration file as follows:
[mysqld] innodb_file_per_table=OFF
https://dev.mysql.com/doc/refman/5.7/en/innodb-file-per-table-tablespaces.html
After disabling this parameter, we did not encounter the error again.
We want to thank the blog ops.tips for their work doing the post that we linked to, since it helped us a lot to understand this error, so that we could find this solution.
https://ops.tips/blog/limits-aws-efs-nfs-locks/
Conclusions
When working with MySQL databases using EFS for storage, we can have errors when reaching the limit of 256 locked files whenever we have schemes with a large number of tables or, ultimately, any system that requires simultaneously locking large amounts of files, such as it would be the case of MongoDB, Oracle, etc.
To avoid reaching this limit, we can disable the MySQL parameter innodb_file_per_table so that a data file is not created for each table.
I hope you’ve enjoyed this post and I encourage you to check our blog for other posts that you might find helpful. Do not hesitate to contact us if you would like us to help you on your projects.
See you on the next post!