Introduction

It all started with a simple message: “Hello Geko, we are receiving a DB connection timeout“. It took less than 2 minutes to discover what was going on: The database was deleted. Lots of facepalms after, the main task was to recover the data (and also the service). Fortunately, Google enforces the backup policies to always copy the database once per day. At Geko, we all agree with this policy. So the thing is we went to look for the backups and the nightmare began. There were no backups! There was nothing! And then we found out backups are strongly bounded to the database resource, so if the database gets deleted the backups also do. Google states it so clear on their docs for Cloud SQL.

Warning: All data on an instance, including backups, is permanently lost when that instance is deleted. To preserve your data, export it to Cloud Storage before you delete it. The Cloud SQL Admin role includes the permission to delete the instance. To prevent accidental deletion, grant this role only as needed.

We are not going to go deeper on this post about how the database got deleted. Just tell it was an automated process which detected a disk size increase and when trying to go back to a previous, smaller size an entire database replacement was required. There was no opportunity to accept it nor stop it, so the situation was what it was:

What we did to resolve the situation

Even when Google was telling us the backups were deleted when the database also was, and all the facts were pointing to this hypothesis, we were still stubborn and we didn’t give up. The GCP web UI gave us no chances to recover, so we decided to continue digging by using the CLI (gcloud). And this was finally the key to our success!

We had the theory backups should be still somewhere even when the docs say they shouldn’t, so we checked different kinds of storage places until checking the SQL backups section. It comes that we were quick enough when checked this section, and we also knew the name of the deleted database, so we were able to run the next command.

$ gcloud sql backups list --instance=deleted-db-name --project our-project
ID             WINDOW_START_TIME              ERROR  STATUS
1614876500000  2021-03-04T04:00:00.000+00:00  -      SUCCESSFUL
1614765400000  2021-03-03T04:00:00.000+00:00  -      SUCCESSFUL
1614654300000  2021-03-02T04:00:00.000+00:00  -      SUCCESSFUL
1614543200000  2021-03-01T04:00:00.000+00:00  -      SUCCESSFUL
1614432100000  2021-02-28T04:00:00.000+00:00  -      SUCCESSFUL
1614321000000  2021-02-27T04:00:00.000+00:00  -      SUCCESSFUL
1614210000000  2021-02-26T04:00:00.000+00:00  -      SUCCESSFUL

And there they were still! After catching breath and getting our smiles back, the recovering process started. We still were not fully trusting this was going to work since maybe the backup list was there while no longer the data behind, but we had to give it a chance. We moved fast as we knew time was playing against us, so we immediately created a new database from scratch and just after we started the restoring process.

$ gcloud sql backups restore 1614876500000 --restore-instance=new-db-from-scratch-name --project our-project --backup-instance=deleted-db-name
All current data on the instance will be lost when the backup is 
restored.

Do you want to continue (Y/n)?  

Restoring Cloud SQL instance...done.                                                                                                                                                                                                        
Restored [https://sqladmin.googleapis.com/sql/v1beta4/projects/our-project/instances/new-db-from-scratch-name]

Finally, even when having a positive feedback from GCP, we were still not fully believing this had worked. We needed to verify all the data was there, so we did. Fortunately again, everything was recovered. Our next and final step was to perform a SQL-dump in order to ensure we had a recent copy at other location.

Conclusion

Even after deeply searching on Google and not finding any helping results — as all of them say there’s nothing you can do — our knowledge and passion kept us continue digging on the topic until we found a way. We know we were lucky about finding the backups still there, but we know we also were quick, methodical and obstinate when detecting and looking around to fix it. On the other hand, we have learnt we cannot trust database backups performed by the provider, so we’re currently working on procedures to have backups on more places. We have pretty clear this is the first and the last time we are handling this topic.

Moreover, and as opposite to AWS, as we previously mentioned GCP strongly links the database resource to its backups. This has shown up it could be a huge problem when dealing with accidental deletion. Additionally, it’s not possible to copy SQL backups to any other kind of storage. Nevertheless, there are custom solutions consisting on regularly dumping the database and then store it on a bucket, but it’s not something official.

On the other hand, we strongly advise to be careful when having automated processes playing around. As we have seen, the only way to protect a GCP database against termination is to limit the permissions. So the way to proceed is to remove the DELETE permission from the (service) accounts the automated processes use.

Joke about GCP. Geko as a strong dog says: I accidentally deleted my DB. I need a backup! ― GCP as a weak dog answers: I deleted your backups when removing the DB. It was not was you was looking for? ― And finally Geko dog replies: Ok... let me find them for you and fix all this mess!

Thankfully, you can always count on Geko team —a high-skilled engineering team— who will dig on the topic until getting it easy and solved for you. Don’t forget to come back to the Geko’s blog and check out what’s new in here!

The Geko team will be always glad to see you back, and also you should contact us for further information!

Leave a Reply

Your email address will not be published. Required fields are marked *