Toward Understanding And Dealing With Failures In Cloud-Scale Systems