Improving Efficiency and Accuracy for Training and Inference of Hardware-Aware Machine Learning Systems