Efficient Algorithms, Hardware Architectures and Circuits for Deep Learning Accelerators