Deep n-step advantage actor-critic algorithm