This paper addresses the problem of controlling the orientation of a 3-axis gimbal that is carrying a cinematography camera, using image measurements for feedback. The control objective is to keep a moving target of interest at the center of the image plane. A Region-of-Interest (ROI) that encloses the target’s image is generated through the combination of a visual object detector and a visual object tracker based on Convolutional Neural Networks. These are specially tailored to allow for high frame rate performance with restricted computational power. Given the target’s ROI, an attitude error in the form of a rotation matrix is computed and a attitude controller is designed, which guarantees convergence of the target’s image to the center of the image plane. Experimental results with a human face as the target of interest are presented to illustrate the performance of the proposed scheme.