What Is Auto Face Tracking in Video Clips and Why It Matters

Face tracking is what makes horizontal-to-vertical conversion work properly. Without it, your subject constantly drifts out of frame.

Auto face tracking is a computer vision technique that identifies a human face in a video frame and follows it as it moves. In the context of video editing, it's used to keep the crop window centered on the speaker's face, even when the speaker shifts position, gestures, or turns their head.

For creators repurposing horizontal video content into vertical format, face tracking is the feature that separates professional-looking clips from awkwardly cropped ones.

How Face Tracking Works Technically

Modern face tracking uses neural networks trained on large datasets of human faces. The model analyzes each frame of the video and identifies the location, size, and orientation of any faces present. It then draws a bounding box around each detected face and outputs tracking data: for each frame, the position of the face in the coordinate system of the original image.

This tracking data is then used to dynamically position the crop window. If the face moves left, the crop moves left. If the speaker leans forward and their face gets larger in frame, the crop adjusts slightly to maintain a consistent head-to-frame ratio.

The result is that the speaker appears centered and consistently framed throughout the vertical clip, even if they were moving around in the original horizontal footage.

Why It Matters for Repurposed Content

When you film a YouTube video, you're framing for a 16:9 horizontal canvas. The speaker might be positioned left of center, move to stand up, or lean toward the camera. In the original video, the camera operator or editor has accommodated these movements. But when you crop that footage to 9:16, none of those horizontal movements were accounted for.

Without face tracking, a static crop window will frequently show the speaker off-center, partially out of frame, or looking at empty space instead of the camera. This looks bad and distracts viewers from the content.

With face tracking, the crop window follows the speaker dynamically. The speaker stays centered regardless of their movement. This works even for content with significant physical activity — a presenter who walks back and forth, or two speakers who shift which one is speaking.

When Face Tracking Is Most Important

Face tracking is most valuable when: the original video has subjects who move significantly within the frame, there are multiple subjects who alternate as the focal point, or the original framing was wider (subject is smaller in frame, leaving more room for movement).

For content where the speaker sits still in front of a static camera, a simple center crop may produce acceptable results without face tracking. But "acceptable" is still worse than properly tracked — even small drift accumulates over a 60-second clip.

Face Tracking in Editing Software

Several professional tools include face tracking for auto-reframe purposes: Adobe Premiere Pro's "Auto Reframe" feature uses Adobe Sensei AI to track subjects and adjust the crop. DaVinci Resolve has a face detection and tracking feature in the Color and Fusion pages. CapCut's mobile app includes a basic auto-reframe that follows the primary subject.

AI clip tools designed for repurposing — like Clipsy — include face tracking as a core part of the vertical formatting process. When you use Clipsy to generate clips from a YouTube video, the vertical crops are dynamically reframed to keep the speaker centered, without any manual tracking setup.

Limitations of Auto Face Tracking

Face tracking isn't perfect. Common failure modes include: faces that are partially obscured (by hair, hands, or objects), faces that turn significantly to the side so the tracking algorithm loses the facial landmarks, very rapid movement that produces jerky or lagging crop adjustments, and multiple faces where the system chooses the wrong subject as the primary focus.

In these cases, manual review and occasional keyframe corrections are needed. Most tools allow you to override the automatic tracking at specific frames. For the majority of standard talking-head and interview content, automatic tracking produces clean results without intervention.

Smooth vs. Jumpy Tracking

Good face tracking implementations smooth the crop movement — the window glides to follow the subject rather than jumping instantly to each new position. This smoothing prevents the jarring effect of the background appearing to jump between frames. If you're evaluating tracking quality in a tool, look for smooth, continuous crop movement as a sign of a well-implemented algorithm.

Try Clipsy Free