HeyGen is the most widely used AI avatar video platform for business content — training videos, product demos, personalized outreach, multilingual explainers. You give it a script, pick an avatar, and it produces a polished presenter video without any on-camera recording. This tutorial walks through the full workflow from a blank project to an exported video, with practical notes on what actually affects output quality.
Getting Started
Create an account at heygen.com. The free plan gives you 1 minute of video per month with access to Avatar III — enough to follow this tutorial. Paid plans start from $24/month (Creator) and unlock Avatar IV, longer videos, unlimited audio dubbing, and premium credits for advanced features.
From the dashboard, click Create Video → AI Avatar to open the main video editor. This is where you’ll spend most of your time.
Step 1: Choose or Create Your Avatar
Using a Stock Avatar
HeyGen has a library of pre-built AI avatars across different genders, ethnicities, ages, and visual styles. Browse them under the Avatar panel on the left side of the editor. For your first video, pick one that fits the tone you want — professional, friendly, technical — and move on. You can always swap avatars later without changing your script.
Creating a Custom Avatar
If you want an avatar that looks like you or a specific person, HeyGen can generate one from approximately 2 minutes of recorded footage. Record yourself speaking naturally in good lighting against a plain background. Upload the footage via Create Avatar → Instant Avatar. Processing takes a few minutes. The result is a realistic digital version of you that can read any script in your voice.
Quality tips for custom avatars:
- Record in natural daylight or with a soft key light — avoid harsh shadows on your face
- Wear clothes you’d actually present in — the avatar will replicate them
- Speak naturally and include some natural pauses — expressionless recordings produce stiffer avatars
- Keep the background clean and uncluttered
Step 2: Write and Format Your Script
HeyGen generates avatar speech directly from your script text. The quality of your script directly determines how natural the avatar sounds. A few principles that make a significant difference:
- Write for speaking, not reading. Short sentences, natural contractions, no complex nested clauses. Read it aloud before submitting — if it sounds unnatural spoken, the avatar will sound unnatural too.
- Use punctuation to control pacing. A period creates a brief pause. An em dash (—) creates a longer pause. Use these intentionally to give the avatar room to breathe between ideas.
- Keep individual slides to 60–90 words maximum. HeyGen works on a slide-by-slide basis. Shorter segments give you more control over pacing, timing, and visuals per section.
- Spell out numbers and abbreviations. “Dr.” should be “Doctor,” “$50K” should be “fifty thousand dollars” — the text-to-speech engine reads what’s written.
Step 3: Set Up the Voice
HeyGen offers three voice options:
- Pre-built AI voices: A library of realistic voices in multiple languages and accents. Fastest to use, solid quality for most use cases.
- Voice cloning: Upload a sample of your own voice (minimum 30 seconds of clean audio) and HeyGen generates a clone that reads your script in your vocal style. Available on paid plans. Best for consistent brand voice across multiple videos.
- Instant Voice Clone: A faster version of voice cloning that works with shorter samples but produces less accurate results. Good for testing before committing to a full clone.
For non-English content, HeyGen’s multilingual voice library covers 40+ languages with natural-sounding local accents — and following the February 2026 update, audio dubbing is now unlimited on paid plans. You can take an existing English video and produce dubbed versions in multiple languages without additional credit costs.
Step 4: Design the Layout
HeyGen’s editor lets you set a background, add text overlays, insert images, and position the avatar on screen. A few layout decisions that matter:
- Avatar position: Left or right side of frame, or full-screen. Full-screen works best when the avatar is the only focus. Left/right positioning works better when you have supporting visuals or text on screen simultaneously.
- Background: Use a solid color, a branded gradient, or upload a custom background image. Avoid busy backgrounds — they compete with the avatar and make text overlays harder to read.
- Text overlays: Keep them short (under 8 words), large enough to read at mobile size, and timed to appear when the avatar is referencing them in the script.
Adding B-Roll with Sora 2 or Veo 3.1
On premium plans, you can generate cinematic B-roll directly inside HeyGen using the Sora 2 or Veo 3.1 integrations. Write a short prompt describing the visual you want, generate it, and it drops directly into your timeline. This removes the need to switch between multiple platforms for avatar + footage hybrid videos.
Step 5: Preview, Adjust, and Export
Before generating the full video, use the Preview function on individual slides to check timing, voice naturalness, and avatar expression. This uses no credits and saves significant time versus generating the full video and discovering a pacing issue afterwards.
Common adjustments at preview stage:
- Slide feels rushed → add punctuation pauses or break the script across two slides
- Avatar expression feels flat → revise the script to be more conversational and include more natural emphasis
- Voice mispronounces a word → use the phonetic spelling field or rewrite the word so it reads as intended
When you’re satisfied, click Generate Video. Processing time varies from a few minutes to 15+ minutes depending on video length and selected avatar tier. The finished video exports as MP4, ready to upload directly to any platform.
Practical Limits to Know
- Free plan: 1 minute of video per month, Avatar III only, watermarked output
- Avatar IV produces noticeably more expressive, lifelike results than Avatar III — the upgrade is visible in longer takes where sustained naturalness matters
- High-volume personalized video (e.g. 1,000 outreach clips with unique names) requires the API or an enterprise plan — not practical on Creator tier
Conclusion
HeyGen’s learning curve is genuinely short — most users produce a usable first video within an hour of signing up. The biggest quality lever is script quality, not the platform settings. Write for spoken delivery, keep segments short, and let the avatar do the rest. Browse our full AI video tools directory to compare HeyGen with Synthesia, Runway, and every other tool in this space.