Interactive Avatars
Earna AI Console integrates HeyGen Streaming Avatar technology to provide real-time interactive avatars that can speak responses from GPT-4o and other AI models through WebRTC streaming.
Overview
The interactive avatar system provides:
- Real-time avatar rendering synchronized with AI responses
- WebRTC peer-to-peer streaming for low latency
- Multiple avatar options with different personalities
- Voice synthesis and lip-sync animation
- Session management and state handling
Architecture
System Architecture
Sequence Diagram
Data Flow
Setup
Get HeyGen API Keys
- Sign up at HeyGen PlatformÂ
- Navigate to API Settings
- Generate your API key and secret
Configure Environment
Add to your .env.local
:
# HeyGen Configuration
NEXT_PUBLIC_HEYGEN_API_KEY=your_api_key
HEYGEN_SECRET_KEY=your_secret_key
# Optional: Custom TURN server
TURN_SERVER_URL=turn:your-server.com:3478
TURN_USERNAME=username
TURN_PASSWORD=password
Install Dependencies
The HeyGen SDK is already included in the console:
{
"dependencies": {
"@heygen/streaming-avatar": "^2.0.16"
}
}
Enable Avatar Mode
The avatar button appears automatically in the chat interface when HeyGen is configured.
Implementation
API Routes
Access Token
Generate Access Token
// app/api/heygen/access-token/route.ts
import { SignJWT } from 'jose';
export async function POST(req: Request) {
try {
const apiKey = process.env.NEXT_PUBLIC_HEYGEN_API_KEY;
const secretKey = process.env.HEYGEN_SECRET_KEY;
if (!apiKey || !secretKey) {
return NextResponse.json(
{ error: 'HeyGen not configured' },
{ status: 500 }
);
}
// Create JWT token
const token = await new SignJWT({
api_key: apiKey,
exp: Math.floor(Date.now() / 1000) + 3600, // 1 hour
})
.setProtectedHeader({ alg: 'HS256' })
.setIssuedAt()
.sign(new TextEncoder().encode(secretKey));
return NextResponse.json({ token });
} catch (error) {
console.error('Token generation failed:', error);
return NextResponse.json(
{ error: 'Failed to generate token' },
{ status: 500 }
);
}
}
Client Components
Avatar Modal
Avatar Modal Component
// app/components/avatar/avatar-modal.tsx
import { useState, useEffect } from 'react';
import { StreamingAvatar } from '@heygen/streaming-avatar';
export function AvatarModal({ isOpen, onClose }) {
const [avatar, setAvatar] = useState<StreamingAvatar | null>(null);
const [isLoading, setIsLoading] = useState(false);
const [selectedAvatar, setSelectedAvatar] = useState('josh_lite3_20230714');
const initializeAvatar = async () => {
setIsLoading(true);
try {
// Get access token
const tokenResponse = await fetch('/api/heygen/access-token', {
method: 'POST',
});
const { token } = await tokenResponse.json();
// Initialize SDK
const newAvatar = new StreamingAvatar({ token });
// Set up event listeners
newAvatar.on('stream-ready', (stream) => {
const videoElement = document.getElementById('avatar-video') as HTMLVideoElement;
if (videoElement && stream) {
videoElement.srcObject = stream;
videoElement.play();
}
});
newAvatar.on('stream-disconnected', () => {
console.log('Stream disconnected');
});
setAvatar(newAvatar);
} catch (error) {
console.error('Failed to initialize avatar:', error);
} finally {
setIsLoading(false);
}
};
const startSession = async () => {
if (!avatar) return;
try {
await avatar.createStartAvatar({
avatarName: selectedAvatar,
quality: 'high',
voice: {
voiceId: 'en-US-BrianNeural',
rate: 1.0,
},
});
} catch (error) {
console.error('Failed to start session:', error);
}
};
const speak = async (text: string) => {
if (!avatar) return;
try {
await avatar.speak({
text,
taskType: 'talk',
});
} catch (error) {
console.error('Failed to speak:', error);
}
};
return (
<Dialog open={isOpen} onOpenChange={onClose}>
<DialogContent className="max-w-4xl">
<DialogHeader>
<DialogTitle>Interactive Avatar</DialogTitle>
</DialogHeader>
<div className="grid grid-cols-2 gap-4">
<div className="aspect-video bg-black rounded-lg overflow-hidden">
<video
id="avatar-video"
className="w-full h-full object-cover"
autoPlay
playsInline
/>
</div>
<div className="space-y-4">
<Select value={selectedAvatar} onValueChange={setSelectedAvatar}>
<SelectTrigger>
<SelectValue placeholder="Choose an avatar" />
</SelectTrigger>
<SelectContent>
<SelectItem value="josh_lite3_20230714">Josh</SelectItem>
<SelectItem value="anna_public">Anna</SelectItem>
<SelectItem value="wayne_lite">Wayne</SelectItem>
</SelectContent>
</Select>
<div className="flex gap-2">
<Button onClick={initializeAvatar} disabled={isLoading}>
Initialize
</Button>
<Button onClick={startSession} disabled={!avatar}>
Start Session
</Button>
</div>
<Textarea
placeholder="Enter text for avatar to speak..."
onKeyDown={(e) => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault();
speak(e.currentTarget.value);
e.currentTarget.value = '';
}
}}
/>
</div>
</div>
</DialogContent>
</Dialog>
);
}
Avatar Gallery
Default avatars available in the system:
Avatar ID | Name | Style | Voice Options |
---|---|---|---|
josh_lite3_20230714 | Josh | Professional | en-US-BrianNeural, en-US-GuyNeural |
anna_public | Anna | Friendly | en-US-JennyNeural, en-US-AriaNeural |
wayne_lite | Wayne | Casual | en-US-ChristopherNeural, en-US-EricNeural |
monica | Monica | Corporate | en-US-MichelleNeural, en-US-MonicaNeural |
kayla | Kayla | Energetic | en-US-SaraNeural, en-US-AmberNeural |
Voice Configuration
HeyGen supports Azure Neural voices. You can customize voice parameters for each avatar.
// Voice configuration options
interface VoiceConfig {
voiceId: string; // Azure voice ID
rate?: number; // Speed: 0.5 to 2.0 (default: 1.0)
pitch?: number; // Pitch: -50 to 50 (default: 0)
volume?: number; // Volume: 0 to 100 (default: 100)
emotion?: string; // Optional emotion style
}
// Example configurations
const voices = {
professional: {
voiceId: 'en-US-BrianNeural',
rate: 0.95,
pitch: -5,
},
friendly: {
voiceId: 'en-US-JennyNeural',
rate: 1.05,
pitch: 5,
},
authoritative: {
voiceId: 'en-US-GuyNeural',
rate: 0.9,
pitch: -10,
},
};
Integration with Chat
Connect the avatar to your chat system:
// app/components/chat/chat-with-avatar.tsx
export function ChatWithAvatar() {
const { messages, sendMessage } = useChat();
const { speak, isConnected } = useStreamingAvatarSession();
useEffect(() => {
// Make avatar speak GPT-4o responses
const lastMessage = messages[messages.length - 1];
if (lastMessage?.role === 'assistant' && isConnected) {
speak(lastMessage.content);
}
}, [messages, isConnected]);
return (
<div className="flex gap-4">
<div className="flex-1">
<ChatInterface />
</div>
<div className="w-96">
<AvatarModal />
</div>
</div>
);
}
Performance Optimization
Connection Quality
// Monitor WebRTC connection quality
async function monitorConnectionQuality(pc: RTCPeerConnection) {
const stats = await pc.getStats();
stats.forEach((report) => {
if (report.type === 'inbound-rtp' && report.mediaType === 'video') {
console.log({
packetsLost: report.packetsLost,
jitter: report.jitter,
frameRate: report.framesPerSecond,
bytesReceived: report.bytesReceived,
});
}
});
}
Bandwidth Management
// Adjust quality based on network conditions
function adjustQuality(bandwidth: number): 'low' | 'medium' | 'high' {
if (bandwidth < 500000) return 'low'; // < 500 Kbps
if (bandwidth < 1000000) return 'medium'; // < 1 Mbps
return 'high'; // >= 1 Mbps
}
Troubleshooting
Common Issues
Issue | Cause | Solution |
---|---|---|
Avatar not loading | Missing API key | Check HEYGEN_API_KEY in environment |
No video stream | WebRTC blocked | Check browser permissions and firewall |
Audio sync issues | Network latency | Use closer TURN server or reduce quality |
Session timeout | Token expired | Refresh token every hour |
Black video | CORS issues | Use the proxy route configuration |
Debug Mode
Enable debug logging:
// Enable HeyGen SDK debug mode
const avatar = new StreamingAvatar({
token,
debug: true,
logLevel: 'verbose',
});
// Log WebRTC stats
setInterval(async () => {
const stats = await avatar.getStats();
console.log('Avatar stats:', stats);
}, 5000);
Security Considerations
Never expose your HeyGen secret key in client-side code. Always generate access tokens server-side.
- Token Generation: Always generate tokens server-side with expiration
- CORS Policy: Configure proper CORS headers for WebRTC
- Rate Limiting: Implement rate limiting on session creation
- Session Management: Clean up abandoned sessions
- Content Filtering: Validate text before sending to avatar
Cost Optimization
- Session Duration: Close sessions when not in use
- Quality Settings: Use appropriate quality for use case
- Caching: Cache avatar list to reduce API calls
- Pooling: Reuse sessions when possible
Next Steps
- Voice Mode - Add real-time voice conversations with GPT-4o
- API Reference - Complete API documentation
- Supabase Setup - Database configuration
Last updated on