Media ProductionAdvanced
Audio Transcription Automation
Automate audio/video transcription, meeting notes, subtitle generation, and content processing
#transcription#audio#video#meetings#subtitles
CLAUDE.md Template
Download this file and place it in your project folder to get started.
# Transcription Automation
Comprehensive workflow for automating audio/video transcription and content processing.
## Core Workflows
### 1. Transcription Pipeline
```
TRANSCRIPTION FLOW:
┌─────────────────┐
│ Audio/Video │
│ Input │
└────────┬────────┘
▼
┌─────────────────┐
│ Pre-Processing │
│ - Convert │
│ - Enhance │
│ - Split │
└────────┬────────┘
▼
┌─────────────────┐
│ Transcription │
│ - STT Engine │
│ - Diarization │
└────────┬────────┘
▼
┌─────────────────┐
│ Post-Processing │
│ - Format │
│ - Timestamps │
│ - Speakers │
└────────┬────────┘
▼
┌─────────────────┐
│ Output │
│ - Text/SRT/VTT │
│ - Summary │
└─────────────────┘
```
### 2. Transcription Configuration
```yaml
transcription_config:
engine: whisper # whisper, assembly_ai, deepgram
audio_settings:
sample_rate: 16000
channels: mono
format: wav
transcription:
language: auto # or specific: en, zh, es
model: large # tiny, base, small, medium, large
task: transcribe # transcribe or translate
features:
speaker_diarization: true
word_timestamps: true
punctuation: true
profanity_filter: false
output:
formats:
- txt
- srt
- vtt
- json
include_confidence: true
include_timestamps: true
```
## Meeting Transcription
### Meeting Notes Template
```yaml
meeting_transcript:
metadata:
title: "{{meeting_title}}"
date: "{{date}}"
duration: "{{duration}}"
attendees: "{{speakers}}"
output_template: |
# {{title}}
**Date:** {{date}}
**Duration:** {{duration}}
**Attendees:** {{attendees}}
## Summary
{{ai_summary}}
## Key Points
{{#each key_points}}
- {{this}}
{{/each}}
## Action Items
{{#each action_items}}
- [ ] {{task}} - @{{assignee}} - Due: {{due_date}}
{{/each}}
## Full Transcript
{{#each segments}}
**[{{timestamp}}] {{speaker}}:** {{text}}
{{/each}}
```
### Speaker Diarization
```yaml
diarization_config:
min_speakers: 2
max_speakers: 10
speaker_labels:
- name: "Speaker 1"
voice_sample: "sample_1.wav" # Optional
- name: "Speaker 2"
voice_sample: "sample_2.wav"
output_format:
speaker_prefix: true
speaker_timestamps: true
example_output: |
[00:00:05] SPEAKER_1: Welcome everyone to today's meeting.
[00:00:12] SPEAKER_2: Thanks for having us.
[00:00:18] SPEAKER_1: Let's start with the agenda.
```
## Subtitle Generation
### SRT Format
```yaml
subtitle_config:
format: srt
timing:
max_duration: 7 # seconds per subtitle
min_gap: 0.1 # seconds between subtitles
chars_per_line: 42
max_lines: 2
style:
case: sentence # sentence, upper, lower
numbers: words # words, digits
example_output: |
1
00:00:05,000 --> 00:00:08,500
Welcome to today's presentation
about transcription automation.
2
00:00:09,000 --> 00:00:12,000
Let me start by explaining
the basic concepts.
```
### VTT Format
```yaml
vtt_config:
format: vtt
features:
cue_settings: true
styling: true
example_output: |
WEBVTT
00:00:05.000 --> 00:00:08.500 align:center
Welcome to today's presentation
about transcription automation.
00:00:09.000 --> 00:00:12.000 align:center
<v Speaker 1>Let me start by explaining
the basic concepts.
```
## Integration Workflows
### Zoom Integration
```yaml
zoom_transcription:
trigger:
event: recording_completed
workflow:
- step: download_recording
source: zoom_cloud
- step: transcribe
engine: whisper
language: auto
- step: diarize
identify_speakers: true
- step: generate_notes
template: meeting_notes
include_summary: true
extract_action_items: true
- step: distribute
destinations:
- notion_page
- slack_channel
- email_attendees
```
### YouTube Integration
```yaml
youtube_subtitles:
trigger:
event: video_uploaded
workflow:
- step: download_audio
source: youtube_video
- step: transcribe
engine: whisper
task: transcribe
- step: generate_subtitles
formats: [srt, vtt]
- step: translate
target_languages: [es, zh, ja, de, fr]
- step: upload_subtitles
destination: youtube
as_cc: true
```
### Podcast Processing
```yaml
podcast_workflow:
input:
source: rss_feed
format: audio/mp3
processing:
- transcribe:
engine: whisper
model: large
- generate_chapters:
detect_topics: true
min_duration: 60 # seconds
- create_show_notes:
summarize: true
extract_links: true
highlight_quotes: true
- create_searchable_index:
full_text: true
timestamps: true
output:
- transcript_txt
- chapters_json
- show_notes_md
- search_index
```
## Language Support
### Multi-Language Transcription
```yaml
multilingual:
auto_detect: true
supported_languages:
- code: en
name: English
model: large
- code: zh
name: Chinese
model: large
- code: es
name: Spanish
model: large
- code: ja
name: Japanese
model: medium
translation:
enabled: true
target: en
preserve_original: true
```
### Code-Switching
```yaml
code_switching:
enabled: true
primary_language: en
secondary_languages: [zh, es]
output: |
[00:01:23] The next topic is about 人工智能,
which has been muy importante in recent years.
handling:
detect_language_per_segment: true
tag_language_switches: true
```
## Quality Enhancement
### Post-Processing
```yaml
post_processing:
text_cleanup:
- remove_filler_words: ["um", "uh", "like"]
- fix_common_errors: true
- normalize_numbers: true
formatting:
- add_punctuation: true
- capitalize_sentences: true
- paragraph_breaks: true
speaker_attribution:
- merge_short_segments: true
- min_segment_duration: 1.0
output_enhancement:
- add_timestamps: true
- highlight_keywords: true
- generate_summary: true
```
### Accuracy Metrics
```
TRANSCRIPTION QUALITY REPORT
═══════════════════════════════════════
File: meeting_2024_01_15.mp3
Duration: 45:32
Engine: Whisper Large
METRICS:
Word Error Rate (WER): 4.2%
Character Error Rate: 2.8%
Confidence Score: 0.94
SPEAKER DIARIZATION:
Speakers Detected: 4
Diarization Accuracy: 91%
PROCESSING TIME:
Total: 8m 23s
Real-time Factor: 0.18x
DETECTED ISSUES:
• Low confidence at 12:34 (background noise)
• Overlapping speech at 23:45
• Unknown speaker at 34:12
```
## API Examples
### OpenAI Whisper
```python
import openai
# Transcribe audio
with open("meeting.mp3", "rb") as audio_file:
transcript = openai.Audio.transcribe(
model="whisper-1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["word", "segment"]
)
# Access results
for segment in transcript.segments:
print(f"[{segment.start:.2f}] {segment.text}")
```
### AssemblyAI
```python
import assemblyai as aai
transcriber = aai.Transcriber()
config = aai.TranscriptionConfig(
speaker_labels=True,
auto_chapters=True,
entity_detection=True
)
transcript = transcriber.transcribe(
"https://example.com/meeting.mp3",
config=config
)
for utterance in transcript.utterances:
print(f"Speaker {utterance.speaker}: {utterance.text}")
```
## Best Practices
1. **Quality Audio**: Clean input = better output
2. **Choose Right Model**: Balance speed vs accuracy
3. **Use Diarization**: Identify speakers clearly
4. **Post-Process**: Clean up automated output
5. **Verify Critical Content**: Human review important
6. **Consider Privacy**: Handle sensitive content
7. **Store Efficiently**: Compress and index
8. **Provide Context**: Vocabulary hints helpREADME.md
What This Does
Comprehensive workflow for automating audio/video transcription and content processing.
Quick Start
Step 1: Create a Project Folder
mkdir -p ~/Documents/TranscriptionAutomation
Step 2: Download the Template
Click Download above, then:
mv ~/Downloads/CLAUDE.md ~/Documents/TranscriptionAutomation/
Step 3: Start Working
cd ~/Documents/TranscriptionAutomation
claude
Best Practices
- Quality Audio: Clean input = better output
- Choose Right Model: Balance speed vs accuracy
- Use Diarization: Identify speakers clearly
- Post-Process: Clean up automated output
- Verify Critical Content: Human review important
- Consider Privacy: Handle sensitive content
- Store Efficiently: Compress and index
- Provide Context: Vocabulary hints help