AI & Processing

Model Studio

Plain-language guide to VAD, pause detection, and transcription behavior so you can tune Voiscribd without guesswork.

Settings -> Model Studio (Expert Mode)

Model Studio is where you control how Voiscribd "listens" and when it decides speech has started or ended.

If your transcript feels cut off, late, noisy, or inconsistent, the fix is usually here.

What VAD Actually Is

VAD means Voice Activity Detection.

In practice: it is the part that decides whether incoming audio sounds like voice or not.

Good VAD settings give you:

  • fewer accidental starts from keyboard clicks/fan noise
  • cleaner stops when you are actually done speaking
  • no clipped first words

Output and Language Behavior

Output Format

Output Format is a style instruction layer for transcription text.

Use it to control punctuation style, compactness, and formatting preferences.

Vocabulary Boost

Add names, acronyms, product terms, and domain jargon that matter in your workflow.

  • supports comma-separated or newline-separated entries
  • especially helpful for niche names and internal terminology

Post-Processing

  • Add space after paste: inserts a trailing space after text output
  • Automatic text formatting: cleans punctuation/paragraph flow after transcription

Global Transcription Defaults

Default Transcription Mode

Used when no Chain overrides it:

  • Batch
  • Real Time

Live Transcript Preview

Shows/hides interim partial text while speaking.

VAD Controls Explained for Humans

Voice Activity Detection (VAD)

Master switch for speech/non-speech detection logic.

Skip likely non-speech audio

Filters out sounds that are probably not voice.

Auto-stop on silence

Stops recording when enough silence is detected.

Natural pause tolerance

Lets you pause briefly without the recording ending too aggressively.

VAD Threshold

How "confident" audio must be before it is treated as voice.

  • higher threshold: stricter, less noise triggers
  • lower threshold: more sensitive, easier voice pickup

Min Speech

Minimum voice duration before speech is accepted as real speech.

Min Silence

How long silence must last before Voiscribd decides you are done.

Speech Padding

Extra audio around detected speech boundaries for safer segmentation.

Always Listen Pre-roll

Keeps a short audio lead-in so first words are not clipped.

Quick Tuning Recipes

Problem: first word gets chopped

Increase Always Listen Pre-roll, then slightly increase Speech Padding.

Problem: recording stops while I am thinking

Increase Min Silence and enable Natural pause tolerance.

Problem: random background sounds trigger recording

Increase VAD Threshold and enable Skip likely non-speech audio.

Problem: speech start feels late

Reduce VAD Threshold slightly and reduce Min Speech.

Safe Baseline

If settings are heavily tuned and behavior is unstable:

  1. reset VAD values to default,
  2. enable Natural pause tolerance,
  3. tune one parameter at a time while testing the same sentence pattern.

Note

[Screencast Placeholder] Record a tuning session that starts with premature auto-stop, then fixes it by adjusting Min Silence and pause tolerance.