Audio Node
An Audio Node holds a single audio clip — a voiceover, a music track, or a sound effect. You can upload an audio file or generate one using an AI audio model. Audio Nodes connect downstream to Video Nodes, letting you attach narration, music, or sound to your video output.
Adding content
Uploading audio
When an Audio Node is empty, click the Upload button to select a file from your computer.
Maximum file size: 50 MB
Generating with a model
Select a generation mode in the generation panel, write a prompt or configure your settings, then click Generate. The audio clip appears in the display area when complete.
The Generate button is disabled when:
- There is no prompt input and no upstream Text Node connected
- Audio is already being generated on this node
- An upstream node is currently generating ("Upstream task in progress.")
Generation modes
The Audio Node has four generation modes. Select your mode from the tabs in the generation panel.
Text to Speech
Converts a text prompt into spoken audio using a selected voice. Use this to generate character narration, voiceovers, or dialogue.
Supported models:
- FishAudio — supports official voices and custom voices you create (see Custom voices below)
- ElevenLabs V3 — uses ElevenLabs official voices only; custom voices not supported
How it works: Type or paste your script in the prompt field (up to 5,000 characters), select a voice, and click Generate.
Tip: If an upstream Text Node is connected, its content is passed as the prompt automatically — useful for turning a script node directly into voiceover.
Text to Music
Generates a music track from a text description of the style, mood, and feel you want.
Supported model: Music V3 (Lyria 3 Pro)
Duration options: 5s to 300s, in 30-second increments
How it works: Describe the music you want in the prompt field (e.g. "An energetic and catchy pop song with a driving beat, bright synths"), select a duration, and click Generate.
Note: Music V3 charges based on duration — longer clips cost more credits. Check the credit cost on the Generate button before confirming.
Text to Sound Effect
Generates a sound effect from a text description.
Supported model: ElevenLabs (sound effects API)
Duration options: 0.5s to 30s
How it works: Describe the sound you want (e.g. "Ambient sound of a busy street on a sunny morning"), select a duration, and click Generate.
Note: This model performs best with English prompts. If you write your prompt in another language, it is automatically translated to English before being sent to the model. Your original text remains visible in the input field.
Voice Design
Generates voice samples based on a text description of a character's vocal qualities. Use this to audition and create custom voices before committing to a full voiceover.
Supported model: ElevenLabs (Voice Design API)
How it works: Describe the voice you want (e.g. "An older woman with a thick Southern accent. She is sweet and sarcastic."). The model generates 3 samples of approximately 8 seconds each. The first sample is shown by default.
Once you have a voice sample you like, you can save it as a custom voice for use in Text to Speech via FishAudio (see Custom voices below).
Generation panel
| Element | Description |
|---|---|
| Mode tabs | Text to Speech / Text to Music / Text to Sound Effect / Voice Design |
| Upstream text icon | Inline icon block if a Text Node is connected upstream |
| Prompt input | Your text instructions; up to 5,000 characters |
| Voice selector | (Text to Speech only) Choose an official or custom voice |
| Model selector | Choose the AI model for the current mode |
| Duration | (Music and Sound Effect modes) Select output length |
| Generate button | Shows credit cost; click to confirm and start generation |
Custom voices
Custom voices let you create and reuse a specific voice across your canvas projects. This feature is only available when FishAudio is selected as the model in Text to Speech mode.
Adding a custom voice
- In the Audio Node generation panel (Text to Speech mode, FishAudio selected), click Add Voice.
- The canvas enters voice selection mode — all non-audio nodes and audio nodes shorter than 5 seconds are greyed out.
- Click any eligible Audio Node (must contain audio longer than 5 seconds) to select it as your voice reference.
- A confirmation panel appears below the selected node. Confirm to create the voice.
- FishAudio processes the reference audio into a speaker file. The new voice appears in your voice selector.
Note: If the current canvas has no eligible Audio Nodes, you'll be prompted to generate one first. Click the cancel button at the bottom of the canvas to exit voice selection mode at any time.
Managing custom voices
Custom voices appear in the voice selector alongside official voices. You can:
- Preview any voice (official or custom) by clicking the play button
- Rename a custom voice directly in the selector
- Delete a custom voice permanently — a confirmation prompt appears before deletion. Deleting a voice does not affect audio already generated with it.
Custom voices are available across all your canvases, not just the one where they were created.
Connecting to other nodes
Accepted upstream connections
| Upstream node | How its content is used |
|---|---|
| Text Node | Text content is passed as the prompt in the generation panel |
Audio Nodes only accept Text Nodes upstream. Image Nodes and Video Nodes cannot connect upstream to an Audio Node.
Supported downstream connections
An Audio Node can connect downstream to a Video Node only. When connected:
- The Video Node's generation mode switches to Omni-Reference automatically
- The audio option in the Video Node is enabled by default
- The Audio Node appears in the Video Node's reference strip
- You can use
@Soundin the Video Node prompt to reference the connected audio
Note: Video models that do not support audio input are greyed out in the Video Node's model selector when an Audio Node is connected upstream.
Audio Nodes cannot connect to other Audio Nodes.
Display area toolbar
| State | Toolbar options |
|---|---|
| Empty / generating / failed | Upload button only |
| Audio present | Trim, Download, History |
Trim
Click Trim to open the trim tool directly on the audio node's waveform. Drag the trim handles to select the portion you want to keep. You can preview your selection using the node's playback controls.
When you confirm the trim, the selected clip becomes a new Audio Node on the canvas — the original node is unchanged. The two nodes are not connected.
History
Click History to view all audio clips previously generated on this node. Each entry shows the generation date, model used, duration, and relevant parameters (voice name for TTS, etc.).
From the history panel you can:
- Play any past clip
- Trim or Download a past clip
- Set as Main — replaces the current display area content with the selected history clip. Any downstream Video Nodes connected to this node will show an "input updated" notification.
Content and deletion
When you delete an Audio Node, its audio content is saved to your account assets — it is not lost.
Updated 2 days ago
