Skip to main content

POST /stt

Transcribes audio files to text using OpenAI’s Whisper model via Mastra’s voice capabilities.

Request Body

Content-Type: multipart/form-data
audio
File
required
Audio file to transcribe. Supported formats: MP3, WAV, M4A, FLAC, OGG, WEBM, etc.

Request Example

// Using FormData for file upload
async function transcribeAudio(audioFile) {
  const formData = new FormData();
  formData.append('audio', audioFile);

  const response = await fetch('https://oyester.metaphy.live/stt', {
    method: 'POST',
    body: formData
  });

  if (!response.ok) {
    throw new Error(`Transcription failed: ${response.status}`);
  }

  const result = await response.json();
  return result.transcript;
}

Response

transcript
string
The transcribed text from the audio file.
success
boolean
Always true for successful transcriptions.
timestamp
string
ISO timestamp of when transcription was completed.
processingTime
number
Processing time in milliseconds.

Success Response (200)

{
  "transcript": "How can I help you find inner peace today?",
  "success": true,
  "timestamp": "2025-11-21T10:30:00.000Z",
  "processingTime": 2450
}

Error Responses

CodeDescription
400No audio file provided or invalid file format
500Audio processing failed or API key issues

Supported Audio Formats

  • MP3
  • WAV
  • M4A
  • FLAC
  • OGG
  • WEBM
  • And other formats supported by OpenAI Whisper

File Size Limits

  • Maximum file size: 25MB (OpenAI API limit)
  • Recommended: Keep files under 10MB for faster processing

Language Support

  • Default: English (en-US)
  • Multi-language: Automatically detects multiple languages
  • Best results: Clear pronunciation for non-English audio

Frontend Integration Examples

File Input Handler

// HTML
<input type="file" id="audioInput" accept="audio/*">

// JavaScript
document.getElementById('audioInput').addEventListener('change', async (event) => {
  const file = event.target.files[0];
  if (file) {
    try {
      const transcript = await transcribeAudio(file);
      document.getElementById('transcript').textContent = transcript;
    } catch (error) {
      console.error('Transcription error:', error);
    }
  }
});

Drag and Drop

const dropZone = document.getElementById('dropZone');

dropZone.addEventListener('dragover', (e) => {
  e.preventDefault();
  dropZone.classList.add('dragover');
});

dropZone.addEventListener('dragleave', () => {
  dropZone.classList.remove('dragover');
});

dropZone.addEventListener('drop', async (e) => {
  e.preventDefault();
  dropZone.classList.remove('dragover');

  const files = e.dataTransfer.files;
  if (files.length > 0) {
    const file = files[0];
    if (file.type.startsWith('audio/')) {
      try {
        const transcript = await transcribeAudio(file);
        console.log('Transcription:', transcript);
      } catch (error) {
        console.error('Error:', error);
      }
    }
  }
});

Recording and Transcription

class AudioRecorder {
  constructor() {
    this.mediaRecorder = null;
    this.audioChunks = [];
  }

  async startRecording() {
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    this.mediaRecorder = new MediaRecorder(stream);

    this.mediaRecorder.ondataavailable = (event) => {
      this.audioChunks.push(event.data);
    };

    this.mediaRecorder.onstop = async () => {
      const audioBlob = new Blob(this.audioChunks, { type: 'audio/wav' });
      const audioFile = new File([audioBlob], 'recording.wav', { type: 'audio/wav' });

      try {
        const transcript = await transcribeAudio(audioFile);
        console.log('Live transcription:', transcript);
      } catch (error) {
        console.error('Transcription failed:', error);
      }
    };

    this.audioChunks = [];
    this.mediaRecorder.start();
  }

  stopRecording() {
    if (this.mediaRecorder && this.mediaRecorder.state === 'recording') {
      this.mediaRecorder.stop();
    }
  }
}

// Usage
const recorder = new AudioRecorder();

// Start recording
document.getElementById('startBtn').addEventListener('click', () => {
  recorder.startRecording();
});

// Stop recording
document.getElementById('stopBtn').addEventListener('click', () => {
  recorder.stopRecording();
});

cURL Examples

Basic File Upload

# Using a local audio file
curl -X POST https://oyester.metaphy.live/stt \
  -F "audio=@/path/to/your/audio.mp3"

# Example response
{
  "transcript": "How can I help you today?",
  "success": true,
  "timestamp": "2025-11-21T10:30:00.000Z",
  "processingTime": 2450
}

With Custom Headers

# With custom headers
curl -X POST https://oyester.metaphy.live/stt \
  -H "Authorization: Bearer your-token" \
  -F "audio=@meditation_question.wav"

Error Handling

async function transcribeAudio(audioFile) {
  const formData = new FormData();
  formData.append('audio', audioFile);

  try {
    const response = await fetch('https://oyester.metaphy.live/stt', {
      method: 'POST',
      body: formData
    });

    if (!response.ok) {
      const errorData = await response.json();
      throw new Error(`Transcription failed: ${errorData.message || response.statusText}`);
    }

    const result = await response.json();
    return result.transcript;
  } catch (error) {
    console.error('Transcription error:', error);

    // Handle specific error types
    if (error.message.includes('400')) {
      throw new Error('Invalid audio file. Please check the format and try again.');
    } else if (error.message.includes('500')) {
      throw new Error('Server error. Please try again later.');
    } else {
      throw new Error('Network error. Please check your connection.');
    }
  }
}

Best Practices

File Format: Use MP3 or WAV for best compatibility and smaller file sizes.
Audio Quality: Higher quality audio generally produces better transcriptions.
File Size: Compress audio files when possible to reduce upload time and processing costs.
Error Handling: Always implement proper error handling for network issues and API failures.