General Audio Processing Flow

  1. Initialization: The AudioReportService starts with parameters such as the audio file path, user email, number of speakers, organization name, and whether the number of speakers is known.

  2. Audio Preprocessing:

    • Convert to WAV Mono: Convert the audio file to WAV format with a single audio channel.

    • Measure Noise: Measure the background noise level in the audio file.

    • Noise Reduction: Reduce background noise in the audio file.

  3. Speaker Diarization and Processing:

    • Diarize Speakers: Identify and segment different speakers in the audio file.

    • Fix Speaker Names: Standardize speaker names in the diarization output.

    • Merge Short Files: Merge short audio segments if necessary.

    • Split Audio: Split the audio file based on timestamps from the diarization output.

  4. Quality Adjustments:

    • Adjust Decibels: Normalize audio volume levels across segments.

    • Speaker Matching: Match speakers within the meeting.

    • Speaker Tracking: Track speakers across multiple meetings.

  5. Speech Analysis:

    • Speech-to-Text: Convert speech segments to text.

    • Emotion Detection: Detect emotions in the audio segments.

    • Music Detection: Identify segments containing music.

    • Microphone Quality: Assess the quality of the microphone used.

    • Reverb Detection: Detect reverb in the audio segments.

    • Lag Detection: Identify any lag in the speaker's audio.

  6. Data Aggregation and Database Insertion:

    • Speaker Statistics: Calculate statistics for each speaker.

    • Insert Data: Insert processed data into the database.

    • Auto-Rename Speakers: Automatically rename speakers based on detected names.

    • Merge Fragments: Merge audio fragments into cohesive statements.

    • Punctuate Statements: Apply punctuation and capitalization to statements.

    • Split Statements: Split statements into individual sentences.

    • Text Emotions: Detect emotions in the text of the statements.

    • Generate Final Emotions: Map text emotions to speech emotions.

    • Toxicity Detection: Detect toxic language in the transcriptions.

    • Offensive Language Detection: Identify offensive language in the transcriptions.

  7. Report Generation and Delivery:

    • Generate Report Data: Compile data for the final report.

    • Convert to JSON: Format the report data as JSON.

    • Send Report: Send the report via email or API, depending on the request source.

Last updated