Tech Tips: Transcribe Audio Files Sent via Email

by Neeraj Verma March 29, 2023

The versatility of APIs is truly astounding, as they empower developers to interconnect systems, share data, and automate processes in unique and groundbreaking ways. In this blog post, we’ll explore how developers can create a tool that transforms audio files sent to an email address into transcriptions with ease.

Tech Tips: Transcribe Audio Files Sent via Email

Imagine a scenario where an agent wants to transcribe an exceptional customer service conversation. Rather than requiring agents to log into ElevateAI, upload audio files, and download transcriptions, developers can construct an internal service to streamline the process by ingesting audio files, transcribing them, and delivering the transcripts directly.

Sounds good, right? Well, let’s start building!

A Step-by-Step Guide to Seamless Audio File Integration via ElevateAI

You can download sample code with an implementation from its GitHub repository. If you want to send ElevateAI files in bulk, consider importing multiple audio files using the command line.

The GitHub repository references a submodule, the ElevateAI Python SDK. We’ll use the ElevateAI.py in the SDK to interface with the ElevateAI API.

What is the process? At a high level:

Access an email account and locate an email that has an audio attachment
Download and save the attachment
Transcribe the audio file attachment
Email the transcript back

For the transcription part of the code, the steps are:

Tell ElevateAI that you want to transcribe an audio file
Upload the file
Download the transcripts and CX insights when ElevateAI is done

The functions in ElevateAI.py, DeclareAudioInteraction, UploadInteraction, GetPunctuatedTranscript (or GetWordByWordTranscription), and GetAIResults will do the heavy lifting.

Let’s dive in!

Step 1. Configure

Read a configuration file that has settings to send and receive emails.

Essentially, we want to pull out the IMAP and SMTP hostnames, usernames, and passwords.

def read_config(filename):
    """
    Read and parse the configuration file.
    """
    try:
        with open(filename, 'r') as f:
            config = json.load(f)
            required_fields = ['imap_server', 'imap_username', 'imap_password',
                               'smtp_server', 'smtp_username', 'smtp_password', 'api_token']
            for field in required_fields:
                if field not in config:
                    raise ValueError(f"Config file is missing required field: {field}")
            return config
    except FileNotFoundError:
        print(f'Error: Config file "{filename}" not found.')
        sys.exit(1)
    except json.JSONDecodeError:
        print(f'Error: Config file "{filename}" is not valid JSON.')
        sys.exit(1)
    except ValueError as e:
        print(f'Error: {e}')
        sys.exit(1)

Step 2. Retrieve

Find the latest email with ‘Transcribe’ in the subject.

For the sake of this exercise, we will only retrieve a specific email, but a POC will require a more robust implementation.

# Search for the newest email message with an attachment
search_criteria = 'DATE'
result, data = imap.sort(search_criteria, 'UTF-8', 'SUBJECT "Transcribe"')
latest_email_id = data[0].split()[-1]

# Fetch the email message and extract the attachment
result, data = imap.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
email_message = email.message_from_bytes(raw_email)

attachment_path = None
sender_address = None

for part in email_message.walk():
    if part.get_content_maintype() == 'multipart':
        continue
    if part.get('Content-Disposition') is None:
        continue

    filename = part.get_filename()
    if not filename:
        continue

    # Save the attachment to a temporary file
    file_name = filename
    attachment_path = os.path.join(tmp_folder, filename)
    with open(attachment_path, 'wb') as f:
        f.write(part.get_payload(decode=True))

search_criteria = 'DATE'
result, data = imap.sort(search_criteria, 'UTF-8', 'SUBJECT "Transcribe"')
latest_email_id = data[0].split()[-1]

Step 3. Download

Download the attachment and save it in a temporary directory.

Use Python’s built in functionality email handling functionality to download the email attachment and store it.

for part in email_message.walk():
    if part.get_content_maintype() == 'multipart':
        continue
    if part.get('Content-Disposition') is None:
        continue

    filename = part.get_filename()
    if not filename:
        continue

    # Save the attachment to a temporary file
    file_name = filename
    attachment_path = os.path.join(tmp_folder, filename)
    with open(attachment_path, 'wb') as f:
        f.write(part.get_payload(decode=True))

Step 4. Transcribe

Declare the interaction, upload the audio file, and wait for ElevateAI to transcribe the audio file.

Send the audio to ElevateAI for transcription. Block and wait till the file is processed.

declareResp = ElevateAI.DeclareAudioInteraction(langaugeTag, vert, None, token, transcriptionMode, True)

declareJson = declareResp.json()

interactionId = declareJson["interactionIdentifier"]

if (localFilePath is None):
  raise Exception('Something wrong with attachment')

uploadInteractionResponse =  ElevateAI.UploadInteraction(interactionId, token, localFilePath, fileName)

#Loop over status until processed
while True:
  getInteractionStatusResponse = ElevateAI.GetInteractionStatus(interactionId,token)
  getInteractionStatusResponseJson = getInteractionStatusResponse.json()
  if getInteractionStatusResponseJson["status"] == "processed" or getInteractionStatusResponseJson["status"] == "fileUploadFailed" or getInteractionStatusResponseJson["status"] == "fileDownloadFailed" or getInteractionStatusResponseJson["status"] == "processingFailed" :
        break
  time.sleep(15)

Step 5. Convert

Convert the transcription, which is in JSON format, into a regular text file.

Once, we have the JSON, parse it so it reads like a conversation and store it.

def print_conversation(json_str):
  data = json.loads(json_str)
  filename = 'transcript.txt'
  
  # Initialize variables to store the accumulated phrases for each participant
  participantOne_phrases = ""
  participantTwo_phrases = ""
  tmp_folder = tempfile.mkdtemp()
  attachment_path = os.path.join(tmp_folder, filename)
  print("=== Begin Transcription Output ===\n\n")

  with open(attachment_path, 'w') as f:
    # Loop through the sentenceSegments list and accumulate phrases for each participant
    for segment in data['sentenceSegments']:
        if segment['participant'] == 'participantOne':
            participantOne_phrases += segment['phrase'] + " "
        elif segment['participant'] == 'participantTwo':
            participantTwo_phrases += segment['phrase'] + " "

        # If the next segment has a different participant, print the accumulated phrases and reset the variables
        if (data['sentenceSegments'].index(segment) != len(data['sentenceSegments'])-1) and (segment['participant'] != data['sentenceSegments'][data['sentenceSegments'].index(segment)+1]['participant']):
            p1 = participantOne_phrases.strip()
            p2 = participantTwo_phrases.strip()
            if p1:
              print("participantOne:\n" + p1 + "\n")
              f.write("participantOne:\n" + p1 + "\n\n")
            if p2:
              print("participantTwo:\n" + p2 + "\n")
              f.write("participantTwo:\n" + p2 + "\n\n")
            participantOne_phrases = ""
            participantTwo_phrases = ""

    # Print the accumulated phrases for the last participant
    p1 = participantOne_phrases.strip()
    p2 = participantTwo_phrases.strip()
    if p1:
      print("participantOne:\n" + p1 + "\n")
      f.write("participantOne:\n" + p1 + "\n\n")

    if p2:
      print("participantTwo:\n" + p2 + "\n")
      f.write("participantTwo:\n" + p2 + "\n\n")

    print("=== End Transcription Output ===\n\n")

  f.close()

  return attachment_path

Step 6. Email

Send the text file back through email.

Create a new email, attach the transcription, and send it back to the original sender.

def send_email_with_attachment(attachment_path, recipient_address, config):

  smtp_server = config["smtp_server"]
  smtp_username = config["smtp_username"]
  smtp_password = config["smtp_password"]

  # Log in to the SMTP server
  smtp = smtplib.SMTP_SSL(smtp_server)
  smtp.ehlo()
  smtp.login(smtp_username, smtp_password)
  print("SMTP logged in.")

  # Create a message object
  message = MIMEMultipart()
  message['From'] = smtp_username
  message['To'] = recipient_address
  message['Subject'] = "Completed Transcription"

  # Add the attachment to the message
  with open(attachment_path, 'r') as f:
    attachment = MIMEApplication(f.read(), _subtype='txt')
    attachment.add_header('Content-Disposition', 'attachment', filename=os.path.basename(attachment_path))
    message.attach(attachment)

  # Send the message
  smtp.send_message(message)

  # Log out of the SMTP server
  smtp.quit()

Sample code can be found in GitHub.

Want more? Visit our Documentation Hub >> ElevateAI Documentation

Ready to Get Started? >> elevateai.com/getstarted

Neeraj Verma

Neeraj has extensive experience in the enterprise software space, having joined speech technology pioneer Nexidia straight out of college and spent his career in technology and customer experience. He transitioned to NICE with their 2016 acquisition of Nexidia and currently serves as the Vice President of Artificial Intelligence (AI), leading ElevateAI by NICE.

Tech Tips: Transcribe Audio Files Sent via Email

Tech Tips: Transcribe Audio Files Sent via Email

A Step-by-Step Guide to Seamless Audio File Integration via ElevateAI

What is the process? At a high level:

Step 1. Configure

Read a configuration file that has settings to send and receive emails.

Step 2. Retrieve

Find the latest email with ‘Transcribe’ in the subject.

Step 3. Download

Download the attachment and save it in a temporary directory.

Step 4. Transcribe

Declare the interaction, upload the audio file, and wait for ElevateAI to transcribe the audio file.

Step 5. Convert

Convert the transcription, which is in JSON format, into a regular text file.

Step 6. Email

Send the text file back through email.

Categories

Recent Posts

5 Steps to an All-Star Contact Center Season with ElevateAI

5 Reasons to Fall for ElevateAI on Valentine’s Day

ElevateAI’s Top 10 AI Resolutions for Contact Centers in 2025

Tags

Tech Tips: Transcribe Audio Files Sent via Email

Tech Tips: Transcribe Audio Files Sent via Email

A Step-by-Step Guide to Seamless Audio File Integration via ElevateAI

What is the process? At a high level:

Step 1. Configure

Read a configuration file that has settings to send and receive emails.

Step 2. Retrieve

Find the latest email with ‘Transcribe’ in the subject.

Step 3. Download

Download the attachment and save it in a temporary directory.

Step 4. Transcribe

Declare the interaction, upload the audio file, and wait for ElevateAI to transcribe the audio file.

Step 5. Convert

Convert the transcription, which is in JSON format, into a regular text file.

Step 6. Email

Send the text file back through email.

Categories

Recent Posts

5 Steps to an All-Star Contact Center Season with ElevateAI

5 Reasons to Fall for ElevateAI on Valentine’s Day

ElevateAI’s Top 10 AI Resolutions for Contact Centers in 2025

Tags

Delivering Exceptional CX with the Power of CX AI

Tech Tips: Efficient Bulk Audio Transcription with ElevateAI

Related Posts

5 Steps to an All-Star Contact Center Season with ElevateAI

5 Reasons to Fall for ElevateAI on Valentine’s Day