Applications Blog Knowledge Base Products and services

Transcribing voicemails to text with Amazon AWS

For this project we are going to use the Amazon AWS Transcribe service, AWS Transcribe is a cloud-based speech recognition service that converts audio recordings into accurate text transcripts. It uses advanced machine learning algorithms to identify different speakers and punctuation, while also supporting a variety of audio formats and languages. AWS Transcribe can transcribe audio from sources such as phone calls, video recordings, and live streams, making it a versatile tool thats idealy suited for voicemail transcription, The service is highly scalable and cost-effective.

We will say that we used to use Google’s Text to speech engine for thsi but over time I would have expected quality of transcription to have improved, But with Google this is not the case, and I expect this is because they possibly use “predictive” text to speech and not sample all the words as this example below shows, This is the same audio fed to Google and AWS

Amazon AWS Transcribe

Um, this is Ian. I’d like to order some pizza for tomorrow, please. We would like to order a pepperoni pizza and a mozzarella pizza that’s for  tomorrow at five PM. Thank you.

Google Speech to Text

like to order some pizza for tomorrow please would like to order a pepperoni pizza and a mozzarella Pizza Hut for tomorrow at 5 a.m. thank you

As can be seen google misses words and adds others, As you can imagine this isnt what you want with speech transcription.

So we have switched out old script to use AWS.

For this project on Freepbx you need a few extra applications added and a amazon aws account, setting this up is not covered here as you should already have knowledge of this if you are here.

The extra apps are , aws , jq , sox

to get aws :

curl "" -o ""
unzip -qq

Then you need to configure as 'root' and as 'asterisk', so:
aws configure
fill out your aws key and token as well as the region your bucket is in 
Then repeat as 'asterisk' so
su asterisk
aws configure
and fill out same details.

for jq and sox, just yum install xxx as you would for any other program.

Next you need the asterisk dialplan added to the extensions_custom.conf

exten => _XXXX,1,Set(__EXTTOCALL=${EXTEN})
exten => _XXXX,n,Noop(${EXTTOCALL})
exten => _XXXX,n,Goto(s,1)

exten => s,1,Answer()  ; Listen to ringing for 1 seconds
exten => s,n,Set(AGC(rx)=8000)
exten => s,n,Set(DENOISE(rx)=on)
exten => s,n,Noop(${EXTTOCALL} , ${DIALSTATUS} , ${SV_DIALSTATUS})
exten => s,n,GotoIf($["${DIALSTATUS}"="BUSY"]?busy:bnext)
exten => s,n(busy),Set(greeting=busy)
exten => s,n,Goto(carryon)
exten => s,n(bnext),GotoIf($["${DIALSTATUS}"="NOANSWER"]?unavail:unext)
exten => s,n(unavail),Set(greeting=unavail)
exten => s,n,Goto(carryon)
exten => s,n(unext),Set(greeting=unavail)
exten => s,n,Goto(carryon)

exten => s,n(carryon),Set(origmailbox=${EXTTOCALL})
exten => s,n,Set(msg=${STAT(e,${ASTSPOOLDIR}/voicemail/default/${origmailbox}/${greeting}.wav)})
exten => s,n,Set(__start=0)
exten => s,n,Set(__end=0)
exten => s,n,NoOp(${UNIQUEID})
exten => s,n,Set(origdate=${STRFTIME(${EPOCH},,%a %b %d %r %Z %G)})
exten => s,n,Set(origtime=${EPOCH})
exten => s,n,Set(callerchan=${CHANNEL})
exten => s,n,Set(callerid=${CALLERID(num)})
exten => s,n,Set(origmailbox=${origmailbox})
exten => s,n,Answer()
exten => s,n,GotoIf($["${msg}"="1"]?msgy:msgn)
exten => s,n(msgy),Playback(${ASTSPOOLDIR}/voicemail/default/${origmailbox}/${greeting});(local/catreq/how_did)
exten => s,n,Goto(beep)
exten => s,n(msgn),Playback(vm-intro)
exten => s,n(beep),System(/bin/touch /var/lib/asterisk/sounds/catline/${UNIQUEID}.wav)
exten => s,n,Playback(beep)
exten => s,n,Set(__start=${EPOCH})
exten => s,n,Record(catline/${UNIQUEID}.wav,3,60,kaqu)
exten => s,n,Playback(beep)
exten => s,n,Hangup()
exten => h,1,Noop(${start} ${end})
exten => h,n,GotoIf($["${start}"!="0"]?ok:end)
exten => h,n(ok),Set(end=${EPOCH})
exten => h,n,Set(duration=${MATH(${end}-${start},int)})
exten => h,n,System(/usr/local/sbin/ "${callerchan}" ${callerid} "${origdate}" ${origtime} ${origmailbox} ${UNIQUEID} ${duration})
exten => h,n(end),Noop(finished)

as can be seen this dialplan records a file and then runs the script. This script collects the variables and passes them over to the main script and exits after doing so, this is so channels aren’t held while transcription takes place. (Thats the plan anyway)

export callerchan
export callerid
export origdate
export origtime
export origmailbox
export origdir
export duration

nohup /usr/local/sbin/ &

Main script

sleep 4

FILENUM=$(/bin/ls ${DIRPATH}${origmailbox}/INBOX |/bin/grep txt | /usr/bin/wc -l)

##Added to allow 999 messages
if  (( $FILENUM <= 9 ));
elif (( $FILENUM <= 99 ));

IN=$(/bin/grep "${origmailbox}=" /etc/asterisk/voicemail.conf)
set -- "$IN"
IFS=","; declare -a Array=($*)

/bin/echo "[message]" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo origmailbox=${origmailbox} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "context=demo" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "macrocontext=" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "exten=s" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "priority=11" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo callerchan=${callerchan} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo callerid=${callerid} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo origdate=${origdate} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo origtime=${origtime} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo msg_id=${origtime}-00000001  >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "flag="  >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "category=" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "duration=${duration}" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt

    /bin/nice /usr/bin/lame -b 16 -m m -q 9-resample /var/lib/asterisk/sounds/catline/${origdir}.wav  /tmp/${origdir}.mp3

    # Create a string based on the current date and time
    current_date_time="$(date +%Y-%m-%d_%H-%M-%S)"

    # Upload to the S3 Bucket

    aws --debug --profile default s3 cp /tmp/${origdir}.mp3 s3://$S3_BUCKET/$current_date_time

    # Start the transcription job
    output=$(aws --profile default transcribe start-transcription-job \
    --transcription-job-name $current_date_time \
    --language-code en-GB \
    --media-format mp3 \
    --media MediaFileUri=s3://$S3_BUCKET/$current_date_time \
    --output-bucket-name $S3_BUCKET)

    # Wait for the transcription to finish
    while [ "$JOB_COMPLETED" = false ]; do
        JOB_STATUS=$(aws --profile default transcribe get-transcription-job \
            --transcription-job-name $current_date_time \
            --query 'TranscriptionJob.TranscriptionJobStatus' \
            --output text)
        if [ "$JOB_STATUS" = "FAILED" ]; then
                /bin/echo "$JOB_STATUS"    >> /tmp/logfile.txt

        if [ "$JOB_STATUS" = "COMPLETED" ]; then
                /bin/echo "$JOB_STATUS"    >> /tmp/logfile.txt

        sleep 5
        echo $counter    >> /tmp/logfile.txt
        /bin/echo "$JOB_STATUS"    >> /tmp/logfile.txt
                if [ "$counter" -eq  "15" ]; then

    # Get the transcription result
    aws s3 --profile default cp s3://$S3_BUCKET/$current_date_time.json /tmp/$current_date_time.json 

    # Get the transcription result
    FILTERED=$(jq -r '.results.transcripts[].transcript' /tmp/$current_date_time.json)
        # append result of transcription
        if [ -z "$FILTERED" ]
          echo "(AWS was unable to recognize any speech in audio data.)" >> /tmp/${origdir}.txt
          echo "$FILTERED" >> /tmp/${origdir}.txt
          sed -i 's/ Um,/ /gI' /tmp/${origdir}.txt

voicemailbody=$(cat "/tmp/${origdir}.txt")

# echo "body ${voicemailbody}"

/bin/cp /var/lib/asterisk/sounds/catline/${origdir}.wav ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.wav

echo -e "You have a new voicemail from ${callerid} it was left on ${origdate} and is ${duration} seconds long,\nThe message left,\n\n${voicemailbody}\n\nTranscribed by the Amazon AWS Transcribe service\n" | /bin/mail -s "A new voicemail has arrived from ${callerid}" -a "/tmp/${origdir}.mp3" "$email"

/bin/rm -f /tmp/${origdir}.mp3
/bin/rm -f /tmp/${origdir}.txt
aws --profile default transcribe delete-transcription-job --transcription-job-name $current_date_time

Then to pass calls to this and not normal voicemail, In Freepbx create a Custom Destination as “vmail2text,s,1” and if you require certain queues to go to specific mailboxes for example 2000 one like “vmail2text,2000,1” so calls will be sent to mailbox 2000 and teh transcriptions will be sent to the email address linked to that extension

Then in extensions that want to use transcription set the “Optional Destinations” in the advanced tab to the custom destination.

Users also can listen to voicemail normally from their handset or the ucp.

These scripts arent only useful for voicemail then can be used fro questionnaire lines and booking lines, anywhere you want to speed up the handling of voice messages. We will soon be looking at ways of integrating this with Whatsapp so transcriptions can be sent to your mobile.

If you have any further questions please email.