For this project we are going to use the Amazon AWS Transcribe service, AWS Transcribe is a cloud-based speech recognition service that converts audio recordings into accurate text transcripts. It uses advanced machine learning algorithms to identify different speakers and punctuation, while also supporting a variety of audio formats and languages. AWS Transcribe can transcribe audio from sources such as phone calls, video recordings, and live streams, making it a versatile tool thats idealy suited for voicemail transcription, The service is highly scalable and cost-effective.
We will say that we used to use Google’s Text to speech engine for thsi but over time I would have expected quality of transcription to have improved, But with Google this is not the case, and I expect this is because they possibly use “predictive” text to speech and not sample all the words as this example below shows, This is the same audio fed to Google and AWS
Amazon AWS Transcribe
Um, this is Ian. I’d like to order some pizza for tomorrow, please. We would like to order a pepperoni pizza and a mozzarella pizza that’s for tomorrow at five PM. Thank you.
Google Speech to Text
like to order some pizza for tomorrow please would like to order a pepperoni pizza and a mozzarella Pizza Hut for tomorrow at 5 a.m. thank you
As can be seen google misses words and adds others, As you can imagine this isnt what you want with speech transcription.
So we have switched out old script to use AWS.
For this project on Freepbx you need a few extra applications added and a amazon aws account, setting this up is not covered here as you should already have knowledge of this if you are here.
The extra apps are , aws , jq , sox
to get aws :
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -qq awscliv2.zip
./aws/install
Then you need to configure as 'root' and as 'asterisk', so:
aws configure
fill out your aws key and token as well as the region your bucket is in
Then repeat as 'asterisk' so
su asterisk
aws configure
and fill out same details.
for jq and sox, just yum install xxx as you would for any other program.
Next you need the asterisk dialplan added to the extensions_custom.conf
[vmail2text]
exten => _XXXX,1,Set(__EXTTOCALL=${EXTEN})
exten => _XXXX,n,Noop(${EXTTOCALL})
exten => _XXXX,n,Goto(s,1)
exten => s,1,Answer() ; Listen to ringing for 1 seconds
exten => s,n,Set(AGC(rx)=8000)
exten => s,n,Set(DENOISE(rx)=on)
exten => s,n,Noop(${EXTTOCALL} , ${DIALSTATUS} , ${SV_DIALSTATUS})
exten => s,n,GotoIf($["${DIALSTATUS}"="BUSY"]?busy:bnext)
exten => s,n(busy),Set(greeting=busy)
exten => s,n,Goto(carryon)
exten => s,n(bnext),GotoIf($["${DIALSTATUS}"="NOANSWER"]?unavail:unext)
exten => s,n(unavail),Set(greeting=unavail)
exten => s,n,Goto(carryon)
exten => s,n(unext),Set(greeting=unavail)
exten => s,n,Goto(carryon)
exten => s,n(carryon),Set(origmailbox=${EXTTOCALL})
exten => s,n,Set(msg=${STAT(e,${ASTSPOOLDIR}/voicemail/default/${origmailbox}/${greeting}.wav)})
exten => s,n,Set(__start=0)
exten => s,n,Set(__end=0)
exten => s,n,NoOp(${UNIQUEID})
exten => s,n,Set(origdate=${STRFTIME(${EPOCH},,%a %b %d %r %Z %G)})
exten => s,n,Set(origtime=${EPOCH})
exten => s,n,Set(callerchan=${CHANNEL})
exten => s,n,Set(callerid=${CALLERID(num)})
exten => s,n,Set(origmailbox=${origmailbox})
exten => s,n,Answer()
exten => s,n,GotoIf($["${msg}"="1"]?msgy:msgn)
exten => s,n(msgy),Playback(${ASTSPOOLDIR}/voicemail/default/${origmailbox}/${greeting});(local/catreq/how_did)
exten => s,n,Goto(beep)
exten => s,n(msgn),Playback(vm-intro)
exten => s,n(beep),System(/bin/touch /var/lib/asterisk/sounds/catline/${UNIQUEID}.wav)
exten => s,n,Playback(beep)
exten => s,n,Set(__start=${EPOCH})
exten => s,n,Record(catline/${UNIQUEID}.wav,3,60,kaqu)
exten => s,n,Playback(beep)
exten => s,n,Hangup()
exten => h,1,Noop(${start} ${end})
exten => h,n,GotoIf($["${start}"!="0"]?ok:end)
exten => h,n(ok),Set(end=${EPOCH})
exten => h,n,Set(duration=${MATH(${end}-${start},int)})
exten => h,n,System(/usr/local/sbin/vmailprox.sh "${callerchan}" ${callerid} "${origdate}" ${origtime} ${origmailbox} ${UNIQUEID} ${duration})
exten => h,n(end),Noop(finished)
as can be seen this dialplan records a file and then runs the vmailprox.sh script. This script collects the variables and passes them over to the main script and exits after doing so, this is so channels aren’t held while transcription takes place. (Thats the plan anyway)
#!/bin/sh
callerchan=$1
callerid=$2
origdate=$3
origtime=$4
origmailbox=$5
origdir=$6
duration=$7
export callerchan
export callerid
export origdate
export origtime
export origmailbox
export origdir
export duration
nohup /usr/local/sbin/quietvmail.sh &
exit
Main script
#!/bin/sh
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
S3_BUCKET="YOURS3BUCKET"
DIRPATH=/var/spool/asterisk/voicemail/default/
#callerchan=$1
#callerid=$2
#origdate=$3
#origtime=$4
#origmailbox=$5
#origdir=$6
#duration=$7
counter=1
sleep 4
FILENUM=$(/bin/ls ${DIRPATH}${origmailbox}/INBOX |/bin/grep txt | /usr/bin/wc -l)
##Added to allow 999 messages
if (( $FILENUM <= 9 ));
then
FILENAME=msg000${FILENUM}
elif (( $FILENUM <= 99 ));
then
FILENAME=msg00${FILENUM}
else
FILENAME=msg0${FILENUM}
fi
IN=$(/bin/grep "${origmailbox}=" /etc/asterisk/voicemail.conf)
set -- "$IN"
IFS=","; declare -a Array=($*)
email=${Array[2]}
/bin/echo "[message]" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo origmailbox=${origmailbox} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "context=demo" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "macrocontext=" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "exten=s" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "priority=11" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo callerchan=${callerchan} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo callerid=${callerid} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo origdate=${origdate} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo origtime=${origtime} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo msg_id=${origtime}-00000001 >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "flag=" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "category=" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/echo "duration=${duration}" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt
/bin/nice /usr/bin/lame -b 16 -m m -q 9-resample /var/lib/asterisk/sounds/catline/${origdir}.wav /tmp/${origdir}.mp3
# Create a string based on the current date and time
current_date_time="$(date +%Y-%m-%d_%H-%M-%S)"
# Upload to the S3 Bucket
aws --debug --profile default s3 cp /tmp/${origdir}.mp3 s3://$S3_BUCKET/$current_date_time
# Start the transcription job
output=$(aws --profile default transcribe start-transcription-job \
--transcription-job-name $current_date_time \
--language-code en-GB \
--media-format mp3 \
--media MediaFileUri=s3://$S3_BUCKET/$current_date_time \
--output-bucket-name $S3_BUCKET)
# Wait for the transcription to finish
JOB_COMPLETED=false
while [ "$JOB_COMPLETED" = false ]; do
JOB_STATUS=$(aws --profile default transcribe get-transcription-job \
--transcription-job-name $current_date_time \
--query 'TranscriptionJob.TranscriptionJobStatus' \
--output text)
if [ "$JOB_STATUS" = "FAILED" ]; then
JOB_COMPLETED=true
SHORT_CALL=yes
/bin/echo "$JOB_STATUS" >> /tmp/logfile.txt
break
fi
if [ "$JOB_STATUS" = "COMPLETED" ]; then
/bin/echo "$JOB_STATUS" >> /tmp/logfile.txt
JOB_COMPLETED=true
else
((counter++))
sleep 5
echo $counter >> /tmp/logfile.txt
/bin/echo "$JOB_STATUS" >> /tmp/logfile.txt
if [ "$counter" -eq "15" ]; then
JOB_STAUS=COMPLETED
JOB_COMPLETED=true
SHORT_CALL=yes
break
fi
fi
done
# Get the transcription result
aws s3 --profile default cp s3://$S3_BUCKET/$current_date_time.json /tmp/$current_date_time.json
# Get the transcription result
FILTERED=$(jq -r '.results.transcripts[].transcript' /tmp/$current_date_time.json)
# append result of transcription
if [ -z "$FILTERED" ]
then
echo "(AWS was unable to recognize any speech in audio data.)" >> /tmp/${origdir}.txt
else
echo "$FILTERED" >> /tmp/${origdir}.txt
sed -i 's/ Um,/ /gI' /tmp/${origdir}.txt
fi
voicemailbody=$(cat "/tmp/${origdir}.txt")
# echo "body ${voicemailbody}"
/bin/cp /var/lib/asterisk/sounds/catline/${origdir}.wav ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.wav
echo -e "You have a new voicemail from ${callerid} it was left on ${origdate} and is ${duration} seconds long,\nThe message left,\n\n${voicemailbody}\n\nTranscribed by the Amazon AWS Transcribe service\n" | /bin/mail -s "A new voicemail has arrived from ${callerid}" -a "/tmp/${origdir}.mp3" "$email"
/bin/rm -f /tmp/${origdir}.mp3
/bin/rm -f /tmp/${origdir}.txt
aws --profile default transcribe delete-transcription-job --transcription-job-name $current_date_time
Then to pass calls to this and not normal voicemail, In Freepbx create a Custom Destination as “vmail2text,s,1” and if you require certain queues to go to specific mailboxes for example 2000 one like “vmail2text,2000,1” so calls will be sent to mailbox 2000 and teh transcriptions will be sent to the email address linked to that extension
Then in extensions that want to use transcription set the “Optional Destinations” in the advanced tab to the custom destination.
Users also can listen to voicemail normally from their handset or the ucp.
These scripts arent only useful for voicemail then can be used fro questionnaire lines and booking lines, anywhere you want to speed up the handling of voice messages. We will soon be looking at ways of integrating this with Whatsapp so transcriptions can be sent to your mobile.
If you have any further questions please email.