For this project we are going to use the Amazon AWS Transcribe service, AWS Transcribe is a cloud-based speech recognition service that converts audio recordings into accurate text transcripts. It uses advanced machine learning algorithms to identify different speakers and punctuation, while also supporting a variety of audio formats and languages. AWS Transcribe can transcribe audio from sources such as phone calls, video recordings, and live streams, making it a versatile tool thats idealy suited for voicemail transcription, The service is highly scalable and cost-effective.
We will say that we used to use Google’s Text to speech engine for thsi but over time I would have expected quality of transcription to have improved, But with Google this is not the case, and I expect this is because they possibly use “predictive” text to speech and not sample all the words as this example below shows, This is the same audio fed to Google and AWS
Amazon AWS Transcribe
Um, this is Ian. I’d like to order some pizza for tomorrow, please. We would like to order a pepperoni pizza and a mozzarella pizza that’s for tomorrow at five PM. Thank you.
Google Speech to Text
like to order some pizza for tomorrow please would like to order a pepperoni pizza and a mozzarella Pizza Hut for tomorrow at 5 a.m. thank you
As can be seen google misses words and adds others, As you can imagine this isnt what you want with speech transcription.
So we have switched out old script to use AWS.
For this project on Freepbx you need a few extra applications added and a amazon aws account, setting this up is not covered here as you should already have knowledge of this if you are here.
The extra apps are , aws , jq , sox
to get aws :
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip -qq awscliv2.zip ./aws/install Then you need to configure as 'root' and as 'asterisk', so: aws configure fill out your aws key and token as well as the region your bucket is in Then repeat as 'asterisk' so su asterisk aws configure and fill out same details.
for jq and sox, just yum install xxx as you would for any other program.
Next you need the asterisk dialplan added to the extensions_custom.conf
[vmail2text] exten => _XXXX,1,Set(__EXTTOCALL=${EXTEN}) exten => _XXXX,n,Noop(${EXTTOCALL}) exten => _XXXX,n,Goto(s,1) exten => s,1,Answer() ; Listen to ringing for 1 seconds exten => s,n,Set(AGC(rx)=8000) exten => s,n,Set(DENOISE(rx)=on) exten => s,n,Noop(${EXTTOCALL} , ${DIALSTATUS} , ${SV_DIALSTATUS}) exten => s,n,GotoIf($["${DIALSTATUS}"="BUSY"]?busy:bnext) exten => s,n(busy),Set(greeting=busy) exten => s,n,Goto(carryon) exten => s,n(bnext),GotoIf($["${DIALSTATUS}"="NOANSWER"]?unavail:unext) exten => s,n(unavail),Set(greeting=unavail) exten => s,n,Goto(carryon) exten => s,n(unext),Set(greeting=unavail) exten => s,n,Goto(carryon) exten => s,n(carryon),Set(origmailbox=${EXTTOCALL}) exten => s,n,Set(msg=${STAT(e,${ASTSPOOLDIR}/voicemail/default/${origmailbox}/${greeting}.wav)}) exten => s,n,Set(__start=0) exten => s,n,Set(__end=0) exten => s,n,NoOp(${UNIQUEID}) exten => s,n,Set(origdate=${STRFTIME(${EPOCH},,%a %b %d %r %Z %G)}) exten => s,n,Set(origtime=${EPOCH}) exten => s,n,Set(callerchan=${CHANNEL}) exten => s,n,Set(callerid=${CALLERID(num)}) exten => s,n,Set(origmailbox=${origmailbox}) exten => s,n,Answer() exten => s,n,GotoIf($["${msg}"="1"]?msgy:msgn) exten => s,n(msgy),Playback(${ASTSPOOLDIR}/voicemail/default/${origmailbox}/${greeting});(local/catreq/how_did) exten => s,n,Goto(beep) exten => s,n(msgn),Playback(vm-intro) exten => s,n(beep),System(/bin/touch /var/lib/asterisk/sounds/catline/${UNIQUEID}.wav) exten => s,n,Playback(beep) exten => s,n,Set(__start=${EPOCH}) exten => s,n,Record(catline/${UNIQUEID}.wav,3,60,kaqu) exten => s,n,Playback(beep) exten => s,n,Hangup() exten => h,1,Noop(${start} ${end}) exten => h,n,GotoIf($["${start}"!="0"]?ok:end) exten => h,n(ok),Set(end=${EPOCH}) exten => h,n,Set(duration=${MATH(${end}-${start},int)}) exten => h,n,System(/usr/local/sbin/vmailprox.sh "${callerchan}" ${callerid} "${origdate}" ${origtime} ${origmailbox} ${UNIQUEID} ${duration}) exten => h,n(end),Noop(finished)
as can be seen this dialplan records a file and then runs the vmailprox.sh script. This script collects the variables and passes them over to the main script and exits after doing so, this is so channels aren’t held while transcription takes place. (Thats the plan anyway)
#!/bin/sh callerchan=$1 callerid=$2 origdate=$3 origtime=$4 origmailbox=$5 origdir=$6 duration=$7 export callerchan export callerid export origdate export origtime export origmailbox export origdir export duration nohup /usr/local/sbin/quietvmail.sh & exit
Main script
#!/bin/sh PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" S3_BUCKET="YOURS3BUCKET" DIRPATH=/var/spool/asterisk/voicemail/default/ #callerchan=$1 #callerid=$2 #origdate=$3 #origtime=$4 #origmailbox=$5 #origdir=$6 #duration=$7 counter=1 sleep 4 FILENUM=$(/bin/ls ${DIRPATH}${origmailbox}/INBOX |/bin/grep txt | /usr/bin/wc -l) ##Added to allow 999 messages if (( $FILENUM <= 9 )); then FILENAME=msg000${FILENUM} elif (( $FILENUM <= 99 )); then FILENAME=msg00${FILENUM} else FILENAME=msg0${FILENUM} fi IN=$(/bin/grep "${origmailbox}=" /etc/asterisk/voicemail.conf) set -- "$IN" IFS=","; declare -a Array=($*) email=${Array[2]} /bin/echo "[message]" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo origmailbox=${origmailbox} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo "context=demo" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo "macrocontext=" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo "exten=s" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo "priority=11" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo callerchan=${callerchan} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo callerid=${callerid} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo origdate=${origdate} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo origtime=${origtime} >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo msg_id=${origtime}-00000001 >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo "flag=" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo "category=" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/echo "duration=${duration}" >> ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.txt /bin/nice /usr/bin/lame -b 16 -m m -q 9-resample /var/lib/asterisk/sounds/catline/${origdir}.wav /tmp/${origdir}.mp3 # Create a string based on the current date and time current_date_time="$(date +%Y-%m-%d_%H-%M-%S)" # Upload to the S3 Bucket aws --debug --profile default s3 cp /tmp/${origdir}.mp3 s3://$S3_BUCKET/$current_date_time # Start the transcription job output=$(aws --profile default transcribe start-transcription-job \ --transcription-job-name $current_date_time \ --language-code en-GB \ --media-format mp3 \ --media MediaFileUri=s3://$S3_BUCKET/$current_date_time \ --output-bucket-name $S3_BUCKET) # Wait for the transcription to finish JOB_COMPLETED=false while [ "$JOB_COMPLETED" = false ]; do JOB_STATUS=$(aws --profile default transcribe get-transcription-job \ --transcription-job-name $current_date_time \ --query 'TranscriptionJob.TranscriptionJobStatus' \ --output text) if [ "$JOB_STATUS" = "FAILED" ]; then JOB_COMPLETED=true SHORT_CALL=yes /bin/echo "$JOB_STATUS" >> /tmp/logfile.txt break fi if [ "$JOB_STATUS" = "COMPLETED" ]; then /bin/echo "$JOB_STATUS" >> /tmp/logfile.txt JOB_COMPLETED=true else ((counter++)) sleep 5 echo $counter >> /tmp/logfile.txt /bin/echo "$JOB_STATUS" >> /tmp/logfile.txt if [ "$counter" -eq "15" ]; then JOB_STAUS=COMPLETED JOB_COMPLETED=true SHORT_CALL=yes break fi fi done # Get the transcription result aws s3 --profile default cp s3://$S3_BUCKET/$current_date_time.json /tmp/$current_date_time.json # Get the transcription result FILTERED=$(jq -r '.results.transcripts[].transcript' /tmp/$current_date_time.json) # append result of transcription if [ -z "$FILTERED" ] then echo "(AWS was unable to recognize any speech in audio data.)" >> /tmp/${origdir}.txt else echo "$FILTERED" >> /tmp/${origdir}.txt sed -i 's/ Um,/ /gI' /tmp/${origdir}.txt fi voicemailbody=$(cat "/tmp/${origdir}.txt") # echo "body ${voicemailbody}" /bin/cp /var/lib/asterisk/sounds/catline/${origdir}.wav ${DIRPATH}${origmailbox}/INBOX/${FILENAME}.wav echo -e "You have a new voicemail from ${callerid} it was left on ${origdate} and is ${duration} seconds long,\nThe message left,\n\n${voicemailbody}\n\nTranscribed by the Amazon AWS Transcribe service\n" | /bin/mail -s "A new voicemail has arrived from ${callerid}" -a "/tmp/${origdir}.mp3" "$email" /bin/rm -f /tmp/${origdir}.mp3 /bin/rm -f /tmp/${origdir}.txt aws --profile default transcribe delete-transcription-job --transcription-job-name $current_date_time
Then to pass calls to this and not normal voicemail, In Freepbx create a Custom Destination as “vmail2text,s,1” and if you require certain queues to go to specific mailboxes for example 2000 one like “vmail2text,2000,1” so calls will be sent to mailbox 2000 and teh transcriptions will be sent to the email address linked to that extension
Then in extensions that want to use transcription set the “Optional Destinations” in the advanced tab to the custom destination.
Users also can listen to voicemail normally from their handset or the ucp.
These scripts arent only useful for voicemail then can be used fro questionnaire lines and booking lines, anywhere you want to speed up the handling of voice messages. We will soon be looking at ways of integrating this with Whatsapp so transcriptions can be sent to your mobile.
If you have any further questions please email.