{"id":3039,"date":"2023-05-04T15:15:20","date_gmt":"2023-05-04T14:15:20","guid":{"rendered":"https:\/\/www.cyber-cottage.co.uk\/?p=3039"},"modified":"2023-09-12T23:27:48","modified_gmt":"2023-09-12T22:27:48","slug":"transcribing-voicemails-to-text-with-amazon-aws","status":"publish","type":"post","link":"https:\/\/www.cyber-cottage.co.uk\/?p=3039","title":{"rendered":"Transcribing voicemails to text with Amazon AWS"},"content":{"rendered":"\n<p>For this project we are going to use the Amazon AWS Transcribe service, AWS Transcribe is a cloud-based speech recognition service that converts audio recordings into accurate text transcripts. It uses advanced machine learning algorithms to identify different speakers and punctuation, while also supporting a variety of audio formats and languages. AWS Transcribe can transcribe audio from sources such as phone calls, video recordings, and live streams, making it a versatile tool thats idealy suited for voicemail transcription, The service is highly scalable and cost-effective.<\/p>\n\n\n\n<p>We will say that we used to use Google&#8217;s Text to speech engine for thsi but over time I would have expected quality of transcription to have improved, But with Google this is not the case, and I expect this is because they possibly use &#8220;predictive&#8221; text to speech and not sample all the words as this example below shows, This is the same audio fed to Google and AWS<\/p>\n\n\n\n<p><strong>Amazon AWS Transcribe<\/strong><\/p>\n\n\n\n<p><em>Um, this is Ian. I&#8217;d like to order some pizza for tomorrow, please. We would like to order a pepperoni pizza and a mozzarella pizza that&#8217;s for&nbsp; tomorrow at five PM. Thank you.<\/em><\/p>\n\n\n\n<p><strong>Google Speech to Text<\/strong><\/p>\n\n\n\n<p><em>like to order some pizza for tomorrow please would like to order a pepperoni pizza and a mozzarella Pizza Hut for tomorrow at 5 a.m. thank you<\/em><\/p>\n\n\n\n<p>As can be seen google misses words and adds others, As you can imagine this isnt what you want with speech transcription.<\/p>\n\n\n\n<p>So we have switched out old script to use AWS.<\/p>\n\n\n\n<p>For this project on Freepbx you need a few extra applications added and a amazon aws account, setting this up is not covered here as you should already have knowledge of this if you are here.<\/p>\n\n\n\n<p>The extra apps are , aws , jq , sox<\/p>\n\n\n\n<p>to get aws :<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">curl \"https:\/\/awscli.amazonaws.com\/awscli-exe-linux-x86_64.zip\" -o \"awscliv2.zip\"\nunzip -qq awscliv2.zip\n.\/aws\/install\n\nThen you need to configure as 'root' and as 'asterisk', so:\n<em><strong>aws configure<\/strong><\/em>\nfill out your aws key and token as well as the region your bucket is in \nThen repeat as 'asterisk' so\n<strong><em>su asterisk<\/em><\/strong>\n<strong><em>aws configure<\/em><\/strong>\nand fill out same details.<\/pre>\n\n\n\n<p>for jq and sox, just yum install xxx as you would for any other program.<\/p>\n\n\n\n<p>Next you need the asterisk dialplan added to the extensions_custom.conf<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[vmail2text]\nexten =&gt; _XXXX,1,Set(__EXTTOCALL=${EXTEN})\nexten =&gt; _XXXX,n,Noop(${EXTTOCALL})\nexten =&gt; _XXXX,n,Goto(s,1)\n\nexten =&gt; s,1,Answer()  ; Listen to ringing for 1 seconds\nexten =&gt; s,n,Set(AGC(rx)=8000)\nexten =&gt; s,n,Set(DENOISE(rx)=on)\nexten =&gt; s,n,Noop(${EXTTOCALL} , ${DIALSTATUS} , ${SV_DIALSTATUS})\nexten =&gt; s,n,GotoIf($[\"${DIALSTATUS}\"=\"BUSY\"]?busy:bnext)\nexten =&gt; s,n(busy),Set(greeting=busy)\nexten =&gt; s,n,Goto(carryon)\nexten =&gt; s,n(bnext),GotoIf($[\"${DIALSTATUS}\"=\"NOANSWER\"]?unavail:unext)\nexten =&gt; s,n(unavail),Set(greeting=unavail)\nexten =&gt; s,n,Goto(carryon)\nexten =&gt; s,n(unext),Set(greeting=unavail)\nexten =&gt; s,n,Goto(carryon)\n\nexten =&gt; s,n(carryon),Set(origmailbox=${EXTTOCALL})\nexten =&gt; s,n,Set(msg=${STAT(e,${ASTSPOOLDIR}\/voicemail\/default\/${origmailbox}\/${greeting}.wav)})\nexten =&gt; s,n,Set(__start=0)\nexten =&gt; s,n,Set(__end=0)\nexten =&gt; s,n,NoOp(${UNIQUEID})\nexten =&gt; s,n,Set(origdate=${STRFTIME(${EPOCH},,%a %b %d %r %Z %G)})\nexten =&gt; s,n,Set(origtime=${EPOCH})\nexten =&gt; s,n,Set(callerchan=${CHANNEL})\nexten =&gt; s,n,Set(callerid=${CALLERID(num)})\nexten =&gt; s,n,Set(origmailbox=${origmailbox})\nexten =&gt; s,n,Answer()\nexten =&gt; s,n,GotoIf($[\"${msg}\"=\"1\"]?msgy:msgn)\nexten =&gt; s,n(msgy),Playback(${ASTSPOOLDIR}\/voicemail\/default\/${origmailbox}\/${greeting});(local\/catreq\/how_did)\nexten =&gt; s,n,Goto(beep)\nexten =&gt; s,n(msgn),Playback(vm-intro)\nexten =&gt; s,n(beep),System(\/bin\/touch \/var\/lib\/asterisk\/sounds\/catline\/${UNIQUEID}.wav)\nexten =&gt; s,n,Playback(beep)\nexten =&gt; s,n,Set(__start=${EPOCH})\nexten =&gt; s,n,Record(catline\/${UNIQUEID}.wav,3,60,kaqu)\nexten =&gt; s,n,Playback(beep)\nexten =&gt; s,n,Hangup()\nexten =&gt; h,1,Noop(${start} ${end})\nexten =&gt; h,n,GotoIf($[\"${start}\"!=\"0\"]?ok:end)\nexten =&gt; h,n(ok),Set(end=${EPOCH})\nexten =&gt; h,n,Set(duration=${MATH(${end}-${start},int)})\n<strong>exten =&gt; h,n,System(\/usr\/local\/sbin\/vmailprox.sh \"${callerchan}\" ${callerid} \"${origdate}\" ${origtime} ${origmailbox} ${UNIQUEID} ${duration})<\/strong>\nexten =&gt; h,n(end),Noop(finished)\n<\/pre>\n\n\n\n<p>as can be seen this dialplan records a file and then runs the vmailprox.sh script. This script collects the variables and passes them over to the main script and exits after doing so, this is so channels aren&#8217;t held while transcription takes place. (Thats the plan anyway)<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">#!\/bin\/sh\ncallerchan=$1\ncallerid=$2\norigdate=$3\norigtime=$4\norigmailbox=$5\norigdir=$6\nduration=$7\nexport callerchan\nexport callerid\nexport origdate\nexport origtime\nexport origmailbox\nexport origdir\nexport duration\n\nnohup \/usr\/local\/sbin\/quietvmail.sh &amp;\nexit\n<\/pre>\n\n\n\n<p>Main script<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">#!\/bin\/sh\nPATH=\"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin\"\nS3_BUCKET=\"YOURS3BUCKET\"\nDIRPATH=\/var\/spool\/asterisk\/voicemail\/default\/\n#callerchan=$1\n#callerid=$2\n#origdate=$3\n#origtime=$4\n#origmailbox=$5\n#origdir=$6\n#duration=$7\ncounter=1\nsleep 4\n\nFILENUM=$(\/bin\/ls ${DIRPATH}${origmailbox}\/INBOX |\/bin\/grep txt | \/usr\/bin\/wc -l)\n\n##Added to allow 999 messages\nif  (( $FILENUM &lt;= 9 ));\nthen\nFILENAME=msg000${FILENUM}\nelif (( $FILENUM &lt;= 99 ));\nthen\nFILENAME=msg00${FILENUM}\nelse\nFILENAME=msg0${FILENUM}\nfi\n\nIN=$(\/bin\/grep \"${origmailbox}=\" \/etc\/asterisk\/voicemail.conf)\nset -- \"$IN\"\nIFS=\",\"; declare -a Array=($*)\nemail=${Array[2]}\n\n\/bin\/echo \"[message]\" >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo origmailbox=${origmailbox} >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo \"context=demo\" >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo \"macrocontext=\" >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo \"exten=s\" >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo \"priority=11\" >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo callerchan=${callerchan} >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo callerid=${callerid} >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo origdate=${origdate} >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo origtime=${origtime} >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo msg_id=${origtime}-00000001  >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo \"flag=\"  >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo \"category=\" >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\/bin\/echo \"duration=${duration}\" >> ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.txt\n\n    \/bin\/nice \/usr\/bin\/lame -b 16 -m m -q 9-resample \/var\/lib\/asterisk\/sounds\/catline\/${origdir}.wav  \/tmp\/${origdir}.mp3\n\n    # Create a string based on the current date and time\n    current_date_time=\"$(date +%Y-%m-%d_%H-%M-%S)\"\n\n    # Upload to the S3 Bucket\n\n    aws --debug --profile default s3 cp \/tmp\/${origdir}.mp3 s3:\/\/$S3_BUCKET\/$current_date_time\n\n    # Start the transcription job\n    output=$(aws --profile default transcribe start-transcription-job \\\n    --transcription-job-name $current_date_time \\\n    --language-code en-GB \\\n    --media-format mp3 \\\n    --media MediaFileUri=s3:\/\/$S3_BUCKET\/$current_date_time \\\n    --output-bucket-name $S3_BUCKET)\n\n    # Wait for the transcription to finish\n    JOB_COMPLETED=false\n    while [ \"$JOB_COMPLETED\" = false ]; do\n        JOB_STATUS=$(aws --profile default transcribe get-transcription-job \\\n            --transcription-job-name $current_date_time \\\n            --query 'TranscriptionJob.TranscriptionJobStatus' \\\n            --output text)\n            \n        if [ \"$JOB_STATUS\" = \"FAILED\" ]; then\n                JOB_COMPLETED=true\n                SHORT_CALL=yes\n                \/bin\/echo \"$JOB_STATUS\"    >> \/tmp\/logfile.txt\n                break\n        fi\n\n        if [ \"$JOB_STATUS\" = \"COMPLETED\" ]; then\n                \/bin\/echo \"$JOB_STATUS\"    >> \/tmp\/logfile.txt\n                JOB_COMPLETED=true\n        else\n\n        ((counter++))\n        sleep 5\n        echo $counter    >> \/tmp\/logfile.txt\n        \/bin\/echo \"$JOB_STATUS\"    >> \/tmp\/logfile.txt\n                \n                if [ \"$counter\" -eq  \"15\" ]; then\n                JOB_STAUS=COMPLETED\n                JOB_COMPLETED=true\n                SHORT_CALL=yes\n                break\n                fi\n        fi\n    done\n\n\n    # Get the transcription result\n    aws s3 --profile default cp s3:\/\/$S3_BUCKET\/$current_date_time.json \/tmp\/$current_date_time.json \n\n    # Get the transcription result\n    FILTERED=$(jq -r '.results.transcripts[].transcript' \/tmp\/$current_date_time.json)\n                   \n        # append result of transcription\n        if [ -z \"$FILTERED\" ]\n        then\n          echo \"(AWS was unable to recognize any speech in audio data.)\" >> \/tmp\/${origdir}.txt\n        else\n          echo \"$FILTERED\" >> \/tmp\/${origdir}.txt\n          sed -i 's\/ Um,\/ \/gI' \/tmp\/${origdir}.txt\n        fi\n\nvoicemailbody=$(cat \"\/tmp\/${origdir}.txt\")\n\n# echo \"body ${voicemailbody}\"\n\n\/bin\/cp \/var\/lib\/asterisk\/sounds\/catline\/${origdir}.wav ${DIRPATH}${origmailbox}\/INBOX\/${FILENAME}.wav\n\necho -e \"You have a new voicemail from ${callerid} it was left on ${origdate} and is ${duration} seconds long,\\nThe message left,\\n\\n${voicemailbody}\\n\\nTranscribed by the Amazon AWS Transcribe service\\n\" | \/bin\/mail -s \"A new voicemail has arrived from ${callerid}\" -a \"\/tmp\/${origdir}.mp3\" \"$email\"\n\n\/bin\/rm -f \/tmp\/${origdir}.mp3\n\/bin\/rm -f \/tmp\/${origdir}.txt\naws --profile default transcribe delete-transcription-job --transcription-job-name $current_date_time\n<\/pre>\n\n\n\n<p>Then to pass calls to this and not normal voicemail, In Freepbx create a <strong>Custom Destination<\/strong> as \u201cvmail2text,s,1\u201d and if you require certain queues to go to specific mailboxes for example 2000 one like \u201cvmail2text,2000,1\u201d so calls will be sent to mailbox 2000 and teh transcriptions will be sent to the email address linked to that extension<\/p>\n\n\n\n<p>Then in extensions that want to use transcription set the \u201cOptional Destinations\u201d in the advanced tab to the custom destination.<\/p>\n\n\n\n<p>Users also can listen to voicemail normally from their handset or the ucp.<\/p>\n\n\n\n<p>These scripts arent only useful for voicemail then can be used fro questionnaire lines and booking lines, anywhere you want to speed up the handling of voice messages. We will soon be looking at ways of integrating this with Whatsapp so transcriptions can be sent to your mobile.<\/p>\n\n\n\n<p>If you have any further questions please email.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p> <\/p>\n","protected":false},"excerpt":{"rendered":"<p>For this project we are going to use the Amazon AWS Transcribe service, AWS Transcribe is a cloud-based speech recognition service that converts audio recordings into accurate text transcripts. It uses advanced machine learning algorithms to identify different speakers and punctuation, while also supporting a variety of audio formats and languages. AWS Transcribe can transcribe [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3028,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[106,2,11,4],"tags":[232,23,233,33,40,51,236,73,100,235,76],"class_list":["post-3039","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-applications","category-blog","category-knowledge","category-products-and-services","tag-amazon","tag-asterisk","tag-aws","tag-digium","tag-freepbx","tag-linux","tag-speech-recognition-2","tag-support","tag-technical","tag-voice-to-text","tag-voip"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.cyber-cottage.co.uk\/wp-content\/uploads\/2023\/03\/missedcalls.jpeg?fit=300%2C168&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p5daZy-N1","jetpack_sharing_enabled":true,"jetpack_likes_enabled":false,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/3039","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3039"}],"version-history":[{"count":8,"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/3039\/revisions"}],"predecessor-version":[{"id":3060,"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/3039\/revisions\/3060"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=\/wp\/v2\/media\/3028"}],"wp:attachment":[{"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3039"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3039"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.cyber-cottage.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3039"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}