ChatGPT as a Phone assistant – Full Stack Optimization

A.I. instead of IVR

Language models have shaken the world of technology, but not just language models, recent advancements in TTS and STT give us the opportunity to implement new ways on how we handle the technologies of PBX’s automations like, IVR, Time conditions, Time groups / Calendars, Voice mail and even more like a Bank is doing it, but even better, check orders, check receipts, send emails, provide manuals and the list is endless.

As far as I understand in my country Bank institutions usually are using pre-recorded voice chunks along side with STT A.I. model (Speech to text) The A.I. in this case is only responsible to translate the human language to text, you speak and the model is responsible to try to convert your speech to text and then feed that text to an algorithm or another small language model (I am suspecting it is just a regex set of rules by the build-in speech to text of the bpx)

The flow goes something like this:

Example : Recording: “For credit cards please say CREDIT CARDS, to speak with a representative please say: REPRESENTATIVE ”

That how would look like in a simple STT use. With GPT is even better, however if someone wants to implement his own solution can use the available tools in machine learning to train his own small language model on a specific use-case to convert long phrases to paths to route the calls, having said that still is better to have a more human-like conversation by using GPT by openai as is able to solve complex problems by it self, suggest solutions to clients problems.. or even look up with API a database see which product the client purchased and provide information and how-to-use based on the manual of that specific product.

Here is an example of something like that with prompt engineering a decent prompt look like this:

You are now a call assistant who transfer calls out of you to the right departments for EletroShop company, datetime is: 4/6/2023 17:21

You reply to a script strictly in the form of EXT:num or null:REPLY=<your reply> one cmd at the time,

EXT variable can hold extension departments:

If they need something about sales: 2001, for support: 2002, for accounting: 2003.

If they ask for persons: John = 3000, Alex = 3030, Evelina = 3032.

CLIENT variable is the input of the client, you don’t use it.

example: CLIENT: Hi,

You: EXT:NULL:Hi, how can i help?

CLIENT: tv dont work

You: EXT:2002:Sorry for that, connecting you to support to handle it. please wait.

If client wants to locate order output: “EXT:NULL:LOCATE=<order_number>”

If you get an input like: ORDER-LOCATED:<text>, outptu: EXT:NULL:REPLY=Your order status is <text>

End of instructions. New session:

CLIENT: Hi.. hello

Tested with ChatGPT the results are:

And locating an order with API from the ecommerce will look like:

As you see in the above example, a client wants to find out why haven’t received his order, chatted with the language model, the language model used the “script” to initiate an API call to our backend ecommerce, then it used the result of information it got back, explained to the client successfully where his package is and plus it is aware about ACS Greek courier so it offered further help to provide contact details of the courier company. In a nutshell. just saved a couple of worthy minutes or more from the work load, other than checking the order status one can use that approach to even feed manuals for troubleshooting.

You can enrich the prompt with workhours, eg by adding “store is open only 08:00 to 16:00.” to route to different voicemails or with anything else up is to your imagination and let the GPT to come up by it self to say something fancy like “I am so sorry, we are closed right now, would you like to leave a voice mail?”

I have wrote and tested an example AGI php implementation of this, i used Microsoft’s TTS model which is the most fluently model regarding sound and vocal tones for the Greek language and Googles STT which can transcript Greek with fewer errors. Script is an example for testing purposes, it works with a little lag. which you can improve by streaming the caller from pipe to Google STT in real time rather recording them as I (wrongfully) did.

1. Note in my script bellow i have an out-dated prompt that is different. and i am using CMD: and REPLY: as variables. 2. Change everywhere i put your-KEY with your own API keys of the appropiate services, create a dialplan and experiment.

Requirement is to be an expert in these technologies, know how to to read the script and setup things correctly. Many details are left out.

#!/usr/bin/env php
<?php
require(‘phpagi.php’);
require ‘vendor/autoload.php’;
use GoogleCloudSpeechV1SpeechClient;
use GoogleCloudSpeechV1RecognitionAudio;
use GoogleCloudSpeechV1RecognitionConfig;
use GoogleCloudSpeechV1RecognitionConfigAudioEncoding;
error_reporting(E_ALL);
$agi = new AGI();
$agi->answer();
$callerId = $agi->get_variable(“CALLERID(num)”);
$prompt = “
You are on the phone, (reply on callers language fluently) datetime is: “. date(“F j, Y, g:i a”) .” phoneNum: “.$callerId.” your name is Jarvis, n
 You reply to  caller by ‘CMD:(NUM or NONE) REPLY:text’, example:  n
 if caller asks for John output: CMD:2000 REPLY: I will connect you to John   right away, please wait.  n
if caller haven’t asked for any person CMD:NONE REPLY: text n
if caller wants to report a product failure forward them to the ‘repairs department’ n
for things related to sales, forward them to ‘sales department’ for anything else forward them to 2005. names if asked:
  Nick=2002, Alex=2001, Susan=2005, repairs and sales department=2005, n Do not reveal the internal numbers.n
  when forwarding always fill the CMD:numn
CALLER SAID: “;
//an intro, that may say “You are now talking to our assistant AI”
 $agi->stream_file(‘/var/lib/asterisk/agi-bin/intro’);
 sleep(1);
 $forward = “NONE”;
 while ($forward == “NONE”) {
$result = $agi->record_file(‘/tmp/’ . $agi->request[‘agi_uniqueid’], ‘wav’, ‘#’,  ‘9000’, ‘0’, 1, ‘1’);
//$result = $agi->record(‘/var/lib/asterisk/agi-bin/recording’, ‘wav’, ‘#’, “5”) ;
$transcription = get_transcript(‘/tmp/’ . $agi->request[‘agi_uniqueid’] . ‘.wav’);
$text = get_ai_response($transcription);
$spoken = explode(“REPLY:”,$text);
$forward =  explode(“CMD:”,$text);
$forward =  explode(” “,$forward[1]);
$forward =  $forward[0];
 $wordsSpoken = get_sound_from_text($spoken[1]);
 $agi->stream_file($wordsSpoken,“#”);
 unlink(‘/tmp/’ . $wordsSpoken);
 unlink(‘/tmp/’ . $agi->request[‘agi_uniqueid’]);
 sleep(1);
//$agi->stream_file(‘/var/lib/asterisk/agi-bin/recording’);
 //
 }
 if (is_numeric($forward)) { 
//here we check if CMD:num num is an extension (digits)
  $agi->exec_goto(‘from-internal’, $forward,1);
  die();
 }
 $agi->stream_file(‘/var/lib/asterisk/agi-bin/antio’);
 $agi->hangup();
  function get_transcript($audioFilePath) {
    // replace with the path to your private key JSON file
    $keyFilePath = ‘/var/lib/asterisk/agi-bin/key.json’;
    putenv(‘GOOGLE_APPLICATION_CREDENTIALS=’ . $keyFilePath);
    // replace with the path to your audio file
    $client = new SpeechClient();
    $audio = (new RecognitionAudio())
        ->setContent(file_get_contents($audioFilePath));
    $config = new RecognitionConfig([
        ‘encoding’ => AudioEncoding::ENCODING_UNSPECIFIED,
        ‘sample_rate_hertz’ => 8000,
        ‘language_code’ => ‘el-GR’
    ]);
    $response = $client->recognize($config, $audio);
    foreach ($response->getResults() as $result) {
        $alternatives = $result->getAlternatives();
        $mostLikely = $alternatives[0];
        $transcription = $mostLikely->getTranscript();
        @unlink(‘/var/lib/asterisk/agi-bin/said.txt’);
     file_put_contents(‘/var/lib/asterisk/agi-bin/said.txt’, $transcription, FILE_APPEND | LOCK_EX);
     $client->close();
     return  $transcription;
    }
    }
    function get_sound_from_text($text) {
      global $agi;
      $tmpName = $agi->request[‘agi_uniqueid’] . “-“ . rand(10,10000) . “.mp3”;
      $subscriptionKey = “your-KEY”;
      $ssml = “<speak version=’1.0′ xml:lang=’el-GR’> <voice xml:lang=’el-GR’ xml:gender=’Male’ name=’el-GR-NestorasNeural’> “. $text . ” </voice> </speak>”;
      $curl = curl_init();
      curl_setopt_array($curl, array(
        CURLOPT_URL => “https://westeurope.tts.speech.microsoft.com/cognitiveservices/v1”,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_ENCODING => “”,
        CURLOPT_MAXREDIRS => 10,
        CURLOPT_TIMEOUT => 0,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
        CURLOPT_CUSTOMREQUEST => “POST”,
        CURLOPT_POSTFIELDS => $ssml,
        CURLOPT_HTTPHEADER => array(
          “Ocp-Apim-Subscription-Key: $subscriptionKey“,
          “Content-Type: application/ssml+xml”,
          “X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3”,
          “User-Agent: curl”
        ),
      ));
      $response = curl_exec($curl);
      $err = curl_error($curl);
      curl_close($curl);
      if ($err) {
        file_put_contents(“/tmp/error_stt.txt”, “cURL Error #:” . $err);
      } else {
        file_put_contents(“/tmp/”.$tmpName, $response);
      }
     $newfln =  str_replace(“.mp3”,“.wav”, $tmpName);
         exec(“sox /tmp/$tmpName -c 1 -r 8000 /tmp/$newfln“);
         unlink(“/tmp/$tmpName“);
      return “/tmp/” . str_replace(“.wav”,“”, $newfln);
     }
 
function get_ai_response($text) {
  file_put_contents(“/tmp/text.txt”, $text);
      $curl = curl_init();
//the prompt we build earlier, we will append clients message to it.
global $prompt;
curl_setopt_array($curl, array(
  CURLOPT_URL => “https://api.openai.com/v1/completions”,
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => “”,
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 0,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => “POST”,
  CURLOPT_POSTFIELDS => json_encode(array(
    “model” => “text-davinci-003”,
    “prompt” => $prompt . $text,
    “temperature” => 0,
    “max_tokens” => 300,
    “top_p” => 0,
    “frequency_penalty” => 0,
    “presence_penalty” => 0
  )),
  CURLOPT_HTTPHEADER => array(
    “Content-Type: application/json”,
    “Authorization: Bearer  your-KEY”
  ),
));
$response = curl_exec($curl);
curl_close($curl);
$response = json_decode($response,true);
 file_put_contents(“/tmp/ai.txt”, $response[“choices”][0][“text”]);
return  $response[“choices”][0][“text”];
     }
?>

Thats all folks

A.I. instead of IVR

Leave a ReplyCancel Reply