Internet-Draft VCON CC Requirements July 2024
Rosenberg & Siciliano Expires 7 January 2025 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-rosenberg-vcon-cc-usecases-01
Published:
Intended Status:
Informational
Expires:
Authors:
J. Rosenberg
Five9
A. Siciliano
Five9

Contact Center Use Cases and Requirements for VCON

Abstract

This document outlines use cases and requirements for the exchange of VCONs (Virtual Conversation) within contact centers. A VCON is a standardized format for the exchange of call recordings and call metadata. Today, call recordings are exchanged between different systems within the contact center. Often, these are done using proprietary file formats and proprietary APIs. By using VCONs, integration complexity can be reduced.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 7 January 2025.

Table of Contents

1. Introduction

Contact Centers (CC) are a capability provided by companies for the purposes of engaging with their customers. They are staffed by contact center agents, whose job it is to handle these interactions. Interactions include phone calls, emails, texts, and messages delivered through messaging vendors, such as Facebook Messenger and WhatsApp. Contact centers are staffed by human agents whose job it is to handle these interactions. Interactions can be inbound - when the customer initiates the conversation, such as by calling a toll free number for the company. Interactions can be outbound, such as when a company calls a customer for a reminder about an upcoming appointment.

Contact centers are implemented through the usage of software applications. These applications usually include web front ends consumed by agents, managers, supervisors and other persona in the contact center. These are supported by backend servers, which receive the interactions, queue them, distribute them to agents, and handle agent actions like transfers and holds. This functionality is sometimes referred to as the ACD - for Automatic Call Distribution. It is also sometimes called the core, as it represents the primary application in the contact center. Like much other software, the ACD was initially deployed on-premise, but has now largely migrated to cloud-based delivery. These vendors are often called Contact Center as a Service (CCaaS) vendors.

Within the contact center, there are numerous supporting applications that are purchased by companies and need to plug in to the core. These include recording, quality management (QM), and speech analytics (SA). These applications operate by obtaining recordings, along with recording meta-data, from the core. Today, these supporting applications make use of a variety of proprietary APIs to obtain these recordings and their meta-data. This means that the integrations vary from vendor to vendor, and result in incompatibilities, security weaknesses, and lengthy timelines to complete.

Recently, the IETF has begun to explore the standardization of a file format for recordings and recording meta-data, called VCON (Virtual Conversation) [I-D.petrie-vcon]. This document is meant to provide input to the VCON effort by describing the use cases and requirements specifically within the contact center.

2. Types of Supporting Applications

In the contact center, there are several different types of applications which require consumption of recordings. These typically go under the moniker of Workforce Optimization (WFO). This section describes the main ones.

2.1. Recording

Call Recording applications receive call recordings from the core, and then provide long term storage, playback, and search functionality. Recording storage is needed for archival purposes, and is often a requirement to meet compliance regulations in certain industries.

2.2. Quality Management (QM)

Quality Management (QM) applications are used by contact center managers to make sure agents are following guidelines on proper handling of calls. Many consumers are familiar with the greeting played in voice response systems which say, "This call may be monitored for quality and training purposes". That greeting refers specifically to QM applications.

QM applications allow a user - an employee in the quality management group typically - to playback recordings for a particular agent, and then based on that recording, rate them on how they performed. These ratings are made against a questionnaire that defines the rubric against which agents are scored. This rubric will often include questions like, "Did the agent thank the customer for calling and ask them how they can help"? Or, "Did the agent upsell the customer on the newest product?". These scorecards are then shared with the agents and their managers (the supervisors), along with coaching and training materials to handle cases where the agent didn't do well. Originally, scoring was done entirely by humans, and as a consequence, only a handful of calls for each agent could possibly be scored. These were often done by sampling calls at random.

It is also common for QM applications to use speech recognition technology to transcribe calls into text. This allows a call to be scored more quickly, and enables search functions for selection of specific calls that would be good candidates for scoring.

A part of the agent role involves usage of corporate applications, such as ordering, billing, shipping, to handle the customer inquiry. To determine whether agents are using these tools correctly, it is common in the contact center for agents to have desktop recording applications installed. These record the screen content as a video file. Typically, the vendor of the QM software provides the desktop screen recording and backend applications which receive and store the recording. These are then combined with the audio, email, or chat recordings that come from the core. The following shows the flow of recordings in this use case:

+--------+
|Customer|
+--------+
    ^ Real Time
    | Voice
    |
    V    Recording                    +--------+
  +----+ Transfer   +----+  Access    |Quality |
  |Core+----------->| QM +----------->|Manager |
  +----+            +----+            +--------+
    ^ Real Time       ^
    | Voice           |
    |                 | Desktop
    V                 | Recording
+-------+             |
| Agent |-------------+
+-------+

Figure 1: QM Recording Exchanges

In this flow, the customer calls into the contact center, and is connected to the core. Typically this is done through the Session Initial Protocol (SIP) [RFC3261] and the Real-Time Transport Protocol (RTP) [RFC3550]. The call is delivered to the agent, also typically using SIP and RTP. The core will record the call, and then at the end of the call, the recording is transferred to the QM system. During the call, the agent desktop is recorded, and this recording is transferred to the QM system. At a later time, the Quality Manager can log into the QM application, and access the recording, inclusive of the audio, the transcript and the desktop recording.

In practice, there are many variations on this basic exchange. Sometimes, the ACD sends the audio portion of the call to the QM system using real-time streaming, sometimes using SIPREC [RFC7866]. This is then augmented with meta-data using proprietary REST APIs. In other cases, the audio is sent post-call, and similarly, meta-data is obtained using proprietary REST APIs. When transcription takes place, it is most often done by the QM system but not always. In some cases, a transcript is sent from the core to the QM system instead of, or in addition to, the audio recording.

In a similar way, the transfer of the desktop recording from the agent's computer to the QM system can happen in real-time or post-call. Post-call systems will often upload the recording in chunks, sometimes doing so after hours or when the agent is not on a call.

A key considering for this use case is the concept of recording stitching.

In a typical call in the contact center, there are multiple segments, each of which represents a phase of the call. There will be a segment that contains the customer's interaction with the voice response system, where no agents were present. When the customer is connected to an agent, there will be a segment representing the portion of the call where the customer talks to the agent. As the call is conferenced, transferred or held, each corresponds to an additional segment.

The process of assembling together these segments into a complete recording is referred to as stitching. Different stitches are needed depending on the use case. In a QM use case, the quality manager is rating the agent, and thus what matters is the call as seen by that agent. In the case where a call was handled by multiple agents (a common use case in the contact center), a single call would result in two separate stitched recordings - one representing the customer's time with the first agent, and the second with the second agent. This is different than recording use cases as described above, where what matters is the entire call as seen by the customer.

2.3. Speech Analytics

Speech analytics applications provide graphs and dashboards on the content of conversations. For voice calls, this includes metrics like cross-talk, silence durations, and anger, which are computed directly from the voice. Voice calls are often transcribed to text, and further analysis is provided on the text. This might include customer sentiment, frequency of common reasons for call, and so on. These tools will also often provide discovery features, such as word clouds and clustering.

Speech analytics tools are often used to help companies decide which calls should be reviewed for quality management. This is an improvement over pure random based sampling. They are also used to help companies improve their processes in the contact center, identifying areas where agents are inefficient. For example, speech analytics can be used to determine that there has been a spike in customer refund requests, and the agents are taking too long to handle these types of calls.

Architecturally, speech analytics look a lot like recording. At the end of the call, a transcript is sent from the core platform to the speech analytics platform for processing. Meta-data is then fetched.

3. PII and PCI Redaction

A common requirement in contact center use cases is the redaction of payment card information (PCI) and personally identifiable information (PII) from recordings and transcripts. This happens in several ways.

For payment cards, it is common for the agent to transfer the call to dedicated voice response systems whose job is to collect the credit card numbers and process them. This way, the agent never hears this information. Furthermore, the system can be configured to pause the recording so that this particular segment is not recorded. For cases where the agent does collect the credit card information, it is common for systems to have a "pause recording" button that can be triggered manually by the agent to ensure that this content is not recorded. Another common solution is to instrument the website where credit card information is entered, so that when the agent places their mouse into this form, the recording is paused. It would be useful in the VCON to indicate that this particular section of the recording was absent for PCI reasons.

It is also a common request to remove PII information, such as first and last name, street address, email address, and phone numbers, from recordings and from transcripts. In such cases, it is desirable to clearly indicate in the transferred recording that this has happened, so that downstream analytics applications function properly. Just replacing a first name with "XXX" is likely to confuse a word cloud tool in a speech analytics application, and make it think that "XXX" is a common word in the transcript. At the same time, just removing the PII entirely results in transcripts that are improperly formed language, making it harder to process by natural language understanding (NLU) tools.

4. Omni Channel

In contact center, the term "omni channel" is used to refer to the usage of non-voice communications with a customer. Sometimes, this means an email exchange or web chat from a widget on a web page. In other cases, it can involve a combination of voice with these other technologies. For example, a customer might call into the contact center, and then the agent uses SMS to send the customer links to information, or collect information from the customer. In that case, the overall interaction is composed of a voice segment and an SMS segment combined together.

In some cases, video is used in contact center applications. Mostly, this is in support of the "see what I see" case, where the customer uses the front camera on their mobile phone to show something to the contact center agent. For example, a customer might show the agent a part that is broken and needs to be replaced, to help the agent identify which part to send. In other use cases, traditional person-to-person video is used, in high touch support or sales use cases.

Co-browsing is also used in contact center applications. This is sometimes used in support situations, where a customer is having trouble navigating the website. The agent can take control over the browsing experience and get the customer where they need to be. This is different than screen sharing use cases common in meetings.

As it relates to recording, all of these additional channels need to be included in the VCON.

5. Deployment Topologies

As one might imagine, there are a variety of deployment topologies for these applications, mixing and matching on-premise vs cloud delivery. The core platform can be delivered on-premise, or via cloud. The supporting applications can also be delivered on-premise or via cloud. In the cloud delivery model, they can be co-resident with the core application (meaning, the vendor of the core service also deploys and operates the supporting application), or be delivered via different cloud services.

6. Required Meta-Data

This section enumerates the meta-data which needs to be transferred from the core application to recording, QM and speech analytics applications. This represents the information that is transferred today between the core and supporting applications.

7. Informative References

[I-D.petrie-vcon]
Petrie, D. and T. McCarthy-Howe, "The JSON format for vCon - Conversation Data Container", Work in Progress, Internet-Draft, draft-petrie-vcon-03, , <https://datatracker.ietf.org/doc/html/draft-petrie-vcon-03>.
[RFC3261]
Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, DOI 10.17487/RFC3261, , <https://www.rfc-editor.org/info/rfc3261>.
[RFC3550]
Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, , <https://www.rfc-editor.org/info/rfc3550>.
[RFC7866]
Portman, L., Lum, H., Ed., Eckel, C., Johnston, A., and A. Hutton, "Session Recording Protocol", RFC 7866, DOI 10.17487/RFC7866, , <https://www.rfc-editor.org/info/rfc7866>.

Authors' Addresses

Jonathan Rosenberg
Five9
Andrew Siciliano
Five9