Internet-Draft draft-zhou-rtgwg-perceptive-routing-info October 2024
Zhou, et al. Expires 21 April 2025 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-zhou-rtgwg-perceptive-routing-information-00
Published:
Intended Status:
Experimental
Expires:
Authors:
T. Zhou
Huawei
D. Li
Tsinghua University
X. Geng
Huawei

Perceptive Routing Information Model

Abstract

This docuement defines the information model for perceptive routing, which could serve as a foundational component in the implementation of perceptive routing.

Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 21 April 2025.

Table of Contents

1. Introduction

In a lot of scenarios, especailly in DC, adaptive routing has emerged as a crucial technique for enhancing network performance and resilience. Traditional routing methods, which rely on static or pre-defined paths, often struggle to cope with rapidly changing network conditions, such as link failures, congestion, and varying traffic demands. Adaptive routing addresses these challenges by allowing routing decisions to be adjusted in real time, based on the current state of the network.

Adaptive routing systems like Perceptive Routing (PR) continuously monitor network parameters, such as port status, congestion levels, and link SLAs, to make informed decisions that improve traffic distribution and fault tolerance. A standardized information model could abstract the essential properties and relationships within the system, allowing different implementations to interact seamlessly. This model offers a common information model for representing the state of the network, allowing devices to communicate critical information such as failures, congestion, and optimal paths, facilitating dynamic and automated decision-making.

This docuement defines the information model for perceptive routing, which could serve as a foundational component in the implementation of perceptive routing.

2. Terminologies

PR-SN: Perceptive Routing Sensing Node, percept local and network information for routing decisions.

PR-RN: Perceptive Routing Routing Node, use multi-dimensional sensory information to make routing decisions, including reroute, adjust speed, load balance, etc.

PR-N: Perceptive Routing Notification, the message from PR-SN to PR-RN.

3. Perceptive Routing General Process

The perceptive routing (PR) mechanism, akin to the adaptive routing network (ARN), aims to ensure efficient and resilient routing in dynamic network environments. PR involves real-time monitoring and decision-making based on multi-dimensional network status information, which enables the network to adapt to changes, such as congestion or link failures, with minimal disruption. Here's a summary of the general process:

1. Detection of Network Status Changes

Perceptive Routing Sensing Nodes (PR-SN) continuously monitor both local and network-level conditions to detect any anomalies or changes in network performance, for example congestion or link/node failure. When such conditions are detected, PR-SN assesses whether they can be resolved locally or require further action.

2. Impact Assessment and Notification

If the PR-SN determines that the local measures (e.g., congestion mitigation strategies) are insufficient to address the problem, it generates a Perceptive Routing Notification (PR-N). The PR-N message contains detailed information about the change in network status (e.g., the type of failure, affected links/nodes, etc.) and is sent to the Perceptive Routing Routing Node (PR-RN) or other designated nodes. These messages inform PR-RN about issues that could affect network performance, allowing them to take proactive steps.

3. Routing Decision and Mitigation

Upon receiving the PR-N message, PR-RN analyzes the specific information provided to make appropriate routing decisions. This decisions includes:

By leveraging real-time data provided by PR-SN and using advanced decision-making algorithms, PR-RN ensures that traffic is rerouted or adjusted dynamically, reducing latency, avoiding congested paths, and enhancing overall network efficiency.

The following sections Define a standardized information model for this general process.

4. Perceptive Routing Information Model

4.1. Local information model of PR Sensing Node

This section focuses on the attributes collected by a Perceptive Routing (PR) sensing node that monitors and gathers real-time data about local conditions.

4.1.1. Port Failure

This type of attribute represents the status of ports on a node. This attribute indicates whether a port has failed and can no longer transmit or receive traffic. Monitoring port failures allows the network to quickly reroute traffic or trigger failover mechanisms.

The possible attributes could include:

  • Port Status: Indicates if the port is active, down, or in a failed state.

  • Failure Cause: Specifies reasons for failure, such as hardware issues, misconfigurations, or timeouts.

4.1.2. Congestion

This type of attribute represents the level of congestion at the node, typically measured by monitoring packet delay, packet loss, and throughput. This attribute informs the system of where congestion points are forming, helping to reroute traffic or apply congestion control techniques.

The possible attributes could include:

  • Traffic Load: Measures current traffic levels on the link

  • Congestion Thresholds: Defines limits for congestion states

  • Packet Drop Rate: The rate at which packets are dropped due to congestion

4.1.3. Queue Length

This type of attribute represents the length of queues in the node. High queue lengths indicate potential bottlenecks and delays, while short queues suggest fast packet forwarding. This attribute is vital for assessing node performance and avoiding network congestion.

The possible attributes could include:

  • Queue Depth: Real-time data about the number of packets in the queue.

  • Queue Thresholds: Defines situations where the queue has overflowed, possible leading to packet loss

This type of attribute represents the Service Level Agreement (SLA) associated with the link, including metrics like bandwidth, latency, jitter, and availability. The node monitors whether the link's performance is within the agreed SLA parameters and flags any violations for corrective actions.

The possible attributes could include:

  • Link Latency: Measures the round-trip delay across the link.

  • Bandwidth Utilization: Tracks the percentage of available bandwidth being used.

4.2. Network information model of PR Sensing Node

This section covers the attributes about network conditions beyond the local node, providing insights about paths, bottlenecks, and topology to assist in making routing decisions.

4.2.1. On Path Information

This type of attribute represents detailed information about the current paths in use for traffic forwarding, including path metrics such as latency, jitter, and hop count. This attribute allows the node to assess the quality of the existing paths and their suitability for ongoing traffic demands.

The possible attributes could include:

  • Hop Count: Number of hops the data takes between source and destination.

  • Latency Per Hop: The time it takes to traverse each node.

4.2.2. Bottleneck Information

This type of attribute identifies and describes network bottlenecks where traffic is delayed or congested. This can include points where the capacity of a link is exceeded or where high latency is introduced due to excessive queuing.

The possible attributes could include:

  • Link Utilization: Monitors bandwidth use on specific bottleneck links.

  • Queue Status: Alerts when queues at a bottleneck link are nearing full capacity.

4.2.3. Topology Information

This type of attribute rsepresents the structure of the network from the node's perspective. This attribute includes details such as connected neighbors, available paths, link states, and node status, providing a global view of the network for optimizing routing decisions.

The possible attributes could include:

  • Neighboring Nodes: A list of adjacent nodes and their statuses.

  • Link Metrics: Performance and quality of links connecting nodes in the topology.

4.3. Routing decision information model of PR routing node

This section covers the key attributes that influence the decision-making processes within a routing node. These attributes determine how traffic is routed, how congestion is managed, and how network resources are allocated.

4.3.1. Reroute

This type of attribute describes the mechanisms and criteria used to reroute traffic in response to changes in the network, such as link failures or congestion events. This attribute ensures that traffic is dynamically redirected to optimal paths.

The possible attributes could include:

  • Reroute Path: The alternative path selected during rerouting.

  • Failover Time: Time taken to switch to an alternate path.

4.3.2. Congestion Control

This type of attribute details the strategies and protocols used to manage congestion at the routing node. This attribute includes techniques like rate-limiting, traffic shaping, or prioritizing certain flows to alleviate network congestion.

The possible attributes could include:

  • Congestion Avoidance Policies: Mechanisms to prevent congestion before it occurs.

  • Rate Limiting: Controls the traffic rate to avoid overwhelming the network.

4.3.3. ECMP (Equal-Cost Multi-Path) Mode

This type of attribute refers to Equal-Cost Multi-Path (ECMP) routing, where multiple paths with equal cost are used to distribute traffic evenly across the network. This attribute describes how ECMP is implemented and the criteria for path selection.

The possible attributes could include:

  • Hash Algorithm: Determines how ECMP chooses paths.

  • Traffic Distribution: Shows how traffic is split across multiple paths.

4.3.4. Hierarchical Routing

This type of attribute covers the use of hierarchical routing techniques to manage larger networks efficiently. This attribute provides information about how the network is divided into tiers or areas, with routing decisions optimized within each layer.

The possible attributes could include:

  • Routing Layers: Defines the layers of routing, such as access, aggregation, and core.

  • Aggregated Traffic Metrics: Summarizes traffic data for groups of lower-layer nodes.

4.3.5. Service Routing

This type of attribute describes how the routing node handles service-specific routing requirements, such as directing traffic based on application needs (e.g., video streaming, voice, or data). This attribute ensures that service-level routing objectives are met, such as prioritizing latency-sensitive traffic.

The possible attributes could include:

  • Service Path: The path chosen for traffic according to a specific service type.

  • Service-Specific SLAs: Monitors SLA adherence based on service-level routing.

5. Security Considerations

6. IANA Considerations

This document makes no request of IANA.

Note to RFC Editor: this section may be removed on publication as an RFC.

7. Acknowledgements

8. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.

Authors' Addresses

Tianran Zhou
Huawei
Dan Li
Tsinghua University
Xuesong Geng
Huawei