Internet-Draft | draft-hegde-lsr-ospf-better-idbx | July 2024 |
Hegde, et al. | Expires 27 January 2025 | [Page] |
When an OSPF router undergoes restart, previous instances of LSAs belonging to that router may remain in the databases of other routers in the OSPF domain until such LSAs are aged out. Hence, when the restarting router joins the network again, neighboring routers re-establish adjacencies while the restarting router is still bringing-up its interfaces and adjacencies and generates LSAs with sequence numbers that may be lower than the stale LSAs. Such stale LSAs may be interpreted as bi-directional connectivity before the initial database exchanges are finished and genuine bi-directional LSA connectivity exists. Such incorrect interpretation may lead to, among other things, transient traffic packet drops. This document suggests improvements in the OSPF database exchange process to prevent such problems due to stale LSA utilization. The solution does not preclude changes in the existing standard but presents an extension that will prevent this scenario.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 27 January 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
When an OSPF [RFC2328] router restarts, its stale LSAs are left in the database of other routers in the OSPF domain until the LSAs are aged out either intentionally or by the LSA age elapsing. The stale Router LSA can contain links to all the neighbors that had Full adjacencies before the router restarted.¶
Figure 1 shows a very simple OSPF network. In case of C undergoing restart that does not generate purges, the other routers in the domain will hold the stale LSA of Router C in their database. The stale LSA may have links to B and E, which represents the topology of C before it went down. When C restarts again, it initiates the database exchange process with B and E. B and E may have C's stale LSA with a higher sequence number in their database than the new ones originated by C and hence erroneously assume those being the newest copy, successively bringing up the adjacency with C, and transitioning to Full state. Based on C's Stale LSA having LSA links to B and E, the Shortest Path First (SPF) back-link check is satisfied and B and E update their routing table to point to C. This may cause C to drop this traffic as C may not have all its previous adjacencies up and all LSAs in place to correctly compute the necessary routes. The situation corrects itself with C reissuing the LSAs with even higher sequence numbers over time so the condition is transient today already.¶
To prevent the transient condition described the database exchange procedure from [RFC2328] section 7.2 is extended with additional constraints to prevent an OSPF router from transitioning to Full state when it has stale LSAs originated by the database exchange neighbor in its Link State Database (LSDB).¶
During an IDBX the router determined it should become adjacent with another OSPF router and for that purpose initially it enters the ExStart state and creates a "Database summary list" for the neighbor. In addition to this list a "Stale database exchange list" is created for such neighbor and is initially populated with LSAs in LSDB which were originated by it. The neighbor will not transition to Full state until both the Link state request list and the Stale database exchange list are empty. During the Database Exchange process entries are removed from the Stale database exchange list when v the same or a more recent LSAs is referenced in a Database Description packet or the neighbor originates a more recent instance of the LSA, and it is received in a Link State Update packet. For stale LSAs in the router's Link State Database, the neighbor will request stale LSAs and originate more recent instances via normal OSPF procedures as illustrated in Section 2.2.¶
The following additions and modifications to OSPF [RFC2328] are needed for this feature. The application of the feature SHOULD be based on local configuration (refer to Section 5).¶
Figure 2 provides an example of C restarting having originated an LSA with sequence number Y before. After restarting C originates the same LSA with sequence number X where X < Y since it is not aware of existence of version X yet.¶
As shown in figure Figure 2 above, E originates LSReq with Sequence number X but waits until the LSA with sequence number Y+1 (or strictly speaking, an LSA that compares as newer to the one it holds) arrives. As the LSA is still in the Stale DB Exchange List, the adjacency will remain in Loading state and will not move to Full state. All the neighbors of the restarting routers hold the neighbor FSM in Loading state and do not transition to Full state until the stale LSA is replaced with the new LSA that is more recent than the stale LSA. This ensures that other routers in the network do not compute a path through the restarting router since they cannot satisfy the bi-directionality condition in SPF computations.¶
The solution described in Section 2 assures stale LSAs for a restarting router are either updated or purged before neighbors of the restarting router advertise a link to the restarting router. This section describes optimizations that are being considered.¶
Rather than adding all the restarting router's LSAs in the Link-State Database (LSDB) to the Stale DB Exchange list, only the following LSAs would be added:¶
The procedures in Section 2 will guarantee that the LSAs on the Stale DB Exchange list are updated before the local router advertises a link to the restarting router. Given that the these stale LSAs are the only ones containing a link to the local router, their update will prevent usage of the link until the restarting router has established an adjacency with the local router. As part of the restarting router establishing an adjacency and advertising a link to the local router, the restarting router will synchronize its LSDB with the local router and will update or purge stale LSAs previously originated by the restarting router. These stale LSAs will be requested by the restarting router using Link State Request packets (section 10.9 in [RFC2328]) and, when received, updated or purgeed using the procedure specified in section 13.4 [RFC2328].¶
The restarting router's LSAs containing a link to the local router are added to the Stale DB Exchange list and will be updated or purged prior to the local router advertising a link to the restarting router. Additionally, the restarting router will not advertise a link to the local router until it has synchronized its LSDB with the local router. Hence, the restarting router's other stale LSAs will be updated or purged prior both routers advertising a link and these other stale LSAs need not be added to the Stale DB Exchange list.¶
The procedures in Section 2 don't attempt to solve the problem of the restarting router's adjacency being used prior to the data plane being updated with the routes associated with the adjacency. This problem can occur on platforms with a non-neglible delay being the control plane Routing Information Base (RIB) being updated and the platform's data plane Forwarding Information Bases (FIBs) being updated. However, this delay is local to the restarting router and can be avoided by delaying the advertisements of adjacecies (i.e., links in the restarting router's router-LSA or network-LSAs) can be handled locally by delaying these advertisements until the data plane has been updated. Note that the restarting router's SPF calculation will need to include these links in order to compute the routes using the adjacency (even though link advertisement is delayed). How this is accomplished is an implementation detail that is beyond the scope of this document.¶
As long as the restarting router's stale LSAs have been updated or purged as described in Section 2, the scope of delaying usage of the adjacency before the restarting router's data plane has converged is completely under the local control.¶
Application the Database exchange procedure specified in this documennt SHOULD be based on local configuration with the default behavior not to perform the addition Stale database exchange list processed specified in this document. Not all OSPF networks require elimination of traffic drops during the Database exchange process (as evidenced by the fact that OSPF has been deployed for several decades without this added packet loss mitigation). Additionally, optional configuration will encourage implementation and deployment.¶
No IANA Considerations¶