Network Management Operations I. D. Martinez-Casanueva Internet-Draft L. Cabanillas Intended status: Informational Telefonica Expires: 30 August 2025 P. Martinez-Julia NICT 26 February 2025 Knowledge Graph Construction from Network Data Sources draft-marcas-nmop-kg-construct-00 Abstract This document discusses the mechanisms that support the management and creation of knowledge graphs from data sources specific to the network management domain. The document provides background on core aspects such as ontology development, identifies methodologies and standards, and shares guidelines for integrating network data sources. About This Document This note is to be removed before publishing as an RFC. The latest revision of this draft can be found at https://idomingu.github.io/knowledge-graph-yang/draft-marcas- knowledge-graph-yang.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-marcas-nmop-kg- construct/. Discussion of this document takes place on the Network Management Operations Working Group mailing list (mailto:nmop@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/nmop/. Subscribe at https://www.ietf.org/mailman/listinfo/nmop/. Source for this draft and an issue tracker can be found at https://github.com/idomingu/knowledge-graph-yang. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Martinez-Casanueva, et alExpires 30 August 2025 [Page 1] Internet-Draft kg-construct February 2025 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 30 August 2025. Copyright Notice Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 3 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Ontology Development . . . . . . . . . . . . . . . . . . . . 4 3.1. Standard Development Methodologies . . . . . . . . . . . 5 3.2. Automatic Knowledge Extraction from YANG Models . . . . . 5 4. Knowledge Graph Construction Pipeline . . . . . . . . . . . . 6 4.1. Knowledge Objects . . . . . . . . . . . . . . . . . . . . 6 4.2. Pipeline Steps . . . . . . . . . . . . . . . . . . . . . 6 4.2.1. Ingestion . . . . . . . . . . . . . . . . . . . . . . 7 4.2.2. Mapping . . . . . . . . . . . . . . . . . . . . . . . 8 4.2.3. Integration . . . . . . . . . . . . . . . . . . . . . 8 5. Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 9 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 8. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 10 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 9.1. Normative References . . . . . . . . . . . . . . . . . . 11 9.2. Informative References . . . . . . . . . . . . . . . . . 11 Appendix A. NETCONF Data Sources . . . . . . . . . . . . . . . . 14 A.1. Prototype Architecture . . . . . . . . . . . . . . . . . 14 A.2. Target Ontology . . . . . . . . . . . . . . . . . . . . . 15 A.3. KGC Pipeline . . . . . . . . . . . . . . . . . . . . . . 15 A.3.1. Raw data . . . . . . . . . . . . . . . . . . . . . . 16 Martinez-Casanueva, et alExpires 30 August 2025 [Page 2] Internet-Draft kg-construct February 2025 A.3.2. Mappings . . . . . . . . . . . . . . . . . . . . . . 16 A.3.3. RDF data . . . . . . . . . . . . . . . . . . . . . . 18 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 19 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 1. Introduction Knowledge graph introduces a new paradigm in data management that facilitates the integration of heterogenous data silos thanks to a semantic layer. In the case of network management, knowledge graphs provide a data integration solution that can cope with the diverse network data sources and telemetry mechanisms [I-D.mackey-nmop-kg-for-netops]. The construction of knowledge graphs is a challenging activity that requires the combination of skills in semantic modelling and data engineering. Semantic data models are represented by ontologies and other forms of structured knowledge, which must be kept in sync with the data pipelines that integrate the different data silos into the knowlede graph. The data integration process is based on the ingestion of raw data from their data sources, the mapping of the raw data to the respective ontologies, and the transformation of the data into a graph structure semantically-annotated. In this sense, Knowledge Graph Construction (KGC) underpinned by two pillars: i) ontology development; and ii) knowledge graph construction pipeline. These pillars are described in detail in the following sections. 2. Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2.1. Terminology This document defines the following terms: Data integration: Process of combining data from diverse sources into a unified view. Data mapping: Technique that defines how data from one data model corresponds to another data model. Martinez-Casanueva, et alExpires 30 August 2025 [Page 3] Internet-Draft kg-construct February 2025 Data materialization: Technique that collects data from remote data source and persists a copy the data in a target data storage. This process can also be seen as Extract-Transform-Load (ETL). Data virtualization: Technique wherein an intermediate component (i.e., data virtualization layer) exposes data available in a remote data sources without creating an copy of the data. The data virtualization layer keeps pointers to the original location of data, so when a data consumer asks for these data, the virtualization layer collects the data from the source and directly serves the data to the consumer. Ontology: Formal, shared representation of knowledge in a domain. 2.2. Acronyms CQ: Competency Question ETL: Extract-Transform-Load KG: Knowledge Graph KGC: Knowledge Graph Construction LOT: Linked Open Terms OWL: Web Ontology Language RDF: Resource Description Framework RDFS: RDF Schema RML: RDF Mapping Language SAREF: Smart Applications REFerence SHACL: Shapes Constraint Language W3C: World Wide Web Consortium 3. Ontology Development Ontologies provide the formal representation of the conceptual models that capture the semantics of data, and building on this, the integration of data in the knowledge graph. Ontologies can be developed following different techniques, ranging from manual to fully automated, depending on the characteristics of the data to be integrated in the knowledge graph (e.g., format or schema). Martinez-Casanueva, et alExpires 30 August 2025 [Page 4] Internet-Draft kg-construct February 2025 3.1. Standard Development Methodologies Developing an ontology is a challenging task that requires skills in knowledge management and semantic modelling. To ease this process, a good practice is to follow mature, proven methodologies that provide thorough guidelines and recommend tools that can help in the development of an ontology. An example of these methodologies is Linked Open Terms (LOT) [Poveda-Villalon2022]. LOT is an ontology development methodology that adopts best practices from agile software development. The methodology has been widely used in European projects as well as in the creation of the ETSI SAREF ontology and its extensions. Precisely, with SAREF Ontology ETSI tackled a similar problem in the scope of IoT, where there is a heterogeneous variety of standard data models and protocols. The methodology iterates over a workflow of the following four activities: 1. ontology requirements specification 2. ontology implementation 3. ontology publication, and 4. ontology maintenance. The workflow starts with the specification of requirements that the ontology must fulfill. To that aim, the methodology requires collecting knowledge from domain experts, but also by analyzing the data sources (e.g., network devices) and schemas for the data (e.g., YANG data models) to be ingested and integrated in the knowledge graph. LOT recommends several approaches such as competency questions (CQs), natural language statements, or tabular information inspired by METHONTOLOGY. 3.2. Automatic Knowledge Extraction from YANG Models The extraction of knowledge from YANG models could be automated, for example, by analyzing YANG identities to generate controlled vocabularies and taxonomies. [RFC7950] defines a YANG identity as "globally unique, abstract, and untyped identity", therefore, a relation between a YANG identity and a concept is straightforward. Additionally, YANG identities can inherit from other YANG identities via the "base" statement. These ideas align with the notion of a taxonomy, where concepts are hierarchically linked with other concepts. Martinez-Casanueva, et alExpires 30 August 2025 [Page 5] Internet-Draft kg-construct February 2025 To support the creation of knowledge structures like taxonomies or thesauri, the W3C standardized the Simple Knowledge Organization System (SKOS). In such ontology, a concept scheme comprises a set of concepts that can be linked with other concepts via hierarchical and associative relations. Typically, a YANG model containing YANG identities can be represented as an instance of the "skos:ConceptScheme" class. Next, all YANG identities included in a YANG model can be represented as "skos:Concept instances" that are contained in the concept scheme. Lastly, those YANG identities that include the "base" statement, the respective SKOS concept will include a relation "skos:broader" whose range is the SKOS concept representing the parent YANG identity. 4. Knowledge Graph Construction Pipeline 4.1. Knowledge Objects The intrinsic nature of knowledge graphs is to connect as much knowledge as possible within certain scope---time and/or space. However, not all processes and operations require whole knowledge graphs. For instance, the communication of a piece of telemetry data, organized according to NTF [RFC9232], can be repreented as a subset of the knowledge graph of all measurements. A knowledge object, as defined in [EERVC], consists in a knowledge graph subset of an arbitrary size---from single atoms to tens or hundreds of triples---that is decorated with metadata to facilitate its contextualization. Knowledge objects are particularly well suited to enable entities that work with knowledge graphs to communicate to each other knowledge pieces, obtained from their knowledge graphs or newly created from other sources, such as monitoring. It has been demonstrated in [SECDEP]. 4.2. Pipeline Steps The construction of a knowledge graph is supported by a data pipeline that follows the archetypical Extract-Transform-Load (ETL), wherein the raw data is collected from the source(s), transformed, and finally, stored for consumption. The knowledge graph creation pipeline can thus be split into multiple steps as depicted in Figure 1. Martinez-Casanueva, et alExpires 30 August 2025 [Page 6] Internet-Draft kg-construct February 2025 +-----------+ +---------+ +-----------------+ | | | | | | | Ingestion +------>| Mapping +------>| Integration | | | Raw | | RDF | | +-----------+ data +---------+ data +--------+--------+ ^ | Raw | | RDF data | | data | | | v +-----+----+ +-----------+ | Data | | Knowledge | | Source | | Graph | | (device) | | Database | +----------+ +-----------+ Figure 1: High-level architecture of a Knowledge Graph Construction Pipeline These steps are the following: ingestion, mapping, and integration. 4.2.1. Ingestion Represents the first step in the creation of the knowledge graph. This step is realized by means of collectors that ingest raw data from the selected data source. These collectors implement data access protocols which are specific to the technology and type of the data source. For instance, when it comes to network management protocols based on YANG, these protocols can be NETCONF [RFC6241], RESTCONF [RFC8040] and gNMI [GNMI]. Two main types of data sources are identified based on the techniques used to ingest the data, namely, batch and streaming. In the case of batch data sources data are pulled (once or periodically) from the data source. Regarding streaming data sources, the collector subscribes to a YANG server to receive notifications of YANG data periodically or upon changes in the data source (e.g., a network device whose interface goes down). These subscriptions can be realized, either based on configuration or dynamically, using mechanisms like YANG Push [RFC8641]. But additionally, another common scenario is the use of message broker systems like Apache Kafka for decoupling the ingestion of streams of YANG data [I-D.netana-nmop-yang-message-broker-integration]. Hence, knowledge graph collectors could also support the ingestion of YANG data from these kinds of message brokers. Martinez-Casanueva, et alExpires 30 August 2025 [Page 7] Internet-Draft kg-construct February 2025 4.2.2. Mapping This second step consists at receiving the raw data data from the Ingestion step. Here, the raw data is mapped to the concepts captured in one or more ontologies. By applying these mapping rules, the raw data is semantically annotated and transformed into RDF data. Depending on the nature of the raw data, different techniques can applied. In the case of (semi-)structured data such as tabular data (e.g., CSV, relational databases) or hierarchical data (e.g., JSON, XML) these mappings can be defined by using declarative languages like RDF Mapping Language (RML) [Iglesias-Molina2023]. RML is a declarative language that is currently being standardized within the W3C Knowledge Graph Construction Community group [W3C-KGC] that allows for defining mappings rules for raw data encoded in semi- structured formats like XML or JSON. The benefits of using a declarative language like RML are twofold: i) the engine that implements the RML rules is generic, thus the mappings rules are decoupled from the code; ii) the explicit representation of mapping and transformation rules as part of the knowledge graph provides data lineage insights that can greatly improve data quality and the troubleshooting of data pipelines. RML is making progress towards becoming a standard, but support of additional YANG encoding formats like CBOR [RFC8949] or Protobuf remains a challenge. The knowledge payload carried by CBOR and/or Protobuf is organized as knowledge objects transmitted by the mapping entities and received by the materialization entities. The use of knowledge objects allows them to easily "cut" knowledge graphs into smaller pieces, transmit them, and "paste" and/or "glue" the pieces onto the destination knowledge graph. Consistency is retained by making the same ontologies be used with the particular knowledge objects. 4.2.3. Integration This is the final step of the knowledge graph creation. This step receives as an input the knowledge object that contains RDF data generated in the Mapping step, which has easily manageable semantic triples---or quadruples---, as well as metadata to contextualize them and facilicate the incorporation of the knwoledge to the local knowledge graph storage element. At this point, the RDF data can be sent to an RDF triple store like Apache Jena Fuseki [Fuseki] for consumption via SPARQL. But alternatively, this step may transform the RDF data into an LPG structure and store the resulting data in a graph database like Neoj4 [Neo4j]. Similarly, the RDF data could also be transformed into the ETSI NGSI-LD standard [ETSI-GS-CIM-009] and stored in an NGSI-LD Context Broker. Martinez-Casanueva, et alExpires 30 August 2025 [Page 8] Internet-Draft kg-construct February 2025 5. Challenges Ontology development: Time-consuming task that requires skills in knowledge management and conceptual modeling. Additionally, ontology developers should maintain a tight coordination with domain owners and ontology users. Following a standard methodology like LOT provides guidance in the process but still, the development of the ontology requires manual work. Tools that can produce or bootstrap ontologies from existing data models in a semi-automatic, or even automatic, are desirable. In this sense, data models could include explicit semantics in the data models, in the same way that JSON-LD [JSON-LD] or CSVW [CSVW] include metadata indicating which concepts from concepts are referenced by the data. Pipeline performance: To integrate the raw data from the original data source into the knowledge graph entails several steps as described before. This steps add an extra latency before having the data stored in the knowledge graph for consumption. This latency can be an important limitation for real-time analytics use cases. Scalability: The knowledge graph must be able to integrate massive amounts of data collected from the network. Distributed and federated architectures can improve the scalability of a global, composable knowledge graph. However, these architectures add complexity to the management of knowledge graph as well as extra latency when federating requests. Virtualization: The common approach for data integration is by materializing the data in the knowledge graph, which entails duplicating the data. However, this approach presents multiple limitations in terms of data governance and data cadence. Regarding data governance, having copies of the original data hampers keeping track of all the available data. With respect to data cadence, in particular for batch data sources, data are periodically pulled from the source at particular frequency, which might not be optimal depending on the use case. In this sense, data virtualization introduces a new data access technique that can overcome these limitations. With this technique, the knowledge graph defines pointers to the data at the original source, and the KGC pipeline performs the ingestion and mapping of the data, and eventually the delivery of data to the consumer, only when requested on demand. 6. Security Considerations Access control to data: The knowledge graph becomes an integrator of Martinez-Casanueva, et alExpires 30 August 2025 [Page 9] Internet-Draft kg-construct February 2025 data, and, in many cases, sensible. Therefore, data access control mechanisms must be present to ensure that only authorized consumers can discover and access data from the knowledge graph. Access control policies based on roles or attributes are common approaches, but additional aspects like sensitivity of data could be included in the policy. Integrity and authenticity of mappings: The declaration of mappings of raw data to concepts in ontologies is a critical step in the knowledge graph construction. Unauthorized mappings, or even tampered mappings, can lead to security breaches and anomalies producing a great impact on analytics and machine learning applications that consume data from the knowledge graph. To protect consumers from these scenarios, the knowledge graph must include mechanisms that verify the correctness, authenticity, and integrity of the mappings used in the construction of the graph. Only data owners, as accountable of their data, should be authorized to define and deploy mappings for the knowledge graph construction. Data provenance: Keeping track of the history of data as they go through the knowledge graph construction pipeline can improve the quality of the data of the knowledge graph. As part of the knowledge graph construction, signatures can be appended to the data [I-D.lopez-opsawg-yang-provenance], can help in verifying that such data come from the golden data source, and therefore, that the data can be trusted. 7. IANA Considerations This document has no IANA actions. 8. Open Issues * Should this document provide guidelines for generating URIs of nodes/subjects in the knowledge graph? Take into account there are several levels of abstraction device vs network/service level. For example, the URI that identifies a network interface cannot be generated only from the name of the interface as there could conflicts with other interfaces of other network devices having the same name. * Implementations? References to examples based on open-source implementations. Integration with YANG-Push-Kafka architecture. Target future hackathons. 9. References Martinez-Casanueva, et alExpires 30 August 2025 [Page 10] Internet-Draft kg-construct February 2025 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . 9.2. Informative References [ANSA] Pedro Martinez-Julia, Ved P. Kafle, Hitoshi Asaeda., "Application of Category Theory to Network Service Fault Detection. IEEE Open Journal of the Communications Society 5 (2024): 4417-4443.", n.d.. [CSVW] "CSVW - CSV on the Web", n.d., . [EERVC] Pedro Martinez-Julia, Ved P. Kafle, Hiroaki Harai., "Exploiting External Events for Resource Adaptation in Virtual Computer and Network Systems, IEEE Transactions on Network and Service Management 15 (2018): 555-566.", n.d.. [ETSI-GS-CIM-009] "Context Information Management (CIM); NGSI-LD API", March 2024, . [Fuseki] Apache, "Apache Jena Fuseki", n.d., . [GNMI] OpenConfig, "gRPC Network Management Interface (gNMI)", n.d., . [I-D.ietf-ivy-network-inventory-yang] Yu, C., Belotti, S., Bouquier, J., Peruzzini, F., and P. Bedard, "A Base YANG Data Model for Network Inventory", Work in Progress, Internet-Draft, draft-ietf-ivy-network- inventory-yang-04, 5 November 2024, . Martinez-Casanueva, et alExpires 30 August 2025 [Page 11] Internet-Draft kg-construct February 2025 [I-D.ietf-netconf-yang-library-augmentedby] Lin, Z., Claise, B., and I. D. Martinez-Casanueva, "Augmented-by Addition into the IETF-YANG-Library", Work in Progress, Internet-Draft, draft-ietf-netconf-yang- library-augmentedby-01, 21 October 2024, . [I-D.lopez-opsawg-yang-provenance] Lopez, D., Pastor, A., Feng, A. H., Birkholz, H., and S. Garcia, "Applying COSE Signatures for YANG Data Provenance", Work in Progress, Internet-Draft, draft- lopez-opsawg-yang-provenance-04, 5 January 2025, . [I-D.mackey-nmop-kg-for-netops] Mackey, M., Claise, B., Graf, T., Keller, H., Voyer, D., and P. Lucente, "Knowledge Graph Framework for Network Operations", Work in Progress, Internet-Draft, draft- mackey-nmop-kg-for-netops-01, 21 October 2024, . [I-D.netana-nmop-yang-message-broker-integration] Graf, T. and A. Elhassany, "An Architecture for YANG-Push to Message Broker Integration", Work in Progress, Internet-Draft, draft-netana-nmop-yang-message-broker- integration-00, 22 April 2024, . [Iglesias-Molina2023] Iglesias-Molina, A., "The RML Ontology: A Community-Driven Modular Redesign After a Decade of Experience in Mapping Heterogeneous Data to RDF", The Semantic Web – ISWC 2023 , October 2023, . [JSON-LD] W3C, "JSON-LD 1.1: A JSON-based Serialization for Linked Data", July 2020, . [Neo4j] "rdflib-neo4j - RDFLib Store backed by neo4j", n.d., . Martinez-Casanueva, et alExpires 30 August 2025 [Page 12] Internet-Draft kg-construct February 2025 [Poveda-Villalon2022] Engineering Applications of Artificial Intelligence, "LOT: An industrial oriented ontology engineering framework", May 2022, . [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., and A. Bierman, Ed., "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, . [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", RFC 7950, DOI 10.17487/RFC7950, August 2016, . [RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017, . [RFC8345] Clemm, A., Medved, J., Varga, R., Bahadur, N., Ananthakrishnan, H., and X. Liu, "A YANG Data Model for Network Topologies", RFC 8345, DOI 10.17487/RFC8345, March 2018, . [RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641, September 2019, . [RFC8949] Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", STD 94, RFC 8949, DOI 10.17487/RFC8949, December 2020, . [RFC9232] Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Network Telemetry Framework", RFC 9232, DOI 10.17487/RFC9232, May 2022, . [SECDEP] Ana Hermosilla, Jose Manuel Manjón-Cáliz, Pedro Martinez- Julia, Antonio Pastor, Jordi Ortiz, Diego R. Lopez, Antonio Skarmeta., "Secure deployment of third-party applications over 5G-NFV ML-empowered infrastructures, the 7th International Conference on Mobile Internet Security (MobiSec '23), Dec 19-21, 2023, Okinawa, Japan.", n.d.. Martinez-Casanueva, et alExpires 30 August 2025 [Page 13] Internet-Draft kg-construct February 2025 [TKDP] Pedro Martinez-Julia, Ved P. Kafle, Hitoshi Asaeda., "Telemetry Knowledge Distributed Processing for Network Digital Twins and Network Resilience. NOMS 2023-2023 IEEE/ IFIP Network Operations and Management Symposium (2023): 1-6.", n.d.. [W3C-KGC] W3C, "Knowledge Graph Construction Community Group", n.d., . Appendix A. NETCONF Data Sources This appendix presents a scenario that demonstrates the construction of a knowledge graph based on YANG data collected from a NETCONF server. In particular, the scenario tackles the creation of a data catalog based on a knowledge graph that keeps registry of the YANG data sources and the YANG data models that they implement. As described in [I-D.ietf-netconf-yang-library-augmentedby], data catalog implementations backed by knowledge graphs provide powerful solutions that can easily incorporate additional context to the catalog. As an evolution of the YANG Catalog service, the resulting knowledge graph facilitates the navigation across dependencies of YANG modules, but more importantly, enables the combination of these data with other data silos such as the network topology [RFC8345] or network hardware inventory [I-D.ietf-ivy-network-inventory-yang]. To create a knowledge graph that supports the data catalog, the proposed approach is based on collecting data from the YANG Library from devices running in the network, in this case, from a NETCONF server. For this, the RML engine queries the NETCONF server to retrieve the YANG Library data, and then, applies the RML mappings to transform the YANG data into RDF according to the target ontology. This prototype was conducted as part of the paper "Declarative Construction of Knowledge Graphs from NETCONF Data Sources" sent to the Semantic Web Journal (currently under review): https://www.semantic-web-journal.net/content/declarative- construction-knowledge-graphs-netconf-data-sources-0 A.1. Prototype Architecture A high-level architecture of the prototype that validates the implementation is shown below: Martinez-Casanueva, et alExpires 30 August 2025 [Page 14] Internet-Draft kg-construct February 2025 | | | RML | Mappings v +-----------+ +---------+ +-----------------+ | | | | | | | Ingestion +------>| Mapping +------>| Integration | | | XML | (BURP) | RDF | | +-----------+ data +---------+ data +--------+--------+ ^ | XML | | RDF data | | data | | | v +-----+------+ +-----------+ | NETCONF | | Knowledge | | Server | | Graph | | (netopeer2)| | Database | +------------+ +-----------+ Figure 2: Architecture of prototype to construct knowledge graph from NETCONF data source BURP was selected as the open-source implementation of an RML engine that was chosen for this prototype. The NETCONF server is emulated using the netopeer2. A.2. Target Ontology The YANG Library Ontology was developed to represent the implementation details of YANG module and submodules, along with their interdependencvies, in the different datastores of YANG server. The ontology was developed following the LOT methodology and is publicly available at: https://w3id.org/yang/library The code of the ontology and all related artifacts are publicly available on GitHub: https://github.com/candil-data-fabric/yang- library-ontology A.3. KGC Pipeline In addition to the YANG Library Ontology, the YANG Server Ontology was developed to represent YANG data sources such as NETCONF servers and operations to retrieve data from them such as queries or subscriptions. Similarly, this ontology was developed following the LOT methodology and is publicly available at: https://w3id.org/yang/ server Martinez-Casanueva, et alExpires 30 August 2025 [Page 15] Internet-Draft kg-construct February 2025 The code of the ontology and all related artifacts are publicly available on GitHub: https://github.com/candil-data-fabric/yang- server-ontology The YANG Server Ontology is used in combination with the RML vocabulary to describe the access to YANG servers, from which the collected data are transformed into RDF. In this sense, BURP was extendended to support the ingestion of YANG data from NETCONF servers using NETCONF queries. The following subsections include excerpts of the raw XML data (Figure 3), RML mappings (Figure 4), and final RDF data (Figure 3). The complete examples can be found on: https://github.com/candil- data-fabric/yang-library-ontology/tree/main/knowledge-graph/xpath A.3.1. Raw data 1 ietf-yang-patch 2017-02-22 file:///etc/sysrepo/yang/ietf-yang-patch@2017-02-22.yang urn:ietf:params:xml:ns:yang:ietf-yang-patch import ietf-ip 2018-02-22 file:///etc/sysrepo/yang/ietf-ip@2018-02-22.yang urn:ietf:params:xml:ns:yang:ietf-ip implement Figure 3: Excerpt of YANG Library data collected from a NETCONF server A.3.2. Mappings @prefix yl: . @prefix ys: . @prefix rml: . @prefix xsd: . @prefix core: . @prefix dcterms: . @prefix observable: . Martinez-Casanueva, et alExpires 30 August 2025 [Page 16] Internet-Draft kg-construct February 2025 @base . # Connection details to NETCONF server a ys:NetconfServer ; ys:socketAddress ; ys:serverAccount ; ys:hostKeyVerification "false" ; ys:capability ys:XpathCapability , ys:YangLibrary1.0 . a ys:OperationalDatastore ; ys:server . a ys:RunningDatastore ; ys:server . a observable:SocketAddress ; observable:addressValue "localhost:830" . a ys:ServerAccount ; ys:username "netconf" ; core:hasFacet . a observable:AccountAuthenticationFacet ; observable:password "netconf" ; observable:passwordType "plain-text" . a ys:XPathFilter ; ys:xpathValue "/yanglib:modules-state"; ys:namespace [ a ys:Namespace ; ys:namespacePrefix "yanglib" ; ys:namespaceURL "urn:ietf:params:xml:ns:yang:ietf-yang-library" ; ]; . <#TriplesMap> a rml:TriplesMap; rml:logicalSource [ a rml:LogicalSource; rml:source [ a ys:Query, rml:Source ; ys:sourceDatastore ; ys:filter ]; rml:referenceFormulation [ a ys:NetconfQuerySource ; rml:namespace [ a rml:Namespace ; rml:namespacePrefix "yanglib" ; rml:namespaceURL "urn:ietf:params:xml:ns:yang:ietf-yang-library" ; ]; ]; rml:iterator "/yanglib:modules-state/yanglib:module"; Martinez-Casanueva, et alExpires 30 August 2025 [Page 17] Internet-Draft kg-construct February 2025 ]; rml:subjectMap [ a rml:SubjectMap; rml:template "http://example.org/module/{yanglib:name/text()}:{yanglib:revision/text()}"; rml:class yl:Module; ]; rml:predicateObjectMap [ a rml:PredicateObjectMap; rml:predicateMap [ a rml:PredicateMap; rml:constant yl:moduleName; ]; rml:objectMap [ a rml:ObjectMap; rml:reference "yanglib:name/text()"; rml:datatype xsd:string; ]; ]; rml:predicateObjectMap [ a rml:PredicateObjectMap; rml:predicateMap [ a rml:PredicateMap; rml:constant yl:revisionDate; ]; rml:objectMap [ a rml:ObjectMap; rml:reference "yanglib:revision/text()"; rml:datatype xsd:date; ]; ]; rml:predicateObjectMap [ a rml:PredicateObjectMap; rml:predicateMap [ a rml:PredicateMap; rml:constant yl:namespace; ]; rml:objectMap [ a rml:ObjectMap; rml:reference "yanglib:namespace/text()"; rml:datatype xsd:anyURI; ]; ]; . Figure 4: RML mappings that collect YANG Library from a NETCONF server and map them to the YANG Library Ontology A.3.3. RDF data Martinez-Casanueva, et alExpires 30 August 2025 [Page 18] Internet-Draft kg-construct February 2025 "ietf-ip"^^ . "2018-02-22"^^ . "urn:ietf:params:xml:ns:yang:ietf-ip"^^ . . "ietf-yang-patch"^^ . "2017-02-22"^^ . "urn:ietf:params:xml:ns:yang:ietf-yang-patch"^^ . . Figure 5: Excerpt of RDF triples generated using the RML mappings and the YANG Library data Acknowledgments This document is based on work partially funded by the EU Horizon Europe projects aerOS (grant agreement no. 101069732) and ROBUST-6G (grant agreement no. 101139068). The authors would like to thank Med, Benoit, Lionel, and Thomas for their review and valuable comments. Authors' Addresses Ignacio Dominguez Martinez-Casanueva Telefonica Email: ignacio.dominguezmartinez@telefonica.com Lucia Cabanillas Telefonica Email: lucia.cabanillasrodriguez@telefonica.com Martinez-Casanueva, et alExpires 30 August 2025 [Page 19] Internet-Draft kg-construct February 2025 Pedro Martinez-Julia NICT Email: pedro@nict.go.jp Martinez-Casanueva, et alExpires 30 August 2025 [Page 20]