Child pages
  • TAPI Contributions and Presentations

 

 

cid:image003.png@01D47AB3.2CCAF460

 

 

 

 

 

 

 

 

 

 

 


ONF Document Type: Technical Recommendation
 

Disclaimer

THIS SPECIFICATION IS PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE.

Any marks and brands contained herein are the property of their respective owners.

 

Open Networking Foundation
1000 El Camino Real, Suite 100, Menlo Park, CA 94025
www.opennetworking.org

©2020 Open Networking Foundation. All rights reserved.

 

Open Networking Foundation, the ONF symbol, and OpenFlow are registered trademarks of the Open Networking Foundation, in the United States and/or in other countries. All other brands, products, or service names are or may be trademarks or service marks of, and are used to identify, products or services of their respective owners.   

 

Important note

This Technical Recommendations has been approved by the Project TST, but has not been approved by the ONF board.   This Technical Recommendation is a new reference implementation document focus on v2.1.2 models, which has been approved under the ONF publishing guidelines for 'Informational' publications that allow Project technical steering teams (TSTs) to authorize publication of Informational documents.   The designation of '-info' at the end of the document ID also reflects that the project team (not the ONF board) approved this TR.

 


Table of Contents

Disclaimer

Important note

Document History

1 Introduction

1.1 General introduction

1.2 Introduction to this document

2 Overview

2.1 Essential feature

2.2 TAPI application

3 Summary of key considerations

3.1 Overview

3.2 TAPI prior to 2.1.3

3.3 TAPI Streaming available in 2.1.3

3.4 Stream content

3.5 TAPI Application in detail

3.6 Streaming Characteristics

3.7 Supported and available streams

3.7.1 Supported stream type

3.7.2 Available Streams

3.8 Log strategy

3.8.1 Effect of Characteristics

3.8.2 Streaming the context

3.9 Using the stream

3.9.1 Initial connection

3.9.2 Tombstone (Delete) retention passed

3.9.3 Compaction delay passed

3.9.4 (Eventual) Consistency achieved

3.9.5 Degraded performance

3.9.6 Need for realignment

3.9.7 Summary

3.10 Record content

3.11 Considering order/sequence and cause/effect

3.11.1 ....................................................................................................................... Time

3.11.2 ............................................................................................ Backend stream details

3.12 The Context

3.13 Handling changes in the Context

3.14 Reporting change

3.15 System engineering

3.16 Eventual Consistency and Fidelity

3.17 Stream Monitor

3.18 Solution structure – Architecture Options

3.18.1 ................................................................................................... Full compacted log

3.18.2 ............................................................................................... Emulated compaction

3.19 Using the compacted log approach for alarm reporting

3.19.1 .................................................................................... Specific alarm characteristics   - raising/clearing an alarm

3.19.2 ..................................................... Key Features of an alarm solution (example usage)

3.19.3 ............................................................................................................ Log strategy

3.19.4 ........................................................................................................ Alarm behavior

3.19.5 ...................................................................... Condition detector and alarm structure

3.19.6 ..................................................................................... Alarm Identifier and location

3.19.7 ........................................................................................ Alarm tombstone behavior

3.19.8 ....................................................................................................................... Time

3.19.9 .............................................................................. Detected Condition normalization

3.19.10 Meaningful detection (device or any other source)

4 Use Cases

4.1 Use Case background

4.1.1 General TAPI considerations - Context

4.1.2 Underlying behaviour

4.2 Use Case Overview

4.3 Building the TAPI stream on provider

4.4 The TAPI use cases

4.4.1 Connect to Stream and align - new client

4.4.2 Steady state reception - well engineered client

4.4.3 Event storm – bad day alarm example (or slow client)

4.4.4 Event storm – very bad day alarm example (or very slow client) - many micro-bends and reduce compute power etc.

4.4.5 Short loss of communications

4.4.6 Long loss of communications requiring realignment

4.5 Message Sequence

4.6 Use cases beyond current release

4.7 Message approach (websocket example)

5 Appendix – Considering compacted logs

5.1 Essential characteristics of a compacted log

5.2 Order of events

5.3 Compaction is a real implementation

6 References

7 Definitions

7.1 Terms defined elsewhere

7.2 Terms defined in this TR

7.3 Abbreviations and acronyms

8 Individuals engaged

8.1 Editors

8.2 Contributors

List of Figures

Figure 1  Example SDN architecture for WDM/OTN network

Figure 2  Yang: supported-stream-type

Figure 3  Yang: available-stream

Figure 4  Yang: log-record-header

Figure 5  Yang: log-record-body

Figure 6  Stylized view of example controller offering full compaction

Figure 7  Stylized view of example controller offering emulated compaction

Figure 8  Yang: condition-detector (descriptions omitted)

Figure 9  Yang: condition-detector (descriptions omitted)

Figure 10  Hybrid Message Sequence Diagram for example implementation corresponding to Use Cases

Figure 11  Phases of interaction for Use Cases

Figure 12  Kafka compaction

Document History

Version

Date

Description of Change

0.1

June, 2020

Initial version of the Reference Implementation document on streaming for TAPI v2.1.3


1        Introduction

1.1            General introduction

This ONF Technical Recommendation (TR) is a supplement to the Reference Implementation for the TRANSPORT-API (TAPI) [ONF TR-5XX.1].

1.2            Introduction to this document

T he purpose of this document is to explain TAPI streaming and provide a set of guidelines and recommendations for use of TAPI streaming.

The target architecture is provided in [ONF TR-5XX.1]. The figure below is a copy of the figure provided in the reference.

This document focusses focuses on the autonomous flow of information via TAPI from SDN-C to OSS/SDTN and from SDTN to OSS.

 

Figure 1   Example SDN architecture for WDM /OTN network


2        Overview

2.1            Essential feature

Streaming is the name for a type of mechanism that handles the providing of information from one system to another in some form of steady and continuous flow.

In the context of a Management-Control solution streaming is used primarily for the reporting (notification) of ongoing change of state of the controlled system from one Management-Control entity to another (usually superior) management-control entity. In this context as much of the information is derived from readings of instruments , the flow is often called telemetry [1] .

The stream provides engineered flow such that peak load is averaged using some mechanism such as back-pressure and/or selective pruning of detail.

In the following discussion the term Controller will be used for any Management-Control entity (OSS, SDTN, SDN-C, EMS, NMS, Orchestrator etc.).

2.2            TAPI application

TAPI can be used in several different applications. The primary application is one where one Controller (provider) is providing an ongoing flow of state updates to a client (superior) Controller, as depicted in the figure above.

In this application the following assumptions apply:

  • The client Controller has one or more internal representations of the semantics (models) of the controlled system (network etc.). A representation may:
    • Relate to a subset of the TAPI model (e.g., just physical inventory)
    • Compress or expand parts of the model (e.g., Topology and Node are combined into a ForwardingDomain)
    • Be enriched with associations (e.g., some or all of the one-way navigations are converted to two-way navigations)
  • The client controller? maintains (stores in some form of repository) an ongoing live view of the state of the instances of things in the controlled system in [A1] so as to populate each of its representational forms
    • A mechanism is available that enables the on-going reporting, from the provider to the client, of change in information known to the provider
    • Note: A view that is constructed from the currently known state will necessarily be plesiochronous with respect to the actual network state because of differential network and processing delays. After some period, the inaccuracies can mainly be corrected such that the state that was present at some appropriately past time is determinable.
    • Note: The model in the repository need not be TAPI, it can be radically transformed from TAPI depending upon the client storage strategy and purpose. Likewise, the model in the repository of the provider need not be TAPI [2]
  • When connected for the first time the client controller must gain knowledge of current state prior to receiving information on change (changes alone are insufficient to provide a clear view of the system state especially recognizing that most states change very rarely – waiting for a change to determine current state is not viable). 
    • On connection to the provider the client gains alignment with the current state and then maintains alignment as the state changes
    • Through the on-going process the client deals with the refactoring etc. to populate its repository as appropriate and deals with the challenges of asynchronous receipt (e.g., the referencing entity arrives before the referenced entity)
  • The client has mapping rules to derive each of its representation of semantics from the delivered representation
    • Each representation is essentially some form of pruning and refactoring of the TAPI model

Consequently, the TAPI provider aims to optimize the process of maintaining alignment for the client.

 

Note that TAPI is not intended to directly support:

  • A client requiring access to the provider database supporting random queries and joins
  • A GUI client


3        Summary of key considerations

3.1            Overview

This section examines the TAPI streaming capability in detail. Examples of UML and Yang are provided as appropriate.

The characteristics of streaming are described in general and are illustrated using example focusing on alarm reporting.

3.2            TAPI prior to 2.1.3

RESTCONF specifies a traditional form of notification where the assumption is that a relatively short queue of notifications will be available on the provider and alignment with current state (in the case of alarms, alignment with current active alarms) will be achieved by GET via RESTCONF.

3.3            TAPI Streaming available in 2.1.3

An Event source/server streaming mechanism is made available as an alternative to traditional notifications and potentially, over time, as a replacement.

The streaming capability is distinct from TAPI Notification and is designed to better deal with scale and to provide an improve d operational approach.

The method proposed allow the client to gain and maintain alignment with current state from the stream alone (with no need to get current state). The client can achieve eventual consistency by simply connecting to the relevant streams. The client will receive an ongoing view of change, assuming that the client is keeping up reasonably with the stream. The stream is designed to allow for some client delay with no loss of information fidelity.

When the client has a significant delay, there will be a loss of fidelity [A2] but no impact on eventual consistency. If the client has a very large delay, then a resync will be initiated by the provider. Resynchronization will be achieved simply by the client reconnecting to the stream from offset zero. This will again allow the client to achieve eventual consistency.

The streaming capability provides a reliable pipeline for reporting of change. This improves the information flow integrity and reduces the need for recovery and resynchronization.

3.4            Stream content

The streaming approach is generally applicable to all information available over TAPI from the provider to client.

The streaming capability also offers an improved alarm structure (focusing on fundamental properties of the alarm and relegating   legacy fields).

3.5            TAPI Application in detail

A management-control system, such as an Orchestrator, high level Controller, OSS etc., has the role achieving intended capability (intent, service etc.) by monitoring and processing information (e.g., alarms) from a managed-controlled system (network) such that the overall assembly of operations systems can determine actions to continue to support intent achieving revenue (via some assessment of services) and identify repair action prioritization (via analysis of problems). This system uses TAPI to acquire from the subordinate controller information from a fragment of the overall network, i.e., the devices monitored by the controller, where that information is positioned in terms of TAPI entities within a TAPI Context.

The client Controller maintains history and live views of the state of the things in the network so as to do the necessary analysis, hence that system uses a mechanism providing autonomous updates and does NOT query the provider Controller for states.

The overall solution is expected to have the following characteristics for the provider controller:

  • Few direct client (~2)
    • Single OSS/orchestrator with several separate internal systems (Fault, provisioning, equipment inventory) and potentially some form of resilience
  • Low client churn
    • Clients remain “connected” for a very long time and if the connection is dropped the same client will usually try to reconnect
  • Provider maintains alignment with underlying system
    • The TAPI realization assumes a reliable input that ensures eventual consistency with current network state

The primary focus for Streaming in TAPI 2.1.3 is simple and efficient ongoing alignment of a Controller (client) with a view presented by another Controller (provider).

3.6            Streaming Characteristics

The key characteristics of the TAPI Streaming solution:

  1. Ensures eventual consistency of the client with the view presented by the provider
    • Essentially, if the managed-controlled system stops changing, once the whole stream has been received and processed in order by the client, the client view will be aligned with the corresponding managed-controlled system state (assuming communication with all components in the managed-controlled system)
  2. Is built on a provider log of records recording change
    • The log is designed to enable “eventual consistency”
  3. Guarantees delivery of each log record “at least once”
    • Clearly, this guarantee applies within a particular operational envelope as defined in this document
    • May deliver some information more than once, but this will be in a pattern that ensures “eventual consistency”
  4. Is highly scalable and available
    • Boundless scale (with corresponding system resources)
  5. Is highly reliable (distributed, partitioned, replicated, and fault tolerant)
    • Provides an inherent high availability solution (assuming necessary implementation)
  6. Has low latency and high throughput on big data scale
    • Assuming the appropriate implementation technology
  7. Divides information across streams
  8. Allows the client to re-consume records from a given stream any time
  9. Supports back-pressure [A3] from client to enable a reactive producer

3.7            Supported and available streams

The interface can offer many streams for a context. The client can determine, using calls on the provider, both the types of stream supported and the available streams that are active for connection and streaming.

3.7.1      Supported stream type [A4]

This structure allows the provider to report the streams that it can support, regardless of whether they are active or not.


Figure 2   Yang: supported-stream-type

The provider can indicate the entity types supported by a stream, the storage and record strategies.

The segment-size and record-retention are free choices made by the provider depending upon system engineering.

Information may be divided into separate streams. There is no restriction on choice of division of the information into streams. A provider could choose to have a stream per class or to have streams that aggregate classes together that have similar lifecycles etc. It should be noted that for TAPI 2.1.3 ALL instances in the context of any object-class-identifier listed in record-content will be streamed, i.e., there is (intentionally) no filtering. [A5]

The supported-stream-type is also augmented with connection-protocol-details which provides a list of allowed-connection-protocols in freeform string (to be normalized through agreement). Candidate protocols include “websockets” [RFC6455] and “sse” [W3C SSE].

For the compacted log, the supported-stream-type information is augmented with compacted-log-details that includes tombstone [A6] -retention (delete retention) and compaction-delay settings. Both are free choices made by the provider depending upon system engineering.

3.7.2      Available Streams


This structure allows the provider to report the streams that are currently available.

 

Figure 3   Yang: available-stream

The provider can indicate the

  • supported-stream-type: references the description of a stream supported by the provider
  • connection-protocol: the protocol for this particular stream instance chosen from the list of available protocols for the stream type
  • connection-address: the location of the stream
    • The address structure and connection method will depend upon the connection-protocol
  • stream-state: indicates whether the stream will deliver records to the client or not

3.8            Log strategy [A7]

The streaming solution assumes that the provider is delivering information in sequence from a log.

In TAPI 2.1.3 a log approach oriented towards maintaining alignment is provided. The stream mechanism defined allows for different log strategies

  • log-record-strategy: COMPACTED, TRUNCATED, FULL-HISTORY and FULL_HISTORY_WITH_PERIODIC_BASELINE.   what s the difference among them?
    • The log-record-strategy fully available in 2.1.3 is COMPACTED (the characteristics of this mechanism are described later in this document)
  • Log-storage-strategy: WHOLE_ENTITY_ON_CHANGE, CHANGE_ONLY, WHOLE_ENTITY_PERIODIC   what s the difference among them?
    • The log-storage-strategy fully available in 2.1.3 is WHOLE_ENTITY_ON_CHANGE

This log approach has several key characteristics:

  • Whole Entity [A8] : On change, the whole entity that has the changed property is streamed (i.e., the solution logs (stores) a full representation of the entity with current values after each change, not just the changed property).
    • Note that the entity could have more than one changed value
    • Each property that changes regularly is isolated its own dedicated small class
      • E.g., The alarm (detector) is considered as a class. Alarms are isolated from configuration data for the related entity and from each other
    • Large data structures that are invariant or change rarely can be grouped in composite classes
      • E.g., The CEP where Configuration data (both intent and actual) is collected into a single class. The data in an instance changes rarely
    • For optimum support of change of properties, the small property class reference the configuration items that they relate to and NOT the reverse (i.e., be a decoration, e.g., an alarm references the CEP or MEP)
  • Compaction: Not all change statements (whole entity statements) related to an entity are retained. Older changes are intentionally pruned out of the log.
    • The log will have the latest record related to each entity that exists in the controlled system
      • This enables achievement of eventual consistency
    • Some additional recent changes will also be present in the log as compaction is intentionally delayed
      • This enables a delayed client to catch back up with no loss of fidelity
    • Records that are not the latest for an entity and that are older than the compaction delay time will be removed
      • This allows an overloaded client to maintain a view of non-fleeting changes whilst suffering an acceptable loss of fidelity where there is high intermittency
      • This allows a new client to align without having to receive large volumes of uninteresting history
  • Delete retention [A9] : When an entity is deleted, a Tombstone record is added to the log for that entity
    • Compaction, after the compaction delay, will remove all but that tombstone record
    • Tombstones are removed once they have persisted for the tombstone retention period
      • Note that without this special tombstone retention behavior, the log growth would be unbounded

The other log strategies are partially supported. The TAPI 2.1.3 solution can be extended by conditional augmentation as appropriate.

3.8.1      Effect of Characteristics [A10]

There are two distinct aspects of alignment:

  • Absolute State:
    • Eventual Consistency: Ensures that the client Controller view of state of controlled system aligns with the state of the view of the actual controlled system as presented by the provider
      • If the controlled system stops changing, once the stream (all changes) has been absorbed by the controller, its view of the current state of the system will be aligned with the actual state of the system
    • Context Detail: The information agreed to be conveyed from the provider to the client
      • Note that some clients will choose to selectively prune out information that is not relevant to them
      • Information conveyed in a context may be temporarily increased as a result of a recognized short-term need (value)
        • This form of context expansion is not available in TAPI 2.1.3 (see later )
  • Change of state:
    • Detail will necessarily be lost (loss of fidelity) when there are communications failures, but the loss will be such that less relevant information is lost first
      • Removal of noise (such as rapid clearing and then re-raising alarms) will be generally beneficial

3.8.2      Streaming the context [A11]

In the described application:

  • The provider presents a view in terms of a context and all of its contained instances
  • The client maintains alignment with that view.
  • The event stream is used for gaining and maintaining alignment with a view, i.e., a TAPI context

3.9            Using the stream

Once the client has identified the available streams to connect to, the client simply acquires the necessary authorization (see later [A12] ) and connects. The following provides a brief sketch of alignment. The process is discussed in far greater detail later in the document (see later ).

The next subsections consider the client connection to and receiving from a stream.

3.9.1      Initial connection

On initial connection, the client provides a null token. This causes the provider to stream from the oldest record. The client can continue to consume records from the stream ongoing.

The initial records received by the client will be for the entities that have not changed for a “long” time.

3.9.2      Tombstone (Delete) retention passed

As the client continues to consume the stream it progresses past the Tombstone (delete) retention point (i.e., is receiving records that have a timestamp that is the less than the Tombstone retention (delay) from the current time), and recent tombstones will be received along with newer changes.

Compaction will remove multiple reports about the same entity, but as the stream progresses further it is possible that an update is received that overwrites previously received entity state or a tombstone is received that deletes an entity that was read earlier. This is where compaction had not yet removed the entity when the stream was started (potentially because the event causing the newer record had not yet occurred).

3.9.3      Compaction delay passed

After some time, the client consumes past the compaction delay point (i.e., is receiving records that have a timestamp that is the less than the compaction delay from the current time). From this point onwards the client is receiving all recent changes and is aligned with network state as it was perceived by the provider at some recent point in time. Whilst beyond the compaction delay point the client will receive all event reports for the context.

3.9.4      (Eventual) Consistency achieved

If the controlled system stopped changing, then the client would eventually reach the newest record and would be aligned with the provider view of the state of the controlled system

3.9.5      Degraded performance

Information fidelity is reduced if the client slips back by more than the compaction delay as compaction will remove some change detail.

3.9.6      Need for realignment

The client will be forced to realign if it is delayed by more than the tombstone retention. The behavior of the provider at this point is equivalent to that when there is an initial connection.

3.9.7      Summary

The above detail can be summarized:

  • The whole stream is acquired by the provider on start-up 1
  • Missed records are acquired as appropriate where tombstone retention has not been exceeded and caused any loss of integrity
    • The client restarts from the last received record
  • Whole stream acquired on crash-recovery and where tombstone retention has been exceeded
    • Compaction removes noisy history and allows rapid alignment with current state through a “replay” of the history

3.10       Record content

The stream-record allows for multiple log-records each of which includes a log-record-header and optionally a log-record-body.

The log-record-header provides information common to all records.

The Tombstone record may have only a header.

The log-record-header is as below.

Figure 4   Yang: log-record-header

Considering the fields in turn:

  • tapi-context: This field can be omitted for TAPI 2.1.3 as the interface does not currently support context uuid.
    • In future this will enable systems that support more than one context to ensure the context of the stream is present in the record.
  • token: The normal identifier of the record.
    • After connection failure, it is this value that is used by the client to indicate to the provider the last processed record such that the provider can determine which record to send next.
  • full-log-record-offset-id: the long hand identifier of the record to allow the client to interpret stream structure and position.
    • This field is mandatory but can contain a repeat of the token for simple solutions.
  • log-append-time-stamp: The timestamp for the record being placed in the log
  • entity-key: the reliable identifier for the entity that is used for compaction and tombstone processing.
    • This need not be the entity uuid so long as it is invariant for the life of the entity, and it is unique.
  • record-type: Indicates the type of record (e.g., if the record is Tombstone)
  • record-authenticity-token: allows the provider to supply a value with the record that the client has a mechanism for confirming so as to validate that the record came from the expected originator

The log record body includes generalized content and is augmented with specific content depending upon the value of a control property, record-content.


Figure 5   Yang: log-record-body

 

The log-record-body includes:

  • event-timestamp: The time of the event as recorded by the originator.
  • event-source: Indicates the dynamic nature of the event.
  • parent-address: Provides the yang tree location.
  • record-content: Identifies the structure of the content and is used to control the augmentation.

It is augmented with an instance of a class. This this allows for any class from the model to be reported.

3.11       Considering order/sequence and cause/effect

3.11.1 Time

When determining the cause and effect of any behavior in the controlled system it is necessary to have visibility of relevant state and to know the time of each change of state. The time units must be sufficiently fine to allow all relevant event sequencing to be determined [3] .

The time of change at the source of change needs to be propagated as data in the stream and hence needs to be in the report of each instance of the thing (and in any stored form that exists between the source and the stream). This is recorded in event-time-stamp.

The time the record was logged is log-append-time-stamp.

3.11.2 Backend stream details

The implementation solution may partition the log supporting a stream. This may cause stream content reordering. Clearly, event reordering across different sources is a fundamental behavior due to relativity and differential delay. It is expected that the order for events from each single state machine will be maintained.

Regardless, the timestamp granularity must be sufficient to ensure relevant chronological order can be recovered. This also assumes that robust time synchronization mechanism is present at each event source and henec hence in the controller/controlled system as a whole.

3.12       The Context

TAPI is a machine-machine interface. As noted earlier, the client maintains a local repository representation of the information from the provider. The information is defined in terms of the context. As the context has been agreed (currently, up front during system design), the context is what the client system wants/needs to see.

Where the client needs less detail than is provided by a specific context the client can:

  • Locally filter out information that is not of interest
  • Change the context.
    • The context can be modified as necessary so long as other clients agree with the change
  • Request construction of a specific context
    • Several contexts can be provided

On this basis:

  • Individual specific queries on the provider are not necessary.
  • Notification filters, that are not simply the realization of the view, are not be necessary. The context defines the filtering for the client.

Any changes in the required information are handled by changes in the context. See later discussion on changing the context and spotlighting

3.13       Handling changes in the Context

The stream relates to the things in the context. Changes include:

  • An instance of a thing being added/removed from context within the current definition of the context
    • E.g., the creation of a connection
  • The context definition being changed such that an instance of a thing appears/disappears
  • An instance of a thing in the context changing such that an element are added/removed from the thing
  • The value of a property of an instance of a thing already in the context changing

See later on changing the context.

3.14       Reporting change

For the compacted log solution, whole entities are streamed on creation, deletion and change of a property. Hence , for example, if a single property in a CEP changes, the whole CEP is logged and streamed.

Note: Separation of properties that are slow rate change (config) from properties that are high rate change state and isolating independent state in separate entities is advisable. The alarm and state model provide for streaming allows a separation of small per-state entities from large slow changing config entities. The current TAPI OAM model also isolates properties related to monitoring results from properties related to configuration.

3.15       System engineering

For the solution to operate reliably:

  • System engineering must be such that under normal and normal bad day circumstances the client is able to keep up well with the provider system (otherwise the client will suffer ongoing alignment issues in terms of lag and potentially in terms of fidelity). Eventual consistency is acceptable if there is only a “short” delay to alignment with any specific state.
  • For realignment to be successful, the client must be engineered to be able to read all records in the tail of the log up to the Tombstone retention point in under the Tombstone retention time. On this basis
    • The tombstone retention time may differ for different cases and different streams
    • For a stream with only limited volumes of long term records the tombstone retention could be quite short
      • Under these circumstances, tombstone retention is probably determined by likely comms down times
    • For case where alignment takes days, tombstone retention would need to be in terms of days

The solution can be tuned to balance pace to achieve consistency (eventual consistency) with the fidelity of information when under stress [4] .

3.16       Eventual Consistency and Fidelity

Considering the supported-stream-type structure discussed earlier, there are two key time settings for the compacted log solution “tombstone-retention” and “compaction-delay” [5] .

  • Client read delay less than compaction delay
    • Client is behind on absolute state but is losing no detail.
    • The client can potentially catch back up if:
      • Rate of append reduces due to conditions in the monitored environment changing
      • The client gains additional resources that allow it to deal with greater than the current rate of append
  • Client read delay is greater than the compaction delay but less than the tombstone retention time
    • Client is behind on absolute state and is losing fidelity as some changes are being compacted out of the log (if the client is less interested in short lived things than long lived things this may not be a significant problem)
      • A rapid intermittency may become completely invisible (e.g., an active – clear alarm pair)
      • All changes that happen at a rate slower than read delay will be visible
    • The client can potentially catch back up if:
      • The delay was due to a long comms down issue that has now recovered, and the client capacity can readily deal with the current append rate
      • Rate of append reduces due to conditions in the monitored environment changing
      • The client gains additional resources that allow it to deal with greater than the current rate of append
  • Client read delay is greater than the tombstone retention time
    • Client has now potentially lost the eventual consistency and must realign by streaming from the “oldest” record (offset zero)

3.17       Stream Monitor

TAPI 2.1.3 also offers a rudimentary capability for monitoring the streams. This allows an external client that has the appropriate capability and authorization to monitor which clients are connected and how their stream is performing.

Where this capability is supported for ?   each client, streaming connection is monitored for the id of the last record written to and read from the log. This allows an administrator to get a view of how delayed a client is.

In TAPI 2.1.3 this is for PoC analysis. It is expected that the feature will advance significantly in future releases as a result of experience gained from PoC activity.

3.18       Solution structure – Architecture Options

Two options are explored, one provides full compaction support, the other uses a more traditional structure to feed a stream and provides a restricted emulation of compaction. Either can be used to support the current TAPI streaming solution sufficiently.

There may be other approaches that provide a suitable capability, i.e., reporting current state and change via whole entity reporting through the same single stream, so as to achieve eventual consistency with current state cost optimized potential for loss of fidelity.

The key consideration is the external behavior and not the specific mechanism used to support it.

3.18.1 Full compacted log   [ Some functions such as policy control and del ay monitor are not described ]

The figure below shows a stylized view of a controller controlling a network and providing a stream to a client.

 

 

Figure 6   Stylized view of example controller offering full compaction

The key features highlighted in the diagram are:

  • Compacted log providing current state, recent Tombstones and recent changes
  • Pipeline providing guaranteed delivery (at least once) whilst connection in place along with provider-initiated connection force drop control when realignment is necessary
    • To align the client does not need to do anything other than connect to the appropriate stream and receive from the stream
    • It is assumed that the client will extend the integrity of the stream to include the process of storage (as shown) so as to enable the client to maintain alignment with current state (eventual consistency) and to build and maintain history
  • A stream monitor (depicted as a control loop on the pipeline as an aspect of control of control) that ensures achievement of eventual consistency

Note that the assumption is that other pipelines are in place (not shown here) throughout the overall flow from device to client to ensure no relevant loss of information from the device and to ensure that the solution is always in a state of eventual consistency with current network state.

3.18.2 Emulated compaction

This approach uses the same pipeline mechanism but feeds the pipeline from a composite store. This does not offer the full compacted log capability. The behavior is as if the tombstone retention and the compaction delay are set to the same value (simplistically, the end of the log, but strictly any point in the log).

On client connection:

Current state may change during streaming of current state such that a change statement is appended to the truncated log.

Once the provider has sent current state, ensuring that all states that have not changed since connection of the client have been streamed to the client, it would then begin to stream from the truncated log. The truncated log may include some states that have already been streamed. As the client is necessarily designed to be able to deal with repeated statements this will not be an issue.

If the connection to the client is dropped, the client will reconnect providing the token of the last record it fully processed. The provider can code into the token any information that helps it determine what to stream next.

Clearly, the client must be able to retrieve current state in a shorter period than the log truncation time so that changes are not lost. If the log wraps during streaming of current state, the provider will have to restart the alignment (by dropping the connection and by ignoring the client token).

It would not be unreasonable for the provider to shorten the truncated log once alignment has been achieved.

The provider is expected to add a Tombstone record for every delete record.

Figure 7   Stylized view of example controller offering emulated compaction

The provider behavior is feeding the stream is very similar to a traditional behavior (send current in response to get then notify of change, other than the log is of whole entities).

In this realization the provider side does not carry out active compaction and hence does not provide the elegant degraded performance under pressure of the full compacted log approach.

3.19       Using the compacted log approach for alarm reporting [A14]

3.19.1 Specific alarm characteristics   - raising/clearing an alarm

A device raises an alarm once specific criteria have been met and clears the alarm once other specific criteria have been met. The criteria could involve:

  • Counting traffic related events where the count is within some window (often sliding and sometimes complex)
    • The alarm will be raised when the count exceeds some threshold and will be cleared when it drops below some threshold
      • The two thresholds provide hysteresis that brings some stability to the alarm
    • The windows are often very short and can cause extreme intermittency under particular network scenarios
  • Measuring an analogue property with several options
    • The alarm may be raised when the measurement exceeds some threshold and cleared when it drops below another threshold with hysteresis
    • The alarm may be raised when the measurement drops below some threshold and cleared when it raises above another threshold with hysteresis
  • etc.

The counts can be of the occurrence of other threshold crossing and hence the alarm definition at origin may be very complex.

3.19.2 Key Features of an alarm solution (example usage)

The following sections work through the key features of an alarm solution as an example of usage of streaming.

The alarm streaming solution has a particular delete/tombstone behavior to provide the best performance. This is highlighted in Section 3.19.7 ….

3.19.3 Log strategy

  1. The TAPI alarm stream is fed from a compacted log
  2. There will be a dedicated connection through which only alarm records are propagated
  3. The log compaction delay is set to allow for normal operation with a client to enable the system to deal with temporary communications failure with no loss of fidelity  
    • When compaction is applied fidelity will be lost, although only rapidly changing and fleeting alarms will be lost
    • In any solution, the client may be a little behind or may suffer a short communication disruption   without significant impact on operational quality
    • The proposed compaction delay setting is 10 minutes for a system with reasonably well engineered platform capacity and communications
      • Clearly having an alarm system that is greater than 10 minutes behind the provider will degrade location performance, however experience may show that this time is too short
    • The solution may allow for this property to be adjustable in the running system, through a mechanism suitable for expert access
      • Dynamic compaction delay might be beneficial in cases where local control can be applied, and an unusually long comms down is being experienced
        • Under these circumstances the compaction delay could be increased such that the clients lose no fidelity (although they are clearly behind whilst the comms is down)
    • Note that:
      • ideally, during the normal operation, the client would be at most a few minutes behind the current append time
      • the volume of information within the compaction delay time depends upon the rate of change of things in the monitored environment
      • in a sophisticated solution it will be possible to allocate more resources to the TAPI client, when it is under pressure, to enable it to process faster
  4. The log tombstone retention is set to allow for reasonable communication failure/recovery where there may be significant failures and to allow for client reboot.  
    • The proposed tombstone retention setting is 4 hours   for a system with reasonably well engineered platform capacity and communications
      • This solution may allow for this property to be adjustable in the running system, through a mechanism suitable for expert access
      • Tombstone retention will always be greater than or equal to the compaction delay
  5. The normal record (non-tombstone) retention will be infinite
    • This property is not be adjustable as it is fundamental to the correct operation of the mechanism
  6. If compaction operates on a segment by segment basis (such that the segment with the most recent alarms is never compacted), then the segment size   should be such as to not hold significantly more records than would occur during the compaction delay time when operating under normal conditions (ideally a segment would hold significantly less)

See system engineering below.

3.19.4 Alarm behavior

The following highlights key design decisions (in the context of justifying/explanatory information)

  1. An alarm detector will have an active and a clear state    
    • Explanatory Information:  
      • An active is considered as more important than the clear (hence the states are asymmetric in importance)  
      • Most detectors in the network will be clear under normal circumstances (hence the normal state can be considered as clear and the off-normal state as active)
    • The alarms will be reported a s ACTIVE or CLEAR
      • An ACTIVE will be followed at some point by a CLEAR (the basic state machine is simple with only Clear → Active and Active → Clear transitions)
    • An alarm CLEAR shall be followed immediately by a Tombstone
      • A Tombstone alone shall be considered as equivalent to a clear
      • The tombstone causes the alarm to be removed from the log. This then has a similar characteristic to a traditional alarm reporting mechanism
  1. If an entity upon which the detector is lifecycle dependent is deleted and just prior to the deletion the alarm was active, then the alarm (and hence its detector) will be tombstoned
    • If the alarm was not active, then there should be no Tombstone as there is nothing in the log to remove
  2. The stream related to a particular detector may be processed and stabilized such that it gains a state, INTERMITTENT, indicating that the detector is intermittently rapidly cycling through active and clear states
    • The alarms will be reported as ACTIVE, INTERMMITTENT or CLEAR
    • All transitions are legal (i.e., Clear   → Active, Clear   → Intermittent, Active   → Clear, Active   → Intermittent, Intermittent   → Active, Intermittent   → Clear)
    • The trigger for moving to/from Intermittent state will be defined by policy. Ideally the policy is configurable

Note: It would be reasonable to also propagate other processed network property types through the same stream as alarms if the network property has similar characteristics to an alarm. For example, operational state is asymmetric in importance with a normal state and off-normal state where the normal state could be considered as equivalent to a clear.

3.19.5 Condition detector and alarm structure

The condition-detector is as below (descriptions omitted to save space).


Figure 8   Yang: condition-detector (descriptions omitted)

Note that the Yang does not show the mandatory fields. The field enforcement will be applied to the Yang in the next TAPI release.

The following values population would be expected:

  • Mandatory:
    • condition-native-name: The name used for the condition at the source.
    • measured-entity-native-id: The identifier (invariant over the life) of the instance of the measured entity at the source.
    • detector-native-id: The identifier (invariant over the life) of the instance of the detector at the source (e.g. a device).
    • condition-detector-type: Identifies the type of detector. This drives the conditional augmentation.
  • Optional
    • condition-normalized-name: Commonly used or standardized condition name.
    • measured-entity-class: The TAPI class of the measured entity.
    • measured-entity-uuid: The UUID of the TAPI entity that represents the entity measured at source.
    • measured-entity-local-id: Where the measured entity is a local class, and hence does not have a UUID, the local ID is provided in conjunction with the parents UUID (in the measured-entity-uuid property).
    • detector-uuid: The Uuid of the TAPI entity that represents the detector.


The condition-detector can be augmented with properties related to alarms as follows:

Figure 9   Yang: condition-detector (descriptions omitted)

The alarm-detector-state indicates whether the alarm is ACTIVE, INTERMITTENT or CLEAR. This is the essential state of the alarm

The legacy-properties are provided to deal with the traditional alarm reporting properties. Alarm systems of the 20th century were based primarily on local lamps (initially filament bulbs) and bells. Lamps can only be on or off, and bells sounding or not sounding, so alarms were Boolean in nature. Where a detector was essentially multi-state it was converted into multiple Boolean statements.

The management of the equipments was essentially human only and local only (there were rarely remote systems).   The device with the problem was the only possible indicator of importance and it had only three distinct bulbs to illuminate (filament bulbs tend to fail requiring costly replacement).

The devices were relatively simple in function and analysis of the detectors was crude. There was only the device to indicate severity. The device also could provide the best view as to whether a service was impacted, although clearly it had almost no knowledge.

In a modern solution with well-connected remote systems that increasingly analyse problems and where there is increasingly 'lights out' building operation, the device's guess at severity etc. is irrelevant. In addition, with sophisticated resilience mechanisms, the device cannot make any relevant statement on whether the customer service has been impacted.

Likewise, in a world where there were no remote systems and local management was the only practice, alarms had to be locally 'acknowledged'. Where there are remote systems, per alarm acknowledge is burdensome.

However, many solutions and operational practices continue to use the historic schemes. On that basis, the schemes are supported but relegated to optional. At this point in the evolution of control solutions legacy-properties are probably mandatory, however, it is anticipated that as control solutions advance the legacy-properties will become irrelevant.

The legacy-properties are:

  • perceived-severity: A device will provide an indication of importance for each alarm. This property indicates the importance. In some cases, the severity may change through the life of an active alarm.
  • service-affect: Some devices will indicate, from its very narrow viewpoint, whether service has been affected.
  • is-acknowledged: Devices offer a capability to acknowledge alarms (to stop the bells ringing). Often an EMS will offer a similar capability. This property reflects the current acknowledge state.
  • additional-alarm-info: Often, alarms raised by devices have additional information. This property can be used to convey this.

3.19.6 Alarm Identifier and location

This section further clarifies the rationale for the choices of mandatory and optional identifiers.

  • The alarm identifier is essentially the identifier of the detector. This should be based upon native NE values to ensure consistency over time and across resilient systems etc.
    • The detector is at a functional location in the solution and detects a particular condition at that location. It should be identified by these two key properties. i.e. functional location and condition.
    • The detector is long lived and may emit many active and clear events through its life
  • The alarm will normally be reported via TAPI against a valid TAPI entity and hence the overall location of the detector will include the identifier of the TAPI entity
    • The report will also include information in the location from the NE in NE terminology to enable the relating of information acquired directly at the NE with information acquired at a higher system
    • The TAPI model allows for alarms without a full TAPI resource id, although this should be a rare case
  • Where the detector relates to a thing that is not fully modeled in TAPI, e.g., a power supply, then:
    • The alarm will be reported against a containing TAPI entity
    • The identifier of that detector will include a meaningful index that include interpretable sub-structuring to describe the position of the detector (again based upon NE values)

3.19.7 Alarm tombstone behavior

As noted earlier, the alarm clear will be followed immediately by a Tombstone record. As also noted, the deletion of an alarm detector, where the alarm was active prior to the deletion, will cause at least the logging of a Tombstone record. Allowing for compaction delay:

  1. The Tombstone/clear will cause the compaction process to remove the corresponding active alarm
  2. The Tombstone will also cause the compaction proves to remove the clear (that immediately precedes   it)
  3. The Tombstone will eventually be removed as a result of tombstone retention being reached

3.19.8 Time

Whilst the timestamp for the alarm is not a Controller responsibility (other than for Controller generated alarms, e.g., disc full), there is an expectation of correct behavior.

A versatile approximate-data-and-time structure has been used for the event-time-stamp to allow representation of inaccuracies.

The time stamp for an alarm should be as follows:

  • The timestamp from the ultimate source, representing the time of occurrence of the event that triggered the raising or clearing of the alarm, should be preserved through the entire passage of the alarm information through the source device, and any chain of Controllers, and presented in the appropriate field in the TAPI log-record-body.
    • Where the source does not provide a time stamp, a timestamp can be added by a controller as the alarm progresses through the solution. This timestamp should be marked with a spread value BEFORE
  • In general, it is extremely important to accurately capture the leading edge of a problem. Hence, for each detector, the time of first occurrence of an active alarm after a "long" period of clear is the most significant time.
    • The time of clearing can be less precise
  • The log record also has a log-append-time-stamp
  • Ideally, the time of occurrence of an alarm should be the time of the entry into the confirmed assessment as to whether an alarm has occurred (and not the time of exit from that assessment).
    • Likewise, the time of clearing of the alarm should be the time of entry into the confirmed assessment as to whether an alarm has cleared.

3.19.9 Detected Condition normalization

The   alarm detected condition shall be presented:

  • Minimally: In native form, i.e., as provided by the NE,  
  • Additionally: In normalized form, i.e., complying with a list of standard alarm types

An alternative is to provide translation metadata to enable normalization from the native form at the client. This metadata can be provided separately from the alarm stream and related to each detector with a relevant mapping. There is not standard expression for translation metadata.

3.19.10                     Meaningful detection (device or any other source)

Under certain circumstances a detected condition may:

  • Become meaningless, e.g., when a remote function is disabled, in which case the alarm shall be tombstoned
    • This may require function disable action to be taken on the local system
  • Have inverted meaning, e.g., when signal should not be present on a port, in which case, if there is a "signal not present" alarm active, then it should be tombstoned and if there is a signal present a "signal present" alarm should be raised
    • This may result from some local action on the entities supporting signal flow
    • Essentially, the "signal not present" condition detection becomes meaningless and a "signal present" condition detection becomes meaningful


4        Use Cases

4.1            Use Case background

4.1.1      General TAPI considerations - Context

Initial TAPI deployments using TAPI 2.1.3 are applied to solutions where TAPI is feeding a controller, orchestrator or OSS etc. from a single context that is taking a view of the entire network for all layers.

4.1.2      Underlying behaviour

It is assumed that at start up the Controller will form the necessary full context and will initialize various streams. It is also assumed that the Controller will recover from any internal problems to ensure that the streams allow the client to achieve eventual consistency.

There are a number of critical behaviors assumed from the underlying system (essentially use cases for other components):

  • TAPI is fed from a reliable source that has necessary notifications and access to current state etc. (e.g., has alarm notifications).
    • I.e., the underlying system through the Controller to the NE is reliable (appropriate   pipelines etc) so that the Controller cannot lose eventual consistency with the NEs
    • The solution may use compacted stream in which case the compaction delay and tombstone retention are compatible with TAPI needs
  • The notifications are well behaved both at the NE level and within Controller, e.g., for alarms such that  
    • An alarm will have a defined active and a defined clear  
    • Only legal transitions (clear to active and active to clear etc.) are represented
  • If the resource is deconfigured/deleted a Tombstone will be logged for each dependent resource (e.g. alarm detector) related to the resource that was indicating active just prior to the deconfiguration/delet e ion.
  • If a circuit pack is configured the states of any dependent resources will be reported appropriately, e.g., if an alarm detector indicates active an active alarm records will be logged
  • If a circuit pack is deconfigured any dependent resource will be tombstoned, e.g., any dependent alarm detectors that are active will have Tombstones logged

4.2            Use Case Overview

This leads to a specific set of use cases. The first list of cases have not been expanded as they cover normal Controller behavior. The main focus here is the second list [A15] of use cases that correspond to TAPI interaction,

4.3            Building the TAPI stream on provider

  • The Controller (provider) initializes the streams
    1. E.g., connect to the relevant internal stream and aligns
  • The Controller reinitializes the stream after upgrade
  • The Controller recovers after internal loss
    1. Ideally the Controller would indicate potential loss of data on the stream (as a result of compaction and the need to resync due to tombstone retention time being exceeded). This is not covered by TAPI in 2.1.3.

Note that the stream should have the characteristics discussed in various sections earlier in this document.

4.4            The TAPI use cases [A16]

The following use cases are described briefly here and then illustrated in the sequence diagram. The use cases assume that Websockets over TCP is the chosen connection method

4.4.1      Connect to Stream and align - new client

The following activities are carried out (many asynchronously)

  • Connect and stream
    1. Acquire Auth Token
    2. Get connection information from available-stream retrieval
      • The supported-stream-type information will be used to interpret the available-stream information
      • The supported-stream-type will offer a Websockets option
    3. Use the provided endpoint URL to connect
    4. The client will connect with the null token to cause the provider to stream from the oldest record in the stream (offset zero)
    5. On connection both the client and provider stream processes are started, and the necessary communication is setup between client and provider
      • As a result, the pipeline is started
    6. The provider polls the appropriate log from offset zero filling the pipeline and responding to ongoing demand
    7. The client demands from the provider and buffers as appropriate
    8. The client store in a repository (e.g., Log, DB etc.)
    9. The client maintains a record of last commit
  • Send pong frame
    1. The client periodically sends a pong to keep the connection alive
  • Through the above activities the client works through the stream from initial record towards the the [A17] most recent record  
    1. Passing the tombstone retention "point"
      • It must take less than the tombstone retention period to reach the tombstone retention "point" in the log,
      • If it takes longer than the tombstone retention the connection will be dropped as Tombstones for records already read may have been missed
    2. Passing the compaction "point"  
      • From this point   onward the client will be getting a full view of changes
    3. Getting close to the head of the head
      • The client is well aligned with little lag

4.4.2      Steady state reception - well engineered client

The pipeline continues to operate with the client close to the most recent record

4.4.3      Event storm – bad day alarm example (or slow client)

Major intermittent failure in the network, for example, several micro-bends with active percussive interference, overloaded client with reduce compute power available

  • The pipeline continues but the client is absorbing alarms at a rate slower than the production and hence is slipping back down the log
  • Eventually the client will slip back beyond compaction point
  • If the problem resulted from excessive intermittent network activity the client will then benefit from compaction as much of the intermittent noise will be eliminated by compaction
    • The client will lose fidelity and will be sub-Nyquist sampling so may completely lose some repeated fleeting events
  • The client may hover in the compaction zone until its performance improves (via some mechanism for compute power enhancement) or the network stabilizes.
    • Note that the intention in future is to support sophisticated backpressure interaction controlling intelligent filtering in the devices and network

4.4.4      Event storm – very bad day alarm example (or very slow client) - many micro-bends and reduce compute power etc.

Assume major intermittent failure as before in an environment will lower capacity.

  1. Streaming: The pipeline continues as before but the client rapidly slips back toward the tombstone retention point
    1. If the client passes tombstone retention, then there is a possibility of loss of eventual consistency
  2. Fail: On passing the tombstone retention point the provider forces a disconnect by dropping the connection
    1. The client and the serve kill the pipeline
  3. Realign: The client reconnects with the previous token
    1. On connection both the client and provider stream processes are started, and the necessary connection is made between client and provider
      1. As a result, the pipeline is started
    2. The provider recognizes the token is outside tombstone retention and goes offset zero
    3. The provider informs the client, via the stream, that it is back at oldest record (offset zero)
    4. The client consumes the stream (as per (1)) to regain alignment
      1. If the environment has not changed the client will again slip back past tombstone retention and will get forced to realign
      2. The client may utilize some efficient realignment strategies
  4. Assuming the intermittency has reduced, or the compute capability has been restored, the client will regain alignment (eventual consistency)

4.4.5      Short loss of communications

  1. The client is consuming the stream with some delay
  2. On loss of comms the client and the serve kill the pipeline
  3. The log continues to progress on the provider side
  4. The client attempts to reconnect with the previous token
    1. This will be successful assuming the comms has recovered
  5. The stream is filled by the provider from the next valid token after the token provided
    1. In a short comms loss case, this token will be for the next offset
  6. Assuming that the client was near the most recent record of the log there should be no loss of fidelity (and clearly eventual consistency will be maintained)
    1. If the client was in the compaction zone, then there will be some loss of fidelity
    2. If the client was close to tombstone retention, then the short comms loss may have the behavior of a long comms loss

4.4.6      Long loss of communications requiring realignment

  1. The client is consuming the stream with some delay
  2. On loss of comms the client and the provider kill the pipeline
  3. The log continues to progress on the provider side
  4. The client/comms is down for longer than the tombstone retention
  5. The client attempts to reconnect with the previous token
  6. The provider recognizes that this is outside tombstone retention and streams from the oldest record etc.
  7. Eventually the stream returns to steady state

4.5            Message Sequence

The hybrid message sequence diagram below captures all relevant flows for the listed use cases. In the diagram the behavior of the Source (device) and TAPI Context are summarized. The diagram only shows presence and fundamental flow for these elements.

Two pipelines are shown in yellow. The TAPI context pipeline intentionally shows no detail as that is outside the scope of the interface definition.

The diagram shows coupled asynchronous parallel/concurrent repeating activities and independent asynchronous repeating activities each in a dashed box. Where the asynchronous activities are coupled there is a dashed line showing the relationship as a ratio of activity or as a 1 which indicates that "eventually" there will be the same number of activities in both asynchronous elements (as a result of a flow through both). Some activities are shown as nested. Where nested there is an indication where there are n repeats of the inner asynchronous activity for each of the outer activities. Buffers are shown to emphasize the asynchronous coupling. The compacted log is shown with a buffer symbol annotated with an "X" indicating compaction (the deletion of records in the log) and "0" indicating that the record is for all system time (where compaction will remove duplicates and hence contain the log size).  

To the left of the figure is the client side (which initiates the connections etc.), shown as a stylized example. The client is shown with both a DB option and a compacted log option for storage. The critical features are the "Pong" and "Last commit". The majority of the client depiction is to explain last commit. Last commit is used by the client on reconnection to continue where it left off.  

The external comms between client and provider is shown as a brown bar and is not considered in any detail. It is assumed that it is reliable (e.g., TCP) and is playing an active part in maintenance of the pipeline.  

1 The functions of WS endpoint stream server/ WS endpoint stream client, WS endpoint stream server source actor in the figure are not described. Because some important message sequeces , e.g. Pong message, are related to them, so it s valuable to introduce more d etails. Moreover, for   such functions   as Source and Process, Align, Commit to Store , it s also w orth adding more description s for deep understanding.  

[2] What functions are mand a tory for the TAPI streaming?

The client token is opaque so the client has no knowledge of sequence through the token, although there is also exposure of the sequence number, this is primarily intended for stream analysis (note that it may be beneficial as part of normal behavior to validate communication).  

The middle of the diagram shows the provider and explains the basic flows related to initial connection, loss of connection and forced connection drop.

To the extreme right of the figure are vertical progression bars that show phases of interaction between the client and provider.

有几个角色没体现:   WS endpoint stream server / WS endpoint stream client, WS endpoint stream server source actor,

缺少:   compact stream emulator

Figure 10   Hybrid Message Sequence Diagram for example implementation corresponding to Use Cases

 

The following figure shows the use cases and relationship to the message hybrid message sequence chart in the figure above. The heading bars are the same as the righthand vertical bars on the previous figure. The flow across the figure assumes a chaining of the use cases in the order they are described in the earlier sections.

Figure 11   Phases of interaction for Use Cases

4.6            Use cases beyond current release

Beyond the current release there will be further improvements including the ability to dynamically adjust the stream behavior and the feeds to the stream as follows:

  • Adjustment of compaction delay and tombstone retention on-the-fly
    • Allows for tuning of stream behavior at initial start-up and during predictable comms failures
  • Building and adjusting the context (creation, expansion, contraction and deletion)
    • Allows for multiple clients with differing needs and security clearances etc.
      • Negotiate Context opportunities based upon policy and client role etc.
      • Build explicit context topology
        • Various interactions to set up intent for nodes and links that themselves need to be realized in the underlying structure
      • Note that the initial requests are in terms of shared knowledge such as city or building location and generalized termination points/flows with minimal technology detail
    • In the general solution, with TAPI feeding a controller, orchestrator or OSS etc., there could be several alternative contexts that can be provided. Contexts may focus on a single layer or layer grouping or on a region of the network etc. In a more sophisticated solution where there are many clients each with a slice, in these solutions various negotiations would be required to agree and form the context.
    • Note: Currently, the context is defined by a default intent where there was no opportunity for the client to express the context intent over an interface.
  • Taking advantage of context adjustment capabilities to increase and decrease the intensity of view of information. This is applied where the intensity could not be handled across the whole context and hence is a focus. This may be where the parameter changes very often.
    • Spotlighting: Allows the client to selectively increase the fidelity of measurements by changing the measurement policy for a specific property and/or by including an instance of a property in the context where that property is usually not monitored
    • Single snapshot: Allows the client to select a property to take a momentary view of via the stream. This may be the capturing of a single counter value where that counter changes very often (e.g., a packet counter) such that streaming of the raw value would be excessive even for a single measure.

For each of the above it will be necessary to enhance the expression of capability such that the client can know what opportunities for adjustment are available. It is expected that this will be expressed using machine interpretable specifications.

4.7            Message approach (websocket example)

The messaging is as defined below:

  • General structure for the websocket url is:
    wss://<host>/tapi/data/context/stream-context/available-stream=<uuid> ” where the uuid is acquired through a get of available-stream.   Using this url would start the stream from the oldest record.
  • Considering the "Connect (token)" from the message sequence diagram, this becomes “ wss://<host>/tapi/data/context/stream-context/available-stream=<uuid> / ?start_from=<token> ”. Omitting the token causes the provider to start from offset zero (i.e., the oldest record).
  • In some cases it may be relevant to start from the latest (e.g., for non-compacted logs)
    wss://<host>/tapi/data/context/stream-context/available-stream=<uuid>/?start_from=latest ”.


5        Appendix – Considering compacted logs

The following section considers Kafka as an example implementation of a compacted log and then discusses implications of compaction and some storage strategies.

5.1            Essential characteristics of a compacted log

Compaction is described in the Kafka documentation at   https://kafka.apache.org/documentation/#compaction

Figure 12   Kafka compaction

With retention (for non-deletes) set to “forever”, the log becomes a single source of truth for absolute state (eventual consistency) and change of state (cost effective fidelity).

Client essentially reading next record in sequence:

  • To the left of the Cleaner Point ensures full fidelity
  • Between cleaner and delete retention points provides reduced fidelity but still supports eventual consistency
  • To the right of (before) the delete retention point potentially violates eventual consistency and requires the client to go back to read record offset zero

5.2            Order of events

  • Disordering of records
    • In a distributed system, information from the various parts is received with varying delay such that it is likely to be out of order
    • It can be assumed that time of day is well synchronized across the network
    • Event order can be regenerated (within reason) based upon time of event at source
    • Critical ordering that should be preserved through the log and pipeline is that related to each single event source. For example, an consider an alarm detector.
      • It is possible that the time granularity at the source is not sufficient to resolve the active-clear sequence when cycling is very rapid as they can both appear to be at the same recorded time.
      • If the detector goes active and then clear, that ordering should be preserved through the system such that time granularity problems are not encountered, so that the view of system state is always eventually consistent with the state of the controlled system
  • Multiple receipts of the same record: Idempotency
    • A record received more than once should not have any impact on system behavior

5.3            Compaction is a real implementation

Considering compaction delay, in general system load will cause the compaction to sometimes drift such that less compaction occurs that is ideal. In Kafka, compaction does not operate at a fixed cleaner point as the head segment is not compacted. When the head rolls to become a tail segment compaction can happen but may be delayed. The behavior is not fully deterministic as it depends upon segment fill and intermittency occurrence.

 


6        References

[WC3 SSE] https://www.w3.org/TR/eventsource/ Server-Sent Events

[RFC6455] https://tools.ietf.org/html/rfc6455   The websocket protocol

[ONF TR-512] https://www.opennetworking.org/wp-content/uploads/2018/12/TR-512_v1.4_OnfCoreIm-info.zip

[ONF TR-5XX.1] TR-5XX.1 TAPI v2.1.3 Reference Implementation

7        Definitions

7.1            Terms defined elsewhere

This document uses terms defined elsewhere.

7.2            Terms defined in this TR

The primary purpose of this TR-5xx ….

7.3            Abbreviations and acronyms

Section where the cross reference is only relevant for abbreviation/acronym interpretation purposes):

ACR Acronym

8        Individuals engaged

8.1            Editors

Nigel Davis Ciena

8.2            Contributors

Arturo Mayoral Telefónica

Kam Lam FiberHome

Pedro Amaral Infinera

Andrea Mazzini Nokia

Karthik Sethuraman NEC

Malcolm Betts ZTE

Jonathan Sadler Infinera

 

 

End of Document


[1] This term loses relevance once the readings have been processe s d and abstract but is often still used.

[2] This is distinct from the idea projected by Yang, i.e. that the interface expresses the contents of the repository. It is assumed here that the repository structure and content is a decision for the Controller designer. The only constraint is that the exposed information must be in TAPI form.

[3] It may be beneficial to add an indication of time granularity to assist in cause/effect evaluation. For this to be fully beneficial the accuracy of synchronization of time would also need to be determined).

[4] Compaction is used intelligently to reduce realignment time whilst minimizing probability of loss of detail.

[5] When assessing the system performance overall delay must be taken into account At this point the timings are given ignoring comms buffering. Depending upon the comms buffers there can be a significant additional delayed information in the stream.


[A1] Delete in

[A2] [AMLL]: Definition of the terms:

"loss of fidelity" and "eventual consistency" would improve the description.

[A3] [AMLL]: Definition or reference should be provided to improve readability and common understanding.

[A4] [AMLL] The "record-retention" format (String) may be confusing. Maybe, if the parameter may be expressed in Time or Capacity there should be two attributes for each different measurement unit.

[A5] [AMLL]: I need to see if I understood correctly. The Controller (Server) produces a set of supported streams, then each of these streams may include a list of "record-content" items such Connection, Node… according to the attribute enum options. Then, my understanding is that the provider somehow filter the relevant objects to be "streamed" in each stream in advance, and it is not the client which can flexibly decide how to configure the stream. Is it this correct?

 

[A6] [AMLL]: A novel reader may not understand the concept. I suggest some introduction to the concept somewhere.

[A7] [AMLL] This section should be structure in a way to be developed further with the rest of "log-record" and "log-storage" strategies description, such as:

3.8 Log strategy

3.8.1 Log-record strategies

3.8.1.1 Compacted

3.8.2 Log-storage strategies

3.8.2.1 Whole entity on change

[A8] [AMLL]: Whole Entity On Change:…

[A9] [AMLL]: If there are several delete retention mechanisms, they should be described independently from the "log-record"/"log-storage" strategies. Otherwise if the delete retention is associated to the selected "log-record"/"log-storage" strategy, they should be explaining within.

Right now it is not very clear to me.

[A10] [AMLL]: If this is conditioned to the log-record/storage-strategy followed, I propose to include it within each specific strategy section for clarity.

[A11] [AMLL]: Same as previous comment.

[A12] [AMLL]: Include reference to section.

[A13] Is this the main difference between the two solutions?

[A14] [AMLL]: This seems to me more a UC than a general section. I would propose to include it in Section 4.

 

Maybe another interesting use case would be applying tapi-streaming to topology/connectivity-state changes monitoring, maybe a UC may compact sections 3.13, 3.14 and adds more concrete examples with a relevant TAPI model objects, such Topology or Connectivity.

[A15] Not mentioned both lists later.

[A16] [AMLL]: In general my idea of Use Cases description is a bit more applicated to specific network management purpose, e.g., Alarm management, Topology-Context synchronization, Connectivity-Service Lifecycle management… I think it is important not mix general streaming concepts provided by this model and the proposed framework in general and the specific use cases.

 

I think current proposed use cases could be somehow categorized, maybe something like Streaming Basics, Alarm Management, Context Alignment, Performance Management, Connectivity-Service lifecycle management…

 

Additionally I was mentioning in a previous comment, it would be helpful to assign a sort of coding to each UC for referencing/testing etc..

[A17] [AMLL]: Delete