This page lists all the fixed bugs and changes in the Vortex OpenSplice 6.11.x releases.
Regular releases of Vortex OpenSplice contain fixed bugs, changes to supported platforms and new features are made available on a regular basis.
There are two types of release, major releases and minor releases. Upgrading Vortex OpenSplice contains more information about the differences between these releases and the impact of upgrading. We advise customers to move to the most recent release in order to take advantage of these changes. This page details all the fixed bugs and changes between different Vortex OpenSplice releases. There is also a page which details the new features that are the different Vortex OpenSplice releases.
There are two different types of changes. Bug fixes and changes that do not affect the API and bug fixes and changes that may affect the API. These are documented in separate tables.
Fixed Bugs and Changes in Vortex OpenSplice 6.11.x
Vortex OpenSplice 6.11.1
|OSPL-14113 / 00020953||Idlpp for Python API creates circular dependency in the generated file.
In the OpenSplice Python API, when having an IDL which has a module B inside module A and using something from module B (e.g., an enum) inside module A, it results in a circular dependency during the import of module A.
Solution: The problem is now fixed.
|OSPL-14065 / 00020912||Incorrect inconsistency detection in the DLite differential alignment protocol.
The differential alignment protocol concluded wrongly that not all data was received from writers that didn't published any data yet. This could prevent the wait_for_historical_data from unblocking which can cause numerous application issues.
Solution: The protocol logic is fixed by excluding publishers from differential alignment calculations until they have published data.
|OSPL-14128||Race condition in management of instance lifecycle states
In the 6.11.0, a new mechanism was introduced that allowed the spliced to revive instances that lost liveliness due to a disconnect when their writer was re-discovered. However, this caused a race condition between the spliced trying to revert the instance back to its state prior to the disconnect, and durability/dlite, which are trying to update the instance to its latest state. Both mechanisms should have been commutative, but in certain scenarios they were not, and this could cause the instance to end up in the wrong life cycle state.
Solution: Both mechanisms are now fully commutative, and the resulting instance lifecycle state is now eventually consistent with the rest of the system.
|OSPL-13652 / 00020635||Add missing python API methods and ensure the subscriber partition setting is functional.
Added methods to set Qos, read_status and take_status. Ensured that setting the partition on creation of a subscriber does function and included a test for this.
Solution: You can now change the Qos policies of entities (but only those allowed by DDS). You can read/take status of an entity and set the partition of a subscriber by Qos, for example via the XML QosProvider.
|OSPL-13665 / 00020647||Issue with invalid samples when using read condition causes readers to get stuck unable to read samples.
In specific cases when a reader has both instances with invalid samples and others with valid samples, if samples are read using a condition on eg. view, sample or instance states, which an instance with invalid sample(s) doesn't meet, no other instances are considered and a 'no-data' result is returned to the application. Note this applies to operations such as take_w_condition but also to conditions in waitsets.
Solution: Processing instances with invalid samples contained a bug in an return code causing the implementation to stop iterating instances and return 'no-data' result to applications prematurely.
|OSPL-13743 / 00020702||Possible alignment mismatch when an asymmetrical disconnect occurs during alignment
When nodes get reconnected after being disconnected, a request for alignment data is sent. When there is an asymmetrical disconnect AFTER the aligner has received the request but BEFORE the aligner has actually sent the data, then the alignee drops the request but the aligner does not. When the asymmetrical disconnect is resolved, the alignee sends a new request for alignment data to the aligner. It now can happen that the aligner sends the alignment data of the FIRST request to the alignee, and the alignee considers this as the answer to the SECOND request. When the alignee receives the alignment data to the SECOND request, the data gets dropped because there is no outstanding request anymore. This can lead to an incorrect state whenever the alignment set has been updated between the first and the second request.
Solution: The answer to the first sample request is not considered a valid answer to the second request any more.
|OSPL-13787 / 00020504||Python: "dds_condition_delete: Bad parameter Not a proper condition" entries in ospl-error.log
In some circumstances, the ospl-error.log can contain errors with the text "dds_condition_delete: Bad parameter Not a proper condition", even though customer code appears correct. This is due to unexpected interactions between the Python garbage collector and the underlying C99 DCPS API used by the Python integration. An attempted fix (OSPL-13503, released in 6.10.4p1) was later reverted in 6.11.0 (OSPL-13771) because it caused a memory leak.
Solution: The issue has been fixed. The fix tracks implicit releases of C99 'condition handles' by tracking garbage collection of the parent DataReader class, and prevents duplicate calls do dds_condition_delete in those cases. Note that customer code can still cause an entry in ospl-error.log, but only when code explicitly calls DataReader.close() prior to explicitly calling Condition.delete(). In such a code sequence, an error message is a reasonable expectation.
|OSPL-13869 / 00020756||Long delay to get read and write functions result for large sequence types when using Python API
When using OpenSplice Python API in the applications, there was a delay to get the results of the read and write functions for sequence types. The problem was only apparent with large sequences: ~ 1MB or so. In the Python API code, it was looping through the list in data construction, serialization and deserialization. Thus, it was taking a long time to get results back from the read and write functions for large sequence types.
Solution: For sequence types, the Python API code is now updated to generate array.array in data construction and deserialization if the sequence type is supported by the built-in array module. The serialization process for the sequence types is also updated to skip the looping through of the data if it is an instance of array.array. The updated code reduces the read and write functions time for the sequence types that are supported by the built-in array module. It is essential to regenerate the code from the IDL file using idlpp in order to achieve performance improvements. The unregenerated code will continue to execute correctly. Please note that the large sequences of types not supported by the built-in array module (enum, boolean, string, char and struct types) will not see any performance improvements.
|OSPL-13914 / 00020791||A delay in the scheduling of the durability service threads may cause the spliced daemon to consider the durability service as died.
The durability service has to report it's liveliness regular with the spliced daemon. When the durability service does not report it's liveliness within the lease period the spliced daemon considers the durability service as died. The durability service contains a watchdog mechanism to monitor the liveliness of the threads when this watchdog finds a thread non-responding the durability service will delay the liveliness notification to the spliced daemon. When a durability thread goes to sleep it informs the thread watchdog accordingly. However a high CPU load may cause that a thread sleeps longer than expected. In this case at one location in the code the sleep time reported to the thread watchdog is to small which may cause that the thread watchdog considers a thread as non-responsive when the sleep time is much longer as expected.
Solution: The sleep time reported to the thread watchdog is increased to take into account scheduling delays caused by a high CPU load.
|OSPL-13926 / 00020810||The ospl stop command may block.
When an 'ospl stop' or 'ospl stop -a' command is issued then the ospl tool will search for an associated key file(s). When the key file indicates that spliced daemon is still initializing the ospl tool will wait until spliced reports that is has become operational. This is done to prevent problems when the ospl tool tries to terminate the still starting spliced daemon. However when a spliced daemon was killed during it's startup it may leave a key file which is still indicating that it is initializing. When the 'ospl stop' command finds such a key file it will wait indefinitely.
Solution: The 'ospl stop' command finds a key file indicating the initializing state it will check if the corresponding spliced daemon is still running. When that is not the case it try to cleanup the resources left by the killed spliced daemon and remove the key file.
|OSPL-13954||MATLAB: IDLPP generates incorrect code for sequence of structs
For the Matlab language, IDLPP generated in correct code to serialize (write) a sequence of structs, resulting in an exception during sample writes.
Solution: IDLPP has been updated to generate the correct code.
|OSPL-13964 / 00020844||Instance liveliness sometimes incorrectly set to NO_WRITERS for late joiners
When a late joiner joins an already running system, the TRANSIENT samples already published by the other nodes get aligned by either durability or dlite. The alignment consists of two phases:
Discovering the remote Writers
Acquiring the samples that they sent.
Both phases are executed asynchronously, and so you might end up receiving samples from a Writer that you did not yet discover, In that case, the samples will be inserted in the NO_WRITERS state, and the delayed discovery of their Writer will not correct for this.
Solution: A Writer whose discovery is delayed will now correctly update the liveliness of instances for which samples had already been received prior.
|OSPL-13973||When the call to DDS_DomainParticipant_find_topic() times out no error message should be reported
One of the way to acquire a topic is to call DDS_DomainParticipant_find_topic(). When the topic cannot be found within the specified duration an error message was generated in ospl-error.log. The generation of such error message is incorrect, because not being able to find the topic is legitimate behaviour.
Solution: The error message is not generated any more.
|OSPL-14049||Serializing big samples on Windows may consume a lot of CPU when using ddsi
The ddsi serializer was serializing into 8Kb blocks, reallocating to an additional block if not big enough. For very big samples this potentially resulted in a large number of realloc operations, which on Windows often resulted in memcopying one memory area into another before proceeding, consuming a lot of CPU in the process, which as a consequence was impacting the network latencies quite dramatically.
Solution: Windows now uses a more efficient algorithm to reallocate memory, and the ddsi derializer now converges to its eventual size in fewer iterations.
|OSPL-14066 / 00020914||Improve handling of unhandled application exceptions in Windows.
When an exception occurs in application code and that exception is not handled it will terminate the application. When this occurs and there are still threads busy in the OpenSplice shared memory segment then it can not always be ensured that the shared memory is still in a consistent state and in that case the spliced daemon will stop OpenSplice. For Posix systems this is handled by the signal handler of OpenSplice to protect the shared memory.
Solution: For Windos an unhandled exception handler is added which tries to stop or let threads leave the shared memory segment when an unhandled exception occurs in application code.
|OSPL-14067||When the durability config specifies master priority selection on several namespace then it may occur that master selection does not converge.
When master selection is based on master priorities then the node with the highest priority will win. However when there is more than one node with the highest priority the quality of the namespace is used to select the master. When a master is selected the node will set it's own quality to that of the master. However this quality was by mistake also set on other namespaces that use master priority selection. This could cause that master selection on some namespace became unstable.
Solution: When a master has been selected for a namespace the quality of the master is copied to the corresponding namespace and not to all.
|OSPL-14079||When a user clock is used, the DLite service uses the user clock instead of the wall clock in their log files
When a user clock is configured, the services use the user clock instead of the wall clock in their log files. Sometimes customers want to use the wall clock instead of the user clock in their logs, because the wall clock better appeals to human intuition of time. To have service log files use the wall clock instead of the user clock in their log files, the attribute //OpenSplice/Domain/UserClockService[@reporting] can be set to false.
Solution: The clock that is used to report time stamps in service log files is chosen based on the //OpenSplice/Domain/UserClockService[@reporting] settings.
|OSPL-12820||ddsi may have problems handling big data samples
DDSI breaks up samples into fragments to be able to send them over the network effectively. In case (part of) the sample is 'lost' when sending over the network, DDSI is able to resend the fragments that have been lost or alternatively the complete sample (so all fragments). For large samples, the chances of at least one fragment being lost is relatively high and given the size of the complete sample, resending it completely is very costly, let alone the chances of 'loosing' part of the resent sample again. The ddsi protocol for handling fragments does not allow the receiver to Ack individual fragments (it can only NACK missing fragments), so it is hard to notify the sender when it is sending too fast.
Solution: By creatively playing with the various freedoms that the ddsi protocol allows, we can prevent ddsi from retransmitting the whole message if it is not Acked in time. This should improve throughput and overall stability of the connection.
|OSPL-13985||Add option to durability to prevent injecting persistent data more than once when using master priorities.
When using the durability service uses master selection based on priorities (masterPriority attribute set) then it may occur that the persistent data is injected more than once by different durability services. This happens when one durability service has taken mastership on a namespace because it has the highest properties for that namespace but later another durability service is started which has better properties for that namespace and will take over the mastership of that namespace and will inject it's persistent data again. When using master selection based on priorities the properties that determine which durability service will become master are the priority and next the quality of the namespace an finally the system-id of the node. When there are several potential aligners in the system and there is no hierarchy among them then the can be configured with equal master priorities. However it then still not guaranteed that only one durability service will inject it's persistent data because now the selection is made on the quality of the namespace. I case that is not desirable an option is needed to select the master not on the quality of the namespace but on the durability service that was started first.
Solution: For this purpose the attribute 'masterSelectBy' is added to the 'Policy' configuration element associated with a particular namespace. The possible values of this attribute are 'Quality' or 'FirstStarted'. When set to 'Quality', which is the default, the master selection criterium will be based on the quality of the persistent data. When set to 'FirstStarted' the master selection criterium will be based on the startup time of the durability service.
|OSPL-13989||Add the option to have a durability service not to advertise it is an aligner when using masterPriority=0.
When configuring a namespace with masterPriority=0 and aligner=true then the durability service will act as a master for this namespace until there is no globally selected master available but it will never become master for other nodes. However it will still advertise itself as an aligner for that namespace. As an aligner for other nodes it may contain durable data that the durability service that has become the master still needs to retrieve to be able to distribute this data over the system. However is is not always desired that the master retrieves this data from the nodes that have configured masterPriority=0. Especially in systems with a large number of nodes this may reduce the initial alignment load on the system.
Solution: To provide this option the possible values or the aligner attribute of the Policy element associated with a particular namespace has been extended with the value 'Local'. Thus the valid values of the aligner attribute are now True, False or Local. Note that 'Local' may only be used in combination with masterPriority=0.
|OSPL-14022||Set the default rate for a DLite service to publish its metrics to every 10 seconds
Any DLite service can periodically publish metrics that can be received by monitoring services to assess the health of the DLite service. The default frequency was to publish a metric every 1s. In practice this is not needed and could potentially lead to unnecessary load. A 10s period seems to be a more sensible default. The default value can be overriden by //OpenSplice/DLite/Settings/metrics_interval.
Solution: A DLite service now publishes metrics every 10s unless the value is overriden by //OpenSplice/DLite/Settings/metrics_interval.
|OSPL-13345 / 00020318||Simulink - override partition from Publisher and Subscriber blocks.
In previous Vortex OpenSplice releases, the partition name was only settable through the QoS Profile that is selected in the Simulink blocks.
Solution: Partition overrides are now possible on Publisher and Subscriber blocks via block parameters.
|OSPL-13407 / 00020374||When a user clock is used, the services use the user clock instead of the wall clock in their log files
When a user clock is configured, the services use the user clock instead of the wall clock in their log files. Sometimes customers want to use the wall clock instead of the user clock in their logs, because the wall clock better appeals to human intuition of time. To have service log files use the wall clock instead of the user clock in their log files, the attribute //OpenSplice/Domain/UserClock[@reporting] can be set to false.
Solution: The clock that is used to report time stamps in service log files is chosen based on the //OpenSplice/Domain/UserClock[@reporting] settings.
|OSPL-14104||Memory leakage of resend data on writer deletion
Data published by a writer that could not be delivered because of temporary unavailability of peers is maintained by the writer in its history cache until it can be delivered. Although unlikely that resend data exists at writer deletion it will leak because the writer destructor fails to delete the data.
Solution: Implemented resend data deletion in the writer destructor.
|OSPL-14105||Support DDSI discovery caused by multicast discovery on one side only
The DDSI service sends participant discovery messages to the SPDP multicast address (if enabled), any addresses configured as peers, and any unicast addresses it added at run-time based on discovery. That third group did not include addresses of peers that advertise themselves via multicast while the local node has SPDP multicast disabled (a very non-standard but sometimes quite useful configuration). The result is that discovery could occur in only one direction and that some temporary, asymmetrical disconnections could not be recovered from.
Solution: It now adds the unicast addresses of discovered peers when SPDP multicast is disabled. This ensures bidirectional discovery and recovery from all temporary asymmetrical disconnections.
|OSPL-13502 / 00020501||Python: Access the version number of OpenSplice in the Python dds module.
In the previous releases of OpenSplice, it was not possible to access the OpenSplice version number in the Python dds module.
Solution: It is now possible to get the OpenSplice version number in the Python dds module using the "__version__" attribute. Please note that if the Python code is recompiled with a different OpenSplice version, the Python dds package will still show the OpenSplice version number it originally came with.
|OSPL-14064 / 00020913||Service crash during startup when configuration becomes unavailable
When service configuration is removed from the configuration file after starting spliced but before spliced has started the service, the service may crash due to missing configuration elements.
Solution: Some extra checks and error report were added to allow a graceful termination of the service instead of a crash
Vortex OpenSplice 6.11.0
|OSPL-13920 / 00020795||Node.js DCPS : Errors when importing IDL for topics using typedef references
In Node.js DCPS, an importIDL api is provided to import topic types defined in IDL. The importIDL api generates xml using idlpp, then processes the xml. If the IDL included typedef references to other typedef references, the end user would see errors. The processing of the idlpp generated xml did not handle typedef references to other typedefs.
Solution: The OSPL Node.js DCPS code has been fixed to handle the cases where topics defined in IDL include typedef references to other typedefs.
|OSPL-13943 / 20692||Durability alignment is not consistent among several nodes when using a REPLACE policy.
When the durability service performs a REPLACE alignment policy the corresponding instances based on the timestamp of the alignment are first wiped from the transient store before the aligned samples are injected. When in the meantime a dispose of a DCPSPublication corresponding to some of the aligned data is handled by the spliced daemon then it may occur that these instances are placed on a purge list before the aligned samples are injected. In this case the injection of the samples will incorrectly not remove these instances from the purge list.
Solution: An instance is always removed from the empty purge list when a sample is injected and the instance becomes not empty.
|OSPL-13909||Durability should wait for the presents of remote durability protocol readers when using ddsi.
When the ddsi service is being used and in case the durability service detects a fellow durability service the durability service should wait with sending messages to that fellow until it has detected the remote durability readers. Due to some configuration parameter changes this function has mistakenly been disabled.
Solution: The check for the presents of remote durability readers is enabled when the ddsi service is used as networking service.
|OSPL-13724 / 00020481||The Vortex.idlImportSlWithIncludePath function call of Simulink Vortex DDS blockset was causing error on Windows platform.
On the Windows platform, the Vortex.idlImportSlWithIncludePath function call of Simulink Vortex DDS blockset was causing error for passing the includedirpaths argument. It is because the function was passing the arguments in the wrong order to the IDLPP tool.
Solution: The bug is now fixed. The Vortex.idlImportSlWithIncludePath function has been updated to pass the arguments in the correct order to the IDLPP tool.
|OSPL-12485||Possible incomplete transaction when aligned by durability
It was possible that a transaction was incomplete when aligned by durability as all transactional samples were treated as EOT. All transactional samples were compared as if they were EOTs which could lead to transactional samples being discarded as duplicates and not aligned
Solution: Made sure only EOT messages are compared
|OSPL-12877||Alignment may stall when a new local group is created while a merge request to acquire the data via a merge for the same group is being scheduled
When a durability service learns about a partition/topic combination, it may have to acquire data for this group by sending a request for samples to its master. When at the same time a merge conflict with the fellow is being handled, this may also lead to the sending of a request for samples to the same fellow. Both paths are for performance reasons decoupled, and so there is a window of opportunity that may lead to two identical requests for the same data to the same fellow. Since the requests are identical only one of them is answered. The remaining one never gets answered, with may potentially stalls conflict resolution.
Solution: The requests are distinguished so that they are never identical
|OSPL-13307 / 00019125||When running mmstat -M some of the numbers created are incorrect
The variables which are created by mmstat that represent a difference are output as unsigned long int. This means that negative numbers are incorrectly output.
Solution: Changed the data type of variables that represent a difference from unsigned long int to signed long int to avoid incorrect output in mmstat -M.
|OSPL-13532||Stalling alignment when an asymmetrical disconnect occurs during the durability handshake
When durability services discover each the must complete a handshake before data alignment can start. The handshake consists of a several stages: # Whenever a durability service discover another durability service it pushes a so-called Capability message to the discovered durability service. A precondition for this to happen is that the Capability reader of the remote durability service must have been discovered, otherwise the Capability message may get lost. These Capability messages are essential to detect, and recover from, asymmetric disconnects. # Once a Capability message has been received from a remote durability service it is possible to request its namespaces by sending a so-called nameSpacesRequest message (ofcourse, after having discovered the reader for this message on the remote durability service). This should trigger the remote durability service to send its namespaces, after which the handshake is completed There are two problems with the handshake. First of all, when a durability service sends its request for namespaces to the remote durability service, there is no guarantee that the remote durability service has discovered its namespaces reader at the time the namespaces are being published, so they can get lost. Secondly, and more likely, when an asymmetric disconnect occurs while establishing the handshake, it is not possible anymore to detect that an asymmetric disconnect has occurred , and therefore it is not possible anymore to recover from this situation. This will effectively lead to a situation where the handshake is not completed, and therefore alignment is stalled.
Solution: There are two soutions ingeredients # When a Capability is published to a remote durability service ALL its relevant readers must have discovered iso only the Capability reader. # To resolve the stalling handshake due to asymmetric disconnects occurring during the handshake, Capability message and nameSpacesRequest message are being republished when the handshake takes too long to complete. This can be controlled using two environment variables ## The environment variable OSPL_DURABILITY_CAPABILITY_RECEIVED_PERIOD specifies the time (in seconds) after which to republish a capability. The default is 3.0 seconds. ## The environment variable OSPL_DURABILITY_NAMESPACES_RECEIVED_PERIOD specifies the time after which to resend a nameSpacesRequest. The default is 30.0 seconds
|OSPL-13692 / 00020677||Networking throttling may slow down communication too long.
When a receiver experiences high packet loss the backlog will increase which is communicated to the sender. In that case the sender will apply throttling to reduce the load on the receiver, However throttling is also applied to the resending of the lost packets. This may cause that the backlog at the receiver decreases at a low rate causing the throttling to applied longer than necessary.
Solution: A parameter (ResendThrottleLimit) is added which sets the lower throttling limit for resending lost packets. Further when the sender detects that there are gaps in the received acknowledgements resends are performed earlier.
|OSPL-13698||When the node causing a master conflict is disconnected before resolving the master conflict may remain unresolved.
When using legacy master selection and a master conflict is detected because another node has selected a different master then the current master is directly set to pending. When before resolving the master conflict the node that caused the master conflict has been disconnected the master conflict could be dropped although the master conflict still exists and because the master is set to pending no new master conflict is raised.
Solution: A master conflict is always resolved independent of the node that has caused the master conflict has been disconnected and has been removed from the durability administration.
|OSPL-13705 / 00020691||Provide option to the RT networking service to optimize durability traffic by using point-to-point communication.
For the RT networking service, the durability service is just another application. Protocol messages sent by the durability service will therefore be sent to all the nodes in the system. The protocol messages sent by the durability service are either addressed to all, a subset or to only one fellow durability service. To limit the networking load caused by the durability service it would be beneficial when the networking service has some knowledge of the durability protocol and sent durability messages that are addressed to one fellow to be sent point-to-point. This requires that the capability to send messages point-to-point is added to the RT networking service.
Solution: Support for point-to-point communication for durability message addressed to one fellow added.
|OSPL-13748 / 00020708||The RT networking service can run out of buffers when receive socket is overloaded.
To limit the chance of packet loss to occur in the receive socket the networking receive thread tries to read as much packets from the receive socket before processing these packets further. However when the receive socket remains full the number of received packets that are waiting to be processes is increasing which may cause that the networking service will run out of buffers.
Solution: When reading packets from the receive socket the size it is checked if the number of packets waiting to be processed does not exceed a threshold. When the threshold is reached the networking receive thread will first process some waiting packets before attending the receive socket.
|OSPL-13753 / 00020714||When installing a Visual Studio 2005 version silently a popup window appears and stops the installation
The installations now ensure Visual Studio redistributables will not force a reboot of Windows before the main installer has completed by using an optional parameter when running the redistributable . This has created a problem for Visual Studio 2005 versions as this parameter is illegal and an error message is produced.
Solution: An additional page has been included in the installer to allow users to not install the Visual Studio redistributable. This option can also be used when installing silently and allows a customer to skip the redistributable which creates the error condition.
|OSPL-13756||Provide the option to have the RT networking service perform the distribution of the builtin topics.
When using the RT networking service the durability service will be responsible for alignment of the builtin topic which is not the case when the DDSI service is used. In a large system the number of builtin topics can become very large. When the networking service is made responsible for aligning the builtin topics only the own builtin topics of a node have been aligned when two nodes detected each other. Especially when a disconnect/reconnect occurs it will reduce the number of builtin topics that have to be aligned.
Solution: The ability to align the builtin topics by RT networking has be added and a configuration parameter ManageBuiltinTopics has been added by which this ability can be enabled. Note to maintain similarity with the DDSI service this applies to: DCPSParticipant, DCPSPublication, DCPSSubscription, DCSPTopic and the CM related builtin topics.
|OSPL-13771 / 00020719||Python API: 'Out Of Resources' exceptions when using conditions and shared memory.
A memory leak was introduced in 6.10.4p1. In the python class Condition, dealloc was removed, resulting in improper cleanup. This change was introduced as a fix for OSPL-13503 Cleanup error: dds_condition_delete: Bad parameter Not a proper condition.
Solution: The change to remove the Condition dealloc was for a minor logging issue. This OSPL-13503 change was rolled back in order to fix the more serious Out of Resources exceptions. With this rollback, extra error messages may be logged. The memory leak for Condition is fixed.
|OSPL-13773||The durability service may send an alignment request to a not confirmed master.
When during the master selection a master is proposed but that master is not yet confirmed and in parallel the need to align the data of a topic/partition is triggered then it may occur that an alignment request is sent to this not yet confirmed master which may not become the actual selected master.
Solution: Delay during initial alignment requesting alignment of data until master selection has finished and a the master is confirmed.
|OSPL-13781||Allow setting the master selection protocol on each durability namespace independently.
Either legacy master selection of master selection based on master priorities can be configured for the durability namespaces. When master selection based on master priority it should be configured on all the namespaces. However it should be allowed to configure the master selection protocol on each namespace independently.
Solution: The global setting of the master selection protocol is removed and the master selection protocol configured for each namespace is applied when selecting a master for that namespace.
|OSPL-13784 / 00020725||The RT networking service the synchronization on the first expected fragment from a newly detected fellow could fail.
When the networking service receives a first packet from an other node it has to determine the first sequence number of the packet that is both used by the sending node and the receiving node as the starting sequence number of the reliable communication. This first sequence number is determined either from the offset present in the first received packet or on the expected packet number indicated by the sync message when SyncMessageExchange has been enabled. The sender will then resent the packets from that agreed sequence number until the sequence number of the packet already received. In this case packets with lower sequence numbers than the sequence number of the first receive packet should be accepted. However this may fail when the first sync message is lost which may cause that packets are rejected by the receive but are already acknowledged. In that case the received will not receive the expected packet .
Solution: When waiting for the sync message which sets the expected sequence number packet received with a sequence number lower than the sequence number of the first received packet are accepted and placed on the out-of-order list.
|OSPL-13791 / 00020706||Potential memory leak in Java5 DataReader
The Java5 DataReader offers two different modes of reading data: it either returns an Iterator holding all requested samples, or you preallocate an ArrayList and pass that as input parameter to your read/take call. The latter is more efficient if you want to benefit from recycling samples allocated in the previous invocation of your read/take calls. For this purpose, the DataReader keeps track of a MAP containing for each previous ArrayLists all the relevant recyclable intermediate objects. However, if you keep feeding new ArrayList objects to each subsequent read/take call, the MAP will grow indefinitely and leak away all your previous data. Although invking the preallocated read/take calls this way is against its intended usage and design, some examples are doing exactly that.
Solution: Examples have been modified not to feed new ArrayList objects to every subsequent read/tale call. Also the MAP that keeps track of all previous ArrayLists and their corresponding intermediate objects has been replaced with a one-place buffer. That means you can still benefit from recycling intermediate data if you use the same ArrayList over and over, but it will garbage collect anything related to any prior ArrayList you passed to the read/take calls.
|OSPL-13795 / 00020698||Issues during termination of spliced when configuration specifies thread attributes for heartbeat-manager
When the configuration file specifies attributes such as stack size or scheduling class for the heartbeat-manager in spliced (//OpenSplice/Domain/Daemon/Heartbeat), termination fails and triggers an error report "Failed to join thread (null):0x0 (os_resultSuccess)"
Solution: The code was changed to cover a specific path where after stopping the thread an invalid return code was propagated leading to failed termination.
|OSPL-13797||Unnecessary alignment may occur when a node with a namespace with aligner=false (temporarily) chooses a different master for this namespace. This unnecessarily increases network load.
When a node with aligner=false for a namespace enters a network, this node starts looking for an aligner. If there are potentially multiple aligners but not all of them have been discovered yet, then it could happen that this node chooses a different master for the namespace than somebody else. When the nodes that chose a different master for the namespace detect each other, then a master conflict is generated. Resolving this master conflict leads to alignment. Although functionally there is nothing wrong, the unfortunate situation in this scenario is that the alignment for nodes with alignment=false is not necessary, because by definition of aligner=false this node will not provide any alignment data to the master (whichever one is chosen). Still the master bumps its state, and causes all slaves to align from the master again. These alignments are superfluous.
Solution: The situation where are node with aligner=false has (temporarily) chosen a different master is not considered a valid reason to start alignment.
|OSPL-13812||Trying to unregister a non-existent instance leads to an entry in ospl-error. This is incorrrect.
Trying to unregister a non-existent instance is a legitimate application action that should return PRECONDITION_NOT_MET. However, as a side effect also an error message would appear in ospl-error.log.
Solution: The error message is not generated anymore when a non-existent instance gets unregistered.
|OSPL-13844||Spliced will crash during shutdown if builtin topics have been disabled.
There is a bug in the spliced that causes it to crash during shutdown when you configured OpenSplice not to communicate the builtin topics. This was caused by spliced forgetting to set the Writers for the builtin topics to NULL in that case, which during shutdown would result in the spliced attempting to release dangling random pointers.
Solution: The writers for the builtin topics are now properly set to NULL when you disabled the builtin topics, and therefore spliced will not attempt to clean them up during shutdown.
|OSPL-13868||Configuration files for NetworkPartitions example were incorrect
The example configuration files include a ddsi2 service and not ddsie2 so extra additional values are not visible in configuration tool and would not be used in OpenSplice. Additionally a number of the elements are incorrectly cased.
Solution: Updated Example files have been included.
|OSPL-13888||The durability service leaks memory when handling a received namespace message.
When a namespace message from a fellow is received and that namespace message is a duplicate of an earlier received namespace message allocated namespace leaks.
Solution: The duplicate namespace is freed.
|OSPL-13892||Potential backlog in the processing of builtin topics by spliced
The spliced is responsible for processing incoming builtin topic samples. This processing is needed to for example modify the local notion of the liveliness of remote entities and the instances they have written. Having the wrong notion of the liveliness of a remote entity could result in instances being marked ALIVE, while they should have been marked NOT_ALIVE or vice-versa. Also, the failure to notice the deletion of a remote entity could result in extended waiting times in case of for example the synchronous write, where a writer is still waiting for acknowledgments of a Reader that already left the system. Due to a bug in the spliced code, the spliced could under certain conditions postpone processing of builtin topics for potentially long time intervals, resulting in incorrect liveliness representations during this interval, which in turn might cause extended waiting times in case of a synchronous write call.
Solution: The spliced now no longer postpones the processing of builtin topics, causing the representation of the liveliness of entities and instances to be up to date, and avoiding unnecessary waiting times in the synchronous write call for readers that have already been deleted.
|OSPL-13923||MATLAB Query and Reader failure with waitsetTimeout()
The MATLB Vortex.Query class would throw on calls to take() or read() if a non-zero value had previously been provided to waitsetTimeout(). BAD_PARAMTER messages would be written to ospl-error.log. In a similar situation, a Vortex.Reader class instance would appear to succeed, but ospl-error.log would still contain BAD_PARAMETER messages, and a DDS entity would be leaked with each call to read() or take()
Solution: The problems have been fixed. Uninstall the currently installed Vortex_DDS_MATLAB_API toolbox and install the new toolbox distributed with this release. (The toolbox is located under tools/matlab in the OpenSplice installation directory.)
|OSPL-13929 / 00020814||Alignment of DCPSPublication may cause that instances that were explicitly unregistered and disposed are not purged and leak from shared memory.
When detecting a disconnect of a node the instances written by writers on that disconnected node are unregistered. When the same node reconnects then alignment of DCPSPublication will indicate which writer are still alive. These DCPSPublication will then be used to update of the liveliness of the corresponding instances. However explicitly unregistered instances are also updated which causes that they are removed from the purge list which results in a memory leak.
Solution: When handling the re-alignment of a DCPSPublication the corresponding instances that were explicitly unregistered are ignored.
|OSPL-13931||Potential alignment issue for unions in generic copy routines for C
The generic copy routines for the C API may potentially misalign union attributes, causing the fields following the union to contain bogus values.
Solution: The algorithm to determine proper alignment for unions has been corrected.
|OSPL-13937||Enable or disable tracing for Dlite
In some situtions users want to disable tracing in production environments, and enable tracing in testing environment. So far, there has not been an easy way other than commenting out the tracing section in the configuration. This is cumbersome.
Solution: An attribute //OpenSplice/Dlite/Tracing[@enabled] is added that can be used to enable/disable tracing for Dlite.
|OSPL-13803||Possible crash at termination of NodeJS with DDS Security
The DDS Security implementation relies on a certain termination path to cleanup all it's resources, part of it dependent on an exit handler. This exit handler does not run reliably at the same moment, eg. before or after certain threads are joined, depending on context, such as a single-process deployment running in NodeJS.
Solution: The cleanup was changed to work regardless of the exact moment when the exit handler is executed.
|OSPL-13799 / 00020745||Generate a logging message for dispose_all_event
The invocation of the dispose_all() function on a topic is an important event, that should appear in the ospl-info.log file.
Solution: A message is written into the ospl-info.log by the node that invokes the dispose_all() function. Note: although all other nodes respond by also disposing their corresponding topic data, they don't mention this event in their ospl-info.log.
|OSPL-13870||Add a parameter to RT networking to allow the independent setting of the acknowledgement interval.
A reliable channel uses acknowledgement messages to notify the sender that a packet has been successfully received. To limit the rate at which acknowledgements are sent the acknowledgements are accumulates during the acknowledgement interval. Currently the acknowledgement interval is set to the configured resolution interval of the channel. However it could be useful to have the ability have an independent parameter which specifies the acknowledgement interval.
Solution: The AckResolution parameter has been added to the RT networking configuration. When set it will determine the interval at which acknowledgements are sent. When not set the acknowledgement interval is set to the resolution of the reliable channel.
|OSPL-13871||Add an configuration parameter to RT networking to disable the sync exchange at initial detection of a node.
When the SyncMessageExchange is disabled reliable communication with another node is started when receiving the first acknowledge from that node. When the SyncMessageExchange is enabled reliable communication will also start when receiving a first message from a node. The sync message will communicate the sequence number from which reliable communication is provided. However this may cause a very high backlog of packets that have to be resend to the newly detected node especially when initial latencies are large. Therefore an option should be provided to enable the SyncMessageExchange only on receiving the first acknowledgement which will reduce this initial backlog or resends to occur.
Solution: A mode attribute is added to the SyncMessageExchange parameter which indicates if the reliable synchronization should occur on the initial received packer or the first received acknowledgement.
|OSPL-13875||Add option to RT networking to disable the initial sequence number offset.
To establish reliable communication between a sender and receiver both have to agree on the initial packet sequence number from which reliable communication is established. This sequence number is based on the first sequence number that is acknowledged minus a small offset which in included in each packet sent. The initial sequence number is than the first acknowledged sequence number minus the offset. This offset then determines the number of packets that have to be resend immediately. To reduce this initial backlog an configuration parameter has to be added to disable this offset.
Solution: The configuration parameter OffsetEnabled is added to allow disabling the offset calculation.
|OSPL-13939 / 00020822||Looser type-checks for the Python API
It is no longer required to use python built-ins as long as the cast is defined. For string types only support for encoding and length determination is needed. For sequences and arrays iteration and length determination are the only requirements.
Solution: Loosened type requirements on integers, bools, strings and floats.
Fixed Bugs and Changes affecting API/behavior in Vortex OpenSplice 6.11.x
Vortex OpenSplice 6.11.1
|OSPL-14000||Internal operation to get the actual domain (shm) memory usage on the C API.
The actual amount of shared memory used by a Domain federation is useful to monitor for test purposes as well for managing a system. The internal operation to get this information is made public for applications by adding it to the C API. This operation will accept a Domain object that can by retrieved by the DDS_DomainParticipant_lookup_domain operation and will return the current amount of used memory in bytes. Note that this is a non-standard product specific operation.
Solution: Operation Signature: DDS_long_long DDS_Domain_get_memory_usage(DDS_Domain domain);
|OSPL-14002||Internal operation to get the actual domain (shm) memory usage on the ISOCPP2 API.
The actual amount of shared memory used by a Domain federation is useful to monitor for test purposes as well for managing a system. The internal operation to get this information is made public for applications by adding it to the ISOCPP2 API's DomainParticipant interface. This will return the current amount of used memory in bytes. Note that this is a non-standard product specific operation.
Solution: uint64_t DDS::DomainParticipant::get_memory_usage();
Vortex OpenSplice 6.11.0
|OSPL-12968 / 00019801||DCPS Python API Out of scope entities aren't closed, causing memory leak
When a python object holding a DDSEntity loses all references, the underlying DDS entity is not deleted, leaking the resource. In a domain where many entities are created dynamically, but not closed explicitly with the close(), this will eventually result in an Out of Resources error.
Solution: Python objects automatically garbage collected when all references to the object are gone. There was code in the DCPS python API that deletes the underlying DDS entity when this occurs, but this never triggered because the were strong references to all DDSEntity python objects held in a persistent dictionary in the dds module. To remedy this, this dictionary was changed to store only weak references, so now the entity is deleted when the python object is garbage collected. There is an important thing for developers using this API to note, now. Before this change, it was possible to create a DDSEntity (typically with a listener) and just let it go out of scope, relying on the listener to just do it's thing, and keeping only a parent entity (like the participant) as the main means to control the lifecycle. This is no longer possible. Python code must keep a reference to a DDSEntity object to keep it active. However, note that DDSEntity still maintains a strong reference to its parent entity, meaning that once a reader or writer reference made, one can let go of a participant, publisher, and/or subscriber reference without having it be garbage collected. Once the last reference to the reader/writer is gone, only then is the entire chain of entities deleted.