Why ROS 2 can be Better than ROS 1 in terms of Communication ?

February 21, 2023
Let us understand the advantages of ROS 2 by understanding backstory of ROS2 development

Development backstory is the key to understanding how ROS 2 overcomes limitations of ROS 1 For newcomers to ROS2, especially those transitioning from ROS1, the technical terminologies such as DDS, rmw, FastDDS, RTPS can be quite overwhelming when reading the ROS2 documentation. Personally, I found that understanding the background story behind ROS2’s development was tremendously helpful in organizing the new concepts, middleware structure, and key advantages over its predecessors.

My goal for this article is to save you time and help you prepare to better understand the new concepts and potentially challenging topics by walking through stories behind ROS 2.

1. ROS 1 was already good enough for academic users

To the Willow Garage, the leader group of ROS, open-source communities and academic researchers were the primary stakeholders driving ROS1 development . One of the main objectives of ROS 1 was to provide running software (opens in a new tab) of PR2 robots. If you are working as an academic research, then you will agree that you are running robots in a lab environment such as below:

The operating assumptions of ROS1 A controlled environment where the network is relatively stable Sufficiently powerful computers without worrying too much about hardware limitations or fail-detection & handling A small number of robots (less than 20) that can be managed at a central point (ROS master) Little concern for security In these environmental assumptions, ROS 1 was a almost perfect choice, and it also provided a wide ecosystem (opens in a new tab) without coding every hardware drivers and integration.

2. However, not enough for commercialization

For those who were interested in using ROS for commercial products, it was not the most suitable solution because of the lack of standards in the following situations:

Problem A. Swarm managements

ROS1 had a central registry called ROS Master, which tracked all communication and served as a lookup table (opens in a new tab) for publications and subscriptions. The ROS Master let publishers know about new subscribers, allowing them to establish a connection.

However, this approach became problematic in certain situations. For example, if the ROS Master disconnected from the network or a node could not reach it through the network, publishers and subscribers could not get “newly” updated about the registry.

This posed a challenge for fleet management systems where new agents kept joining and existing agents temporarily left for charging. As the official ROS article noted, the OTTO (opens in a new tab) fleet could not scale beyond 25 robots without developing a multi-master system.

Problem B. Low-cost-computers with poor network condition

ROS1’s communication policy had limited control over the network, with only a few adjustable settings such as queue size. While ROS did offer the choice of communication protocol between TCP and UDP (opens in a new tab), this was not widely known among users (have you heard about TransportHint (opens in a new tab) class..?)

Even the official document says TCP was primary (opens in a new tab) and only well maintained option. Although the reliability policy in ROS1 was possible, ROS 1 lacked active maintenance, and its UDP-based protocol was often inflexible, leading to errors on some arm-based computers (opens in a new tab).

ROS1 lacked other convenient APIs for adjusting reliability settings and fail detection and handling, such as lifespan adjustment to judge message staleness or event handlers for subscribers to actively take actions if they did not receive topics for a given duration. This could be problematic for commercialization or in more critical environments where fine-grained criteria for system failure detection are required. For example, ROS1 tries to send all old messages in the queue even after a node is reconnected after a long time.

Problem C. Security

ROS 1 lacks security features in its design (opens in a new tab). There were no standards in the middleware, and only custom packages like SROS and web sockets exist to address this issue. To ensure security, users are required to isolate the ROS PC from the robot controller that receives ROS topics. This limitation is a significant concern for those who wish to use ROS in commercial or industrial applications where security is paramount.

3. Decision made to not build new ROS from scratch

In the development of ROS 2 to solve the problems for commercialization, the team was faced with numerous decisions to be made. Then, they realized that there were now many third-party middleware solutions available, such as Protocol Buffers and ZeroMQ, that had not been around during the development of ROS 1 (opens in a new tab).

These new solutions showed great promise for addressing the communication issues that ROS 2 needed to solve, while still supporting existing publish-subscribe patterns. Ultimately, the team found that DDS (data distribution services) was the most promising middleware solution available.

DDS chosen as middleware

DDS was ultimately chosen as the middleware solution for ROS 2 due to its technical credibility and ability to support existing ROS 1 communication patterns. DDS is an open “standard” for real-time, scalable, and secure data distribution, with a proven track record in various industries, including aerospace, defense, and healthcare. (see full list of vendors from here)

As I said here “standard” (a kind of protocol), there are many vendors (opens in a new tab) which observe this, such as eProsima Fast DDS, and RTI Connext. Although it is not a best analogy, it is similar that we have browsers (vendors) such as Chrome, Safari, and Edge which observe a global standard: HTML 5.

DDS’s ability to provide a reliable and robust communication layer, with support for quality of service, data durability, and security (opens in a new tab), made it an ideal candidate for ROS 2. Importantly, DDS has been designed to be compatible with existing ROS 1 communication patterns, ensuring that ROS 1 users can easily transition to ROS 2 without needing to learn a new communication paradigm.

Stacking on top of DDS third-parties

DDS vendors offer varying forms of APIs. The initial task involves interfacing these APIs to provide users with a choice of providers. At the deepest layer lies the ROS middleware (rmw), which focuses on encapsulating basic DDS functions, such as Quality of Service (QoS), transport patterns like publish-subscribe, and actions. Above rmw is the ROS Client Library Interface (rcl), which imports DDS features from rmw and wraps them for use by user-importing libraries like rclcpp and rclpy. To summarize, DDS is stacked by rmw → rcl → rclcpp, rclpy, in that order.

4. How DDS stack resolves problems

For problem A: swarm system

ROS 2 DDS does not rely on a central registry, unlike ROS 1. it implements peer-to-peer discovery using a set of built-in protocols such as Simple Discovery Protocol (SDP). In this system, each node or participant in the system periodically sends out a discovery message. Thus, without the need of a central point, any new updates on registry will be propagated over the network. This allows for a more scalable solution to manage larger and more complex robotic systems, such as fleet management systems.

For problem B: adjustability for network policy

As shown in the DDS documentation, DDS itself provides developers with a comprehensive set of quality of service (opens in a new tab) (QoS) policies that allow them to finely tune criteria such as topic staleness (lifespan) and node liveliness. This means that developers can use ROS 2 (as it was stacked on DDS) to set detailed policies, including deadline, liveliness, and lifespan, that were not available in ROS 1.

Additionally, ROS 2’s reliability policy becomes more flexible than ROS 1, making it easier to adjust reliability settings in the code without almost forgotten classes such as TransportHint. ROS 2 also introduces callback APIs for QoS events (opens in a new tab) that enables developers to better prepare for failure scenarios.

For problem C: security

As can be expected from the fact that DDS was widely adopted ranging from battleships to financial system, DDS had already a well proven foundation (opens in a new tab). Among the many features in DDS, three were integrated into SROS 2 (opens in a new tab) framework: Authentication, Access control, and Cryptographic. As well described in the official documents, now ROS 2 has a standard approach for authentication and encryption on communication stacked on DDS.

Conclusion

ROS 1 served well in the realm of academic research but faced hindrances when it came to commercialization, such as challenges in swarm management, substandard network capabilities of low-cost computers, and an absence of security features. Rather than creating a new ROS from the ground up, the team opted to employ already existing middleware solutions DDS to overcome these obstacles, leading to the emergence of ROS 2. By comprehending these difficulties, users can gain a better understanding of the novel concepts and middleware architecture of ROS 2 and the advantages it holds over its predecessor.