Subsystem Nodes#

The subsystem nodes are foundational to much of the rover software codebase. Think of them like foundational wrappers that simplify a lot of common functionality. In this article, we’ll explore each of them.

Motor Subsystem Node#

Motor subsystem nodes represent collections of joints and motors of a particular physical subsystem on the rover. They provide a friendly interface on top of the Hardware Abstraction Layer by handling the logic to receive current motor velocities and positions and set target velocities and positions.

Most motor subsystem nodes are defined in the Rover directory.

Configuring a Motor Subsystem#

Motor subsystem nodes are meant to be relatively simple and very easy to change, in order to represent the ever-changing hardware configurations of the rover. We do not create and modify motor subsystem nodes programmatically; rather, we encode them entirely in the form of their configurations.

Name	Type	Description
`joints`	`map<string, uint32_t>`	Maps joint names to their motor ID’s.
`period`	`float`	How often should this node communicate with HAL? (sec)
`frequency`	`float`	How often should this node communicate with HAL? (Hz)
`watchdog`	`map<string, ...>`	Map with the following properties: `enabled` (bool) [is the watchdog enabled? Defaults to true.] and `timeout` (float) [length of time we can operate without receiving a message] (see Watchdog)
`publish_positions`	`boolean`	Publishes to “current_position” topic if true. Defaults to false. (See Motor Topics)
`publish_velocities`	`boolean`	Publishes to “current_velocity” topic if true. Defaults to false. (See Motor Topics)

Attention

period and frequency are mutually exclusive. If you provide one, you can’t provide the other.

Watchdog#

Motor subsystems can be configured with a watchdog, which will halt all the motors in the subsystem if no commands are received within the defined timeout period. This is important because if the software stops issuing new joint commands, the motors would just keep targeting whatever position or velocity was given to them last; they would keep moving. So, having a watchdog timer is a good safety feature to ensure that if the code that’s sending motor subsystem commands has stopped, we halt the motors before any harm is done. By default, each motor subsystem has its watchdog enabled, and the watchdog timeout is set to 1 second.

The watchdog uses the time since the last message was received from any of this motor subsystem’s topics. However, the watchdog status is only checked right before we write to the motors, which means it runs at the configured publishing frequency (or period). Thus, the value of the watchdog timeout is too small, it will not be strictly adhered to.

Motor Topics#

The primary role of a motor subsystem node is to expose topics for easy setting/retrieval of effort, target velocity, and target position values for each joint within the motor subsystem.

For each joint in the list of joints, a motor subsystem node reads from the following topics:

joint/effort (Float64): Set a motor’s effort to the value provided.
joint/target_velocity (Float64): Set a motor’s target velocity to the value provided.
joint/target_position (Float64): Set a motor’s target position to the value provided.

For each joint in the list of joints, a motor subsystem node publishes to the following topics:

joint/current_velocity (Float64): Get a motor’s current velocity.
joint/current_position (Float64): Get a motor’s current position.

Example#

We’ll go through an example of the arm subsystem node, which (as the name suggests) represents the set of motors/joints that are responsible for moving the arm. Here is the node configuration:

type: cmr_subsystem/subsystem_node
node:
    restart_delay: 1000
    max_restart_attempts: 1
dependencies:
    hal:
        type: $HAL
        path: /hal
internal:
    period: 0.1
    publish_positions: true
    joints:
        base_rotate: 4
        shoulder: 5
        elbow: 6
        second_rotate: 7
        third_tilt: 8
        third_rotate: 9
        end_effector: 10

We know that this is a subsystem node because it is declared as its type. We can also observe that this node is set to only attempt to restart once, because it’s generally unlikely that a motor subsystem failure can be recovered from by simply restarting the node; because motor subsystem nodes essentially wrap sending motor commands via HAL, the primary way it can fail is a HAL failure.

Moving to its internal (node-specific) configuration, we see that its period is set to 0.1 seconds, meaning that this node will send commands via HAL ten times per second. We also see that publish_positions is true, so we’ll get position data through the joint/current_position topic. Lastly, we see the list of joints. Each key (left-hand side value) is the way the motor will be represented internally; that is, the substituion for “joint” in all the topics listed in the previous section. For example, we can write a target position to the arm’s shoulder by publishing to the topic /arm/shoulder/target_position. And, because publish_positions is true, we can also subscribe to the topic /arm/shoulder/current_position to get current position data. Each value (right-hand side value) is each motor/joint’s corresponding ID, which you can usually acquire from the electrical team.

Subprocess Node#

In order to gain access to certain services, we often need to run programs external to the CMR controls codebase. For example, in order to use the GPS, we have to run a command line tool called GPSD (GPS service daemon) that reads the latest GPS data and sends it over a local server. Or, we may want to roslaunch a certain node, which allows us to run a node outside of the Node Framework (i.e. the standard ROS way). For example, we have another subprocess node that launches the “gpsd_client” node, which reads messages from that GPSD server and publishes them to the /gpsd_client/fix topic.

A subprocess node is responsible for launching and monitoring a single command. It will run the command as a subprocess, check if it enabled successfully (by using an “enabled marker”, which we’ll discuss in a bit), send its standard output and standard error through the CMR Logger, and kill it once the node disables and/or the rover shuts down.

Configuring a Subprocess Node#

Subprocess nodes are relatively simple to configure.

Name	Type	Description
`cmd`	`string`	Maps joint names to their motor ID’s.
`enabled_marker`	`string`	The node will look for this string in the output of the command to determine if it started successfully. For example, if the command prints out “Listening on port 8080”, a good enabled marker would be “Listening”.
`marker_timeout`	`int`	The amount of time to wait (sec) for the enabled marker to appear before assuming the command failed to run successfully. Defaults to 20 seconds.
`should_exit`	`bool`	True if we expect the command to exit when it’s done running. Defaults to false. If this is set to false and the command exits, this’ll be interpreted as the command failing to run successfully.
`stdout_verbosity`	`string`	The log level to print out standard output.
`stderr_verbosity`	`string`	The log level to print out standard error.

Subprocess Topics#

This node does not publish or subscribe to any topics. So, how do you get the output of the command in your code? You can’t directly get the output from the command itself, but you can have it write to some file or network socket and then poll for new data from there. A good example of a node that does this is the gpsd_client node.

Disabling and Termination#

It’s important that we safely terminate the commands that this subprocess node runs when we disable the subprocess node that manages it. All subprocess nodes will handle the termination of their command when they disable. They will attempt to stop the subprocess in one of three ways:

SIGINT: the nicest way to kill a program. Tells the program that we want to stop, but the program can decide how it responds. This is the signal that is sent when you run Ctrl-C in a terminal window. If the program is hung, it might not respond to this, so we may have to escalate.
SIGTERM: a more stern way of asking to kill a program. The program can still decide how it wants to respond to this signal, and so it’s possible that the command doesn’t terminate even after we send it this signal. If that’s the case, we have to escalate.
SIGKILL: the scariest way to kill a program. This signal cannot be ignored, and the process cannot decide how it wants to respond to this; the OS simply murders the process on the spot. This means that the process won’t end gracefully. It may not clean up what it was doing or finish writing the data it was trying to write, so it could be a destructive action. We use this as our last resort.

How do we know when to escalate to the next level? The subprocess node has a SIGNAL_TIMEOUT constant that is currently set to 5 seconds. If a program takes more than 5 seconds to respond to a signal, we send the next one up. If we exhaust all three options and it still hasn’t died (which is unlikely because SIGKILL is very formidible), then we report that the subprocess node’s disable process failed, and the subprocess continues doing as it pleases until the computer is power cycled or the process eventually dies.

Example#

type: cmr_subsystem/subprocess_node
node:
    verbosity: INFO
internal:
    cmd: gpsd -S 5000 -N /dev/gps
    should_exit: false
    stdout_verbosity: INFO
    stderr_verbosity: WARNING

We see that this is a subprocess node because that’s declared as its type. We provide the command and its arguments in the cmd field as a string. The command will run in a subprocess, just as if you were running it in a Terminal window yourself, except you would see the output of this command as the output of this particular subprocess node. We also observe that should_exit is set to false, and this is good, because gpsd is expected to continuously run (and therefore continue fetching GPS data) until we turn it off. Lastly, we set the verbosity of stdout to INFO and stderr to WARNING; this is up to you to determine what’s best for your subprocess node, though these are sensible choices as defaults.

Fusion Node#

A fusion node is any node that “fuses” data from multiple sensors. The fusion node itself does not perform the fusing; rather, you pass a trivially constructible “Fuser” into its template argument, and it will use the functions within your Fuser to perform the fusing. We’re going to see an example later in the article.

Configuring a Fusion Node#

Fusion nodes take the following configuration:

Name	Type	Description
`frequency`	`int`	Frequency that this node should publish its value (Hz).
`error_count`	`int`	The number of attempts this node will make to read any one sensor until the sensor’s stream will be disabled.
`inputs`	`list<pair<uint32_t, uint32_t>>`	A list of pairs of integers, where each pair’s first value is the sensor ID to stream from, and the second value is the expected size of the sensor reading.

You can also append additional arguments into the configuration of your Fuser. It’ll pass the configuration options into the parseConfig(conf, logger) function of the fuser.

Making a Fuser#

A Fuser is any trivially constructible (i.e. its constructor takes no arguments) class that contains the following four things:

A type, MessageType, that contains the type of the message this fusion node should output.
A function, bool processData(MessageType& result, const std::vector<std::vector<double>>& readings, cmr::Logger& logger), which processes the data from the readings into one message, result. This function will always receive data of the same length. Returns true if it processed succesfully, and false otherwise. If it returns false, this fusion node will be disabled.
A function, bool parseConfig(const YAML::Node& node, Logger& logger), that parses the configuration file for this node.
A function, std::string outputTopic(), that returns the topic we want to publish the result of the fusion to.

Example: Odometry Node#

It might be easier to understand what’s going on here by following a real example. So let’s explore the odometry node (cmr_subsystem/odom_node.cc). One thing to note is that the odometry node, while it is a node, does not implement a Node Provider. That’s because its functionality is already provided by the fusion node (which is a node provider). That’s what makes fusion nodes so useful.

We start off with using MessageType = nav_msgs::Odometry;. ROS documentation article for nav_msg::Odometry. This line tells us that we will be outputting a nav_msgs::Odometry message from the odometry node.

We also define a function std::string outputTopic() const { return "reading" };. This tells the fusion node that we want the outputted nav_msgs::Odometry messages to be published onto the “/odom/reading” topic – note that the name of the node and the prepending slash is added for us, so we must not add it ourselves.

The real meat of the odometry Fuser is in the processData function. Let’s look at the code.

bool processData(MessageType& result,
            const std::vector<std::vector<double>>& readings,
            cmr::Logger& logger) {
    const std::vector<double> data = readings[0];
    std::fill(result.pose.covariance.begin(), result.pose.covariance.end(), 0);
    result.pose.pose.position.x = data[0];
    result.pose.pose.position.y = data[1];
    result.pose.pose.orientation = tf::createQuaternionMsgFromYaw(data[2]);
    result.child_frame_id = child_frame_id_;
    result.header.stamp = ros::Time::now();
    result.header.frame_id = frame_id_;
    return true;
}

As you can see, this function is just setting the parameters of our output message. This will be different for every fuser. Here, to make things easier, we store the reading from the first sensor into a data variable. The size of the reading is defined in the configuration for this node to be 3 (in this case, they correspond to the x position, y position, and yaw). However, even if it were just one value, this would still be a vector!

Next, we set the covariance of this pose to the zero matrix; this is because we don’t care about covariance.

After that, we just set the rest of the parameters to the obvious values and return true because we didn’t have any failures!

Let’s also look at the configuration for this node:

type: cmr_subsystem/odom_node
node:
    verbosity: INFO
dependencies:
    hal:
        type: $HAL
        path: /hal
internal:
    frequency: 10 # hz
    error_count: 10
    inputs:
    - [5, 3]
    frame_id: odom
    child_frame_id: base_link

Lots to unpack here. First, notice that the type of this node is a cmr_subsystem/odom_node. It is not a fusion node! Remember, there’s no such thing as a fusion node itself; it is simply a node provider that provides functionality to the nodes that use it as their NodeProvider (in this case, our odometry node). The next two sections are pretty standard stuff; we depend on HAL because we are reading from sensors. All fusion nodes will need a HAL dependency, because we are using HAL within the fusion node provider to get the sensor data.

In the internal configuration, notice that we have two additional configuration values: frame_id and child_frame_id. Since your Fuser can have a parseConfig function, we are able to pass additional configuration to fusion nodes.

Note

You might also notice that we only have one sensor here! What’s the point of using a fusion node, if we only have one sensor? We do this to take advantage of the clean interfaces the Fusion Node provider provides; after all, we should work towards reducing code duplication, and since the fusion node already handles talking to HAL for us and verifying that the reading is correct, we take advantage of that by just implementing a fusion node with one input.