At Code Construct, we have been working on support for the Management Component Control Protocol (MCTP) on Linux systems, to the point where it's becoming generally useful for production server environments. To help with that, we have put together a few details in this introductory document.

MCTP?🔗

In case you're not already familiar with MCTP, it's a fairly-lightweight protocol defining a method of communication between components typically found on a server system. Some typical uses are

There are a few different hardware transports used for MCTP messaging, with the most conventional being i2c/SMBus, PCIe and serial.

The core attributes of the protocol:

If you're after further details, the MCTP protocol is defined by a set of standards produced by DMTF, each with their own "DSPxxx" identifier, for easy searching. The main ones relevant to us here:

MCTP support in Linux🔗

As of kernel version 5.15, Linux has a protocol definition for MCTP, added via the initial patchset. With that code enabled, you can create standard sockets that allow communication to other endpoints using MCTP.

Version 5.16 added some improvements for the MCTP core, including infrastructure for managing flows of messages.

Version 5.17 will include the first set of device drivers for transferring MCTP packets over physical hardware.

Update [2022-03-22]: The i2c transport driver has now been queued for v5.18.

As a handy table:

Kernel versionMCTP support
5.15MCTP core protocol
5.16Core protocol improvements, extended addressing, flow support
5.17Initial transport drivers (serial)
5.18Further transport drivers (i2c)

Our development branches🔗

While we're working on upstreaming the MCTP code, we have a set of branches in the Code Construct linux repo, which contain in-progress changes to the MCTP core.

We also have published backports branches, which provide MCTP support for recent stable and longterm kernels, which may be used by various OpenBMC platforms:

Using MCTP on Linux🔗

Firstly, you'll need a kernel with MCTP support - this is enabled with the CONFIG_MCTP build-time option. You can check that the protocol is available through the /proc/net/protocols file, which should contain an entry for MCTP:

# grep MCTP /proc/net/protocols 
MCTP       872      0      -1   NI       0   no   kernel      […]

If this isn't present, you'll either need to ensure that the mctp.ko module is loaded (if you've built the MCTP support as a module), and/or that the kernel was built with CONFIG_MCTP enabled.

Controlling the MCTP stack🔗

Like other network protocols, the kernel's MCTP stack is configured using the netlink interface, and so requires some basic tools to control the stack state. For this, we've developed a set of simple utilities, published to the repository at https://github.com/CodeConstruct/mctp. The main tool provided in this code is a small command-line utility called mctp, which works in a similar way to the ip utility for IP-based networking.

The commonly-used commands for the mctp tool are:

where the commands may be abbreviated - mctp addr will work the same as mctp address.

For example, to configure a local interface (named mctpi2c1) with a local EID of 8, and bring up the link:

mctp addr add 8 dev mctpi2c1
mctp link set mctpi2c1 up

We can also tell the kernel about routes to remote endpoints. To configure the routing table for a remote endpoint with EID 9 attached to the mctpi2c1 interface:

mctp route add 9 via mctpi2c1

Certain link types also need to know the physical address of endpoints on the same bus. For example, the i2c transport needs the i2c address of MCTP-enabled clients on the bus. We can use the mctp neighbour command to update physical addressing information. To update the neighbour table to indicate that EID 9 uses a physical address of 0x1d:

mctp neigh add 9 dev mctpi2c1 lladdr 0x1d

However, there's also a utility, mctpd, which allows the local machine to discover remote endpoints using the MCTP Control Protocol, and automatically configure the route and neighbour tables for each discovered endpoint. We'll cover the details of mctpd in a later document.

Hardware & interface configuration🔗

An MCTP-enabled system isn't much use without hardware interfaces, as these provide the facility to communicate with other endpoints. The method of defining MCTP interfaces will depend on the hardware type.

i2c/SMBus interfaces🔗

i2c/SMBus endpoints are defined through the kernel device tree, just like any other i2c client device. In our case, the local MCTP endpoint also carries its own hardware address, which we need to pass too.

MCTP-over-i2c interfaces are defined as i2c client nodes, using the mctp-i2c-controller compatible value.

A typical local i2c transport might look like this:

/* Our local i2c controller device */
&i2c6 {
    status = "okay";

    /* Mark this device as a MCTP controller. For any non-top-level i2c
     * controllers (eg, downstream ports of a multiplexer), this property
     * will create a new interface for the subordinate bus, linked to the
     * to the "real" MCTP iterface at the top-level */
    mctp-controller;

    /* The MCTP interface itself, at i2c address 0x10. This will be named
     * mctpi2cN, where N is the index of the i2c controller.  */
     mctp@10 {
        compatible = "mctp-i2c-controller";
        reg = <(0x10 | I2C_OWN_SLAVE_ADDRESS)>;
    };
};

Serial interfaces🔗

The MCTP-over-serial support is provided as a new tty line discipline. To create a new MCTP interface over an existing serial device:

mctp link serial /dev/ttyS0

This will create a new MCTP interface, named mctpserialN, allowing MCTP communication over the specified serial device. This process will block, as the line discipline will be active only while the serial device's file descriptor remains open. Consequently, you may want to start this from a systemd/init service.

Developing MCTP applications🔗

Now that we're able to set up our MCTP stack under Linux, we'll want to send and receive messages to/from other MCTP-enabled hardware. Like other network protocols, the physical addressing, routing and transport layer functions are handled by the kernel; applications will typically just need to be aware of their peers' EIDs. However, see the Addressing section below for a few extra details about that.

This is all done over the standard sockets API, plus a couple of small MCTP-specific definitions.

Sockets API🔗

Given we're implementing a network protocol here, it makes sense to use the existing sockets API for sending and receiving MCTP messages.

To allow this, the MCTP support in Linux exists as a new network protocol definition, like IP, or CAN. This allows userspace programs to use the usual sockets API: socket() to create a new socket descriptor which can then be used to send and receive messages.

MCTP sockets are all datagram-oriented (ie., use SOCK_DGRAM as the socket type), so message boundaries are preserved and will correspond to the buffers passed to and from by userspace. Being a datagram socket, the send/sendto/sendmsg syscalls are used for message transmit, and recv/recvfrom/recvmsg syscalls are used for receive.

The main protocol-specific parts of the API are two new definitions:

struct mctp_addr {
    uint8_t             s_addr;
};

struct sockaddr_mctp {
    uint16_t            smctp_family;
    uint32_t            smctp_network;
    struct mctp_addr    smctp_addr;
    uint8_t             smctp_type;
    uint8_t             smctp_tag;
};

Note that the kernel definition of struct sockaddr_mctp has some explicit padding fields, and more kernel-specific type definitions. We've simplified those here a little, but the definition above will work as-is.

Other than these, the rest of the sockets API can be used as-is. Here's a small example that transmits a single MCTP message:

#include <err.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/socket.h>

int main(void)
{
    struct sockaddr_mctp addr = { 0 };
    char buf[] = "hello, world!";
    int sd, rc;

    /* create the MCTP socket */
    sd = socket(AF_MCTP, SOCK_DGRAM, 0);
    if (sd < 0)
        err(EXIT_FAILURE, "socket() failed");

    /* populate the remote address information */
    addr.smctp_family = AF_MCTP;  /* we're using the MCTP family */
    addr.smctp_addr.s_addr = 8;   /* send to remote endpoint ID 8 */
    addr.smctp_type = 0;          /* encapsulated protocol type (eg. PLDM = 1) */
    addr.smctp_tag = MCTP_TAG_OWNER; /* we own the tag, and so the kernel
                                        will allocate one for us */

    /* send the MCTP message */
    rc = sendto(sd, buf, sizeof(buf), 0,
                (struct sockaddr *)&addr, sizeof(addr));

    if (rc != sizeof(buf))
        err(EXIT_FAILURE, "sendto() failed");

    return EXIT_SUCCESS;
}

Note that we have passed the message type in the smctp_type field, and have not included it in the message data (passed to sendto). The kernel will construct the correct message format by prepending this smctp_type byte to the message contents.

The message transmitted here is only 13 bytes long, but if fragmentation is required to suit the maximum size limit of the hardware transport, the kernel will perform the packetisation automatically.

Extending on this, here's a small MCTP "responder", which receives incoming messages (of a fictional type 5), and echoes the message data back to the original sender:

#include <err.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#include <sys/socket.h>

int main(void)
{
    struct sockaddr_mctp addr = { 0 };
    char buf[4096];
    int sd, rc;

    /* create the MCTP socket */
    sd = socket(AF_MCTP, SOCK_DGRAM, 0);
    if (sd < 0)
        err(EXIT_FAILURE, "socket() failed");

    /* populate the local address information for our bind(), which defines
     * properties of the messages that we will receive */
    addr.smctp_family = AF_MCTP;
    addr.smctp_addr.s_addr = MCTP_ADDR_ANY;   /* receive from any address */
    addr.smctp_type = 5;                      /* receive messages of type 5 */

    rc = bind(sd, (struct sockaddr *)&addr, sizeof(addr));
    if (rc)
        err(EXIT_FAILURE, "bind() failed");

    for (;;) {
        socklen_t addrlen;
        ssize_t len;

        addrlen = sizeof(addr);

        /* receive an incoming message, as well as the sender's address */
        len = recvfrom(sd, buf, sizeof(buf), MSG_TRUNC,
                      (struct sockaddr *)&addr,  &addrlen);

        if (len <= 0) {
            err(EXIT_FAILURE, "recvfrom failed()");

        } else if (len > (ssize_t)sizeof(buf)) {
            warnx("recvfrom: message too large for buffer");
            continue;

        }

        printf("message (%zd bytes) from remote EID 0x%02x\n",
                len, addr.smctp_addr);

        /* for the tag used in the reply, we clear the tag-owner bit, but
         * keep the tag value */
        addr.smctp_tag &= ~MCTP_TAG_OWNER;

        /* return message to sender */
        sendto(sd, buf, len, 0,
               (struct sockaddr *)&addr, sizeof(addr));
    }

    return EXIT_SUCCESS;
}

Tag handling🔗

As mentioned above, MCTP packets have two header fields related to tags:

These provide a basic method of both correlating packets of a fragmented message, and correlating request messages with their replies. The general semantics of these fields are:

In our Linux MCTP implementation, the kernel can handle almost all of this tag handling automatically, but does need a little information from applications in order to perform request-to-response correlation.

The struct sockaddr_mctp introduced above has a field for passing tag information to the kernel, highlighted here:

struct sockaddr_mctp {
    uint16_t            smctp_family;
    uint32_t            smctp_network;
    struct mctp_addr    smctp_addr;
    uint8_t             smctp_type;
    uint8_t             smctp_tag;
};

This smctp_tag field contains both the owner and tag values, and so we also have two macros for interpreting it:

#define MCTP_TAG_MASK		0x07
#define MCTP_TAG_OWNER		0x08

Where MCTP_TAG_MASK is a mask for the tag value, and MCTP_TAG_OWNER is the single-bit owner field.

The kernel has simple logic for tag values on message send: if MCTP_TAG_OWNER is set, the kernel will control the tag value. If it is unset, the kernel will use the value provided.

Most applications will only need two rules for setting the smctp_tag field:

  1. For request messages: set the tag value to MCTP_TAG_OWNER only.

    addr.smctp_tag = MCTP_TAG_OWNER;
    

    This will cause the kernel to allocate a new unique tag for the message. If MCTP_TAG_OWNER is set, the rest of the smctp_tag bits must be zero, otherwise the sendto() system call will fail with EINVAL.

  2. For response messages: set the tag value to the request message's tag, with MCTP_TAG_OWNER cleared.

    response_addr.smctp_tag = request_addr.smctp_tag & ~MCTP_TAG_OWNER;
    

    This will cause the kernel to use the tag value exactly as-is from the sendto() system call, and will allow the recipient to correlate the response messsge to the original request.

The rest of the smctp_tag field (ie., the most-significant four bits) must always be set to zero.

Addressing🔗

MCTP has a fairly limited address space - with EIDs being 8-bits, we can only have 255 endpoint IDs on a single network (minus a handful for reserved EID values).

To allow more than these ~255 endpoints, the kernel MCTP address structure (struct sockaddr_mctp) also contains a network number. Each network has a distinct set of EIDs, allowing applications to address more than the limit of 255 EIDs.

Of course, these networks need to be on physically separate busses, as this addressing scheme does not apply outside of the local system. Each MCTP interface must be on one (and only one) network, and messages will not be forwarded between separate networks - as we wouldn't have unique routing rules across networks where EIDs may be duplicated.

By default, all links start on the default network (network id 1), and so messages can be routed between all interfaces on that network.

It's possible to move a device to a different network using the mctp link set command:

# mctp link set lo network 2
# mctp link
dev lo index 1 address 0x00:00:00:00:00:00 net 2 mtu 65536 up

Since lo is no longer on the default network, messages sent through this interface will need to have the smctp_network field set explicitly:

    addr.smctp_family = AF_MCTP;
    addr.smctp_addr.s_addr = 8;
    addr.smctp_network = 2;     /* send to a non-default network */

Similar semantics apply to the struct sockaddr_mctp passed to the bind() system call. If a specific network is provided, the socket will only receive messages sent on that network. If the value MCTP_NET_ANY is provided, the socket will receive messages for all networks.


That covers the fundamental parts of the MCTP implementation under Linux. We'll continue this series with some more details in future posts. Stay tuned!

If you have any queries around the MCTP infrastructure or recent developments, please feel free to get in touch by sending an email - jk@codeconstruct.com.au.