At Code Construct, we have been working on support for the Management Component Control Protocol (MCTP) on Linux systems, to the point where it's becoming generally useful for production server environments. To help with that, we have put together a few details in this introductory document.

MCTP?🔗

In case you're not already familiar with MCTP, it's a fairly-lightweight protocol defining a method of communication between components typically found on a server system. Some typical uses are

control and configuration messages between a system's Baseboard Management Controller (BMC) and the main CPUs / firmware / operating system;
management of flash storage through the NVMe Management Interface, using MCTP over an i2c channel to the BMC, or
control and monitoring of sensor and effector devices using a standard on-the-wire format.

There are a few different hardware transports used for MCTP messaging, with the most conventional being i2c/SMBus, PCIe and serial.

The core attributes of the protocol:

Endpoints are addressed by an 8-bit endpoint ID (EID). EIDs are unique across a MCTP network.
Data is sent as messages. Messages may be fragmented into packets if they are larger than the maximum size supported by the hardware transport.
As well as having a source and destination EID, the packet header includes a tag and a tag-owner field. The tag is a three-bit value assigned by the initiator of the message. The tag-owner is a single bit, indicating whether the tag was generated by the sending or receiving endpoint. Typically, the tag owner bit will be set to 1 on a request message, and 0 on the response message, and the tag value will be the same on both.
Each MCTP message has a type as the first byte of the message data. This type value indicates the protocol of the message contents. For example, type 0x1 represents a PLDM message, and type 0x4 represents a NVMe Management Interface message.

If you're after further details, the MCTP protocol is defined by a set of standards produced by DMTF, each with their own "DSPxxx" identifier, for easy searching. The main ones relevant to us here:

The MCTP Base Specification (DSP0236) defines the core transport protocol, as well as the "control protocol" - a set of base messages that can be used to configure and control communication between endpoint devices
Various "transport binding" specifications, like MCTP over PCIe (DSP0238) and MCTP over i2c/SMBus (DSP0237).
Specifications for various upper-layer protocols that can be run over MCTP, like PLDM over MCTP (DSP0241)

MCTP support in Linux🔗

As of kernel version 5.15, Linux has a protocol definition for MCTP, added via the initial patchset. With that code enabled, you can create standard sockets that allow communication to other endpoints using MCTP.

Version 5.16 added some improvements for the MCTP core, including infrastructure for managing flows of messages.

Version 5.17 will include the first set of device drivers for transferring MCTP packets over physical hardware.

Update [2022-03-22]: The i2c transport driver has now been queued for v5.18.

Update [2025-05-25]: The i3c and USB transport drivers have been added too.

As a handy table:

Kernel version	MCTP support
5.15	MCTP core protocol
5.16	Core protocol improvements, extended addressing, flow support
5.17	Initial transport drivers (serial)
5.18	Further transport drivers (i2c)
6.6	Further transport drivers (i3c)
6.15	Further transport drivers (USB)

Our development branches🔗

While we're working on upstreaming the MCTP code, we have a set of branches in the Code Construct linux repo, which contain in-progress changes to the MCTP core.

The dev/mctp branch contains any pending patches for the MCTP core code. This is against recent upstream master, and will be occasionally rebased as upstream progresses.
The dev/mctp-i2c branch is like the above, plus the i2c transport controller support.

We also have published backports branches, which provide MCTP support for recent stable and longterm kernels, which may be used by various OpenBMC platforms:

Using MCTP on Linux🔗

Firstly, you'll need a kernel with MCTP support - this is enabled with the CONFIG_MCTP build-time option. You can check that the protocol is available through the /proc/net/protocols file, which should contain an entry for MCTP:

# grep MCTP /proc/net/protocols 
MCTP       872      0      -1   NI       0   no   kernel      […]

If this isn't present, you'll either need to ensure that the mctp.ko module is loaded (if you've built the MCTP support as a module), and/or that the kernel was built with CONFIG_MCTP enabled.

Controlling the MCTP stack🔗

Like other network protocols, the kernel's MCTP stack is configured using the netlink interface, and so requires some basic tools to control the stack state. For this, we've developed a set of simple utilities, published to the repository at https://github.com/CodeConstruct/mctp. The main tool provided in this code is a small command-line utility called mctp, which works in a similar way to the ip utility for IP-based networking.

The commonly-used commands for the mctp tool are:

mctp link

configures/enables/disables local interfaces
mctp address

configures addresses on local interfaces
mctp route

configures the MCTP routing table

where the commands may be abbreviated - mctp addr will work the same as mctp address.

For example, to configure a local interface (named mctpi2c1) with a local EID of 8, and bring up the link:

mctp addr add 8 dev mctpi2c1
mctp link set mctpi2c1 up

We can also tell the kernel about routes to remote endpoints. To configure the routing table for a remote endpoint with EID 9 attached to the mctpi2c1 interface:

mctp route add 9 via mctpi2c1

Certain link types also need to know the physical address of endpoints on the same bus. For example, the i2c transport needs the i2c address of MCTP-enabled clients on the bus. We can use the mctp neighbour command to update physical addressing information. To update the neighbour table to indicate that EID 9 uses a physical address of 0x1d:

mctp neigh add 9 dev mctpi2c1 lladdr 0x1d

However, there's also a utility, mctpd, which allows the local machine to discover remote endpoints using the MCTP Control Protocol, and automatically configure the route and neighbour tables for each discovered endpoint. We'll cover the details of mctpd in a later document.

Hardware & interface configuration🔗

An MCTP-enabled system isn't much use without hardware interfaces, as these provide the facility to communicate with other endpoints. The method of defining MCTP interfaces will depend on the hardware type.

i2c/SMBus interfaces🔗

i2c/SMBus endpoints are defined through the kernel device tree, just like any other i2c client device. In our case, the local MCTP endpoint also carries its own hardware address, which we need to pass too.

MCTP-over-i2c interfaces are defined as i2c client nodes, using the mctp-i2c-controller compatible value.

A typical local i2c transport might look like this:

/* Our local i2c controller device */
&i2c6 {
    status = "okay";

    /* Mark this device as a MCTP controller. For any non-top-level i2c
     * controllers (eg, downstream ports of a multiplexer), this property
     * will create a new interface for the subordinate bus, linked to the
     * to the "real" MCTP iterface at the top-level */
    mctp-controller;

    /* The MCTP interface itself, at i2c address 0x10. This will be named
     * mctpi2cN, where N is the index of the i2c controller.  */
     mctp@10 {
        compatible = "mctp-i2c-controller";
        reg = <(0x10 | I2C_OWN_SLAVE_ADDRESS)>;
    };
};

Serial interfaces🔗

The MCTP-over-serial support is provided as a new tty line discipline. To create a new MCTP interface over an existing serial device:

mctp link serial /dev/ttyS0

This will create a new MCTP interface, named mctpserialN, allowing MCTP communication over the specified serial device. This process will block, as the line discipline will be active only while the serial device's file descriptor remains open. Consequently, you may want to start this from a systemd/init service.

Developing MCTP applications🔗

Now that we're able to set up our MCTP stack under Linux, we'll want to send and receive messages to/from other MCTP-enabled hardware. Like other network protocols, the physical addressing, routing and transport layer functions are handled by the kernel; applications will typically just need to be aware of their peers' EIDs. However, see the Addressing section below for a few extra details about that.

This is all done over the standard sockets API, plus a couple of small MCTP-specific definitions.

Sockets API🔗

Given we're implementing a network protocol here, it makes sense to use the existing sockets API for sending and receiving MCTP messages.

To allow this, the MCTP support in Linux exists as a new network protocol definition, like IP, or CAN. This allows userspace programs to use the usual sockets API: socket() to create a new socket descriptor which can then be used to send and receive messages.

MCTP sockets are all datagram-oriented (ie., use SOCK_DGRAM as the socket type), so message boundaries are preserved and will correspond to the buffers passed to and from by userspace. Being a datagram socket, the send/sendto/sendmsg syscalls are used for message transmit, and recv/recvfrom/recvmsg syscalls are used for receive.

The main protocol-specific parts of the API are two new definitions:

a new address family, AF_MCTP.
a new address format, struct sockaddr_mctp, defined as:

struct mctp_addr {
    uint8_t             s_addr;
};

struct sockaddr_mctp {
    uint16_t            smctp_family;
    uint32_t            smctp_network;
    struct mctp_addr    smctp_addr;
    uint8_t             smctp_type;
    uint8_t             smctp_tag;
};

Note that the kernel definition of struct sockaddr_mctp has some explicit padding fields, and more kernel-specific type definitions. We've simplified those here a little, but the definition above will work as-is.

Other than these, the rest of the sockets API can be used as-is. Here's a small example that transmits a single MCTP message:

#include <err.h>
#include <stdlib.h>
#include <unistd.h>

#include <sys/socket.h>

int main(void)
{
    struct sockaddr_mctp addr = { 0 };
    char buf[] = "hello, world!";
    int sd, rc;

    /* create the MCTP socket */
    sd = socket(AF_MCTP, SOCK_DGRAM, 0);
    if (sd < 0)
        err(EXIT_FAILURE, "socket() failed");

    /* populate the remote address information */
    addr.smctp_family = AF_MCTP;  /* we're using the MCTP family */
    addr.smctp_addr.s_addr = 8;   /* send to remote endpoint ID 8 */
    addr.smctp_type = 0;          /* encapsulated protocol type (eg. PLDM = 1) */
    addr.smctp_tag = MCTP_TAG_OWNER; /* we own the tag, and so the kernel
                                        will allocate one for us */

    /* send the MCTP message */
    rc = sendto(sd, buf, sizeof(buf), 0,
                (struct sockaddr *)&addr, sizeof(addr));

    if (rc != sizeof(buf))
        err(EXIT_FAILURE, "sendto() failed");

    return EXIT_SUCCESS;
}

Note that we have passed the message type in the smctp_type field, and have not included it in the message data (passed to sendto). The kernel will construct the correct message format by prepending this smctp_type byte to the message contents.

The message transmitted here is only 13 bytes long, but if fragmentation is required to suit the maximum size limit of the hardware transport, the kernel will perform the packetisation automatically.

Extending on this, here's a small MCTP "responder", which receives incoming messages (of a fictional type 5), and echoes the message data back to the original sender:

#include <err.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#include <sys/socket.h>

int main(void)
{
    struct sockaddr_mctp addr = { 0 };
    char buf[4096];
    int sd, rc;

    /* create the MCTP socket */
    sd = socket(AF_MCTP, SOCK_DGRAM, 0);
    if (sd < 0)
        err(EXIT_FAILURE, "socket() failed");

    /* populate the local address information for our bind(), which defines
     * properties of the messages that we will receive */
    addr.smctp_family = AF_MCTP;
    addr.smctp_addr.s_addr = MCTP_ADDR_ANY;   /* receive from any address */
    addr.smctp_type = 5;                      /* receive messages of type 5 */

    rc = bind(sd, (struct sockaddr *)&addr, sizeof(addr));
    if (rc)
        err(EXIT_FAILURE, "bind() failed");

    for (;;) {
        socklen_t addrlen;
        ssize_t len;

        addrlen = sizeof(addr);

        /* receive an incoming message, as well as the sender's address */
        len = recvfrom(sd, buf, sizeof(buf), MSG_TRUNC,
                      (struct sockaddr *)&addr,  &addrlen);

        if (len <= 0) {
            err(EXIT_FAILURE, "recvfrom failed()");

        } else if (len > (ssize_t)sizeof(buf)) {
            warnx("recvfrom: message too large for buffer");
            continue;

        }

        printf("message (%zd bytes) from remote EID 0x%02x\n",
                len, addr.smctp_addr);

        /* for the tag used in the reply, we clear the tag-owner bit, but
         * keep the tag value */
        addr.smctp_tag &= ~MCTP_TAG_OWNER;

        /* return message to sender */
        sendto(sd, buf, len, 0,
               (struct sockaddr *)&addr, sizeof(addr));
    }

    return EXIT_SUCCESS;
}

Tag handling🔗

As mentioned above, MCTP packets have two header fields related to tags:

A three-bit tag value field
A one-bit tag owner field, also known as TO.

These provide a basic method of both correlating packets of a fragmented message, and correlating request messages with their replies. The general semantics of these fields are:

When a requester sends a message to a responder, it chooses a tag value, where the (source EID, dest EID, TO, tag value) tuple is unique.
When sending this message, since the requester generated the tag value, it sets the tag owner field to 1.
When a responder generates a reply to the above message, it uses the same tag value as seen in the request message(s), but sets the tag owner field to 0.

In our Linux MCTP implementation, the kernel can handle almost all of this tag handling automatically, but does need a little information from applications in order to perform request-to-response correlation.

The struct sockaddr_mctp introduced above has a field for passing tag information to the kernel, highlighted here:

struct sockaddr_mctp {
    uint16_t            smctp_family;
    uint32_t            smctp_network;
    struct mctp_addr    smctp_addr;
    uint8_t             smctp_type;
    uint8_t             smctp_tag;
};

This smctp_tag field contains both the owner and tag values, and so we also have two macros for interpreting it:

#define MCTP_TAG_MASK		0x07
#define MCTP_TAG_OWNER		0x08

Where MCTP_TAG_MASK is a mask for the tag value, and MCTP_TAG_OWNER is the single-bit owner field.

The kernel has simple logic for tag values on message send: if MCTP_TAG_OWNER is set, the kernel will control the tag value. If it is unset, the kernel will use the value provided.

Most applications will only need two rules for setting the smctp_tag field:

For request messages: set the tag value to MCTP_TAG_OWNER only.
```
addr.smctp_tag = MCTP_TAG_OWNER;
```
This will cause the kernel to allocate a new unique tag for the message. If MCTP_TAG_OWNER is set, the rest of the smctp_tag bits must be zero, otherwise the sendto() system call will fail with EINVAL.
For response messages: set the tag value to the request message's tag, with MCTP_TAG_OWNER cleared.
```
response_addr.smctp_tag = request_addr.smctp_tag & ~MCTP_TAG_OWNER;
```
This will cause the kernel to use the tag value exactly as-is from the sendto() system call, and will allow the recipient to correlate the response messsge to the original request.

The rest of the smctp_tag field (ie., the most-significant four bits) must always be set to zero.

Addressing🔗

MCTP has a fairly limited address space - with EIDs being 8-bits, we can only have 255 endpoint IDs on a single network (minus a handful for reserved EID values).

To allow more than these ~255 endpoints, the kernel MCTP address structure (struct sockaddr_mctp) also contains a network number. Each network has a distinct set of EIDs, allowing applications to address more than the limit of 255 EIDs.

Of course, these networks need to be on physically separate busses, as this addressing scheme does not apply outside of the local system. Each MCTP interface must be on one (and only one) network, and messages will not be forwarded between separate networks - as we wouldn't have unique routing rules across networks where EIDs may be duplicated.

By default, all links start on the default network (network id 1), and so messages can be routed between all interfaces on that network.

It's possible to move a device to a different network using the mctp link set command:

# mctp link set lo network 2
# mctp link
dev lo index 1 address 0x00:00:00:00:00:00 net 2 mtu 65536 up

Since lo is no longer on the default network, messages sent through this interface will need to have the smctp_network field set explicitly:

    addr.smctp_family = AF_MCTP;
    addr.smctp_addr.s_addr = 8;
    addr.smctp_network = 2;     /* send to a non-default network */

Similar semantics apply to the struct sockaddr_mctp passed to the bind() system call. If a specific network is provided, the socket will only receive messages sent on that network. If the value MCTP_NET_ANY is provided, the socket will receive messages for all networks.

That covers the fundamental parts of the MCTP implementation under Linux. We'll continue this series with some more details in future posts. Stay tuned!

If you have any queries around the MCTP infrastructure or recent developments, please feel free to get in touch by sending an email - jk@codeconstruct.com.au.

MCTP on Linux introduction