MCTP on Linux introduction
At Code Construct, we have been working on support for the Management Component Control Protocol (MCTP) on Linux systems, to the point where it's becoming generally useful for production server environments. To help with that, we have put together a few details in this introductory document.
MCTP?🔗
In case you're not already familiar with MCTP, it's a fairly-lightweight protocol defining a method of communication between components typically found on a server system. Some typical uses are
-
control and configuration messages between a system's Baseboard Management Controller (BMC) and the main CPUs / firmware / operating system;
-
management of flash storage through the NVMe Management Interface, using MCTP over an i2c channel to the BMC, or
-
control and monitoring of sensor and effector devices using a standard on-the-wire format.
There are a few different hardware transports used for MCTP messaging, with the most conventional being i2c/SMBus, PCIe and serial.
The core attributes of the protocol:
-
Endpoints are addressed by an 8-bit endpoint ID (EID). EIDs are unique across a MCTP network.
-
Data is sent as messages. Messages may be fragmented into packets if they are larger than the maximum size supported by the hardware transport.
-
As well as having a source and destination EID, the packet header includes a tag and a tag-owner field. The tag is a three-bit value assigned by the initiator of the message. The tag-owner is a single bit, indicating whether the tag was generated by the sending or receiving endpoint. Typically, the tag owner bit will be set to 1 on a request message, and 0 on the response message, and the tag value will be the same on both.
-
Each MCTP message has a type as the first byte of the message data. This type value indicates the protocol of the message contents. For example, type 0x1 represents a PLDM message, and type 0x4 represents a NVMe Management Interface message.
If you're after further details, the MCTP protocol is defined by a set of standards produced by DMTF, each with their own "DSPxxx" identifier, for easy searching. The main ones relevant to us here:
-
The MCTP Base Specification (DSP0236) defines the core transport protocol, as well as the "control protocol" - a set of base messages that can be used to configure and control communication between endpoint devices
-
Various "transport binding" specifications, like MCTP over PCIe (DSP0238) and MCTP over i2c/SMBus (DSP0237).
-
Specifications for various upper-layer protocols that can be run over MCTP, like PLDM over MCTP (DSP0241)
MCTP support in Linux🔗
As of kernel version 5.15, Linux has a protocol definition for MCTP, added via the initial patchset. With that code enabled, you can create standard sockets that allow communication to other endpoints using MCTP.
Version 5.16 added some improvements for the MCTP core, including infrastructure for managing flows of messages.
Version 5.17 will include the first set of device drivers for transferring MCTP packets over physical hardware.
Update [2022-03-22]: The i2c transport driver has now been queued for v5.18.
As a handy table:
Kernel version | MCTP support |
---|---|
5.15 | MCTP core protocol |
5.16 | Core protocol improvements, extended addressing, flow support |
5.17 | Initial transport drivers (serial) |
5.18 | Further transport drivers (i2c) |
Our development branches🔗
While we're working on upstreaming the MCTP code, we have a set of branches in the Code Construct linux repo, which contain in-progress changes to the MCTP core.
-
The
dev/mctp
branch contains any pending patches for the MCTP core code. This is against recent upstream master, and will be occasionally rebased as upstream progresses. -
The
dev/mctp-i2c
branch is like the above, plus the i2c transport controller support.
We also have published backports branches, which provide MCTP support for recent stable and longterm kernels, which may be used by various OpenBMC platforms:
Using MCTP on Linux🔗
Firstly, you'll need a kernel with MCTP support - this is enabled with the
CONFIG_MCTP
build-time option. You can check that the protocol is available
through the /proc/net/protocols
file, which should contain an entry for MCTP:
# grep MCTP /proc/net/protocols
MCTP 872 0 -1 NI 0 no kernel […]
If this isn't present, you'll either need to ensure that the mctp.ko
module is
loaded (if you've built the MCTP support as a module), and/or that the kernel
was built with CONFIG_MCTP
enabled.
Controlling the MCTP stack🔗
Like other network protocols, the kernel's MCTP stack is configured using the
netlink interface, and so requires some basic tools to control the stack state.
For this, we've developed a set of simple utilities, published to the repository
at https://github.com/CodeConstruct/mctp. The main tool provided in this code
is a small command-line utility called mctp
, which works in a similar way to
the ip
utility for IP-based networking.
The commonly-used commands for the mctp tool are:
-
mctp link
configures/enables/disables local interfaces
-
mctp address
configures addresses on local interfaces
-
mctp route
configures the MCTP routing table
where the commands may be abbreviated - mctp addr
will work the same as mctp address
.
For example, to configure a local interface (named mctpi2c1
) with a local
EID of 8, and bring up the link:
mctp addr add 8 dev mctpi2c1
mctp link set mctpi2c1 up
We can also tell the kernel about routes to remote endpoints. To configure
the routing table for a remote endpoint with EID 9 attached to the mctpi2c1
interface:
mctp route add 9 via mctpi2c1
Certain link types also need to know the physical address of endpoints on
the same bus. For example, the i2c transport needs the i2c address of
MCTP-enabled clients on the bus. We can use the mctp neighbour
command to
update physical addressing information. To update the neighbour table
to indicate that EID 9 uses a physical address of 0x1d
:
mctp neigh add 9 dev mctpi2c1 lladdr 0x1d
However, there's also a utility, mctpd
, which allows the local machine to
discover remote endpoints using the MCTP Control Protocol, and automatically
configure the route and neighbour tables for each discovered endpoint. We'll
cover the details of mctpd
in a later document.
Hardware & interface configuration🔗
An MCTP-enabled system isn't much use without hardware interfaces, as these provide the facility to communicate with other endpoints. The method of defining MCTP interfaces will depend on the hardware type.
i2c/SMBus interfaces🔗
i2c/SMBus endpoints are defined through the kernel device tree, just like any other i2c client device. In our case, the local MCTP endpoint also carries its own hardware address, which we need to pass too.
MCTP-over-i2c interfaces are defined as i2c client nodes, using the
mctp-i2c-controller
compatible value.
A typical local i2c transport might look like this:
/* Our local i2c controller device */
&i2c6 {
status = "okay";
/* Mark this device as a MCTP controller. For any non-top-level i2c
* controllers (eg, downstream ports of a multiplexer), this property
* will create a new interface for the subordinate bus, linked to the
* to the "real" MCTP iterface at the top-level */
mctp-controller;
/* The MCTP interface itself, at i2c address 0x10. This will be named
* mctpi2cN, where N is the index of the i2c controller. */
mctp@10 {
compatible = "mctp-i2c-controller";
reg = <(0x10 | I2C_OWN_SLAVE_ADDRESS)>;
};
};
Serial interfaces🔗
The MCTP-over-serial support is provided as a new tty line discipline. To create a new MCTP interface over an existing serial device:
mctp link serial /dev/ttyS0
This will create a new MCTP interface, named mctpserialN
, allowing MCTP
communication over the specified serial device. This process will block, as
the line discipline will be active only while the serial device's file
descriptor remains open. Consequently, you may want to start this from a
systemd/init service.
Developing MCTP applications🔗
Now that we're able to set up our MCTP stack under Linux, we'll want to send and receive messages to/from other MCTP-enabled hardware. Like other network protocols, the physical addressing, routing and transport layer functions are handled by the kernel; applications will typically just need to be aware of their peers' EIDs. However, see the Addressing section below for a few extra details about that.
This is all done over the standard sockets API, plus a couple of small MCTP-specific definitions.
Sockets API🔗
Given we're implementing a network protocol here, it makes sense to use the existing sockets API for sending and receiving MCTP messages.
To allow this, the MCTP support in Linux exists as a new network protocol
definition, like IP, or CAN. This allows userspace programs to use the usual
sockets API: socket()
to create a new socket descriptor which can then
be used to send and receive messages.
MCTP sockets are all datagram-oriented (ie., use SOCK_DGRAM
as the socket
type), so message boundaries are preserved and will correspond to the
buffers passed to and from by userspace. Being a datagram socket, the
send
/sendto
/sendmsg
syscalls are used for message transmit, and
recv
/recvfrom
/recvmsg
syscalls are used for receive.
The main protocol-specific parts of the API are two new definitions:
-
a new address family,
AF_MCTP
. -
a new address format,
struct sockaddr_mctp
, defined as:
;
;
Note that the kernel definition of struct sockaddr_mctp
has some explicit
padding fields, and more kernel-specific type definitions. We've simplified
those here a little, but the definition above will work as-is.
Other than these, the rest of the sockets API can be used as-is. Here's a small example that transmits a single MCTP message:
int
Note that we have passed the message type in the smctp_type
field, and have
not included it in the message data (passed to sendto
). The kernel will
construct the correct message format by prepending this smctp_type
byte to the
message contents.
The message transmitted here is only 13 bytes long, but if fragmentation is required to suit the maximum size limit of the hardware transport, the kernel will perform the packetisation automatically.
Extending on this, here's a small MCTP "responder", which receives incoming messages (of a fictional type 5), and echoes the message data back to the original sender:
int
Tag handling🔗
As mentioned above, MCTP packets have two header fields related to tags:
-
A three-bit tag value field
-
A one-bit tag owner field, also known as
TO
.
These provide a basic method of both correlating packets of a fragmented message, and correlating request messages with their replies. The general semantics of these fields are:
-
When a requester sends a message to a responder, it chooses a tag value, where the (source EID, dest EID,
TO
, tag value) tuple is unique. -
When sending this message, since the requester generated the tag value, it sets the tag owner field to 1.
-
When a responder generates a reply to the above message, it uses the same tag value as seen in the request message(s), but sets the tag owner field to 0.
In our Linux MCTP implementation, the kernel can handle almost all of this tag handling automatically, but does need a little information from applications in order to perform request-to-response correlation.
The struct sockaddr_mctp
introduced above has a field for passing tag
information to the kernel, highlighted here:
;
This smctp_tag
field contains both the owner and tag values, and so we also
have two macros for interpreting it:
Where MCTP_TAG_MASK
is a mask for the tag value, and MCTP_TAG_OWNER
is the
single-bit owner field.
The kernel has simple logic for tag values on message send: if MCTP_TAG_OWNER
is set, the kernel will control the tag value. If it is unset, the kernel will
use the value provided.
Most applications will only need two rules for setting the smctp_tag
field:
-
For request messages: set the tag value to
MCTP_TAG_OWNER
only.addr.smctp_tag = MCTP_TAG_OWNER;
This will cause the kernel to allocate a new unique tag for the message. If
MCTP_TAG_OWNER
is set, the rest of thesmctp_tag
bits must be zero, otherwise thesendto()
system call will fail withEINVAL
. -
For response messages: set the tag value to the request message's tag, with
MCTP_TAG_OWNER
cleared.response_addr.smctp_tag = request_addr.smctp_tag & ~MCTP_TAG_OWNER;
This will cause the kernel to use the tag value exactly as-is from the
sendto()
system call, and will allow the recipient to correlate the response messsge to the original request.
The rest of the smctp_tag
field (ie., the most-significant four bits) must
always be set to zero.
Addressing🔗
MCTP has a fairly limited address space - with EIDs being 8-bits, we can only have 255 endpoint IDs on a single network (minus a handful for reserved EID values).
To allow more than these ~255 endpoints, the kernel MCTP address structure
(struct sockaddr_mctp
) also contains a network number. Each network
has a distinct set of EIDs, allowing applications to address more than the limit
of 255 EIDs.
Of course, these networks need to be on physically separate busses, as this addressing scheme does not apply outside of the local system. Each MCTP interface must be on one (and only one) network, and messages will not be forwarded between separate networks - as we wouldn't have unique routing rules across networks where EIDs may be duplicated.
By default, all links start on the default network (network id 1), and so messages can be routed between all interfaces on that network.
It's possible to move a device to a different network using the mctp link set
command:
# mctp link set lo network 2
# mctp link
dev lo index 1 address 0x00:00:00:00:00:00 net 2 mtu 65536 up
Since lo
is no longer on the default network, messages sent through this
interface will need to have the smctp_network
field set explicitly:
addr.smctp_family = AF_MCTP;
addr.smctp_addr.s_addr = 8;
addr.smctp_network = 2; /* send to a non-default network */
Similar semantics apply to the struct sockaddr_mctp
passed to the bind()
system call. If a specific network is provided, the socket will only receive
messages sent on that network. If the value MCTP_NET_ANY
is provided, the
socket will receive messages for all networks.
That covers the fundamental parts of the MCTP implementation under Linux. We'll continue this series with some more details in future posts. Stay tuned!
If you have any queries around the MCTP infrastructure or recent developments, please feel free to get in touch by sending an email - jk@codeconstruct.com.au.