Linux 6.0 released, Bootlin contributions

Linux 6.0 has been released two weeks ago, and Linux 6.1-rc1 is already out of the door, but we didn’t get the chance to look at the contributions made by Bootlin to the Linux 6.0 release. Before we do that, let’s provide our usual must-read articles on Linux 6.0: the Linux 6.0 merge window part 1 and Linux 6.0 merge window part 2 LWN.net articles and the KernelNewbies.org article.

On Bootlin side, our significant contributions to this release have been:

  • Clément Léger contributed a new driver for the Ethernet switch found in the Renesas RZ/N1 processor, as well as a PCS driver for the MII converter of the same processor. Obviously, this came with the related Device Tree bindings and Device Tree changes, but also with a few small changes in the DSA subsystem.
  • Hervé Codina enabled support for the PCIe controller found in the same Renesas RZ/N1 processor, which in fact does not allow to use PCIe devices, but USB devices: this PCIe controller is only used to connect to an internal USB controller in the chip, which therefore allows to use USB devices.
  • Köry Maincent extended the existing mpc4922 DAC IIO driver to also support the mpc4921 variant, which has only one output channel instead of two.
  • Luca Ceresoli contributed several improvements to the I2C subsystem documentation.
  • Paul Kocialkowski contributed a new DRM driver for the logiCVC-ML display controller IP
  • Paul Kocialkowski contributed two new V4L drivers for the MIPI CSI-2 camera interfaces available in the Allwinner A31 family of processors (sun6i) and the Allwinner A83T family of processors (sun8i).

Here is the full details of our contributions, commit by commit:

A journey in the RTC subsystem

As part of a team effort to improve the upstream Linux kernel support for the Renesas RZ/N1 ARM processor, we had to write from scratch a new RTC driver for this SoC. The RTC subsystem API is rather straightforward but, as most kernel subsystems, the documentation about it is rather sparse. So what are the steps to write a basic RTC driver? Here are some pointers.

The registration

The core expects drivers to allocate, initialize and then register a struct rtc_device with the device managed helpers: devm_rtc_allocate_device() and devm_rtc_register_device(). Between these two function calls, one will be required to provide at least a set of struct rtc_class_ops which contains the various callbacks used to access the device from the core, as well as setting a few information about the device.

The kind of information expected is the support for various features (rtcdev->features bitmap) as well as the maximum continuous time range supported by your RTC. If you do not know the actual date after which your device stops being reliable, you can use the rtc-range test tool from rtc-tools, available at https://git.kernel.org/pub/scm/linux/kernel/git/abelloni/rtc-tools.git (also available as a Buildroot package). It will check the consistency of your driver against a number of common known-to-be-failing situations.

Time handling

The most basic operations to provide are ->read_time() and ->set_time(). Both functions should play with a struct rtc_time which describes time and date with members for the year, month, day of the month, hours (in 24-hour mode), minutes and seconds. The week day member is ignored by userspace and is not expected to be set properly, unless it is actively used by the RTC, for example to set alarms. There are then three popular ways of storing time in the RTC world:

  1. either using the binary values of each of these fields
  2. or using a Binary Coded Decimal (BCD) version of these fields
  3. or, finally, by storing a timestamp in seconds since the epoch

In BCD, each decimal digit is encoded using four bits, eg. the number 12 could either be coded by 0x0C in hexadecimal, or 0x12 in BCD, which is easier to read with a human eye.

The three representations are absolutely equivalent and you are free to convert the time from one system to another when needed:

  • #1 <-> #2 conversions are done with bcd2bin() and bin2bcd() (from linux/bcd.h)
  • #1 <-> #3 conversions are done with rtc_time64_to_tm() and rtc_tm_to_time64() (from linux/rtc.h)

While debugging, it is likely that you will end up dumping these time structures. Note that struct rtc_time is aligned on struct tm, this means that the year field is the number of years since 1900 and the month field is the number of months since January, in the range 0 to 11. Anyway, dumping these fields manually is a loss of time, it is advised instead to use the dedicated RTC printk specifiers which will handle the conversion for you: %ptR for a struct rtc_time, %ptT for a time64_t.

Of course, when reading the actual time from multiple registers on the device and filling those fields, be aware that you should handle possible wrapping situations. Either the device has an internal latching mechanism for that (eg. the front-end of the registers that you must read are all frozen upon a specific action) or you need to verify this manually by, for instance, monitoring the seconds register and try another read if it changed between the beginning and the end of the retrieval.

If your device continuous time range ended before 2000 you may want to shift the default hardware range further by providing the start-year device tree property. The core will then shift the Epoch further for you.

Finally, once done, you can verify your implementation by playing with the rtc test tool (also from rtc-tools).

Supporting alarms

One common RTC feature is the ability to trigger alarms at specific times. Of course it’s even better if your RTC can wake-up the system.

If the device or the way it is integrated doesn’t support alarms, this should be advertised at registration time by clearing the relevant bit (RTC_FEATURE_ALARM, RTC_FEATURE_UPDATE_INTERRUPT). In the other situations, it is relevant to indicate whether the RTC has a second, 2-seconds or minute resolution by setting the appropriate flag (RTC_FEATURE_ALARM_RES_2S, RTC_FEATURE_ALARM_RES_MINUTE). Mind when testing that querying an alarm time below this resolution will return a -ETIME error.

When implementing the ->read_alarm(), ->set_alarm() and ->alarm_irq_enable() hooks, be aware that the update and periodic alarms are now implemented in the core, using HR timers rather than with the RTC so you should focus on the regular alarm. The read/set hooks naturally allow to read and change the alarm settings. A struct rtc_wkalrm *alrm is passed as parameter, alrm->time is the struct rtc_time and alrm->enabled the state of the alarm (which must be set in ->set_alarm()). The third hook is an asynchronous way to enable/disable the alarm IRQ.

The interrupt handler for the alarm is required to call rtc_update_irq() to signal the core that an alarm happened, providing the RTC device, the number of alarms reported (usually one), and the RTC_IRQF flag OR’ed with the relevant alarm flag (likely, RTC_AF for the main alarm).

Oscillator offset compensation

RTC counters rely on very precise clock sources to deliver accurate times. To handle the situation where the source is not matching the expected precision, which is the case with most cheap oscillators on the market, some RTCs have a mechanism allowing to compensate for the frequency variation by incrementing or skipping the RTC counters at a regular interval in order to get closer to the reality.

The RTC subsystem offers a set of callbacks, ->read_offset() and a ->set_offset(), where a signed offset is passed in ppb (parts per billion).

As an example, if an oscillator is below its targeted frequency of 32768 Hz and is measured to run at 32767.7 Hz, we need to offset the counter by 1 - (32767.7/32768) = 9155 ppb. If the RTC is capable of offsetting the main counter once every 20s it means that every 20s, this counter (which gets decremented at the frequency of the oscillator to produce the “seconds”) will start at a different value than 32768. Adding 1 to this counter every 20s would basically mean earning 1 / (32768 * 20) = 1526 ppb. Our target being 9155 ppb, we must offset the counter by 9155 / 1526 = 6 every 20s to get a compensated rate of 32767.7 + (6 / 20) = 32768 Hz.

Upstreaming status of the RZ/N1 RTC driver

The RZ/N1 RTC driver has all the features listed above and made its way into the v5.18 Linux kernel release. Hopefully this little reference sheet will encourage others to finalize and send new RTC drivers upstream!

Bootlin at Linux Plumbers conference 2022

Next week, almost the entire Bootlin team will be at the Embedded Linux Conference Europe in Dublin, see our previous blog post on this topic. We will give four talks at this event, on a variety of Linux kernel and embedded Linux topics.

During the same week, also in Dublin albeit in a different location, will take place the Linux Plumbers conference. Bootlin engineer Miquèl Raynal will give a talk at Linux Plumbers, as part of the IoTs a 4-Letter Word micro-conference. Miquèl’s talk will discuss Linux IEEE 802.15.4 MLME improvements, as Miquèl has been working for several months on bringing improvements to the 802.15.4 stack in the Linux kernel.

Linux 5.19 released, Bootlin contributions inside

Linux 5.19 has been released yesterday. We recommend the usual resources of LWN (part 1 and part 2) as well as KernelNewbies to get some high-level overview of the major additions. CNX-Software also has an article focused on the ARM/RISC-V/MIPS improvements.

At Bootlin, we contributed 68 patches to this release, the main highlights being:

  • Clément Léger contributed patches for the Microchip SAMA5 platform to support suspend operation while running in non-secure mode, with OP-TEE handling the necessary PCSI calls. This is related to our work to port OP-TEE on Microchip SAMA5D2, which we have covered in several blog posts before.
  • Hervé Codina contributed device Tree updates to enable the PCI controller of the Renesas RZ/N1 platform, which allows to access the USB host controller that sits on an internal PCI bus. Some driver updates for the PCI driver are needed, and they will land in 5.206.0 kernel.
  • Miquèl Raynal contributed several improvements to the IIO subsystem, following his work on several IIO drivers and his related blog post. These improvements either touch the core IIO, or fix some incorrect API use in IIO drivers.
  • Miquèl Raynal contributed a new driver for the Renesas RZ/N1 DMA router (in drivers/dmaengine) as well as a new driver for the Renesas RZ/N1 Real Time Clock (in drivers/rtc). In addition, Miquèl modified the 8250 UART controller driver to be able to use the DMA capabilities available on the RZ/N1 processor.
  • Miquèl Raynal also contributed a number of improvements to the IEEE 802.15.4 stack in the Linux kernel.
  • Paul Kocialkowski contributed support for MIPI CSI-2 in the Allwinner phy-sun6i-mipi-dphy driver.
  • Paul Kocialkowski and Luca Ceresoli contributed a few misc fixes, touching the SPI core and SPI Rockchip driver and the dmaengine documentation.

The complete details of our contributions are:

Linux 5.18 released, Bootlin contributions inside

Linux 5.18 has been released a bit over a week ago. As usual, we recommend the resources provided by LWN.net (part 1 and part 2) and KernelNewbies.org to get an overall view of the major features and improvements of this Linux kernel release.

Bootlin engineers have collectively contributed 80 patches to this Linux kernel release, making us the 28th contributing company according to these statistics.

  • Alexandre Belloni, as the RTC subsystem maintainer, continued to improve the overall subsystem, and migrate drivers to new features and mechanisms introduced in the core RTC subsystem
  • Clément Léger contributed a new RTC driver that allows to use the RTC exposed by the OP-TEE Trusted Execution Environment, as well as a few other fixes
  • Hervé Codina and Luca Ceresoli contributed some fixes: Hervé to the dw-edma dmaengine driver, and Luca to the Rockchip RK3308 pinctrl driver
  • Miquèl Raynal, as the MTD subsystem co-maintainer, contributed the remainder of his work to generalize the support of ECC handling, and allow both parallel and SPI NAND to use either software ECC, on-die ECC, or ECC done by a dedicated controller. Included in this work is a new driver for the Macronix external ECC engine, in drivers/mtd/nand/ecc-mxic.c
  • Miquèl Raynal also made a few contributions to the 802.15.4 part of the networking stack, and we have more contributions in this area coming up.
  • Paul Kocialkowski contributed a small fix to Allwinner Device Tree files, and another attempt at fixing an issue with the display panel detection/probing in the DRM subsystem

Linux 5.17 released: Bootlin contributions

Linux 5.17 has been released last Sunday. As usual, the best coverage of what is part of this release comes from LWN (part 1 and part 2), as well as KernelNewbies (unresponsive at the time of this writing) or CNX Software (for an ARM/RISC-V/MIPS focused description).

Bootlin contributed just 34 patches to this release, which isn’t a lot by the number of patches, but in fact includes a number of important new features. Also, we have many more contributions being discussed on the mailing lists or in preparation. For this 5.17 release here are the highlights of our contributions:

  • Alexandre Belloni, as the maintainer of the RTC subsystem, contributed one improvement to an RTC driver
  • Clément Léger improved the Microchip Ocelot Ethernet switch driver performance by implementing FDMA support. This allows network packets that are going from the switch to the CPU, or from the CPU to the switch to be received/sent in a much more efficient fashion than before. The Microchip Ocelot Ethernet switch driver was developed and upstreamed several years ago by Bootlin, see our previous blog post.
  • Clément Léger also contributed smaller fixes: a bug fix in the core software node code, and one PHY driver fix.
  • Hervé Codina implemented support for GPIO interrupts on the old ST Spear320 platform.
  • Maxime Chevallier contributed mqprio support to the Marvell Ethernet MAC mvneta driver, which was the topic of a previous blog post
  • Miquèl Raynal contributed a brand new NAND controller driver, for the NAND controller found in the Renesas RZ/N1 SoC. We expect to contribute to many more aspects of the Renesas RZ/N1 Linux kernel support in the next few months.
  • Miquèl Raynal contributed a few Device Tree changes enabling the ADC on the Texas Instruments AM473x platform, after contributing the driver changes a few releases ago.
  • Miquèl Raynal started contributing some improvements to the 802.15.4 Linux kernel stack, and we also have many more changes in the pipe for this Linux kernel subsystem.
  • Thomas Perrot added support for the Sierra EM919X modem to the existing MHI PCI driver.

Here is the full list of our contributions:

Multi-queue improvements in Linux kernel Ethernet driver mvneta

In the past months, the Linux kernel driver for the Ethernet MAC found in a number of Marvell SoCs, mvneta, has seen quite a few improvements. Lorenzo Bianconi brought support for XDP operations on non-linear buffers, a follow-up work on the already-great XDP support that offers very nice performances on that controller. Russell King contributed an improved, more generic and easier to maintain phylink support, to deal with the variety of embedded use-cases.

At that point, it’s getting difficult to squeeze more performances out of this controller. However, we still have some tricks we can use to improve some use-cases so in the past months, we’ve worked on implementing QoS features on mvneta, through the use of mqprio.

Multi-queue support

A simple Ethernet NIC (Network Interface Controller) could be described as a controller that handles some protocol-level aspect of the Ethernet standard, where the incoming and outgoing packets would be enqueued in a dedicated ring-buffer that serves as an interface between the controller and the kernel.

The buffer containing packets that needs to be sent is called the Transmit Queue, often written as txq. It’s fed by the kernel, where the NIC driver converts some struct sk_buff objects called skb, that represent packets trough their journey in the kernel, into buffers that are enqueued in the txq.

On the ingress side, the Receive Queue, written rxq, is fed by the MAC, and dequeued by the NIC driver, who creates skb containing the payload and attached metadata.

In the end, every incoming or outgoing packet is enqueued and dequeued in a typical first-in first-out fashion. When the queue is full, packets are dropped, everything is neat, simple and deterministic.

However, typical use-cases are never simple. Most modern NICs have several queues in TX and RX. On the receive side, it’s useful to have multiple queues for performance reasons. When receiving packets, we can spread the traffic across multiple queues (with RSS for example), and have one CPU core dedicated to each queue. More complex use-cases can be expressed, such as flow steering, that you can learn about in the kernel documentation.

On the transmit side, it’s a bit less obvious why we want to have multiple queues. If the CPU is the bottleneck in terms of performances, having multiple TX queues won’t help much. Still, there are ways to benefit from having multiple TX queues on a multi-cpu system with XPS.

However, when the line-rate is the bottleneck, having multiple queues gets useful. By sorting the outgoing traffic by priority and assign each priority to a TX queue, the NIC can then pick the next packet to send from the highest priority queue until it’s empty, and then move on to the next priority. That way, we implement what’s called QoS (Quality of Service), where we care about high-priority traffic making it through the controller.

QoS itself is useful for various reasons. Besides the obvious prioritization of high-value streams for not-so-neutral networking, we can also imagine tagging management traffic as high-priority, to guarantee the ability to remotely access a machine under heavy traffic.

Other applications can be in the realm of Time Sensitive Networking, where it’s important that time-sensitive traffic gets sent right-away, using the high-priority queues as shortcuts to bypass the low-priority over-crowded queues.

NICs can use more advanced algorithms to pick the queue to de-queue the packet from, in a process called arbitration, to give low-priority traffic a chance to get sent even when high-priority traffic takes most of the bandwidth.

Typical algorithms range from strict priority-based FIFO to more flexible Weighted Round-Robin arbitration, to allow at least a few low-priority packets to leave the NIC.

In the case of mvneta, we’re limited in terms of what we can do in the receive path. Most of the work we’ve done focused on the transmit side.

Traffic Priorisation and Arbitration

Multi-queue support on the TX path is a three-fold process.

First, we have to know which packets are high-priority and which aren’t, by assigning a value to the skb->priority field.

There are several ways of doing so, using iptables, nftables, tc, socket parameters through SO_PRIORITY, or switching parameters. This is outside of the scope of this article, we’ll assume that incoming packets are already prioritized through one of the above mechanisms.

Once the packet comes into the network stack as a skb, we need to assign it to a TX queue. That’s where tc-mqprio comes in play.

In that second step, we build a mapping between priorities and queues. This mapping is done through the use of an intermediate mapping, the Traffic Classes. A Traffic Class is a mapping between N priorities and M transmit queues. We therefore have 2 levels of mappings to configure :

  1. The priority to Traffic Class mapping (N to 1 mapping)
  2. The Traffic Class to TX queue mapping (1 to M mapping)

All of this is done in one single tc command. We’re not going to dive too much into the tc tool itself, but we’ll still see a bit how tc-mqprio works.

Here’s an example :

tc qdisc add dev eth0 parent root handle 100 mqprio num_tc 8 map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 1

Let’s break this down a bit.

Queuing Disciplines (QDiscs) are algorithms you use to select how and when you enqueue and dequeue traffic on a NIC. It benefits from great software support, but can also be offloaded to hardware, as we’ll see.

So the first part of the command is about attaching the mqprio QDisc to the network interface :

tc qdisc add dev eth0 parent root handle 100 mqprio

After that, we configure the traffic classes. Here, we create 8 traffic classes :

num_tc 8

And then we map each class to a priority. In this list, the n-th element corresponds to the class you want to assign to priority n. Here, we map the 8 first priorities to a dedicated class. Priorities that aren’t assigned a class are mapped to the class 0.

map 0 1 2 3 4 5 6 7

Finally, we map each class to a queue. Classes can only be assigned a contiguous set of queues, so the only parameters needed for the mapping are the number of queues assigned to the class (the number before the @), and the offset in the queue list (after the @). We specify one mapping per class, the m-th element in the list being the mapping for class number m. In the following example, we simply assign one queue per traffic class :

queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7

Under the hood, tc-mqprio will actually assign one QDisc per queue and map the Traffic Classes to these QDiscs, so that we can still hook other tc configurations on a per-queue basis.

Finally, we enable hardware offloading of the prioritization :

hw 1

The last part of the work consists in configuring the hardware itself to correctly prioritize each queue. That’s done by the NIC driver, which gets the configuration from the tc hooks.

If we take a look at the Hardware, we see that the queues are exposed to the kernel, which will enqueue packets into them. Each queue then has a Bandwidth Limiter, followed by the arbitration layer. Finally, we have one last global Rate limiter, and the path then continues to the egress blocks of the controller.

The arbiter configuration is easy, it’s just a matter of enabling the strict priority arbitration in a few registers.

Let’s summarize what the stack looks like :

Traffic shaping

Being able to prioritize outgoing traffic is good, but we can step-it up by allowing to limit the rate on each queue. That way, we can neatly organize and control how we want to use our bandwidth, not on a per-packet level but really on a bits/s basis.

Controlling the rate of individual flows or queues is called Traffic Shaping. When configuring a shaper, not only do we focus on the desired max rate, but also how we deal with traffic bursts. We can smooth them by tightly controlling how and when we send packets, or we can allow some burstiness by being more lenient on how often we enforce the rate limitation.

In the mvneta hardware, we have 2 levels of shaping : one level of per-queue shapers, and one per-port shaper. They can all be controlled semi-independently, some parameters being shared between all shapers.

With mqprio, the shapers are configured on a per-Traffic-Class basis. Since the hardware only allows per-queue shaping, we enforce in the driver that one traffic class is assigned to only one queue.

A final configuration with mqprio looks like this :

Most shapers use a variant of the Token Bucket Filter algorithm. The principle is the following :

Each queue has a Token Bucket, with a limited capacity. When a packet needs to be sent, one token per bit in the packet gets taken from the bucket. If the bucket is empty, transmission stops. The bucket then gets refilled at a configurable rate, with a configurable amount of tokens.

Configuring the TBF for each queue boils down to setting a few parameters :

– The Bucket Size, sometimes called burst, buffer or maxburst.

It should be big enough to accommodate for the shaping rate required, but not too big. If the bucket is too big and you have a very slow traffic going out, the bucket is going to fill up to its full size. When you suddenly have a lot of traffic to send, you’ll first have a huge burst, the time for the bucket to empty, before the traffic starts to actually get limited. Unlike the tc-tbf interface, tc-mqprio doesn’t allow to change the burst size, so it’s up to the driver to set it.

– The Refill Rate, sometimes called tick, defining how often you refill the bucket.
– The Refill Value, defining how many tokens you put back into the bucket at each refill.

These two, combined together, define the actual rate limitation. The approach taken to select the value is to select a fixed value for one of the parameters, and vary the other one to select the desired rate. In the case of mvneta, we chose to use a fixed refill rate, and change the refill value. This means that we have a limited resolution in the rates we can express. In our case, we have a 10kbps resolution, allowing us to cover any rate limitation between 10kbps and 5Gbps, with 10k increments.

– One final parameter is the minburst or MTU parameter, and controls the minimum amount of tokens required to allow packet transmission.
Since transmission stops when the bucket is empty, we can end-up with an empty bucket in the middle of a packet being transmitted.The link-partner may not be too happy about that, so it’s common to set the Maximum Transmission Unit as the minimum amount of tokens required to trigger transmission.

That way, we are sure that when we start sending a packet, we’ll always have enough tokens in the bucket to send the full packet. We can play a bit with this parameter if we want to buffer small packets and send them in a short burst when we have enough. In our case, we simply configured it as the MTU, which is good enough for most cases.

In the end, the gains are not necessarily measurable in terms of raw performances, but thanks to mqprio and traffic shaping, we can now have better control on the traffic that comes out of our interface.

An example of configuration would be :

# Shaping and priority configuration
tc qdisc add dev eth0 parent root handle 100 mqprio \
num_tc 8 map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
hw 1 mode channel shaper bw_rlimit \
max_rate 1Gbit 500Mbit 250Mbit 125Mbit 50Mbit 25Mbit 25Mbit 25MBit

# Assign HTTP traffic a priority of 1, to limit it to 500Mbps
iptables -t mangle -A POSTROUTING -p tcp --dport 80 -j CLASSIFY \
--set-class 0:1

There are tons of other use-cases, for example limiting per-vlan speeds, or in contrary making sure that traffic on a specific vlan has the highest priority, and all of that mostly offloaded to the hardware itself !

Bootlin contributions to Linux 5.14 and 5.15

It’s been a while we haven’t posted about Bootlin contributions to the Linux kernel, and in fact missed both the Linux 5.14 and Linux 5.15 releases, which we will cover in this blog post.

Linux 5.14 was released on August 29, 2021. The usual KernelNewbies.org page and the LWN articles on the merge window (part 1 and part 2) provide the best summaries of the new features and hardware support offered by this release.

Linux 5.15 on the other hand was released on November 1, 2021. Here as well, we have a great KernelNewbies.org article and LWN articles on the merge window (part 1 and part 2).

In total for those two releases, Bootlin contributed 79 commits, in various areas:

  • Alexandre Belloni, as the RTC subsystem maintainer, contributed 9 patches improving various aspects of RTC drivers, the RTC subsystem or Device Tree bindings for RTC
  • Clément Léger contributed a small improvement to the at_xdmac driver used on Microchip ARM platforms
  • Hervé Codina enabled Ethernet support on the old ST Spear320 SoC, by leveraging the existing stmmac Ethernet controller driver
  • Maxime Chevallier fixed a small issue with the Ethernet PHY on the i.MX6 Solidrun system-on-module
  • Miquèl Raynal added support for NV-DDR timings in the MTD susbsystem. This allows to improve performance with NAND flash memories that support those timings. Their usage is specifically implemented in the Arasan NAND controller driver, which Miquèl contributed back in Linux 5.8. See our previous blog post on this topic for more details
  • Miquèl Raynal added support for yet another NAND controller driver, the ARM PL35x, which is used for example on Xilinx Zynq 7000. See our previous blog post on this topic.
  • Miquèl Raynal added support for NAND chips with large pages (larger than 4 KB) to the OMAP GPMC driver.
  • Miquèl Raynal made a few fixes to the IIO driver for the max1027 ADC.
  • Paul Kocialkowski contributed a few patches to enable usage of the Hantro video decoder driver on the Rockchip PX30 processor.
  • Thomas Perrot contributed one patch to enable usage of the Flex Timers on i.MX7, and one to fix an issue in the PL022 SPI controller driver.

And now, as usual the complete list of our contributions to Linux 5.14 and 5.15:

Initial Allwinner V3 ISP support in mainline Linux

Introduction

Several months ago, Bootlin announced ongoing work on MIPI CSI-2 support for the Allwinner A31/V3 and A83T platforms in mainline Linux, as well as support for the Omnivision OV8865 and OV5648 image sensors. This effort has been a success and while the sensor patches were already integrated in mainline Linux since, the MIPI CSI-2 controller patches are on their way towards inclusion.

Since then we had the opportunity to take things further and start tackling the next steps for advanced camera support in mainline Linux on Allwinner SoCs!

With MIPI CSI-2 support and proper sensor drivers available in V4L2, we were able to capture raw bayer data provided by the sensors. But this data does not constitute the final picture that can be displayed or encoded into a file: a number of enhancement and transformation steps are required to achieve a visually-pleasing result that users typically expect.

These steps are quite calculation-intensive and it does not make sense to implement them with a software pipeline, especially with rates of 25, 30 or even 60 frames per second that are typically expected for video recording.

An open-source and upstream driver for the Allwinner ISP

Allwinner SoCs that support MIPI CSI-2 also include an Image Signal Processor hardware unit, a dedicated accelerator for enhancing and transforming raw data received from sensors.

Allwinner ISP features
Features of the ISP as described in the Allwinner V3 datasheet.

Since support for this ISP was implemented using a non-free blob in Allwinner SDKs, this area remained unsupported in mainline Linux… Until now!

Thanks to some help from Allwinner, we were able to implement a proper V4L2 driver for the Allwinner ISP found in the Allwinner V3, completely open-source, with no binary blob involved. This work was recently submitted upstream, with a first revision totaling more than 8000 new lines of code, which comes together with a significant rework of the Allwinner camera interface driver to make it usable with or without the ISP, and including the new MIPI CSI-2 support which we had submitted previously. We are very happy to keep contributing to advancing fully open-source Allwinner SoCs support in mainline Linux and help tackle some of the remaining areas there!

Our currently proposed driver for the Allwinner ISP only supports a limited set of features: debayering with coefficients and 2D noise filtering. These features were sufficient for our use case, and allowed to offload the computationally intensive debayering process to a dedicated hardware accelerator.

As the driver for now relies on a specific user-space API that does not yet cover all aspects of the ISP, the driver was submitted to the Linux kernel staging area and will probably stay there until all ISP features are properly described.

Our work on this advanced camera support, including the ISP driver, has been described in the talk we have given earlier this week at the Embedded Linux Conference, for which the slides are already available.

Advanced camera support on Allwinner SoCs with Mainline Linux

Additional features and future work

The Allwinner ISP supports a lot more features beyond just debayering and noise filtering. For example, it supports statistics to implement 3A algorithms (auto-focus, auto-exposition and auto-white-balance) which are necessary to avoid manual configuration of scene-specific parameters. These could typically be implemented in libcamera, the community free software project that supports complex image capture pipelines and ISPs.

As a result Bootlin would be very interested to continue this work and bring this driver to a more advanced state. So if you have a project that could help move this topic forward, do not hesitate to contact us about it!

Bootlin contributions to Linux 5.13

After finally publishing about our Linux 5.12 contributions and even though Linux 5.14 was just released yesterday, it’s hopefully still time to talk about our contributions to Linux 5.13. Check out the LWN articles about the merge window to get the bigger picture about this release: part 1 and part 2.

In terms of Bootlin contributions, this was a much more quiet release than Linux 5.12, with just 28 contributions. The main highlights are:

  • The usual round of RTC subsystem updates from its maintainer Alexandre Belloni
  • A large amount of improvements in the MTD subsystem by its co-maintainer Miquèl Raynal, continuing his effort to improve the ECC handling in the MTD subsystem. See Miquèl’s talk at ELCE 2020 for more details on this effort: slides and video.
  • A small fix for an annoying regression in the musb USB gadget controller driver.

Even though we contributed just 28 commits to this release, as maintainers, some of us also reviewed and merged code from other contributors: Miquèl Raynal as the MTD co-maintainer merged 63 patches, Alexandre Belloni merged 22 patches, and Grégory Clement 6 patches.

Here are the details of our contributions to Linux 5.13: