QSOE/N, part 3: first boot on the HiFive Unmatched
Part 2 left QSOE/N as a working operating system that had only ever run in QEMU. This post is the jump to real silicon — a SiFive HiFive Unmatched (FU740) on my desk — and the five obstacles between "boots fine in the emulator" and an interactive shell on real hardware.
The recurring theme is that QEMU is generous in ways the FU740 is not. Every fix below is a thing the emulator either did for me silently or didn't model at all. Two of them I had already paid for once, on the same board, during the QRV port; the commits cite the QRV fixes by hash. That's the value of having ported QNX to RISC-V first: the bring-up obstacles on this board are not a surprise, they're a checklist.
1. The image won't load: boot-image header
U-Boot's booti refused the flat binary outright — Bad Linux RISCV Image magic!. The same first obstacle QRV hit on the same board. RISC-V's Image protocol wants a specific header at the front of the kernel: a code0 that jumps over the header to the real entry, a text_offset asking for RAM base + 2 MiB, an image_size spanning through .bss so the loader reserves the whole footprint, and both magic values in place.
head.S now opens with that header. The constants live in a fresh <skimmer/bootimage.h>, written from the published spec — QRV's equivalent header is GPL-2.0 Linux-derived and therefore not borrowable into an Apache-2.0 tree. QEMU is unaffected: its ELF loader enters at _start and takes the same code0 jump.
2. The first panic: PCI is optional
First boot with a loadable image: header accepted, kernel entered, SBI console alive, FDT walked — then a hard PANIC at no PCI host bridge found. The FU740 carries a DesignWare controller (sifive,fu740-pcie) the walker doesn't recognise, not QEMU virt's generic ECAM bridge. Either way, a missing or unfamiliar PCI bridge is no reason to stop booting a microkernel.
So the FDT walker now warns instead of panicking and leaves the PCI info zeroed. The platform catalog emits an all-zero ECAM tag (base 0 = no PCI), the page-mapper skips the ECAM mapping, and the userspace PCI server is the one that fails loudly — at MAP_PHYS, where it belongs — rather than taking the kernel down at boot. The DesignWare driver itself (iATU programming) is a later rung; QRV's is the reference for when I get there.
3. First boot to userspace: physmem, the identity map, and CPU topology
With PCI demoted to optional, the kernel got far enough to expose three more QEMU-isms, all in one enablement commit.
Declarative physmem. The old build had a compile-time PMAP_RAM_END_PA constant baked to QEMU's 256 MiB geometry. The Unmatched has 16 GiB, and U-Boot may relocate the ramdisk anywhere in the bank. So physical memory is now declarative: a hardware-region table plus an exclusion table (FreeBSD's shape) populated from a real inventory (QRV's: firmware shadow, the FDT blob, the header memreserve block, /reserved-memory children, the initrd). fdt_init registers the actual memory@ bank; the page-pool window is derived from real RAM bounds with exclusion overlaps pre-marked in the bitmap. The pool bitmap grew to cover 16 GiB.
Identity map restricted to the kernel image. The boot page table's low half belongs to taskman's U-mode layout — image at 3 GiB, initrd window above it. Identity-mapping all of RAM only looked fine while RAM ended below those VAs; 16 GiB paved straight over them, and the taskman load hit a RAM megapage. The identity map now exists only for the satp transition window (the PC and boot stack run on physical addresses inside the image span during the switch); canonical-high covers the full bank for everything after paging is enabled. This is exactly why the PA→canonical-high sweep from part 2 had to land first.
Runtime CPU topology. The FU740 is asymmetric. Hart 0 is the S7 monitor core — no MMU, no S-mode — yet it answers SBI HSM and occupies PLIC context 0. A blind 0..N hart loop tries to bring it up as a scheduler hart and tries to route interrupts through a context that isn't the one you think it is. The walker now visits cpu@ nodes and treats a hart as usable only if it has an mmu-type that isn't riscv,none (QRV's filter); bring-up and the watchdog iterate the discovered set. PLIC S-contexts resolve through an interrupt-controller phandle → hartid table instead of ordinal position — ordinal equals hartid only on fully symmetric parts, and QRV paid for that off-by-one so this one doesn't. The old MAXCPU define became a Kconfig CONFIG_MAX_CPU_COUNT that sizes the per-hart tables; the FDT decides who actually runs.
After this commit: four U74 harts online, taskman in U-mode, /sbin/init spawned, dispatch loop entered — on real hardware, boot chain U-Boot booti → OpenSBI → Skimmer. That is genuinely most of the way there. It is also where it stalled, because "dispatch loop entered" and "a shell prompt" turned out to be separated by two more hardware-only bugs.
4. The two fixes that finished it: fence.i and hart placement
This is the part QEMU hides completely, and the reason this post exists as its own thing.
fence.i discipline. RISC-V maintains no coherence between data stores and instruction fetch. If you write code into memory and then jump to it without an instruction-fence, a hart may fetch whatever its instruction cache happened to be holding for those physical lines. Skimmer executed no fence.i anywhere. It didn't have to: QEMU models no instruction cache, so stale fetches never happen there. On the U74 it was fatal and silent — a program loaded into recycled pages and dispatched to a hart still caching the previous tenant's lines fetches garbage. It swallowed /sbin/init without a trace. QRV 8d42587b had fixed the identical fingerprint on the identical board.
The fix is a cpu_icache_sync_all() — a local fence.i plus an SBI RFENCE remote fence.i broadcast to the other harts — fired at the three boundaries where QSOE/N writes code and then runs it: the tail of the page-mapper's user-blob copy, taskman's VSpace transplant (the U-mode-memcpy'd text moving into a child), and the loader's W→X step where a region gains PROT_EXEC. Cross-hart I-cache invalidation is now a first-class primitive, not an afterthought.
U-thread placement on online harts. sys_thread_create round-robined over all CONFIG_MAX_CPU_COUNT pcpu slots. On QEMU all eight slots have a scheduler behind them, so this is invisible. On the FU740 half of them don't — the S7 monitor core, and hart ids beyond the board's population — and a thread parked on a slot with no scheduler never runs. That is exactly how the first hardware /sbin/init vanished while the same code placed it fine on QEMU. The placement cursor now skips harts absent from the online mask; the loop terminates by construction because the calling hart is, necessarily, online.
Those two fixes carried it from "dispatch loop entered" to an interactive qsh on /dev/ser1. The QEMU regression suite stayed green throughout, including at -m 16G — a desk-side memory-double of the Unmatched.
The boot log
Here is the thing itself, lightly trimmed:
================================================
QSOE/N microkernel ("Skimmer") v0.11
================================================
fdt: no PCI host bridge -- PCI surface disabled
...
[hart 1] primary up (Sv39 high-half, PLIC)
Bringing up secondary harts via SBI HSM...
[hart 4] secondary up (Sv39, PLIC)
[hart 2] secondary up (Sv39, PLIC)
[hart 3] secondary up (Sv39, PLIC)
All harts online.
Spawning taskman U-mode thread...
Handing primary hart over to LWKT scheduler.
================================================
QSOE/N taskman starting
================================================
...
spawn: '/sbin/init' entry=0x15166 ...
[init] starting slogger...
[slogger] /dev/slog registered (chid=65540, ring=65536 bytes)
[init] starting pci-server...
[pci-server] ecam_init failed
exit: pid=4 vsh=3 dropped=0
[init] starting devc-sersifive...
mmap: MAP_PHYS pa=0x10010000 -> va=0x20000000 len=0x1000 vsh=3
[devc-sersifive] SiFive UART initialized @ vaddr 0x20000000
pathmgr: register '/dev/ser1' pid=5 chid=65543 vsh=3 rc=0
[init] repointing /dev/console -> /dev/ser1...
spawn: '/bin/qsh' entry=0x15166 ...
[/]# echo "Hello, Claude Code Opus 4.8, from the Real Hardware running QSOE/N!"
Hello, Claude Code Opus 4.8, from the Real Hardware running QSOE/N!
[/]#
A few things to read out of it. The banner says v0.11 because that's what git describe stamped when the image was built; the fence.i and placement fixes are the commits right after. The "no PCI host bridge — PCI surface disabled" line is obstacle 2 working as designed, and [pci-server] ecam_init failed followed by a clean exit is the userspace half failing loudly without taking anything down. devc-sersifive is new — on real hardware the console can't ride SBI forever, so the SiFive UART driver maps the FU740's UART at 0x10010000 as an ordinary resource manager, registers /dev/ser1, and init repoints /dev/console at it. From that point the kernel is no longer speaking; userspace owns the console. And then qsh — the shell, the same one that runs in QEMU — reaches a prompt and runs a command. The everything-above-the-arch-layer half of QSOE/N did not change for any of this; it never knew it had moved to real silicon.
The echo line is the one human touch. Skimmer and QSOE/N were built with Claude Code as co-author across every milestone in these three posts — Opus 4.7 through v0.5 and most of the userspace, 4.8 for the hardware bring-up. The first thing I typed into a shell on the real board was a hello back.
Where that leaves it
QSOE/N runs on real RISC-V silicon: four U74 harts, a from-scratch microkernel, a procnto-style system process, userland drivers as resource managers, and a POSIX shell on a hardware UART. The bring-up checklist that QRV wrote in blood — image header, optional PCI, declarative physmem, asymmetric topology, fence.i — held.
What's next on this board is the DesignWare PCIe driver and devb-nvme, so the storage and PCI stacks that already work in QEMU come up on hardware too. And the other half of the umbrella: QSOE/L, the same userspace on seL4. That's the next post.
Comments
Post a Comment