The RISC-V ISA, on which Cartesi Machines are based, consists of a minimal 32-bit integer instruction set to which several extensions can be added. The standard defines a privileged architecture with features commonly used by modern operating systems, such as multiple privilege levels, paged-based virtual-memory, timers, interrupts, exceptions and traps, etc. Implementations are free to select the combination of extensions that better suit their needs.
The Cartesi Machine architecture can be separated into a processor and a board. The processor performs the computations, executing the traditional fetch-execute loop while maintaining a variety of registers. The board defines the surrounding environment with an assortment of memories (ROM, RAM, flash) and a number of devices. To make verification possible, a Cartesi Machine maps its entire state to the physical address space in a well-defined way. This includes the internal states of the processor, the board, and of all attached devices. The contents of the address space therefore completely define the Cartesi Machine. Fortunately, this modification does not limit the operating system or the applications it hosts in any significant way.
Both the processor and board are implemented in the emulator. A full description of the RISC-V ISA is out of the scope of this documentation (See the volumes I and II of the specification for details.) This section describes Cartesi's RISC-V architecture, the modifications made to support verification, the devices supported by the emulator, and the process the machine follows to boot the Linux kernel.
Following RISC-V terminology, Cartesi Machines implement the RV64IMASU ISA. The letters after RV specify the extension set. This selection corresponds to a 64-bit machine, Integer arithmetic with Multiplication and division, Atomic operations, as well as the optional Supervisor and User privilege levels. In addition, Cartesi Machines support the Sv39 mode of address translation and memory protection. Sv39 provides a 39-bit protected virtual address space, divided into 4KiB pages, organized by a three-level page table. This set of features creates a balanced compromise between the simplicity demanded by a blockchain implementation and the flexibility expected from off-chain computations.
There are a total of 99 instructions, out of which 28 simply narrow or widen, respectively, their 64-bit or 32-bit counterparts. This being a RISC ISA, most instructions are very simple and can be emulated in a few lines of high-level code. In contrast, the x86 ISA defines at least 2000 (potentially complex) instructions. In fact, the only complex operation in RISC-V is the virtual-to-physical address translation. Instruction decoding is particularly simple due to the reduced number of formats (only 4, all taking 32-bits).
The entire processor state fits within 512 bytes, which are divided into 64 registers, each one holding 64-bits. It consists of 32 general-purpose integer registers and 26 control and status registers (CSRs). Most of these registers are defined by the RISC-V ISA; the remaining are Cartesi-specific. The processor makes its entire state available, externally and read-only, by mapping individual registers to the lowest 512 bytes in the physical address space (in the processor shadow). The adjacent 1.5KiB are reserved for future use. The entire mapping is given in the following table:
Offset | Register | Offset | Register | Offset | Register | Offset | Register |
---|---|---|---|---|---|---|---|
0x000 | x0 | 0x120 | mcycle | 0x160 | misa | 0x1a0 | sepc |
0x008 | x1 | 0x128 | minstret | 0x168 | mie | 0x1a8 | scause |
… | … | 0x130 | mstatus | 0x170 | mip | 0x1b0 | stval |
0x0f8 | x31 | 0x138 | mtvec | 0x178 | medeleg | 0x1b8 | satp |
0x100 | pc | 0x140 | mscratch | 0x180 | mideleg | 0x1c0 | scounteren |
0x108 | mvendorid | 0x148 | mepc | 0x188 | mcounteren | 0x1c8 | ilrsc |
0x110 | marchid | 0x150 | mcause | 0x190 | stvec | 0x1d0 | iflags |
0x118 | mimpid | 0x158 | mtval | 0x198 | sscratch |
The only generally relevant standard register is mcycle
.
Since its value is advanced at every CPU cycle, it can be used to identify a particular step in the computation being performed by a Cartesi Machine.
This is a key component of the verification process, and can also be used to bound the amount of computation.
The registers whose names start with “i
” are Cartesi additions, and have the following semantics:
Bits | 63–5 | 4–3 | 2 | 1 | 0 |
---|---|---|---|---|---|
Field | Reserved | PRV | Y | I | H |
Bit PRV
gives the current privilege level (0 for User, 1 for Supervisor, and 3 for Machine), bit Y
is set to 1 when the processor has yielded control back to the host, bit I
is set to 1 when the processor is idle (i.e., waiting for interrupts), bit H
is set to 1 to signal the processor has been permanently halted.
ilrsc
holds the reservation address for the LR/SC atomic memory operations;The interaction between board and processor happens through interrupts and the memory bus. Devices are mapped to the processor's physical address space. The mapping can be seen in the following table:
Physical address | Mapping |
---|---|
0x00000000–0x000003ff | Processor shadow |
0x00000800–0x00000Bff | Board shadow |
0x00001000–0x00010fff | ROM (Bootstrap & Devicetree) |
0x02000000–0x020bffff | Core Local Interruptor |
0x40000000–0x40007fff | Host-Target Interface |
0x80000000–configurable | RAM |
configurable | Flash drive 0 |
… | … |
configurable | Flash drive 7 |
There are 60KiB of ROM starting at address 0x1000
, where execution starts by default.
The amount of RAM is user-configurable, but always starts at address 0x80000000
.
Finally, a number of additional physical memory ranges can be set aside for flash-memory devices.
These will typically be preloaded with file-system images, but can also hold raw data.
The board maps two non-memory devices to the physical address space: CLINT and HTIF.
The Core Local Interruptor (or CLINT) controls the timer interrupt.
The active addresses are 0x0200bff8
(mtime
) and 0x02004000
(mtimecmp
).
The CLINT issues a hardware interrupt whenever mtime
equals mtimecmp
.
Since Cartesi Machines must ensure reproducibility, the processor's clock and the timer are locked by a constant frequency divisor of 100
.
In other words, mtime
is incremented once for every 100 increments of mcycle
.
There is no notion of wall-clock time.
The Host-Target Interface (HTIF) mediates communication with the external world.
It is mapped to a physical memory starting at 0x40000000
, where registers can be accessed at the following offsets:
Offset | Register |
---|---|
0x000 | tohost |
0x008 | fromhost |
0x010 | ihalt |
0x018 | iconsole |
0x020 | iyield |
0x028 | Reserved |
… | … |
0x218 | Reserved |
The format of CSRs tohost
and fromhost
are as follows:
Bits | 63–56 | 55–48 | 47–0 |
---|---|---|---|
Field | DEV | CMD | DATA |
Interactions with Cartesi's HTIF device follow the following protocol:
fromhost
;tohost
;fromhost
.Cartesi's HTIF supports 3 subdevices:
DEV=0
) — To halt the machine, request CMD=0
, and DATA
containing bit 0 set to 1. (Bits 47–1 can be set to an arbitrary exit code.) This will permanently set bit H
in iflags
and return control back to the host.DEV=1
) — To output a character ch
to console, request CMD=1
, with DATA=ch
. To input a character from console (in interactive sessions), request CMD=0
, DATA=0
, then read response CMD=0
, DATA=ch+1
. (DATA=0
means no character was available);DEV=2
) — This is a Cartesi-specific subdevice.
To yield control back to the host and report progress permil
, request CMD=0
and DATA=permil
.
This will set bit Y
in iflags
until the emulator is resumed.Registers ihalt
, iconsole
, and iyield
are bit masks specifying the which commands are available for the respective devices.
The physical memory mapping is described by Physical Memory Attribute records (PMAs) that start at address 0x00000800
(the board shadow) .
Each PMA consists of 2 64-bit words.
The first word gives the start of a range and the second word its length.
Since the ranges must be aligned to 4KiB page boundaries,
the lowest 12-bits of each word are available for attributes.
The meaning of each attribute field is as follows:
Bits | 63–12 | 11–8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
---|---|---|---|---|---|---|---|---|---|---|
Field | start | DID | IW | IR | X | W | R | E | IO | M |
Bits | 63–12 | 11–0 | ||||||||
Field | length | Reserved (=0) |
The M
, IO
, and E
bits are mutually exclusive, and respectively mark the range as memory, I/O mapped, or excluded.
Bits R
, W
, and X
mark read, write, and execute permissions, respectively.
The IR
and IW
bits mark the range as idempotent for reads and writes, respectively.
Finally, the DID
gives the device id
(0 for memory ranges, 1 for the shadows, 2 for a flash drive, 3 for CLINT, and 3 for HTIF).
The list of PMA records ends with an invalid PMA entry for which length=0
.
By default, pc
starts at 0x1000
, pointing to the start of the ROM region.
Before control reaches the RAM image (and ultimately the Linux kernel), a small program residing in ROM builds a devicetree describing the hardware.
Cartesi's ROM image rom.bin
containing this program can be generated from the rom/
directory of the Cartesi Machine SDK.
To do so, it goes over the PMA entries identifying the devices and their locations in the physical address space.
It also looks for a null-terminated string, starting at the last 4k of the ROM region, that will be used as the command-line for the Linux kernel.
Once the devicetree is ready, the ROM program sets register x10
to 0 (the value of mhartid
), x11
to point to the devicetree (which it places at the end of the RAM region), and then jumps to RAM-base at address 0x80000000
.
This is where the entry point of the RAM image is expected to reside.
The dtc
command-line utility can be used to inspect the devicetree:
playground:~$ cartesi-machine -- "dtc -I dtb -O dts /sys/firmware/fdt"
The result is
. / \ / \\---/---\ /----\ \ X \ \----/ \---/---\ \ / CARTESI \ / MACHINE '/dts-v1/;/ { #address-cells = <0x02>; #size-cells = <0x02>; compatible = "ucbbar,riscvemu-bar_dev"; model = "ucbbar,riscvemu-bare"; cpus { #address-cells = <0x01>; #size-cells = <0x00>; timebase-frequency = <0x989680>; cpu@0 { device_type = "cpu"; reg = <0x00>; status = "okay"; compatible = "riscv"; riscv,isa = "rv64aimsu"; mmu-type = "riscv,sv39"; clock-frequency = <0x3b9aca00>; interrupt-controller { #interrupt-cells = <0x01>; interrupt-controller; compatible = "riscv,cpu-intc"; phandle = <0x01>; }; }; }; memory@80000000 { device_type = "memory"; reg = <0x00 0x80000000 0x00 0x3ff0000>; }; flash@8000000000000000 { #address-cells = <0x02>; #size-cells = <0x02>; compatible = "mtd-ram"; bank-width = <0x04>; reg = <0x80000000 0x00 0x00 0x3c00000>; linux,mtd-name = "flash.0"; }; soc { #address-cells = <0x02>; #size-cells = <0x02>; compatible = "ucbbar,riscvemu-bar-soc\0simple-bus"; ranges; htif@40008000 { compatible = "ucb,htif0"; reg = <0x00 0x40008000 0x00 0x1000>; interrupts-extended = <0x01 0x0d>; }; }; chosen { bootargs = "console=hvc0 rootfstype=ext2 root=/dev/mtdblock0 rw quiet mtdparts=flash.0:-(root) -- dtc -I dtb -O dts /sys/firmware/fdt"; };};HaltedCycles: 64177685
The memory@80000000
section describes 64MiB of RAM starting at address 0x80000000
.
The flash@8000000000000000
describes flash drive 0: a memory region of 60MiB, starting at address 0x8000000000000000
, under the control of the mtd-ram
driver, with name flash.0
.
This will eventually become available as block device /dev/mtdblock0
.
Finally, section chosen
includes the bootargs
string that will be used as the kernel command-line parameters.
Notice the specification of the root file-system pointing to /dev/mtdblock0
, i.e., flash.0
, and the mtdparts
giving it label root
.
Also notice the command dtc -I dtb -O dts /sys/firmware/fdt
coming directly from the cartesi-machine
command line.
Linux support for RISC-V is upstream in the Linux kernel archives.
The kernel runs in supervisor mode, on top of a Supervisor Binary Interface (SBI) provided by a machine-mode shim: the Berkeley Boot Loader (BBL).
The BBL is linked against the Linux kernel and this resulting RAM image is preloaded into RAM.
Cartesi's RAM image linux.bin
can be generated from the kernel/
directory of the Cartesi Machine SDK.
The SBI provides a simple interface through which the kernel interacts with CLINT and HTIF.
Besides implementing the SBI, the BBL also installs a trap that catches invalid instruction exceptions.
This mechanism can be used, for example, to emulate floating-point instructions, although it is more efficient to setup the target toolchain to replace floating point instructions with calls to a soft-float implementation instead.
After installing the trap, BBL switches to supervisor mode and cedes control to the kernel entry point.
After completing its own initialization, the kernel mounts the root file-system and eventually cedes control to /sbin/init
.
Cartesi's root file-system rootfs.ext2
can be generated from the fs/
directory in the Cartesi Machine SDK.
The Cartesi-provided /sbin/init
script scans all flash devices /dev/mtdblock1
–/dev/mtdblock7
for valid file-systems.
When a file-system is found, the script obtains the corresponding <label>
(set in the mtdparts
kernel command-line parameter) by inspecting /sys/block/mtdblock*/device/name
and mounts the filesystem at /mnt/<label>
.
The kernel passes to /sbin/init
as command-line parameters all arguments after the separator --
in the bootargs
string it found in the devicetree.
The Cartesi-provided /sbin/init
script concatenates all arguments into a string and executes the command in this string in a shell.
When the shell returns, /sbin/init
unmount all file-systems and gracefully halts the machine.
There will be a section on the yield device driver, the ioctl
, and how the /opt/cartesi/bin/yield
utility uses it.