RISC-V

From Hidden Wiki
Jump to navigation Jump to search

Template:Advert Template:Use dmy dates Template:Infobox CPU architecture

File:Yunsup Lee holding RISC V prototype chip.jpg
RISC-V processor prototype, January 2013

RISC-V (pronounced "risk-five") is an open-source instruction set architecture (ISA) based on established reduced instruction set computing (RISC) principles.

In contrast to most ISAs, the RISC-V ISA is free and open-source and can be used royalty-free for any purpose, permitting anyone to design, manufacture and sell RISC-V chips and software. While not the first open architecture[1] ISA, it is significant because it is designed to be useful in a wide range of devices. The instruction set also has a substantial body of supporting software, which avoids a usual weakness of new instruction sets.

The project began in 2010 at the University of California, Berkeley, but many contributors are volunteers and industry workers outside the university.[2]

The RISC-V ISA has been designed with small, fast, and low-power real-world implementations in mind,[3][4] but without over-architecting for a particular microarchitecture style.[4][5][6][7]

As of May 2017, version 2.2 of the userspace ISA is fixed and the privileged ISA is available as draft version 1.10.[4]

Significance

The RISC-V authors aim to provide several CPU designs freely available under a BSD license. Such licenses allow derivative works, such as RISC-V chip designs, to be either open and free, like RISC-V itself, or closed and proprietary.

By contrast, commercial chip vendors such as ARM Holdings and MIPS Technologies charge substantial license fees for the use of their patents.[8][9] They also require non-disclosure agreements before releasing documents that describe their designs' advantages and instruction set. The secrecy prevents security auditing.Template:Citation needed

Developing a CPU requires design expertise in several specialties: electronic logic, compilers, and operating systems. It is rare to find this outside of a professional engineering team. The result is that modern, high-quality general-purpose computer instruction sets have not recently been widely available anywhere, or even explained, except in academic settings. Because of this, many RISC-V contributors see it as a unified community effort. This need for a large base of contributors is part of the reason why RISC-V was engineered to fit so many uses.

The RISC-V authors also have substantial research and user-experience validating their designs in silicon and simulation. The RISC-V ISA is a direct development from a series of academic computer-design projects. It was originated in part to aid such projects.[4][10]

History

Predecessors

The term "RISC" dates from about 1980.[11] Before this, there was some knowledge that simpler computers could be effective, but the design principles were not widely described. Simple, effective computers have always been of academic interest.

Academics created the RISC instruction set DLX for the first edition of Computer Architecture: A Quantitative Approach in 1990. David Patterson was an author, and later assisted RISC-V. However DLX was for educational use. Academics and hobbyists implemented it using field-programmable gate arrays, but it was not a commercial success.

ARM CPUs, version 2 and earlier, had a public-domain instruction set, and it is still supported by GCC, a popular free software compiler. Three open-source cores exist for this ISA, but they have not been manufactured.[12][13]

OpenRISC is an open-source ISA based on DLX, with associated RISC designs. It is fully supported with GCC and Linux implementations. However, it has few commercial implementations.

Foundation

Krste Asanović at University of California, Berkeley, found many uses for an open-source computer system. In 2010 he decided to develop and publish one in a "short, three-month project over the summer". The plan was to help both academic and industrial users.[10] David Patterson at Berkeley also aided the effort. Patterson originally identified the properties of Berkeley RISC,[11] and RISC-V is one of his long series of cooperative RISC research projects. Early funding was from DARPA.[4]

A partial list of organizations that support the RISC-V Foundation includes: AMD,[14] Andes Technology,[15] BAE Systems, Berkeley Architecture Research, Bluespec, Inc., Cortus, Google, GreenWaves Technologies, Hewlett Packard Enterprise, Huawei, IBM, Imperas Software, ICT, IIT Madras, Lattice Semiconductor, Mellanox Technologies, Microsemi, Micron, Nvidia, NXP, Oracle, Qualcomm, Rambus Cryptography Research, Western Digital, and SiFive.[16][17]

Awards

  • 2017: The Linley Group's Analyst's Choice Award for Best Technology (for the instruction set)[18]

Motivating requirements

The designers claim that the instruction set is the main interface in a computer, because it lies between the hardware and the software. If a good instruction set was open, available for use by all, it should dramatically reduce the cost of software by permitting far more reuse. It should also increase competition among hardware providers, who can use more resources for design and less for software support.[10]

The designers assert that new principles are becoming rare in instruction set design, as the most successful designs of the last forty years have become increasingly similar. Of those that failed, most did so because their sponsoring companies failed commercially, not because the instruction sets were poor technically. So, a well-designed open instruction set designed using well-established principles should attract long-term support by many vendors.[10]

Most prior open ISAs used the GNU General Public License (GPL), encouraging users to open their implementations to copying and use by others.

Unlike other academic designs which are optimized only for simplicity of exposition, the designers state that the RISC-V instruction set is for practical computers. It is said to have features to increase computer speed, yet reduce cost and power use. These include a load/store architecture, bit patterns to simplify the multiplexers in a CPU, simplified standards-based floating-point, a design that is architecturally neutral, and placing most-significant bits at a fixed location to speed sign-extension. Sign extension is said to often be on the critical timing path.[4]

The instruction set is designed for a wide range of uses. It supports three word-widths, 32, 64, and 128 bits, and a variety of subsets. The definitions of each subset vary slightly for the three word-widths. The subsets support small embedded systems, personal computers, supercomputers with vector processors, and warehouse-scale rack-mounted parallel computers.

The instruction set is variable-width and extensible, so that more encoding bits can always be added. Space for the 128-bit stretched version of the ISA was reserved, because 60 years of industry experience has shown that the most unrecoverable error in instruction set design is a lack of memory address space. Template:As of, the 128-bit ISA remains undefined intentionally, because there is yet so little practical experience with such large memory systems.[4]

However, RISC-V also supports the designers' academic uses. The simplicity of the integer subset permits basic student exercises. The integer subset is a simple ISA enabling software to control research machines. The variable-length ISA enables extensions for both student exercises and research.[4] The separated privileged instruction set permits research in operating system support, without redesigning compilers.[19] RISC-V's open intellectual property allows its designs to be published, reused, and modified.[4]

Software

A normal problem for a new instruction set is a lack of CPU designs and software.

The RISC-V website has a specification for user-mode instructions, and a preliminary specification for a general-purpose privileged instruction set, to support operating systems.[20]

There are several open-sourced CPU designs, including the 64-bit Berkeley Out of Order Machine (BOOM),[21] 64-bit Rocket,[22] five 32-bit Sodor CPU designs from Berkeley,[23] picorv32 by Clifford Wolf, scr1 from Syntacore, PULPino (Riscy and Zero-Riscy) from ETH Zurich / University of Bologna[24], and others. The three-stage Sodor CPU appears apt for a small embedded CPU. Rocket may suit compact, low-power intermediate computers such as personal devices. BOOM utilizes much of the infrastructure created for Rocket, and may be usable for personal, supercomputer and warehouse-scale computers. Both picorv and scr1 are 32-bit MCU class RV32IMC implementations in Verilog. The cores in PULPino implement a simple RV32IMC ISA for micro-controllers (Zero-Riscy) or a more powerful RV32IMFC ISA with custom DSP extensions for embedded signal processing.

The design software includes a design compiler, Chisel,[25] which can reduce the designs to Verilog for use in devices. The website includes verification data for testing core implementations.

Available RISC-V software tools include a GNU Compiler Collection (GCC) toolchain (with GDB, the debugger), an LLVM toolchain, the OVPsim simulator (and library of RISC-V Fast Processor Models), the Spike simulator, and a simulator in QEMU.

Operating system support exists for the Linux kernel, FreeBSD, and NetBSD, but the supervisor-mode instructions are unstandardized Template:As of,[19] so this support is provisional. The preliminary FreeBSD port to the RISC-V architecture was upstreamed in February 2016, and shipped in FreeBSD 11.0.[26][27] Ports of Debian[28]and Fedora[29] are stabilizing. There is a port of U-Boot.[30] UEFI Spec v2.7 has defined the RISC-V binding and a tianocore port has been done by HPE engineers and is expected to be upstreamed. There is a preliminary port of the seL4 microkernel.[31][32] There is also a simulator to run a RISC-V Linux system on a web browser using JavaScript.Template:Citation needed

Adopters

Commercial

  • SiFive, a company established specifically for developing RISC-V hardware, has two existing processor models:
  • Andes Technology Corporation is a founding member of the RISC-V Foundation,[39] joining the consortium in 2016, it released its first two RISC-V cores, the N25 and NX25 in 2017. The cores come with a complete design ecosystems and a number of RISC-V partners. Andes is actively driving the development of RISC-V ecosystem and expects to release several new RISC-V products in 2018.
  • Codasip and UltraSoC have developed fully supported intellectual property for RISC-V embedded SOCs that combine Codasip's RISC-V cores and other IP with UltraSoC's debug, optimization and analytics.[40]
  • Imperas has developed a family of fast processor models for the different subsets of RV32GC and RV64GC ISA variants that are part of the OVPsim instruction accurate simulator distributions used for embedded software development.
  • GreenWaves Technologies announced the availability of GAP8, a 32-bit 1 controller plus 8 compute cores, 32-bit SoC and developer board in February 2018. It is expected to begin shipment in April 2018.[41][42]
  • Hex Five announced general availability MultiZone Security - the first RISC-V Trusted Execution Environment utilizing the standard RISC-V ISA and privileged mode extensions.[43]

In development

Design

ISA base and extensions

RISC-V has a modular design, consisting of alternative base parts, with additional optional extensions. The ISA base and its extensions are developed in a collective effort between industry, the research community and educational institutions. The base specifies instructions (and their encoding), control flow, registers (and their sizes), memory and addressing, logic (i.e., integer) manipulation, and ancillaries. The base by itself can implement a simplified general-purpose computer, with full software support, including a general-purpose compiler.

The standard extensions are specified to work with all of the standard bases, and with each other without conflict.

Many RISC-V computers might implement the compact extension to reduce power consumption, code size, and memory usage.[4] There are also future plans for to support hypervisors, virtualization,[19] Together with a supervisor instruction set extension, S, an RVGC defines all instructions needed to conveniently support a Unix-style operating system.

Name Description Version StatusTemplate:Efn
Base
RV32I Base Integer Instruction Set, 32-bit 2.0 Template:Yes
RV32E Base Integer Instruction Set (embedded), 32-bit, 16 registers 1.9 Template:No
RV64I Base Integer Instruction Set, 64-bit 2.0 Template:Yes
RV128I Base Integer Instruction Set, 128-bit 1.7 Template:No
Extension
M Standard Extension for Integer Multiplication and Division 2.0 Template:Yes
A Standard Extension for Atomic Instructions 2.0 Template:Yes
F Standard Extension for Single-Precision Floating-Point 2.0 Template:Yes
D Standard Extension for Double-Precision Floating-Point 2.0 Template:Yes
G Shorthand for the base and above extensions Template:N/a Template:N/a
Q Standard Extension for Quad-Precision Floating-Point 2.0 Template:Yes
L Standard Extension for Decimal Floating-Point 0.0 Template:No
C Standard Extension for Compressed Instructions 2.0 Template:Yes
B Standard Extension for Bit Manipulation 0.36 Template:No
J Standard Extension for Dynamically Translated Languages 0.0 Template:No
T Standard Extension for Transactional Memory 0.0 Template:No
P Standard Extension for Packed-SIMD Instructions 0.1 Template:No
V Standard Extension for Vector Operations 0.2 Template:No
N Standard Extension for User-Level Interrupts 1.1 Template:No

Template:Noteslist

To tame the combinations of functionality that may be implemented, a nomenclature is defined to specify them.[4] The instruction set base is specified first, coding for RISC-V, the register bit-width, and the variant; e.g., RV64I or RV32E. Then follows letters specifying implemented extensions in canonical order (as above). The base, extended integer and floating point calculations, and synchronisation primitives for multi-core computing, the base and extensions MAFD, are considered to be necessary for general purpose computation, and thus have the shorthand, G.

A small 32-bit computer for an embedded system might be RV32EC. A large 64-bit computer might be RV64GC; i.e., shorthand for RV64IMAFDC.

A naming scheme with Zxxx for standard extensions and Yxxx for non-standard (vendor-specific) extensions has been proposed. For example, the Ztso extension for total store ordering, an alternative memory consistency model to weak memory ordering is under discussion.[53]

Register sets

RISC-V has 32 (or 16 in the embedded variant) integer registers, and, when the floating point extension is implemented, 32 floating-point registers. Except for memory access instructions, instructions address only registers.

The first integer register is a zero register, and the remainder are general purpose registers. A store to the zero register has no effect, and a read always provides 0. Using the zero register as a placeholder makes for a simpler instruction set. E.g., Template:Code becomes Template:Code.[4]

Control and status registers exist, but user-mode programs can access only those used for performance-measurement and floating point management.

There are no instructions to save and restore multiple registers. Those were thought to be unnecessary, too complex and perhaps too slow.[4]

Memory access

Like many RISC designs, RISC-V is a load/store architecture; instructions address only registers, with load and store instructions conveying to and from memory.[4]

Memory consists of and is addressed as 8-bit bytes, with words being in little-endian order.[4] Words, up to the register size, can be accessed with the load and store instructions.

Accessed memory addresses need not be aligned to their word-width, but accesses to aligned addresses will always be the fastest. As, for example, simple CPUs may implement unaligned accesses with slow software emulation driven from an alignment failure interrupt.[4]

RISC-V manages memory systems that are shared between CPUs or threads by ensuring a thread of execution always sees its memory operations in the programmed order. But between threads and I/O devices, RISC-V is simplified: It doesn't guarantee the order of memory operations, except by specific instructions, such as Template:Code.

A Template:Code instruction guarantees that the results of predecessor operations are visible to successor operations of other threads or I/O devices. Template:Code can guarantee the order of combinations of both memory and memory-mapped I/O operations. E.g. it can separate memory read and write operations, without affecting I/O operations. Or, if a system can operate I/O devices in parallel with memory, Template:Code doesn't force them to wait for each other. A single CPU with one thread may decode Template:Code as Template:Code.

Like many RISC instruction sets (and indeed some CISC instruction sets, such as x86 and IBM System/360 families), RISC-V lacks address-modes that write back to the registers. For example, it does not do auto-incrementing.[4]

RISC-V is little-endian to resemble other familiar, successful computers, for example, x86. This also reduces a CPU's complexity and costs slightly because it reads all sizes of words in the same order. For example, the RISC-V instruction set decodes starting at the lowest-addressed byte of the instruction. The specification leaves open the possibility of non-standard big-endian or bi-endian systems.[4]

Some RISC CPUs (such as MIPS, PowerPC, DLX, and Berkeley's RISC-I) place 16 bits of offset in the loads and stores. They set the upper 16 bits by a "load upper word" instruction. This permits upper-halfword values to be set easily, without shifting bits. However, most use of the upper half-word instruction makes 32-bit constants, like addresses. RISC-V uses a SPARC-like combination of 12-bit offsets and 20-bit "set upper" instructions. The smaller 12-bit offset helps compact, 32-bit load and store instructions select two of 32 registers yet still have enough bits to support RISC-V's variable-length instruction coding.[4]

Immediates

RISC-V handles 32-bit constants and addresses with instructions that set the upper 20 bits of a 32-bit register. Load upper immediate Template:Code loads 20 bits into bits 31 through 12. Another instruction, Template:Code generates the same 20 upper address bits by adding an offset to the program counter and storing the result into a base register. This permits position-independent code to have 32-bit addresses relative to the program counter. The base register can be used as-is with the 12-bit offsets of the loads and stores. If needed, Template:Code can set the lower 12-bits of a register. In 64-bit ISAs,Template:Code and Template:Code sign-extend the result to 64 bits.[4]

Some fast CPUs may interpret combinations of instructions as single "fused" instructions. Template:Code or Template:Code may be good candidates to fuse with loads or stores.

Subroutine calls, jumps, and branches

RISC-V's subroutine call Template:Code (jump and link) places its return address in a register. This is faster in many computer designs, because it saves a memory access compared to systems that push a return address directly on a stack in memory. Template:Code has a 20-bit signed (2's complement) offset. The offset is multiplied by 2, then added to the PC to generate a relative address to a 32-bit instruction. If the result is not at a 32-bit address (i.e. evenly divisible by 4), the CPU may force an exception.[4]

RISC-V CPUs jump to calculated addresses using a "jump and link-register", Template:Code instruction. Template:Code is similar to Template:Code, but gets its destination address by adding a 12-bit offset to a base register. (In contrast,Template:Code adds a larger 20-bit offset to the PC.)

Template:Code's bit format is like the register-relative loads and stores. Like them, Template:Code can be used with the instructions which set the upper 20 bits of a base register in order to make 32-bit branches, either to an absolute address (using Template:Code) or a PC-relative one (using Template:Code for position-independent code). (Using a constant zero base address allows single-instruction calls to a small (the offset), fixed positive or negative address.)

RISC-V recycles Template:Code and Template:Code to get unconditional 20-bit PC-relative jumps and unconditional register-based 12-bit jumps. Jumps just make the linkage register 0 so that no return address is saved.[4]

RISC-V also recycles Template:Code to return from a subroutine: To do this, Template:Code's base register is set to be the linkage register saved by Template:Code or Template:Code. Template:Code's offset is zero and the linkage register is zero, so that there is no offset, and no return address is saved.

Like many RISC designs, in a subroutine call, a RISC-V compiler must use individual instructions to save registers to the stack at the start, and then restore these from the stack on exit. RISC-V has no "save multiple" or "restore multiple" register instructions. These were thought to make the CPU too complex, and possibly slow, as well.[54] This can take more code space. Designers planned to reduce code size with library routines to save and restore registers.[55]

RISC-V has no condition code register or carry bit. The designers believed that condition codes make fast CPUs more complex by forcing interactions between instructions in different stages of execution. This choice makes multiple-precision arithmetic more complex. Also, a few numerical tasks need more energy.[4]

Instead, RISC-V has short branches that perform comparisons: equal, not-equal, less-than, unsigned less-than, greater-than and unsigned greater-than. Ten comparison-branch operations are implemented with only six instructions, by reversing the order of operands in the assembler. For example, "branch if greater than" can be done by "less than or equal" with a reversed order of operands.[4]

The comparing branches have a twelve-bit signed range, and jump relative to the PC.[4]

RISC-V's ISA requires default branch predictions for CPUs: Backward conditional branches should be predicted "taken". Forward conditional branches predict "not taken". The predictions are easy to decode in a pipelined CPU: Branch addresses are signed numbers added to the PC. Backward branches have negative two's complement addresses, and therefore have a one in the most significant bit of the address. Forward branches have a zero. The most significant bit is in a fixed location in the operation code in order to speed up the pipeline. Complex CPUs can add branch predictors to work well even with unusual data or situations.

The ISA manual recommends that software be optimized to avoid branch stalls by using the default branch predictions. This reuses the most significant bit of the signed relative address as a "hint bit" to tell whether the conditional branch will be taken or not. So, no other hint bits are needed in the operation codes of RISC-V branches. This makes more bits available in the branch operation codes. Simple, inexpensive CPUs can merely follow the default predictions and still perform well with optimizing compilers. Compilers can still perform statistical path optimization, if desired.[4]

To avoid unnecessary loading of branch prediction electronics, (and therefore unnecessary pipeline stalls) the comparing branch codes should never be used for unconditional jumps.[4]

RISC-V does not support predication (the conditional execution of instructions) as its designers claim that CPUs without predication are easier to design, and optimizing compilers are less likely to mistakenly use predication where it should not be used. The designers claim that very fast, out-of-order CPU designs do predication anyway, by doing the comparison branch and conditional code in parallel, then discarding the unused path's effects. They also claim that even in simpler CPUs, predication is less valuable than branch prediction, which can prevent most stalls associated with conditional branches. Code without predication is larger, with more branches, but they also claim that a compressed instruction set (such as RISC-V's set "C") solves that problem in most cases.[4]

Many RISC designs have included a branch delay slot, a position after a branch instruction that can be filled with an instruction which is executed whether or not the branch is taken. This feature can improve the performance of pipelined CPUs by absorbing some of the time wasted if a CPU mispredicts the operation of a conditional branch, and the CPU's pipeline stalls. RISC-V omits a branch delay slot because it complicates multicycle CPUs, superscalar CPUs, and long pipelines. Dynamic branch predictors have succeeded well enough to reduce the need for delayed branches.[4]

Arithmetic and logic sets

RISC-V segregates math into a minimal set of integer instructions (set "I") with add, subtract, shift, bit-wise logic and comparing-branches. These can simulate most of the other RISC-V instruction sets with software. (The atomic instructions are a notable exception.) RISC-V currently lacks the "count leading zero" and bit-field operations normally used to speed software floating-point in a pure-integer processor.

The integer multiplication instructions (set "M") includes signed and unsigned multiply and divide. Double-precision integer multiplies and divides are included, as multiplies and divides that produce the "high word" of the result. The ISA document recommends that implementors of CPUs and compilers "fuse" a standardized sequence of high and low multiply and divide instructions to a single operation if possible. [4]

The floating-point instructions (set "F") includes single-precision arithmetic and also comparison-branches similar to the integer arithmetic. It requires an additional set of 32 floating-point registers. These are separate from the integer registers. The double-precision floating point instructions (set "D") generally assume that the floating-point registers are 64-bit (i.e. double-width), and the "F" subset is coordinated with the "D" set. A quad-precision 128-bit floating-point ISA ("Q") is also defined. RISC-V computers without floating-point can use a floating-point software library.[4]

RISC-V does not cause exceptions on arithmetic errors, including overflow, underflow, subnormal and divide by zero. Instead, both integer and floating-point arithmetic produce reasonable default values and set status bits. Divide-by-zero can be discovered by a single branch after the division. The status bits can be tested by an operating system or periodic interrupt.[4]

Atomic memory operations

RISC-V supports computers that share memory between multiple CPUs and threads. RISC-V's standard memory consistency model is release consistency. That is, loads and stores may generally be reordered, but some loads may be designated as acquire operations which must precede later memory accesses, and some stores may be designated as release operations which must follow earlier memory accesses.[4]

The base instruction set includes minimal support in the form of a [[fence instruction|Template:Code instruction]] to enforce memory ordering. Although this is sufficient (Template:Code provides "acquire" and Template:Code provides "release"), combined operations can be more efficient.[4]

The atomic memory operation extension supports two types of atomic memory operations for release consistency. First, it provides general purpose [[Load-link/store-conditional|"load-reserved" Template:Code and "store-conditional" Template:Code]] instructions. Template:Code performs a load, and tries to reserve that address for its thread. A later store-conditional Template:Code to the reserved address will be performed only if the reservation is not broken by an intervening store from another source. If the store succeeds, a zero is placed in a register. If it failed, a non-zero value indicates that software needs to retry the operation. In either case, the reservation is released.[4]

The second group of atomic instructions perform read-modify-write sequences: a load (which is optionally a load-acquire) to a destination register, then an operation between the loaded value and a source register, then a store of the result (which may optionally be a store-release). Making the memory barriers optional permits combining the operations. The optional operations are enabled by "acquire" and "release" bits which are present in every atomic instruction. RISC-V defines nine possible operations: swap (use source register value directly); add; bitwise and, or, and exclusive-or; and signed and unsigned minimum and maximum.[4]

A system design may optimize these combined operations more than Template:Code and Template:Code. For example, if the destination register for a swap is the constant zero, the load may be skipped. If the value stored is unmodified since the load, the store may be skipped.[4]

The IBM System/370 and its successors including z/Architecture, and x86, both implement a compare-and-swap (Template:Code) instruction, which tests and conditionally updates a location in memory: if the location contains an expected old value, Template:Code replaces it with a given new value; it then returns an indication of whether it made the change. However, a simple load-type instruction is usually performed before the Template:Code to fetch the old value. The classic problem is that if a thread reads (loads) a value "A", calculates a new value "C", and then uses (Template:Code) to replace "A" with "C", it has no way to know whether concurrent activity in another thread has replaced "A" with some other value "B" and then restored the "A" in between. In some algorithms (e.g., ones in which the values in memory are pointers to dynamically allocated blocks), this ABA problem can lead to incorrect results. The most common solution employs a "double-wide Template:Code" instruction to update both the pointer and an adjacent counter; unfortunately, such an instruction requires a special instruction format to specify multiple registers, performs several reads and writes, and can have complex bus operation.[4]

The Template:Code/Template:Code alternative is more efficient. It usually requires only one memory load, and minimizing slow memory operations is desirable. It's also exact: it controls all accesses to the memory cell, rather than just assuring a bit pattern. However, unlike Template:Code, it can permit livelock, in which two or more threads repeatedly cause each other's instructions to fail. RISC-V guarantees forward progress (no livelock) if the code follows rules on the timing and sequence of instructions: 1) It must use only the "I" subset. 2) To prevent repetitive cache misses, the code (including the retry loop) must occupy no more than 16 consecutive instructions. 3) It must not include any system or fence instructions, or taken backward branches between the Template:Code and Template:Code. 4) The backward branch to the retry loop must be to the original sequence.[4]

The specification gives examples of how to use this subset to lock a data structure.[4]

Compressed subset

The standard RISC-V ISA specifies that all instructions are 32 bits. This makes for a particularly simple implementation, but like other RISC processors with such an instruction encoding, results in larger code size than in other instruction sets.[4][54] To compensate, RISC-V's "32-bit" instructions are actually 30 bits; Template:Frac of the opcode space is reserved for an optional (but recommended) variable-length "compressed" instruction set, RVC, that includes 16-bit instructions. Like ARM's Thumb and the MIPS16, the compressed instructions are simply aliases for a subset of the larger instructions. Unlike ARM's Thumb or the MIPS compressed set, space was reserved from the beginning so there is no separate operating mode. Standard and compressed instructions may be intermixed freely.[4][54] (letter "C")[55]

Because (like Thumb-1 and MIPS16) the compressed instructions are simply alternate encodings (aliases) for a selected subset of larger instructions, the compression can be implemented in the assembler, and it is not essential for the compiler to even know about it.

A prototype of RVC was tested in 2011.[54] The prototype code was 20% smaller than an x86 PC and MIPS compressed code, and 2% larger than ARM Thumb-2 code.[54] It also substantially reduced both the needed cache memory and the estimated power usage of the memory system.[54]

The researcher intended to reduce the code's binary size for small computers, especially embedded computer systems. The prototype included 33 of the most frequently used instructions, recoded as compact 16-bit formats using operation codes previously reserved for the compressed set.[54] The compression was done in the assembler, with no changes to the compiler. Compressed instructions omitted fields that are often zero, used small immediate values or accessed subsets (16 or 8) of the registers. Template:Code is very common and often compressible.[54]

Much of the difference in size compared to Arm's Thumb set occurred because RISC-V, and the prototype, have no instructions to save and restore multiple registers. Instead, the compiler generated conventional instructions that access the stack. The prototype RVC assembler then often converted these to compressed forms that were half the size. However, this still took more code space than the ARM instructions that save and restore multiple registers. The researcher proposed to modify the compiler to call library routines to save and restore registers. These routines would tend to remain in a code cache and therefore run fast, though probably not as fast as a save-multiple instruction.[54]

Embedded subset

An instruction set for the smallest "embedded" CPUs (set "E") is reduced in other ways: Only 16 of the 32-bit integer registers are supported. Floating-point instructions should not be supported (the specification forbids it as uneconomical), so a floating-point software library must be used.[4] The compressed set "C" is recommended. The privileged instruction set supports only machine mode, user mode and memory schemes that use base-and-bound address relocation.[19]

There is also discussion of a microcontroller profile for RISC-V, to ease development of deeply embedded systems. It centers on faster, simple C-language support for interrupts, simplified security modes and a simplified POSIX application binary interface.[56]

Correspondents have also proposed smaller, non-standard, 16-bit "RV16E" ISAs: One would use 16 × 16-bit integer registers, using the standard "EIMC" ISAs (including 32-bit instructions.)[57] Another proposal would only use the 16-bit "C" instructions with 8 × 16-bit registers. A full RV16EG was said to be possible with a completely recoded ISA.[58]

Privileged instruction set

The privileged instruction set specification of RISC-V is currently (July 2017) preliminary.[19]

RISC-V's ISA includes a separate privileged instruction set specification.

Version 1.10 of the specification supports several types of computer systems:

  1. Systems that have only "machine mode", perhaps for embedded systems,
  2. Systems with both machine mode (for the supervisor) and user-mode, perhaps to implement Linux.
  3. Systems with machine-mode, hypervisors, multiple supervisors, and user-modes under each supervisor.

These correspond roughly to systems with up to four "rings" of privilege and security, at most: machine, hypervisor, supervisor and user. Each layer also is expected to have a thin layer of standardized supporting software that communicates to a more-privileged layer, or hardware.[19]

The overall plan for this ISA is to make the hypervisor mode orthogonal to the user and supervisor modes.[59] The basic feature is a configuration bit that either permits supervisor-level code to access hypervisor registers, or causes an interrupt on accesses. This bit lets supervisor mode directly handle the hardware needed by a hypervisor. This simplifies a type 2 hypervisor, hosted by an operating system. This is a popular mode to run warehouse-scale computers. To support type 1, unhosted hypervisors, the bit can cause these accesses to interrupt to a hypervisor. The bit simplifies nesting of hypervisors, in which a hypervisor runs under a hypervisor. It's also said to simplify supervisor code by letting the kernel use its own hypervisor features with its own kernel code. As a result, the hypervisor form of the ISA supports five modes: machine, supervisor, user, supervisor-under-hypervisor and user-under-hypervisor.

The privileged instruction set specification explicitly defines "hardware threads", or "harts". Multiple hardware threads are a common practice in larger, more powerful computers. When one thread is stalled, waiting for memory, others can often proceed. Hardware threads can help make better use of the large number of registers and execution units in large CPUs. Finally, hardware threads can be a simple, powerful way to handle interrupts: No saving or restoring of registers is required, simply executing a different hardware thread. The only hardware thread required in a RISC-V computer is thread zero.[19]

The existing control and status register definitions support RISC-V's error and memory exceptions, and a small number of interrupts. For larger systems, the specification also defines an interrupt controller. Interrupts always start at the highest-privileged machine level, and the control registers of each level have explicit "forwarding" bits to route interrupts to less-privileged code. For example, the hypervisor need not include software that executes on each interrupt to forward an interrupt to an operating system. Instead, on set-up, it can set bits to forward the interrupt.[19]

Several memory systems are supported in the specification. Physical-only is suited to the smallest embedded systems. There are also three UNIX-style virtual memory systems for memory cached in mass-storage systems. The virtual memory systems have three sizes, with addresses sized 32, 39 and 48-bits. All virtual memory systems support 4 KiB pages, multilevel page-table trees and use very similar algorithms to walk the page table trees. All are designed for either hardware or software page-table walking. To optionally reduce the cost of page table walks, super-sized pages may be leaf pages in higher levels of a system's page table tree. SV32 has a two layer page table tree and supports 4 MiB superpages. SV39 has a three level page table, and supports 2 MiB superpages and 1 GiB gigapages. SV48 is required to support SV39. It also has a 4-level page table and supports 2 MiB superpages, 1 GiB gigapages, and 1 TiB terapages. Superpages are aligned on the page boundaries for the next-lowest size of page.[19]

Bit manipulation

Substantial work was done to produce a preliminary, though unapproved, bit manipulation ("B") ISA for RISC-V. Done well, a bit manipulation subset can aid cryptographic, graphic and mathematical operations. The criteria for inclusion documented in the draft were compliance with RV5 philosophies and ISA formats, substantial improvements in code density or speed (i.e. at least a 3-for-1 reduction in instructions), and substantial real-world applications, including preexisting compiler support. Version 0.36 included[60] uncontroversial instructions to count leading zeros, count one bits, perform 'and' with complement, shift ones, rotate, a generalized bit-reverse and shuffle, byte swaps, bit extracts and deposits, and some bit manipulation additions for the compressed set (not, neg and reverse). It also includes a controversial proposal for bit-field extract and place, using a nonstandard 48-bit instruction format.

Packed SIMD

For simple, cost-reduced RISC-V systems, there is a proposal to use the floating-point registers' bits to perform parallel SIMD sub-word arithmetic. This is widely used to accelerate multimedia and other digital signal processing.[4] This ISA is (as of 2016) undefined, but could resemble PA-RISC's multimedia instructions. Besides its native 64-bit math, the PA-RISC MAX2 CPU could do arithmetic on four 16-bit subwords at a time, with several overflow methods. It also could move subwords to different positions. PA-RISC's MAX2 was intentionally simplified. It lacked support for 8-bit or 32-bit subwords. The 16-bit subword size was chosen to support most digital signal processing tasks. These instructions were inexpensive to design and build. However, they increased the CPU's performance on digital signal processing tasks by 48-fold or more, enabling practical real-time video codecs in 1995.[61][62]

Vector set

The proposed vector processing instruction set may make the packed SIMD set obsolete. The designers hope to have enough flexibility so that a CPU can implement vector instructions in a standard processor's registers. This would enable minimal implementations with similar performance to a multimedia ISA, (as above.) However, a true vector coprocessor could execute the same code with higher performance.[63]

The vector processing proposal is currently (29 June 2015) a conservative, flexible design of a general-purpose mixed-precision vector processor (suitable for executing compute kernels). Code would port easily to CPUs with differing vector lengths, ideally without recompiling.[63]

In contrast, short-vector SIMD extensions are less convenient. These are used in x86, ARM and PA-RISC. In these, a change in word-width forces a change to the instruction set to expand the vector registers (in the case of x86, from 64-bit MMX registers to 128-bit SSE, to 256-bit AVX, and AVX-512). The result is not only a growing instruction set, but also a need to port working code to the new instructions.

In the RISC-V vector ISA, rather than fix the vector length in the architecture, an instruction (Template:Code) is available which takes a requested size and sets the vector length to the minimum of the hardware limit and the requested size. So, the RISC-V proposal is more like a Cray's long-vector design. That is, each vector in up to 32 vectors is the same length.[63]

The application specifies the total vector width it requires, and the processor determines the vector length it can provide with available on-chip resources. This takes the form of an instruction (Template:Code) with four immediate operands, specifying the number of vector registers of each available width needed. The total must be no more than the addressable limit of 32, but may be less if the application does not require them all. The vector length is limited by the available on-chip storage divided by the number of bytes of storage needed for each entry. (Added hardware limits may also exist, which in turn may permit SIMD-style implementations.)[63]

Outside of vector loops, the application can request zero vector registers, saving the operating system the work of preserving them on context switches.[63]

The vector length is not only architecturally variable, but designed to vary at run time also. To achieve this flexibility, the instruction set is likely to use variable-width data paths and variable-type operations using polymorphic overloading.[63] The plan is that these can reduce the size and complexity of the ISA and compiler.[63]

Recent experimental vector processors with variable-width data paths also show profitable increases in operations per: second (speed), area (lower cost), and watt (longer battery life).[64]

Unlike a typical modern graphics processing unit, there are no plans to provide special hardware to support branch predication. Instead, lower cost compiler-based predication will be used.[63][65]

External debug system

There is a preliminary specification for RISC-V's hardware-assisted debugger. The debugger will use a transport system such as Joint Test Action Group (JTAG) or Universal Serial Bus (USB) to access debug registers. A standard hardware debug interface may support either a "standardized abstract interface" or "instruction feeding".[66][67]

The exact form of the "abstract interface" remains undefined (January 2017), but proposals include a memory mapped system with standardized addresses for the registers of debug devices or a command register and a data register accessible to the communication system.[66] Correspondents claim that similar systems are used by Freescale's BDM for some CPUs, ARM, OpenRISC and Aeroflex's LEON.[66]

In "instruction feeding", the CPU will process a debug exception to execute individual instructions written to a register. This may be supplemented with a data-passing register and a module to directly access the memory. Instruction feeding lets the debugger access the computer exactly as software would. It also minimizes changes in the CPU, and adapts to many types of CPU. This was said to be especially apt for RISC-V because RISC-V is explicitly designed for many types of computers. The data-passing register allows a debugger to write a data-movement loop to RAM, and then execute the loop to move data into or out of the computer at a speed near the maximum speed of the debug system's data channel.[66] Correspondents claim that similar systems are used by MIPs, Intel Quark, Tensilica's Xtensa and for Freescale Power CPUs' BDM.[66]

See also

References

1 }}
     | references-column-width 
     | references-column-count references-column-count-{{#if:1|30em}} }}
   | {{#if: 
     | references-column-width }} }}" style="{{#if: 30em
   | {{#iferror: {{#ifexpr: 30em > 1 }}
     | Template:Column-width
     | Template:Column-count }}
   | {{#if: 
     | Template:Column-width }} }} list-style-type: {{#switch: 
   | upper-alpha
   | upper-roman
   | lower-alpha
   | lower-greek
   | lower-roman = {{{group}}}
   | #default = decimal}};">
  1. Template:Cite book)
  2. Template:Cite web
  3. Template:Cite web
  4. 4.00 4.01 4.02 4.03 4.04 4.05 4.06 4.07 4.08 4.09 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 4.21 4.22 4.23 4.24 4.25 4.26 4.27 4.28 4.29 4.30 4.31 4.32 4.33 4.34 4.35 4.36 4.37 4.38 4.39 4.40 4.41 4.42 4.43 Template:Cite web
  5. Template:Cite web
  6. 6.0 6.1 Template:Cite web
  7. Template:Cite web
  8. Template:Cite web
  9. Template:Cite web
  10. 10.0 10.1 10.2 10.3 Template:Cite web
  11. 11.0 11.1 Template:Cite journal
  12. Template:Cite web
  13. Template:Cite web
  14. Template:Cite web
  15. Template:Cite web
  16. Template:Cite web
  17. Template:Cite web
  18. Template:Cite press release
  19. 19.0 19.1 19.2 19.3 19.4 19.5 19.6 19.7 19.8 Template:Cite web
  20. Template:Cite web
  21. Template:Cite web
  22. Template:Cite web
  23. Template:Cite web
  24. Template:Cite web
  25. Template:Cite web
  26. Template:Cite web
  27. 27.0 27.1 Template:Cite web
  28. Template:Cite web
  29. Template:Cite web
  30. Template:Cite web
  31. Template:Cite web
  32. Template:Cite web
  33. Template:Cite web
  34. Template:Cite web
  35. Template:Cite web
  36. Template:Cite web
  37. Template:Cite news
  38. Template:Cite web
  39. Template:Cite web
  40. Template:Cite web
  41. Template:Cite web
  42. Template:Cite news
  43. Template:Cite web
  44. Template:Cite web
  45. Template:Cite AV media
  46. Template:Cite mailing list
  47. Template:Cite web
  48. Template:Cite web
  49. Template:Cite web
  50. Template:Cite web
  51. Template:Cite web
  52. Template:Cite web
  53. Template:Cite web
  54. 54.0 54.1 54.2 54.3 54.4 54.5 54.6 54.7 54.8 Template:Cite book
  55. 55.0 55.1 Template:Cite web
  56. Template:Cite web
  57. Template:Cite web
  58. Template:Cite web
  59. Template:Cite web
  60. Template:Cite web
  61. Template:Cite journal
  62. Template:Cite journal
  63. 63.0 63.1 63.2 63.3 63.4 63.5 63.6 63.7 Template:Cite web
  64. Template:Cite web
  65. Template:Cite web
  66. 66.0 66.1 66.2 66.3 66.4 Template:Cite web
  67. Template:Cite web

Further reading

External links

Template:Commonscat

Template:RISC-based processor architectures Template:Programmable Logic Template:Microcontrollers