

# **SH-4R Architecture**

**Technical Overview** 

| SH-4R The High-End Multimedia architecture                                     | . 2 |
|--------------------------------------------------------------------------------|-----|
| SH-4R RISC Multimedia enhancements                                             | . 2 |
| Two-way Superscalar support - The leading performance per MHz                  | . 2 |
| Integrated Floating Point Unit co-processor (FPU) - Embedding your PC Software | . 3 |
| DSP/SIMD instruction - The true Multimedia acceleration                        | . 3 |
| 16-bit Instruction length - The smallest memory footprint                      | . 4 |
|                                                                                |     |

| The SH-4R architecture features           | 5 |
|-------------------------------------------|---|
| Register configuration                    | 5 |
| Instruction set - The full compatibility  | 6 |
| Flexible data format                      |   |
| Five-stage pipeline                       |   |
| Optimised cache architecture              | 7 |
| Store queues - high speed burst transfers | 7 |
| Very large address space                  | 8 |
| Low power consumption                     |   |
| Optimised technology & quality            | 8 |





## SH-4R - The High-end Multimedia architecture

The standard RISC (Reduced Instruction Set Computing) features have been extended by Hitachi to create an enhanced microprocessor core dedicated for embedded applications requiring DSP and multimedia accelerations as well as high performance general purpose functions.

The SH-4R core is a first-class choice for multimedia applications such as car infotainment systems, game consoles, multimedia terminals, video conferencing, interactive TV and many others. Algorithms such as speech codecs, advanced audio MP3, voice recognition, MPEG motion video processing and 3D-graphics calculations benefit directly from this dedicated architecture.



## SH-4R RISC Multimedia enhancements

The SH-4R architecture has been optimised to provide higher performances and lower power consumption at a reduced price, whilst keeping the full software compatibility with the former SH4 core. This means that moving to the SH-4R is really the simplest way to boost your system performance without adding additional cost associated with redesign of your hardware and software environment.

The main improvements in the SH-4R core are the higher frequency supported, and the change of the cache architecture (doubled in size and two-way set associative). This considerably reduces the cache misses and significantly increases the overall system performance. Also the DMA controllers have been extended to handle more media stream transfers.

The new technology reduces the power consumption considerably and allows running at a higher speed whist reducing the cost/performances ratio.

#### Two-way Superscalar support - The leading performance per MHz

The Superscalar implementation allows the decoding and execution of two instructions in parallel. This generates a maximum performance of up to two instructions per cycle (1.5 instruction average) including FPU.

The Superscalar architecture allows an integer processing power of more than 1.8MIPS/MHz (Dhrystone1.1)



#### Integrated FPU co-processor - Embedding your PC Software

The SH-4R Core includes an IEEE-754 compliant floating point co-processor, which supports single-precision (32-bits) and double-precision (64-bits) numbers. The SH-4R FPU can process an impressive seven million polygons per MHz.

The FPU is a major building block required in systems running multimedia applications coming from the PC market. Most of the software developers programming on a PC-based environment are using FPU variables for their convenience. The transfer of this software on the SH-4R architecture will be straightforward and directly optimised by the compiler by using the FPU.

#### DSP / SIMD instruction - The true Multimedia acceleration

The matrix math enhancements contribute to the graphics quality and are also suited to accelerate certain signal-processing algorithms.

The SH-4R instruction set includes several SIMD (Single Instruction – Multiple Data) instructions which dramatically accelerate algorithms based on vector and matrix arithmetic. Vectors and matrixes with a dimension of four are operands and coefficients of these single-issue instructions. The 128-bit floating-point vector engine executes the highly efficient SIMD instructions like matrix operations (FTRV) and inner product of two vectors (FIPR) required for 3D graphics processing and data (de)compression.

The other hardware DSP implementations supported are scalar square root, absolute value, divide, and multiply-and-accumulate operations.



SH-4R FPU used to accelerate MP3, MPEG-4, Speech Recognition, VoIP algorithms. SH-4R running at 240MHz decodes Windows Media, Audio and Video at 24 frames/sec, on a QVGA LCD and 16 bits per pixel.

### 16-bit Instruction length - The smallest memory footprint

A fixed 16-bit instruction length offers a very high code density to save cost for storage memory (external RAM, ROM and cache) and solves the bandwidth bottleneck problem of conventional 32-bit RISC architectures. On a 32-bit memory access, two instructions can be fetched in parallel, reducing the necessary memory accesses by a factor of two.



There is an average code size ratio of 2:3 between a SuperH processor and other RISC processors. Providing a lower total system cost.

## The SH-4R architecture features

The SH-4R Core employs a common 32-bit RISC architecture designed by Hitachi to meet the requirement of multimedia and high performance embedded applications.

The SH architecture has the following basic features:

- Load/Store
- Register orientation
- 32-bit internal data path
- RISC-type instruction set
- 4 Gbyte address space
- Five-stage RISC instruction pipeline

#### **Register configuration**

- General Purpose 32-bit Register bank
- 32-bit Control Registers
- 32-bit System Registers
- · 32-bit Shadow registers

The SH architecture enables arithmetic and logical instructions to operate normally on the 32-bit generalpurpose registers. Special load/store instructions are provided to transfer data from memory to registers and vice versa. The diagram on the right shows the basic general purpose 32-bit register bank which is used for source and destination operands. In addition to the 16 registers the SH-4R has 8 x 32-bit shadow registers, which can be accessed in the so-called privilege mode. As well as the general purpose registers, the SH architecture supports four system registers providing a Program Counter (PC), Procedure Register (PR), and two 32-bit Multiply and Accumulate Registers (MACH/MACL). The block of control registers contains the Status Register (SR), the Global Base Register (GBR), the Vector Base Register (VBR), the Saved Status Register (SSR) and the Saved Program Counter (SPC).





SH7750R Functional Block Diagram

#### Instruction set - The full upward compatibility

The instruction set has been carefully chosen to provide a highlevel language orientation, thus simplifying programming of the individual devices.

A major strength of the SH-4R core is the instruction set's upward compatibility with SH1, SH2 and SH3, which allows customers to move easily from one SH core to another. Each SH core is designed and targeted for specific applications with different requirements such as integration, performance and cost.

SH3-DSP 160 types DSP instructions SH-DSP 154 types **SH-1** SH-2 SH-3 SH-4 56 types 62 types 68 types 91 types FPU, Graphics 32-bit MMU instructions instructions multiplier/ accumulator

Instruction Set Upward Compatibility

The SH-4R supports up to 91 types of instructions including data transfer, arithmetic operations (with on-chip multiplier), logic operations, shift, branch, system control and DSP/FPU classes. The SH-4R C and C++ compilers are optimised to use the maximum benefit of each class of instructions. SH architecture provides a full software compatibility from 10 MIPS to more than 1000 MIPS

#### Flexible data format

Memory data formats are classified into bytes, words and longwords. The SH-4R supports big and little endian mode. This choice is a major advantage to simplify and ease the connection of external devices around the SH-4R processors



Memory data formats, Byte, Word and Longword Alignment

#### **Five-stage pipeline**

The advantage of using a RISC approach can be seen by the pipeline mechanism, allowing very high clock frequencies. The pipelining mechanism of the SH provides a single cycle peak throughput for the basic instructions (two in the case of the Superscalar SH-4R). The pipeline is automatically reduced if an instruction does not need all stages, and extended if an instruction needs some more latency cycles to be completed or if pipeline contention occurs. To reduce pipeline penalties, a delay-slot mechanism has been provided, reducing pipeline-breakage.



Instruction Pipelining Examples

#### Optimised cache architecture

The SH-4R cache architecture has been significantly optimised over the SH4 Core to boost the real time capabilities and overall system performances. The instruction cache and operand cache are handled separately. The global cache architecture is a two-way set associative to reduce cache misses, whilst keeping the latency to a minimum. The cache is doubled (16Kbyte instruction cache and 32KB operand cache).



for optimised latency/cache miss balance

The SH-4R includes large on-chip caches including copy-back and write-through buffers to close the performance gap between the processing speed of the CPU and the bandwidth of external SDRAMs.

Impressive results of the new cache architecture running on operating systems such as Linux, VxWorks<sup>™</sup>, Windows®CE or QNX

This optimisation is especially important for applications using intensive memory transfers, such as Java-based systems. Significant performance improvements are achieved by optimising the SH4 cache architecture.



## SH7750R

SH7750S

Running at 200MHz Wind River Systems Inc., JWorks 4.0 Operating System: VxWorks

#### Store queues - high speed burst transfers

The SH4(R) supports two 32-byte store queues (SQ) to perform high-speed burst writes to external memory. While the contents of one SQ are being transferred to external memory, the other SQ can be written to without a penalty cycle. This functionality is especially useful to transfer video, graphic or display list data to the frame buffer of the graphic processor.





#### Very large address space

The large address space of the SH-4R facilitates the connection/mapping of external devices. The uniform and unsegmented address space is managed by a Memory Management Unit (MMU) which translates addresses between the physical address space and the virtual one required by operating systems. Up to 4Gbytes address space (448-Mbyte external memory space) is available.

The SH-4R architecture incorporates two processor modes: user mode and privileged mode. Normal program execution is done in user mode, privileged mode is normally entered when an exception occurs.



#### Low power consumption

SH is a low power processor family designed to produce best-in-class performance/watt ratio. The SH-4R core features an advanced power saving mechanism with built-in power down modes:

- Sleep, on-chip peripherals still run
- · Standby, on-chip peripherals halt
- Module stand-by, specified module halt

#### Optimised technology and & quality

The SH-4R core from Hitachi is a leading architecture recognised in applications running in difficult environments such as automotive. Over the years, Hitachi has built a strong expertise by providing high reliability devices in extended operating conditions on the latest technology process. That is why the SH-4R is also qualified for wide temperature ranges (-40C to + 85C).

Latest technology further reduces the low power consumption of the SH-4R

#### EUROPEAN HEADQUARTERS

Hitachi Europe Ltd. Whitebrook Park, Lower Cookham Road, Maidenhead, Berkshire SL6 8YA United Kingdom Tel: +44-1628 585000 Fax: +44-1628 585160 Email: web.ecg@hitachi-eu.com

(Please visit our website for contact details of our Hitachi Sales Offices in EMEA)

## www.hitachi-eu.com/semiconductors/

