Dear friends,
I am very pleased and proud to announce YARDstick:
(http://electronics.physics.auth.gr/people/nkavv/yardstick),
a custom processor development toolset with an impressive list of features.
But, beware, the following amount of information might prove unhealthy to
low-life academic bozos that are unaware of real-life coding, are fans
of top-down programming (crap!, only professor's textbooks), UML (crap!), and hate
extreme programming.
So:
YARDstick is a novel design automation tool for custom processor development flows
that focuces on the hard part: generating and evaluating application-specific
hardware extensions. YARDstick is a powerful building block for ASIP
development, since it integrates application analysis, ultra-fast algorithms for custom
instruction generation and selection with user-defined compiler intermediate
representations. As of September 2007, YARDstick integrates retargetable compiler
features for the targeted IRs/architectures. Remarkable features of YARDstick are the
following:
- retargetable to used-defined IRs by machine description.
- can be targeted to low-level compiler IRs, assembly-level representations of virtual
machines, or assembly code for existing processors.
- fully parameterized custom instruction generation and selection engine.
- lightning-fast code selector for multiple-input multiple-output patterns based on
graph matching. It is known that the code selector scales very well with the instruction
node count of basic block data-dependence graphs (successfully tested with custom
instruction patterns of more than 30 nodes).
- virtual register assignment for virtual machine targets.
- an extensive set of backends including assembly code emitter, C backend, visualization
backends for Graphviz and VCG (or aiSee), an XML format amenable to graph rewriting
and others.
YARDstick comes along with a cross-platform GUI written in Tcl/Tk 8.5.
The ultimate goal of YARDstick is to liberate the designer's development infrastructure
from compiler and simulator idiosyncrasies. With YARDstick, the ASIP designer is empowered
with the freedom of specifying the target architecture of choice and adding new
implementations of analyses and custom instruction generation/selection methods.
At this moment, YARDstick is being heavily used for developing a new processor
architecture of mine with many never-being-seen features, mostly aiming FPGAs.
Status update report on the processor architecture should be expected near late
October 2007.
Typically, 2x to 15x speedups for benchmark applications (ANSI C optimized source code)
can be fully automatically obtained by using YARDstick depending on the
target architecture. Speedups are evaluated against a typical scalar RISC
architecture.
Detailed feature list:
1. Analysis engines generating both static and dynamic statistics:
- Data types
- Operation-level statistics
- Basic block statistics (ranking)
- Performance estimations with/without custom instructions.
2. Generation of CDFGs (Control-Data Flow Graphs).
3. Backend engines:
- ANSI C
- dot (Graphviz)
- VCG (GDL, aiSee)
- XML (GGX for the AGG graph rewriting tool)
- Retargetable assembly emitter for entire translation units (single files
with multiple functions/procedures).
- CDFG formats for various RTL synthesis tools.
4. Custom instruction engines:
- Full-parameterized MIMO custom instruction generation algorithm. Features:
* Fast heuristic !!!
* Configurable number of inputs
* Configurable number of outputs
* List of forbidden nodes
* Node sorting strategies (3 different strategies!)
* Transformation rule library for applying CFG transformation strategies
5. Custom instruction selection:
- Based on priority metrics (2 choices at the moment).
6. Graph (and graph-subgraph) isomorphism features for eliminating redundant
patterns. Multiple algorithms supported.
7. Visualization of custom instructions, basic blocks, control-flow graphs and
control-data flow graphs (basic block nodes expanded to their constituent
instructions).
8. Basic retargetable compiler features (alpha state):
- Code selector for MIMO instructions (tested with large cases).
- Virtual register assignment (allocation for a VM).
- Hard register allocator in the works.
9. Miscellaneous features:
- single constant multiplication optimizer
- elimination of false data-dependences in assembly-level CDFGs.
- beautification options for visualization
- interfacing (co-operation) with external tools such as peephole optimizers,
profilers, code generators etc.
- features related to the custom processor architecture (not to be disclosed
yet)
Here is a list of application benchmarks that have been tested with YARDstick (compiler
features not fully tested):
- ADPCM encoder and decoder (typically: 4x speedup)
- Video processing kernels: full-search block-matching motion estimation, logarithmic
search motion estimation, motion compensation
- Image processing kernels: steganography (hide/uncover), edge detection, matrix
multiplication
- Cryptographic kernels: crc32, rc5, raiden (7x speedup, 12x for unrolled version)
At the YARDstick homepage:
http://electronics.physics.auth.gr/people/nkavv/yardstick/
you can find some additional material:
- 2-page brochure
- 2-page abstract for the DATE'07 University Booth
- a more extended presentation on YARDstick
The above material refers to the status of April 2007.
Expected enhancements to YARDstick in the near future:
- linear-scan and integer-linear programming based register allocators
- bitwidth analysis
- CDFG->VHDL generation of custom instruction hardware
- algorithm implementation for CDFG pipelining
Interested parties are welcome to contact me for details on how to get access
to a demo version of the YARDstick toolset.
Kind regards
Nikolaos Kavvadias
Computer Architecture Specialist - Compiler Developer
Ph.D. candidate
M.Sc. Eletronics Engineering
B.Sc. Physics
You may contact me at:
Nikolaos Kavvadias <nkavv@physics.auth.gr>
http://www.geocities.com/kaveirious/
http://electronics.physics.auth.gr/tomeas/en/kavvadias.html
USAGE EXAMPLES
--
-- A Custom instruction generated from an edge-detection filter
--
1. Auto-generated C simulation code.
void main_9(
int *vr234_s32
,int *vr235_s32
,int vr60_s32
,int vr207_s32
,int vr210_s32
,int vr220_s32
)
{
int vr228_s32;
int vr229_s32;
int vr230_s32;
int vr231_s32;
int vr232_s32;
int vr233_s32;
*vr235_s32 = 1;
vr231_s32 = vr207_s32-vr220_s32;
vr232_s32 = ((vr231_s32 < 0) ? -vr231_s32 : vr231_s32);
vr233_s32 = vr60_s32<vr232_s32;
vr228_s32 = vr207_s32-vr210_s32;
vr229_s32 = ((vr228_s32 < 0) ? -vr228_s32 : vr228_s32);
vr230_s32 = vr60_s32<vr229_s32;
*vr234_s32 = vr230_s32|vr233_s32;
#pragma cycles_est_total = 2
}
2. Auto-generated VCG (GDL) graph
graph: { title: "main_9"
x: 30
y: 30
height: 380
width: 560
xspace: 20
yspace: 30
display_edge_labels: yes
layoutalgorithm: minbackward
port_sharing: no
node.borderwidth: 3
node.color: white
node.textcolor: black
node.bordercolor: black
edge.color: black
node: { title:"0" shape: ellipse label:" ior" color:yellow }
node: { title:"1" shape: ellipse label:" sl" color:yellow }
node: { title:"2" shape: ellipse label:" abs" color:yellow }
node: { title:"3" shape: ellipse label:" sub" color:yellow }
node: { title:"4" shape: ellipse label:" sl" color:yellow }
node: { title:"5" shape: ellipse label:" abs" color:yellow }
node: { title:"6" shape: ellipse label:" sub" color:yellow }
node: { title:"7" shape: ellipse label:" ldc" color:yellow }
node: { title:"8" shape: rhomb label:" 1" color:magenta }
edge: {sourcename:"8" targetname:"7"}
node: { title:"9" shape: triangle label:" vr234.s32" color:cyan }
edge: {sourcename:"0" targetname:"9"}
node: { title:"10" shape: triangle label:" vr235.s32" color:cyan }
edge: {sourcename:"7" targetname:"10"}
node: { title:"11" shape: box label:" vr60.s32" color:green }
edge: {sourcename:"11" targetname:"1"}
node: { title:"12" shape: box label:" vr207.s32" color:green }
edge: {sourcename:"12" targetname:"3"}
node: { title:"13" shape: box label:" vr210.s32" color:green }
edge: {sourcename:"13" targetname:"3"}
edge: {sourcename:"11" targetname:"4"}
edge: {sourcename:"12" targetname:"6"}
node: { title:"14" shape: box label:" vr220.s32" color:green }
edge: {sourcename:"14" targetname:"6"}
edge: {sourcename:"3" targetname:"2" label:"vr228.s32" }
edge: {sourcename:"2" targetname:"1" label:"vr229.s32" }
edge: {sourcename:"1" targetname:"0" label:"vr230.s32" }
edge: {sourcename:"6" targetname:"5" label:"vr231.s32" }
edge: {sourcename:"5" targetname:"4" label:"vr232.s32" }
edge: {sourcename:"4" targetname:"0" label:"vr233.s32" }
}
3. Auto-generated Graphviz (dot) file
digraph main_9 {
node [fontname=Courier,fontsize=14,style=filled];
0 [shape=ellipse,label="ior",fillcolor=yellow]
1 [shape=ellipse,label="sl",fillcolor=yellow]
2 [shape=ellipse,label="abs",fillcolor=yellow]
3 [shape=ellipse,label="sub",fillcolor=yellow]
4 [shape=ellipse,label="sl",fillcolor=yellow]
5 [shape=ellipse,label="abs",fillcolor=yellow]
6 [shape=ellipse,label="sub",fillcolor=yellow]
7 [shape=ellipse,label="ldc",fillcolor=yellow]
8 [shape=diamond,label="1",fillcolor=magenta]
8 -> 7;
9 [shape=triangle,label="vr234.s32",fillcolor=cyan]
0 -> 9;
10 [shape=triangle,label="vr235.s32",fillcolor=cyan]
7 -> 10;
11 [shape=invtriangle,label="vr60.s32",fillcolor=green]
11 -> 1;
12 [shape=invtriangle,label="vr207.s32",fillcolor=green]
12 -> 3;
13 [shape=invtriangle,label="vr210.s32",fillcolor=green]
13 -> 3;
11 -> 4;
12 -> 6;
14 [shape=invtriangle,label="vr220.s32",fillcolor=green]
14 -> 6;
3 -> 2 [label="vr228.s32"];
2 -> 1 [label="vr229.s32"];
1 -> 0 [label="vr230.s32"];
6 -> 5 [label="vr231.s32"];
5 -> 4 [label="vr232.s32"];
4 -> 0 [label="vr233.s32"];
}