http://www.fmi.uni-passau.de/archive/doc/unix/gcc/gcc_17.html (Einblicke ins Internet, 10/1995)

Using and Porting GNU CC - Machine Descriptions

Go to the previous, next section.

Machine Descriptions

A machine description has two parts: a file of instruction patterns (`.md' file) and a C header file of macro definitions.

The `.md' file for a target machine contains a pattern for each instruction that the target machine supports (or at least each instruction that is worth telling the compiler about). It may also contain comments. A semicolon causes the rest of the line to be a comment, unless the semicolon is inside a quoted string.

See the next chapter for information on the C header file.

Everything about Instruction Patterns

Each instruction pattern contains an incomplete RTL expression, with pieces to be filled in later, operand constraints that restrict how the pieces can be filled in, and an output pattern or C code to generate the assembler output, all wrapped up in a define_insn expression.

A define_insn is an RTL expression containing four or five operands:

An optional name. The presence of a name indicate that this instruction pattern can perform a certain standard job for the RTL-generation pass of the compiler. This pass knows certain names and will use the instruction patterns with those names, if the names are defined in the machine description.
The absence of a name is indicated by writing an empty string where the name should go. Nameless instruction patterns are never used for generating RTL code, but they may permit several simpler insns to be combined later on.
Names that are not thus known and used in RTL-generation have no effect; they are equivalent to no name at all.
The RTL template (see section RTL Template) is a vector of incomplete RTL expressions which show what the instruction should look like. It is incomplete because it may contain match_operand, match_operator, and match_dup expressions that stand for operands of the instruction.
If the vector has only one element, that element is the template for the instruction pattern. If the vector has multiple elements, then the instruction pattern is a parallel expression containing the elements described.
A condition. This is a string which contains a C expression that is the final test to decide whether an insn body matches this pattern.
For a named pattern, the condition (if present) may not depend on the data in the insn being matched, but only the target-machine-type flags. The compiler needs to test these conditions during initialization in order to learn exactly which named instructions are available in a particular run.
For nameless patterns, the condition is applied only when matching an individual insn, and only after the insn has matched the pattern's recognition template. The insn's operands may be found in the vector operands.
The output template: a string that says how to output matching insns as assembler code. `%' in this string specifies where to substitute the value of an operand. See section Output Templates and Operand Substitution.
When simple substitution isn't general enough, you can specify a piece of C code to compute the output. See section C Statements for Assembler Output.
Optionally, a vector containing the values of attributes for insns matching this pattern. See section Instruction Attributes.

Example of `define_insn`

Here is an actual example of an instruction pattern, for the 68000/68020.

(define_insn "tstsi"
  [(set (cc0)
        (match_operand:SI 0 "general_operand" "rm"))]
  ""
  "*
{ if (TARGET_68020 || ! ADDRESS_REG_P (operands[0]))
    return \"tstl %0\";
  return \"cmpl #0,%0\"; }")

This is an instruction that sets the condition codes based on the value of a general operand. It has no condition, so any insn whose RTL description has the form shown may be handled according to this pattern. The name `tstsi' means "test a SImode value" and tells the RTL generation pass that, when it is necessary to test such a value, an insn to do so can be constructed using this pattern.

The output control string is a piece of C code which chooses which output template to return based on the kind of operand and the specific type of CPU for which code is being generated.

`"rm"' is an operand constraint. Its meaning is explained below.

RTL Template

The RTL template is used to define which insns match the particular pattern and how to find their operands. For named patterns, the RTL template also says how to construct an insn from specified operands.

Construction involves substituting specified operands into a copy of the template. Matching involves determining the values that serve as the operands in the insn being matched. Both of these activities are controlled by special expression types that direct matching and substitution of the operands.

(match_operand:m n predicate constraint)

This expression is a placeholder for operand number n of the insn. When constructing an insn, operand number n will be substituted at this point. When matching an insn, whatever appears at this position in the insn will be taken as operand number n; but it must satisfy predicate or this instruction pattern will not match at all.

Operand numbers must be chosen consecutively counting from zero in each instruction pattern. There may be only one match_operand expression in the pattern for each operand number. Usually operands are numbered in the order of appearance in match_operand expressions.

predicate is a string that is the name of a C function that accepts two arguments, an expression and a machine mode. During matching, the function will be called with the putative operand as the expression and m as the mode argument (if m is not specified, VOIDmode will be used, which normally causes predicate to accept any mode). If it returns zero, this instruction pattern fails to match. predicate may be an empty string; then it means no test is to be done on the operand, so anything which occurs in this position is valid.

Most of the time, predicate will reject modes other than m---but not always. For example, the predicate address_operand uses m as the mode of memory ref that the address should be valid for. Many predicates accept const_int nodes even though their mode is VOIDmode.

constraint controls reloading and the choice of the best register class to use for a value, as explained later (see section Operand Constraints).

People are often unclear on the difference between the constraint and the predicate. The predicate helps decide whether a given insn matches the pattern. The constraint plays no role in this decision; instead, it controls various decisions in the case of an insn which does match.

On CISC machines, the most common predicate is "general_operand". This function checks that the putative operand is either a constant, a register or a memory reference, and that it is valid for mode m.

For an operand that must be a register, predicate should be "register_operand". Using "general_operand" would be valid, since the reload pass would copy any non-register operands through registers, but this would make GNU CC do extra work, it would prevent invariant operands (such as constant) from being removed from loops, and it would prevent the register allocator from doing the best possible job. On RISC machines, it is usually most efficient to allow predicate to accept only objects that the constraints allow.

For an operand that must be a constant, you must be sure to either use "immediate_operand" for predicate, or make the instruction pattern's extra condition require a constant, or both. You cannot expect the constraints to do this work! If the constraints allow only constants, but the predicate allows something else, the compiler will crash when that case arises.

(match_scratch:m n constraint)

This expression is also a placeholder for operand number n and indicates that operand must be a scratch or reg expression.

When matching patterns, this is completely equivalent to

(match_operand:m n "scratch_operand" pred)

but, when generating RTL, it produces a (scratch:m) expression.

If the last few expressions in a parallel are clobber expressions whose operands are either a hard register or match_scratch, the combiner can add them when necessary. See section Side Effect Expressions.

(match_dup n)

This expression is also a placeholder for operand number n. It is used when the operand needs to appear more than once in the insn.

In construction, match_dup acts just like match_operand: the operand is substituted into the insn being constructed. But in matching, match_dup behaves differently. It assumes that operand number n has already been determined by a match_operand appearing earlier in the recognition template, and it matches only an identical-looking expression.

(match_operator:m n predicate [operands...])

This pattern is a kind of placeholder for a variable RTL expression code.

When constructing an insn, it stands for an RTL expression whose expression code is taken from that of operand n, and whose operands are constructed from the patterns operands.

When matching an expression, it matches an expression if the function predicate returns nonzero on that expression and the patterns operands match the operands of the expression.

Suppose that the function commutative_operator is defined as follows, to match any expression whose operator is one of the commutative arithmetic operators of RTL and whose mode is mode:

int
commutative_operator (x, mode)
     rtx x;
     enum machine_mode mode;
{
  enum rtx_code code = GET_CODE (x);
  if (GET_MODE (x) != mode)
    return 0;
  return (GET_RTX_CLASS (code) == 'c'
          || code == EQ || code == NE);
}

Then the following pattern will match any RTL expression consisting of a commutative operator applied to two general operands:

(match_operator:SI 3 "commutative_operator"
  [(match_operand:SI 1 "general_operand" "g")
   (match_operand:SI 2 "general_operand" "g")])

Here the vector [operands...] contains two patterns because the expressions to be matched all contain two operands.

When this pattern does match, the two operands of the commutative operator are recorded as operands 1 and 2 of the insn. (This is done by the two instances of match_operand.) Operand 3 of the insn will be the entire commutative expression: use GET_CODE (operands[3]) to see which commutative operator was used.

The machine mode m of match_operator works like that of match_operand: it is passed as the second argument to the predicate function, and that function is solely responsible for deciding whether the expression to be matched "has" that mode.

When constructing an insn, argument 3 of the gen-function will specify the operation (i.e. the expression code) for the expression to be made. It should be an RTL expression, whose expression code is copied into a new expression whose operands are arguments 1 and 2 of the gen-function. The subexpressions of argument 3 are not used; only its expression code matters.

When match_operator is used in a pattern for matching an insn, it usually best if the operand number of the match_operator is higher than that of the actual operands of the insn. This improves register allocation because the register allocator often looks at operands 1 and 2 of insns to see if it can do register tying.

There is no way to specify constraints in match_operator. The operand of the insn which corresponds to the match_operator never has any constraints because it is never reloaded as a whole. However, if parts of its operands are matched by match_operand patterns, those parts may have constraints of their own.

(match_op_dup:m n[operands...])

Like match_dup, except that it applies to operators instead of operands. When constructing an insn, operand number n will be substituted at this point. But in matching, match_op_dup behaves differently. It assumes that operand number n has already been determined by a match_operator appearing earlier in the recognition template, and it matches only an identical-looking expression.

(match_parallel n predicate [subpat...])

This pattern is a placeholder for an insn that consists of a parallel expression with a variable number of elements. This expression should only appear at the top level of an insn pattern.

When constructing an insn, operand number n will be substituted at this point. When matching an insn, it matches if the body of the insn is a parallel expression with at least as many elements as the vector of subpat expressions in the match_parallel, if each subpat matches the corresponding element of the parallel, and the function predicate returns nonzero on the parallel that is the body of the insn. It is the responsibility of the predicate to validate elements of the parallel beyond those listed in the match_parallel.

A typical use of match_parallel is to match load and store multiple expressions, which can contains a variable number of elements

Machine Descriptions

Everything about Instruction Patterns

Example of define_insn

RTL Template

Example of `define_insn`