riscv64: Improve icmp codegen#6112
Conversation
cfallin
left a comment
There was a problem hiding this comment.
Thanks for going through all of the cases and working out the correct sequences -- this is fairly subtle, with the minimal set of instructions we have on this ISA.
I share your concern/discomfort/... with the handling of signed i8; I suspect maybe we can define a generic "sign-extend from this ty to Imm12" helper and avoid the special case; with that addressed, the rest LGTM to me.
jameysharp
left a comment
There was a problem hiding this comment.
I like the improved output! Before we merge this though I have some suggestions and questions.
This reverts commit 4bd46a6.
|
This was a fairly big rebase since most of these changes conflicted with #5888 which also added the |
jameysharp
left a comment
There was a problem hiding this comment.
I'm not sure all of these rules are correct. I've left notes on what I've been able to look at so far but I haven't thought through everything yet.
I think it's useful for gen_icmp_imm to add special cases for Equal and NotEqual to 0, using seqz and snez respectively.
One thought: all comparisons where neither operand is constant can be converted to tmp = x - y followed by gen_icmp_imm cc tmp 0, right?
I think there's some way to confine much of the complexity to gen_icmp_imm instead of having just as many cases also appearing in gen_icmp_inner. I'm having trouble reasoning through the cases though. In particular you might want to use y - x and reverse cc for some cases, but maybe those cases are sufficiently covered by incrementing the constant.
There are more special cases involving constants which might be useful to write rules for. For example, sure, you can rewrite icmp ule x, 0 to icmp ult x, 1, but both are equivalent to icmp eq x, 0 which has a dedicated instruction. That example isn't super important though if sltiu and seqz are both one instruction.
Sorry I don't have better organized thoughts on this. This is definitely making my brain hurt.
| (rule (rv_sltu rs1 rs2) | ||
| (alu_rrr (AluOPRRR.SltU) rs1 rs2)) | ||
|
|
||
There was a problem hiding this comment.
Looks like some trailing whitespace got added here. You can use git diff --check to check a range of commits for any whitespace issues like that.
| (rule 4 (gen_icmp_imm cc x imm ty) | ||
| (if-let (IntCC.UnsignedGreaterThan) (intcc_unsigned cc)) | ||
| (let ((res Reg (gen_icmp_imm (intcc_reverse cc) x (i64_add imm 1) ty))) | ||
| (rv_xori res (imm12_const 1)))) |
There was a problem hiding this comment.
Adding 1 to the constant may overflow. It isn't immediately obvious to me that this rule is always equivalent in that case.
If the constant is the maximum value for the given ty and the signedness of cc, and cc is *GreaterThan, then for all x this icmp must return false.
If we add 1 modulo the width of ty, then the result wraps around to the minimum for ty/cc. Then for all x, !(x < min) will be true. Which would be wrong.
However this doesn't add modulo the width of ty. Instead, it's a signed 64-bit addition. If ty is narrower than 64-bit, then x is always less than the result (whether it's compared signed or unsigned), so the inverse of that result is always false, which is correct.
If ty is I64 and cc is SignedGreaterThan, then the signed 64-bit addition overflows. Then checking signed less-than will be false, the final result will be true, so the rule is incorrect.
If ty is I64 and cc is UnsignedGreaterThan, then the signed 64-bit addition does not overflow, but interpreting it as unsigned does overflow. Then checking unsigned less-than is false, etc.
So I think this rule is wrong in case of overflow when ty is I64, isn't it?
There was a problem hiding this comment.
Yeah, this case used to be covered when we used Imm12 instead of i64. Since imm12_add is partial we would have that automatically checked for us. I forgot to check for that when changing it to i64.
| ;; i.e. `x <= imm` is the same as `x < imm + 1`. | ||
| (rule 2 (gen_icmp_imm cc x imm ty) | ||
| (if-let (IntCC.UnsignedLessThanOrEqual) (intcc_unsigned cc)) | ||
| (gen_icmp_imm (intcc_without_equal cc) x (i64_add imm 1) ty)) |
There was a problem hiding this comment.
Similarly, I think this rule is wrong if adding 1 overflows and ty is I64.
| (let ((extend_op ExtendOp (icmp_intcc_extend cc ty)) | ||
| (x_ext Reg (extend x extend_op ty $I64)) | ||
| (y_ext Reg (extend y extend_op ty $I64))) | ||
| (gen_icmp_inner cc x_ext y_ext ty))) |
There was a problem hiding this comment.
Given that the operands were just extended to I64, should gen_icmp_inner be called with $I64 instead of ty?
| (if-let imm (imm12_sextend_i64 ty n)) | ||
| (rv_slti (sext x ty $I64) imm)) | ||
| (rule 1 (gen_icmp_imm (IntCC.UnsignedLessThan) x n (fits_in_64 ty)) | ||
| (if-let imm (imm12_sextend_i64 ty n)) |
There was a problem hiding this comment.
Why is sign-extending the constant the right thing to do for an unsigned comparison?
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
I'm closing this because I'm not going to look into it right now, and it looks like @alexcrichton is looking into something similar in #7113. Nevertheless thanks @jameysharp for all of the feedback and helping me along with this ❤️! |
This commit is the first in a few steps to reimplement bytecodealliance#6112 and bytecodealliance#7113. The `Icmp` pseudo-instruction is removed here and necessary functionality is all pushed into ISLE lowerings. This enables deleting the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary, meaning that all conditional branches should ideally be generated in lowering rather than pseudo-instructions to benefit from various optimizations. Currently the lowering is the bare-bones minimum to get things working. This involved creating a helper to lower an `IntegerCompare` into an `XReg` via negation/swapping args/etc. In generated code this removes branches and makes `icmp` a straight-line instruction for non-128-bit arguments.
* riscv64: Constants are always sign-extended Skip generating extra sign-extension instructions in this case because the materialization of a constant will implicitly sign-extend into the entire register. * riscv64: Rename `lower_int_compare` helper Try to reserve `lower_*` as taking a thing and producing a `*Reg`. Rename this helper to `is_nonzero_cmp` to represent how it's testing for nonzero and producing a comparison. * riscv64: Rename some float comparison helpers * `FCmp` -> `FloatCompare` * `emit_fcmp` -> `fcmp_to_float_compare` * `lower_fcmp` -> `lower_float_compare` Make some room for upcoming integer comparison functions. * riscv64: Remove `ordered` helper This is only used by one lowering so inline its definition directly. * riscv64: Remove the `Icmp` pseudo-instruction This commit is the first in a few steps to reimplement #6112 and #7113. The `Icmp` pseudo-instruction is removed here and necessary functionality is all pushed into ISLE lowerings. This enables deleting the `lower_br_icmp` helper in `emit.rs` as it's no longer necessary, meaning that all conditional branches should ideally be generated in lowering rather than pseudo-instructions to benefit from various optimizations. Currently the lowering is the bare-bones minimum to get things working. This involved creating a helper to lower an `IntegerCompare` into an `XReg` via negation/swapping args/etc. In generated code this removes branches and makes `icmp` a straight-line instruction for non-128-bit arguments. * riscv64: Remove an unused helper * riscv64: Optimize comparisons with 0 Use the `x0` register which is always zero where possible and avoid unnecessary `xor`s against this register. * riscv64: Specialize `a < $imm` * riscv64: Optimize Equal/NotEqual against constants * riscv64: Optimize LessThan with constant argument * Optimize `a <= $imm` * riscv64: Optimize `a >= $imm` * riscv64: Add comment for new helper * Use i64 in icmp optimizations Matches the sign-extension that happens at the hardware layer. * Correct some sign extensions * riscv64: Don't assume immediates are extended * riscv64: Fix encoding for `c.addi4spn` * riscv64: Remove `icmp` lowerings which modify constants These aren't correct and will need updating * Add regression test * riscv64: Fix handling unsigned comparisons with constants --------- Co-authored-by: Afonso Bordado <afonsobordado@az8.co>
👋 Hey,
This PR improves our current lowerings for
icmpand moves them to ISLE. We currently emit a 4 instruction sequence that moves 1 or 0 to a register based on a branch. This PR changes the lowerings to use the dedicatedicmpinstructions.These are:
slt rd, rs1, rs2sltu rd, rs1, rs2slti rd, rs1, immsltu rd, rs1, immAll other
IntCC's are handled with some variation of the above. Additionally I've also added some optimizations when the comparision is done with an immediate.This has been fuzzing for most of today (~8h) without any issues.