Skip to content

Convert packed floating point to signed integer #2320

Merged
jlb6740 merged 3 commits into
bytecodealliance:mainfrom
jlb6740:convert_sat
Oct 28, 2020
Merged

Convert packed floating point to signed integer #2320
jlb6740 merged 3 commits into
bytecodealliance:mainfrom
jlb6740:convert_sat

Conversation

@jlb6740
Copy link
Copy Markdown
Contributor

@jlb6740 jlb6740 commented Oct 25, 2020

No description provided.

@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Oct 25, 2020
@jlb6740 jlb6740 marked this pull request as ready for review October 26, 2020 03:44
@jlb6740 jlb6740 requested review from abrown, cfallin and julian-seward1 and removed request for cfallin October 26, 2020 03:44
Comment thread cranelift/codegen/src/isa/x64/lower.rs

// Get the low 16 bits
ctx.emit(Inst::xmm_rmi_reg(SseOpcode::Pslld, RegMemImm::imm(16), tmp));
ctx.emit(Inst::xmm_rmi_reg(SseOpcode::Psrld, RegMemImm::imm(16), tmp));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used PBLENDW in the old backend and so does V8... what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh .. yes. V8 uses pblendw as well. This adds an extra instruction; both pblendw and pslld/psrld have the same instruction latency. I choose not to use pblendw though because it is only compatible with SSE4_1 or greater while shifts are compatible with SSE2 which I thought was the base target for SIMD. In general not sure how in the new backend we are guarding lowering based on compatibility level so for now I am lowering based on the lowest denominator. What do you think??

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool--having both would actually be great: if ...has_sse41() { [emit pblendw] } else { [emit the double shift] }. That information is available in EmitInfo.isa_flags but unfortunately that struct is not present until the emit phase. If we created a new Inst::XmmLowBits { dst, bits_to_retain } (or something like that) and lowered to that then in the emit phase we could pick which version we want based on the EmitInfo. The other option would be to try to make those ISA flags available during lowering but that seems harder to do (@cfallin?).

Comment thread cranelift/codegen/src/isa/x64/lower.rs
Comment thread cranelift/codegen/src/isa/x64/lower.rs Outdated
tmp,
));

// Convert the float to double.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Convert the float to double.
// Convert the float to double quadword.

Copy link
Copy Markdown
Contributor Author

@jlb6740 jlb6740 Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By double I meant double word. This is converting to a double word though not a double quadword. I'll change it to say packed doubleword.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic looks right but can we add the CLIF tests that verify these individual instructions? I'm thinking that simd-conversion-run.clif and simd-conversion-legalize.clif (once converted to being a test compile emitting vcode) would be very useful to see that each instruction works correctly and compiles to the sequence we expect.

It definitely passes the SIMD Spectest so I am confident it is correct but let me look to add a file test as well. 👍

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These instructions are tested in simd_conversions.wast but this file has not been enabled in experimental_x64_should_panic in the build.rs so I don't think any spec tests are running for these instructions. Unfortunately, that spec test also checks narrow and widen which I found a bit annoying; a lot has to be implemented for the spec test to be enabled. So I guess the CLIF file tests will be the only things checking this until all conversions are implemented.

Copy link
Copy Markdown
Contributor Author

@jlb6740 jlb6740 Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right they aren't enabled by default. I ran them manually though, basically removed all tests except the ones related to the packed float conversion to packed signed int. I also confirmed that it was indeed running as expected while testing. Separately I've also included the file tests that have tests for this conversion .. commenting out tests packed float to packed unsigned int which isn't supported yet.

unimplemented!("f32x4.convert_i32x4_u");
} else {
unreachable!();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreachable! is not correct here because there are two other opcodes that could reach this: Opcode::FcvtToUint | Opcode::FcvtToSint; perhaps we should do a match on all four opcodes (filling the others with unimplemented!()) and then _ => unreachable!() at the end.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I disagree though I may be wrong. This code is guarded by a check for vector instructions and so afaik neither Opcode::FcvtToUint or Opcode::FcvtToSint are supported for vector input. In the context of a vector instruction it currently impossible to reach this branch with those opcodes right? This question has come up before where I reach for using unreachable instead of implementing it as unimplemented. I can change to unimplemented but not really sure the rules for applying unimplemented vs unreachable when context is considered. Certainly support for vector input for Opcode::FcvtToUint or Opcode::FcvtToSint could be added, but then that is the case for most places in the backend were the unreachable! is used instead of unimplemented! For example there are places where we match on a type (pshufd use in extractlane for example) and say the default _ => is unreachable simply because a type is not supported, but if there was need for that support and that support were added it is suddenly unimplemented and not unreachable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Opcode::FcvtToUint and Opcode::FcvtToSint are not supported then this should remain unreachable!; maybe add a note because a straightforward reading of the code would expect these to be implemented.

Copy link
Copy Markdown
Member

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic looks right but can we add the CLIF tests that verify these individual instructions? I'm thinking that simd-conversion-run.clif and simd-conversion-legalize.clif (once converted to being a test compile emitting vcode) would be very useful to see that each instruction works correctly and compiles to the sequence we expect.

@jlb6740 jlb6740 force-pushed the convert_sat branch 2 times, most recently from b5e9a14 to 9b43733 Compare October 27, 2020 00:59
@jlb6740 jlb6740 requested a review from abrown October 27, 2020 00:59
@jlb6740
Copy link
Copy Markdown
Contributor Author

jlb6740 commented Oct 27, 2020

Hopefully all issues have been addressed, but let me know if there is anything else.

Copy link
Copy Markdown
Member

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, see comments. @cfallin, any thoughts on a way to determine what SSE features are available during lowering?

@@ -0,0 +1,34 @@
test legalizer
set enable_simd
target x86_64 skylake
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently running the old backend; I think it should be modified to test compile and add feature "experimental_x64" (see simd-bitwise-compile.clif, e.g.).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok .. Yeah thanks. Will make this change too. It somehow was being acknowledged as testing was failing CI when I had the file tests for conversion to unsigned included, but I was having trouble running it on my machine. Will update.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abrown @bnjbvr .. Actually I am going to just remove this compile file test. It is checking for a very specific sequence of instructions which should not be static (set in stone). It will depend on optimizations or SSE feature flag set and is there anything else that can change register allocation even if the same instructions are used?

Comment thread cranelift/codegen/src/isa/x64/lower.rs Outdated
Comment on lines +2374 to +2376
// Since this branch is also guarded by a check for vector types
// neither Opcode::FcvtToUint nor Opcode::FcvtToSint can reach here
// as the first to branches will cover all reachable cases.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Since this branch is also guarded by a check for vector types
// neither Opcode::FcvtToUint nor Opcode::FcvtToSint can reach here
// as the first to branches will cover all reachable cases.
// Since this branch is also guarded by a check for vector types,
// neither Opcode::FcvtToUint nor Opcode::FcvtToSint can reach here
// (the vector variants do not exist).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Implements i32x4.trunc_sat_f32x4_s
Add portions of filetests simd-conversion-legalize.clif and simd-conversion-run.clif
that test fcvt_from_sint.f32x4
@bnjbvr
Copy link
Copy Markdown
Member

bnjbvr commented Oct 28, 2020

LGTM, see comments. @cfallin, any thoughts on a way to determine what SSE features are available during lowering?

X64Backend has a x64_flags field which contains this information, e.g. has_sse42() (as derived from the meta, x64 target-specific settings file); you could access it by changing slightly the signature of lower_insn_to_regs and passing it the self.x64_flags next to the self.flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants