[Feature Request] LayoutInference pass should be enhanced to analysis vectorize factor cross indices

Currently, when we write a set of nested loops to ensure 16-byte vectorized access, the code might look like this:

```python
for i in range(1):
    for v_3 in T.vectorized(16):
        B_shared[tx // 16, tx % 16 // 8, tx % 8 * 2 + v_3 // 8, v_3 % 8] = B[bx * 8 + tx // 16, ko * 2 + tx % 16 // 8, tx % 8 * 2 + v_3 // 8, v_3 % 8]
```

However, our current legalization pass transforms this into the following form:

```python
for i, v_3 in T.grid(1, 2):
    for vec in T.vectorized(8):
        B_shared[tx // 16, tx % 16 // 8, tx % 8 * 2 + (v_3 * 8 + vec) // 8, (v_3 * 8 + vec) % 8] = B[bx * 8 + tx // 16, ko * 2 + tx % 16 // 8, tx % 8 * 2 + (v_3 * 8 + vec) // 8, (v_3 * 8 + vec) % 8]
```

While this transformation achieves functional correctness, it introduces additional complexity in the indexing expressions and splits the vectorized loop into smaller chunks (e.g., breaking the 16-element vectorized access into two 8-element accesses). This reduces the efficiency of vectorized memory operations and complicates the generated code.

Proposed Enhancement:
To address this, the legalization pass should be enhanced to maintain the original vectorized structure and ensure that the indexing expressions remain as simple as possible. Specifically:
	1.	Preserve Single-Level Vectorization: Instead of breaking the 16-element vectorized loop into smaller subloops (e.g., two 8-element loops), the pass should retain the original T.vectorized(16) loop where possible.
	2.	Simplify Index Calculations: The pass should avoid introducing complex expressions like (v_3 * 8 + vec) for computing indices. Instead, it should aim to directly map the v_3 indices to the original structure (e.g., v_3 // 8 and v_3 % 8).
	3.	Optimize Performance: By preserving the larger vectorized loop and avoiding unnecessary transformations, the pass can generate more efficient, hardware-friendly code that takes better advantage of vectorized memory access.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] LayoutInference pass should be enhanced to analysis vectorize factor cross indices #266

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature Request] LayoutInference pass should be enhanced to analysis vectorize factor cross indices #266

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions