Skip to content

Fix GeLU stack overflow on Phoenix; fix results commits with new Phoenix CI#108

Merged
andrej merged 5 commits intoamd:develfrom
andrej:phoenix-ci-fixes
Apr 24, 2026
Merged

Fix GeLU stack overflow on Phoenix; fix results commits with new Phoenix CI#108
andrej merged 5 commits intoamd:develfrom
andrej:phoenix-ci-fixes

Conversation

@andrej
Copy link
Copy Markdown
Collaborator

@andrej andrej commented Apr 23, 2026

This should restore the green checkmark for the extensive test suite (and tacking on a little beautification of the aggregate CI results summary)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

CI Test Results

dd00072 (2026_04_24_00_48_25)

IRONCLAD - CI Summary

Examples

Test Krackan Phoenix
llama_3.2_1b_prompt_1024_tokens_1 pass -
llama_3.2_1b_prompt_1024_tokens_40 pass -
llama_3.2_1b_prompt_13_tokens_1 pass -
llama_3.2_1b_prompt_13_tokens_40 pass -

Small

Test Krackan Phoenix
M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128 pass pass
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1 pass -
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1 pass pass
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1 pass pass
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 pass pass
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 pass pass
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1 pass -
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048 pass pass
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024 pass pass
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512 pass pass
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256 pass -
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8 pass pass
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8 pass pass
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1 pass pass
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4 pass pass
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024 pass pass
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024 pass pass
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024 pass pass
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024 pass -
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1 pass -
embedding_dim_1024-hidden_dim_3584 pass pass
embedding_dim_2048-hidden_dim_2048 pass pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048 pass pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32 pass pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False pass pass
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True pass pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024 pass pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32 pass pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False pass pass
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True pass pass
input_length_2048-num_aie_columns_1-tile_size_2048 pass pass
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0 pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024 pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32 pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False pass pass
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True pass pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512 pass pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32 pass pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False pass pass
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True pass pass
input_length_2048-num_aie_columns_2-tile_size_1024 pass pass
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0 pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512 pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32 pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False pass pass
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True pass pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256 pass pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32 pass pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False pass pass
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True pass -
input_length_2048-num_aie_columns_4-tile_size_512 pass pass
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0 pass pass
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256 pass -
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32 pass -
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False pass -
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True pass -
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128 pass -
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32 pass -
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False pass -
input_length_2048-num_aie_columns_8-tile_size_256 pass -
input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0 pass -
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048 pass pass
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128 pass -
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024 pass pass
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024 pass pass
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512 pass pass
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512 pass pass
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256 pass -
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256 pass pass
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024 pass pass
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048 pass pass
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512 pass pass
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0 pass pass
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0 pass pass
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0 pass pass
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0 pass -
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0 pass pass
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0 pass pass
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0 pass pass
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0 pass -
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0 pass -
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False pass pass

Extensive

Test Krackan Phoenix
(no data) - -
Krackan - Small

IRONCLAD

Tested on 2026_04_24_00_48_25 at commit dd00072.

Test Checks Latency (mean)Bandwidth (mean)Throughput (mean)
M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128 ✅ 5/5 n/a 0.20 0.19
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1 ✅ 5/5 2329.94 4.05 1593.28
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 272.46 0.85 36.36
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 225.66 0.99 42.29
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 49292.12 0.51 348.53
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 28591.80 0.88 600.91
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 7266.32 3.46 2364.62
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048 ✅ 5/5 n/a 12.83 12.82
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024 ✅ 5/5 n/a 23.19 23.18
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512 ✅ 5/5 n/a 40.57 40.55
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256 ✅ 5/5 n/a 43.68 43.66
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8 ✅ 5/5 200.96 2.66 n/a
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8 ✅ 5/5 188.06 2.80 n/a
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1 ✅ 5/5 2336.32 3.49 914.32
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4 ✅ 5/5 3518.90 0.37 19.74
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 12.45 12.44
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 24.47 24.46
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 39.12 39.10
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 43.49 43.47
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1 ✅ 5/5 1585.40 4.27 1319.64
embedding_dim_1024-hidden_dim_3584 ✅ 5/5 3498.04 0.00 n/a
embedding_dim_2048-hidden_dim_2048 ✅ 5/5 3814.21 0.00 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048 ✅ 30/30 167.77 0.05 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32 ✅ 5/5 167.06 0.03 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False ✅ 5/5 160.62 0.05 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True ✅ 5/5 204.82 0.06 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024 ✅ 25/25 172.70 0.05 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32 ✅ 5/5 171.02 0.03 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False ✅ 5/5 155.26 0.05 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True ✅ 5/5 193.60 0.05 n/a
input_length_2048-num_aie_columns_1-tile_size_2048 ✅ 10/10 162.33 0.08 n/a
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0 ✅ 5/5 149.84 0.08 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024 ✅ 30/30 176.91 0.05 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32 ✅ 5/5 186.28 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False ✅ 5/5 195.02 0.04 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True ✅ 5/5 221.80 0.05 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 25/25 188.58 0.05 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32 ✅ 5/5 160.94 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False ✅ 5/5 166.54 0.05 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True ✅ 5/5 182.94 0.05 n/a
input_length_2048-num_aie_columns_2-tile_size_1024 ✅ 10/10 162.55 0.08 n/a
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0 ✅ 5/5 173.60 0.07 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512 ✅ 30/30 172.64 0.05 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32 ✅ 5/5 166.84 0.03 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False ✅ 5/5 169.44 0.05 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True ✅ 5/5 182.56 0.05 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256 ✅ 25/25 177.94 0.05 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32 ✅ 5/5 177.76 0.03 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False ✅ 5/5 171.30 0.05 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True ✅ 5/5 218.10 0.04 n/a
input_length_2048-num_aie_columns_4-tile_size_512 ✅ 10/10 171.80 0.07 n/a
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0 ✅ 5/5 183.82 0.07 n/a
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256 ✅ 30/30 191.75 0.04 n/a
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32 ✅ 5/5 204.40 0.03 n/a
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False ✅ 5/5 185.44 0.04 n/a
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True ✅ 5/5 174.76 0.05 n/a
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128 ✅ 25/25 224.14 0.04 n/a
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32 ✅ 5/5 219.98 0.02 n/a
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False ✅ 5/5 228.12 0.04 n/a
input_length_2048-num_aie_columns_8-tile_size_256 ✅ 10/10 216.33 0.06 n/a
input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0 ✅ 5/5 233.64 0.06 n/a
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048 ✅ 5/5 163.02 0.05 n/a
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128 ✅ 5/5 197.94 0.04 n/a
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024 ✅ 5/5 157.46 0.06 n/a
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024 ✅ 5/5 163.88 0.05 n/a
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512 ✅ 5/5 203.76 0.05 n/a
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512 ✅ 5/5 185.02 0.05 n/a
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256 ✅ 5/5 174.12 0.05 n/a
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256 ✅ 5/5 182.54 0.05 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024 ✅ 5/5 178.56 0.74 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048 ✅ 5/5 200.12 0.67 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 5/5 192.74 0.72 n/a
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0 ✅ 5/5 184.02 0.55 n/a
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0 ✅ 5/5 213.02 0.48 n/a
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0 ✅ 5/5 205.00 0.49 n/a
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0 ✅ 5/5 230.80 0.45 n/a
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0 ✅ 5/5 174.04 0.43 n/a
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0 ✅ 5/5 207.48 0.36 n/a
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0 ✅ 5/5 179.66 0.42 n/a
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0 ✅ 5/5 204.96 0.39 n/a
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0 ✅ 5/5 40692.98 0.21 n/a
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False ✅ 5/5 9867.74 0.22 n/a

Trends:

IRONCLAD Trends

M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:530.26 (n/a)0.20 (n/a)0.19 (n/a)0.16 (n/a)0.04 (n/a)0.26 (n/a)0.19 (n/a)0.19 (n/a)0.16 (n/a)0.04 (n/a)

M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:534.35 (n/a)4.05 (n/a)4.14 (n/a)3.77 (n/a)0.27 (n/a)2497.30 (n/a)2329.94 (n/a)2271.70 (n/a)2159.90 (n/a)154.35 (n/a)1712.74 (n/a)1593.28 (n/a)1628.46 (n/a)1481.34 (n/a)104.46 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:531.06 (n/a)0.85 (n/a)0.94 (n/a)0.60 (n/a)0.20 (n/a)368.50 (n/a)272.46 (n/a)235.20 (n/a)209.30 (n/a)69.58 (n/a)45.08 (n/a)36.36 (n/a)40.12 (n/a)25.61 (n/a)8.47 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:531.15 (n/a)0.99 (n/a)1.00 (n/a)0.83 (n/a)0.12 (n/a)268.00 (n/a)225.66 (n/a)221.40 (n/a)192.70 (n/a)27.24 (n/a)48.98 (n/a)42.29 (n/a)42.63 (n/a)35.21 (n/a)4.93 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:530.51 (n/a)0.51 (n/a)0.51 (n/a)0.51 (n/a)0.00 (n/a)49446.50 (n/a)49292.12 (n/a)49272.20 (n/a)49228.10 (n/a)89.94 (n/a)348.99 (n/a)348.53 (n/a)348.67 (n/a)347.44 (n/a)0.63 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:530.89 (n/a)0.88 (n/a)0.88 (n/a)0.87 (n/a)0.01 (n/a)28926.30 (n/a)28591.80 (n/a)28625.70 (n/a)28209.50 (n/a)255.88 (n/a)609.01 (n/a)600.91 (n/a)600.16 (n/a)593.92 (n/a)5.39 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:533.51 (n/a)3.46 (n/a)3.46 (n/a)3.40 (n/a)0.04 (n/a)7409.80 (n/a)7266.32 (n/a)7265.90 (n/a)7163.10 (n/a)92.48 (n/a)2398.37 (n/a)2364.62 (n/a)2364.45 (n/a)2318.53 (n/a)29.90 (n/a)

M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:5313.34 (n/a)12.83 (n/a)13.04 (n/a)12.06 (n/a)0.52 (n/a)13.33 (n/a)12.82 (n/a)13.03 (n/a)12.05 (n/a)0.52 (n/a)

M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:5325.17 (n/a)23.19 (n/a)24.08 (n/a)18.27 (n/a)2.79 (n/a)25.15 (n/a)23.18 (n/a)24.06 (n/a)18.26 (n/a)2.78 (n/a)

M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:5341.85 (n/a)40.57 (n/a)40.18 (n/a)39.66 (n/a)0.88 (n/a)41.83 (n/a)40.55 (n/a)40.16 (n/a)39.64 (n/a)0.88 (n/a)

M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:5345.34 (n/a)43.68 (n/a)43.96 (n/a)42.29 (n/a)1.24 (n/a)45.31 (n/a)43.66 (n/a)43.93 (n/a)42.26 (n/a)1.24 (n/a)

M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:533.31 (n/a)2.66 (n/a)2.66 (n/a)2.12 (n/a)0.43 (n/a)247.40 (n/a)200.96 (n/a)196.90 (n/a)158.60 (n/a)32.00 (n/a)

M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:533.02 (n/a)2.80 (n/a)2.80 (n/a)2.59 (n/a)0.16 (n/a)202.80 (n/a)188.06 (n/a)187.10 (n/a)173.50 (n/a)10.93 (n/a)

M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:534.07 (n/a)3.49 (n/a)3.49 (n/a)2.96 (n/a)0.40 (n/a)2727.70 (n/a)2336.32 (n/a)2309.40 (n/a)1979.60 (n/a)267.28 (n/a)1067.87 (n/a)914.32 (n/a)915.34 (n/a)774.97 (n/a)104.65 (n/a)

M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:530.51 (n/a)0.37 (n/a)0.36 (n/a)0.30 (n/a)0.08 (n/a)4185.80 (n/a)3518.90 (n/a)3489.30 (n/a)2456.30 (n/a)668.17 (n/a)27.32 (n/a)19.74 (n/a)19.23 (n/a)16.03 (n/a)4.47 (n/a)

M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:5313.40 (n/a)12.45 (n/a)13.24 (n/a)10.06 (n/a)1.43 (n/a)13.39 (n/a)12.44 (n/a)13.23 (n/a)10.05 (n/a)1.43 (n/a)

M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:5325.07 (n/a)24.47 (n/a)24.89 (n/a)23.54 (n/a)0.72 (n/a)25.05 (n/a)24.46 (n/a)24.87 (n/a)23.53 (n/a)0.72 (n/a)

M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:5340.76 (n/a)39.12 (n/a)39.22 (n/a)36.97 (n/a)1.42 (n/a)40.74 (n/a)39.10 (n/a)39.19 (n/a)36.95 (n/a)1.42 (n/a)

M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:5346.05 (n/a)43.49 (n/a)43.10 (n/a)42.04 (n/a)1.57 (n/a)46.02 (n/a)43.47 (n/a)43.07 (n/a)42.01 (n/a)1.57 (n/a)

M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:42:534.79 (n/a)4.27 (n/a)4.58 (n/a)3.54 (n/a)0.62 (n/a)1880.20 (n/a)1585.40 (n/a)1451.20 (n/a)1389.70 (n/a)241.70 (n/a)1478.92 (n/a)1319.64 (n/a)1416.23 (n/a)1093.11 (n/a)191.05 (n/a)

embedding_dim_1024-hidden_dim_3584

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)3506.05 (n/a)3498.04 (n/a)3496.30 (n/a)3492.61 (n/a)5.65 (n/a)

embedding_dim_2048-hidden_dim_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)4824.10 (n/a)3814.21 (n/a)3566.92 (n/a)3554.62 (n/a)564.57 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)214.60 (n/a)167.77 (n/a)170.35 (n/a)113.20 (n/a)31.72 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)189.30 (n/a)167.06 (n/a)171.70 (n/a)145.40 (n/a)18.97 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)214.80 (n/a)160.62 (n/a)149.90 (n/a)131.10 (n/a)32.17 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.08 (n/a)0.06 (n/a)0.07 (n/a)0.04 (n/a)0.02 (n/a)311.20 (n/a)204.82 (n/a)179.40 (n/a)151.90 (n/a)65.30 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)236.50 (n/a)172.70 (n/a)175.20 (n/a)110.10 (n/a)31.84 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.04 (n/a)0.03 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)306.10 (n/a)171.02 (n/a)145.10 (n/a)122.70 (n/a)76.14 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)172.50 (n/a)155.26 (n/a)164.80 (n/a)134.00 (n/a)18.49 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)214.60 (n/a)193.60 (n/a)198.20 (n/a)161.60 (n/a)19.57 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.09 (n/a)0.08 (n/a)0.08 (n/a)0.06 (n/a)0.01 (n/a)191.50 (n/a)162.33 (n/a)162.35 (n/a)134.10 (n/a)17.11 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.10 (n/a)0.08 (n/a)0.09 (n/a)0.06 (n/a)0.02 (n/a)195.40 (n/a)149.84 (n/a)136.40 (n/a)118.60 (n/a)31.81 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)235.60 (n/a)176.91 (n/a)172.50 (n/a)118.80 (n/a)29.56 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)259.60 (n/a)186.28 (n/a)182.00 (n/a)136.40 (n/a)47.19 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.04 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)250.20 (n/a)195.02 (n/a)172.00 (n/a)147.20 (n/a)50.15 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.05 (n/a)0.06 (n/a)0.03 (n/a)0.02 (n/a)385.60 (n/a)221.80 (n/a)180.10 (n/a)147.90 (n/a)95.15 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.05 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)327.10 (n/a)188.58 (n/a)187.90 (n/a)121.60 (n/a)49.96 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)187.30 (n/a)160.94 (n/a)163.60 (n/a)119.00 (n/a)28.09 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)209.30 (n/a)166.54 (n/a)174.10 (n/a)129.10 (n/a)33.45 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)216.70 (n/a)182.94 (n/a)184.40 (n/a)160.10 (n/a)22.30 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.10 (n/a)0.08 (n/a)0.07 (n/a)0.06 (n/a)0.02 (n/a)217.20 (n/a)162.55 (n/a)164.60 (n/a)118.90 (n/a)34.25 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.10 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.02 (n/a)205.20 (n/a)173.60 (n/a)188.30 (n/a)125.80 (n/a)32.87 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)279.60 (n/a)172.64 (n/a)170.30 (n/a)120.50 (n/a)35.65 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)233.50 (n/a)166.84 (n/a)157.80 (n/a)119.90 (n/a)41.48 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.05 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)190.40 (n/a)169.44 (n/a)183.80 (n/a)120.80 (n/a)28.39 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)215.10 (n/a)182.56 (n/a)195.10 (n/a)139.20 (n/a)31.38 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)320.80 (n/a)177.94 (n/a)176.10 (n/a)114.90 (n/a)41.61 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.03 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)200.70 (n/a)177.76 (n/a)177.10 (n/a)156.50 (n/a)16.93 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)197.30 (n/a)171.30 (n/a)168.20 (n/a)147.30 (n/a)18.74 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)315.70 (n/a)218.10 (n/a)213.60 (n/a)170.80 (n/a)58.87 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.11 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.02 (n/a)221.10 (n/a)171.80 (n/a)171.60 (n/a)113.70 (n/a)34.28 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.11 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.02 (n/a)235.60 (n/a)183.82 (n/a)181.40 (n/a)110.10 (n/a)48.28 (n/a)

input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)297.90 (n/a)191.75 (n/a)196.05 (n/a)123.90 (n/a)39.92 (n/a)

input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)281.80 (n/a)204.40 (n/a)189.10 (n/a)145.60 (n/a)51.48 (n/a)

input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.05 (n/a)0.04 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)231.10 (n/a)185.44 (n/a)179.20 (n/a)157.80 (n/a)27.34 (n/a)

input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)215.50 (n/a)174.76 (n/a)161.50 (n/a)150.40 (n/a)26.04 (n/a)

input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)355.60 (n/a)224.14 (n/a)219.10 (n/a)146.10 (n/a)54.62 (n/a)

input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)307.60 (n/a)219.98 (n/a)213.90 (n/a)169.50 (n/a)55.72 (n/a)

input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.00 (n/a)253.10 (n/a)228.12 (n/a)231.00 (n/a)186.60 (n/a)25.27 (n/a)

input_length_2048-num_aie_columns_8-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)325.60 (n/a)216.33 (n/a)200.15 (n/a)166.20 (n/a)52.72 (n/a)

input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.09 (n/a)0.06 (n/a)0.05 (n/a)0.04 (n/a)0.02 (n/a)312.50 (n/a)233.64 (n/a)242.60 (n/a)135.80 (n/a)67.38 (n/a)

input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)209.90 (n/a)163.02 (n/a)159.60 (n/a)130.20 (n/a)32.34 (n/a)

input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)269.20 (n/a)197.94 (n/a)204.50 (n/a)122.20 (n/a)54.36 (n/a)

input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.08 (n/a)0.06 (n/a)0.05 (n/a)0.04 (n/a)0.02 (n/a)207.30 (n/a)157.46 (n/a)160.80 (n/a)99.00 (n/a)47.60 (n/a)

input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)194.40 (n/a)163.88 (n/a)164.60 (n/a)138.00 (n/a)23.24 (n/a)

input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.07 (n/a)0.05 (n/a)0.04 (n/a)0.02 (n/a)0.02 (n/a)368.40 (n/a)203.76 (n/a)185.40 (n/a)120.20 (n/a)96.73 (n/a)

input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.05 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)234.80 (n/a)185.02 (n/a)169.70 (n/a)150.80 (n/a)33.40 (n/a)

input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)201.80 (n/a)174.12 (n/a)188.90 (n/a)132.30 (n/a)28.40 (n/a)

input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)214.40 (n/a)182.54 (n/a)180.00 (n/a)130.70 (n/a)32.97 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.80 (n/a)0.74 (n/a)0.72 (n/a)0.71 (n/a)0.04 (n/a)185.00 (n/a)178.56 (n/a)180.80 (n/a)164.30 (n/a)8.17 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.90 (n/a)0.67 (n/a)0.65 (n/a)0.57 (n/a)0.14 (n/a)231.30 (n/a)200.12 (n/a)201.60 (n/a)144.80 (n/a)35.36 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:531.00 (n/a)0.72 (n/a)0.67 (n/a)0.52 (n/a)0.19 (n/a)249.80 (n/a)192.74 (n/a)196.60 (n/a)130.50 (n/a)46.88 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.67 (n/a)0.55 (n/a)0.53 (n/a)0.42 (n/a)0.11 (n/a)236.20 (n/a)184.02 (n/a)184.80 (n/a)146.70 (n/a)37.08 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.62 (n/a)0.48 (n/a)0.49 (n/a)0.34 (n/a)0.11 (n/a)288.90 (n/a)213.02 (n/a)198.80 (n/a)159.70 (n/a)52.40 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.59 (n/a)0.49 (n/a)0.45 (n/a)0.41 (n/a)0.07 (n/a)237.60 (n/a)205.00 (n/a)216.30 (n/a)167.40 (n/a)29.91 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.59 (n/a)0.45 (n/a)0.43 (n/a)0.29 (n/a)0.11 (n/a)337.70 (n/a)230.80 (n/a)226.90 (n/a)165.40 (n/a)65.64 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.48 (n/a)0.43 (n/a)0.43 (n/a)0.37 (n/a)0.05 (n/a)198.10 (n/a)174.04 (n/a)172.80 (n/a)152.60 (n/a)20.22 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.41 (n/a)0.36 (n/a)0.35 (n/a)0.31 (n/a)0.04 (n/a)241.00 (n/a)207.48 (n/a)209.40 (n/a)179.30 (n/a)22.64 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.51 (n/a)0.42 (n/a)0.40 (n/a)0.37 (n/a)0.05 (n/a)199.10 (n/a)179.66 (n/a)184.00 (n/a)145.70 (n/a)20.77 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.58 (n/a)0.39 (n/a)0.39 (n/a)0.26 (n/a)0.12 (n/a)278.30 (n/a)204.96 (n/a)189.10 (n/a)127.70 (n/a)58.44 (n/a)

seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.21 (n/a)0.21 (n/a)0.21 (n/a)0.21 (n/a)0.00 (n/a)40901.50 (n/a)40692.98 (n/a)40652.90 (n/a)40581.80 (n/a)123.53 (n/a)

seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:42:530.28 (n/a)0.22 (n/a)0.24 (n/a)0.17 (n/a)0.05 (n/a)12683.19 (n/a)9867.74 (n/a)8644.74 (n/a)7547.37 (n/a)2270.41 (n/a)
Krackan - Examples

IRONCLAD

Tested on 2026_04_24_00_39_04 at commit dd00072.

Test Checks TTFT (mean)TPS (mean)
llama_3.2_1b_prompt_1024_tokens_1 ✅ 5/5 2.12 n/a
llama_3.2_1b_prompt_1024_tokens_40 ✅ 5/5 2.16 4.17
llama_3.2_1b_prompt_13_tokens_1 ✅ 5/5 2.08 n/a
llama_3.2_1b_prompt_13_tokens_40 ✅ 5/5 2.09 4.18

Trends:

IRONCLAD Trends

llama_3.2_1b_prompt_1024_tokens_1

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
dd00072 — 2026-04-24 00:33:182.14 (n/a)2.12 (n/a)2.12 (n/a)2.11 (n/a)0.01 (n/a)

llama_3.2_1b_prompt_1024_tokens_40

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
dd00072 — 2026-04-24 00:33:184.22 (n/a)4.17 (n/a)4.17 (n/a)4.15 (n/a)0.03 (n/a)2.29 (n/a)2.16 (n/a)2.13 (n/a)2.12 (n/a)0.07 (n/a)

llama_3.2_1b_prompt_13_tokens_1

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
dd00072 — 2026-04-24 00:33:182.09 (n/a)2.08 (n/a)2.08 (n/a)2.06 (n/a)0.01 (n/a)

llama_3.2_1b_prompt_13_tokens_40

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
dd00072 — 2026-04-24 00:33:184.21 (n/a)4.18 (n/a)4.18 (n/a)4.16 (n/a)0.02 (n/a)2.10 (n/a)2.09 (n/a)2.09 (n/a)2.07 (n/a)0.01 (n/a)
Phoenix - Small

IRONCLAD

Tested on 2026_04_24_00_40_35 at commit dd00072.

Test Checks Latency (mean)Bandwidth (mean)Throughput (mean)
M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128 ✅ 5/5 n/a 0.09 0.08
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 672.86 0.39 16.64
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1 ✅ 5/5 486.98 0.48 20.67
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 82838.26 0.30 207.46
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1 ✅ 5/5 24218.12 1.04 709.88
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048 ✅ 5/5 n/a 3.65 3.65
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024 ✅ 5/5 n/a 6.86 6.86
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512 ✅ 5/5 n/a 11.60 11.59
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8 ✅ 5/5 532.60 1.17 n/a
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8 ✅ 5/5 478.36 1.17 n/a
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1 ✅ 5/5 3645.06 2.33 610.70
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4 ✅ 5/5 6172.06 0.20 10.90
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 3.88 3.87
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 6.99 6.99
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024 ✅ 5/5 n/a 12.73 12.73
embedding_dim_1024-hidden_dim_3584 ✅ 5/5 11971.42 0.00 n/a
embedding_dim_2048-hidden_dim_2048 ✅ 5/5 15559.76 0.00 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048 ✅ 30/30 477.19 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32 ✅ 5/5 376.54 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False ✅ 5/5 356.32 0.03 n/a
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True ✅ 5/5 300.92 0.04 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024 ✅ 25/25 537.48 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32 ✅ 5/5 394.90 0.01 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False ✅ 5/5 379.34 0.02 n/a
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True ✅ 5/5 379.64 0.03 n/a
input_length_2048-num_aie_columns_1-tile_size_2048 ✅ 10/10 431.15 0.03 n/a
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0 ✅ 5/5 457.74 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024 ✅ 30/30 391.27 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32 ✅ 5/5 326.80 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False ✅ 5/5 571.76 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True ✅ 5/5 350.08 0.03 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 25/25 413.40 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32 ✅ 5/5 370.62 0.01 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False ✅ 5/5 390.82 0.02 n/a
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True ✅ 5/5 727.04 0.02 n/a
input_length_2048-num_aie_columns_2-tile_size_1024 ✅ 10/10 427.69 0.03 n/a
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0 ✅ 5/5 407.24 0.03 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512 ✅ 30/30 502.39 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32 ✅ 5/5 365.44 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False ✅ 5/5 320.70 0.03 n/a
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True ✅ 5/5 394.72 0.03 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256 ✅ 25/25 456.55 0.02 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32 ✅ 5/5 393.94 0.01 n/a
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False ✅ 5/5 356.22 0.03 n/a
input_length_2048-num_aie_columns_4-tile_size_512 ✅ 10/10 599.65 0.03 n/a
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0 ✅ 5/5 439.74 0.03 n/a
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048 ✅ 5/5 376.74 0.02 n/a
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024 ✅ 5/5 359.84 0.03 n/a
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024 ✅ 5/5 393.24 0.02 n/a
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512 ✅ 5/5 461.30 0.02 n/a
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512 ✅ 5/5 347.42 0.03 n/a
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256 ✅ 5/5 468.66 0.02 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024 ✅ 5/5 445.12 0.31 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048 ✅ 5/5 473.60 0.29 n/a
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512 ✅ 5/5 459.60 0.31 n/a
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0 ✅ 5/5 820.10 0.16 n/a
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0 ✅ 5/5 415.12 0.27 n/a
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0 ✅ 5/5 636.78 0.21 n/a
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0 ✅ 5/5 709.78 0.16 n/a
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0 ✅ 5/5 411.64 0.20 n/a
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0 ✅ 5/5 441.82 0.19 n/a
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False ✅ 5/5 18932.35 0.12 n/a

Trends:

IRONCLAD Trends

M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:380.14 (n/a)0.09 (n/a)0.07 (n/a)0.02 (n/a)0.05 (n/a)0.14 (n/a)0.08 (n/a)0.07 (n/a)0.02 (n/a)0.05 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:380.55 (n/a)0.39 (n/a)0.43 (n/a)0.17 (n/a)0.14 (n/a)1326.10 (n/a)672.86 (n/a)514.10 (n/a)399.80 (n/a)374.62 (n/a)23.60 (n/a)16.64 (n/a)18.36 (n/a)7.12 (n/a)6.17 (n/a)

M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:380.66 (n/a)0.48 (n/a)0.44 (n/a)0.34 (n/a)0.14 (n/a)652.10 (n/a)486.98 (n/a)507.50 (n/a)333.70 (n/a)133.96 (n/a)28.28 (n/a)20.67 (n/a)18.60 (n/a)14.47 (n/a)5.93 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:380.31 (n/a)0.30 (n/a)0.30 (n/a)0.30 (n/a)0.01 (n/a)84660.20 (n/a)82838.26 (n/a)83578.90 (n/a)80731.30 (n/a)1702.51 (n/a)212.80 (n/a)207.46 (n/a)205.55 (n/a)202.93 (n/a)4.29 (n/a)

M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:381.07 (n/a)1.04 (n/a)1.05 (n/a)0.99 (n/a)0.03 (n/a)25443.70 (n/a)24218.12 (n/a)24021.30 (n/a)23522.90 (n/a)728.15 (n/a)730.35 (n/a)709.88 (n/a)715.19 (n/a)675.21 (n/a)20.76 (n/a)

M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:383.94 (n/a)3.65 (n/a)3.71 (n/a)3.16 (n/a)0.30 (n/a)3.94 (n/a)3.65 (n/a)3.71 (n/a)3.16 (n/a)0.30 (n/a)

M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:387.61 (n/a)6.86 (n/a)6.96 (n/a)5.54 (n/a)0.83 (n/a)7.61 (n/a)6.86 (n/a)6.96 (n/a)5.53 (n/a)0.83 (n/a)

M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:3814.01 (n/a)11.60 (n/a)13.73 (n/a)8.25 (n/a)3.06 (n/a)14.00 (n/a)11.59 (n/a)13.72 (n/a)8.25 (n/a)3.05 (n/a)

M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:381.78 (n/a)1.17 (n/a)0.97 (n/a)0.58 (n/a)0.53 (n/a)905.50 (n/a)532.60 (n/a)538.60 (n/a)293.90 (n/a)250.33 (n/a)

M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:381.60 (n/a)1.17 (n/a)1.18 (n/a)0.82 (n/a)0.34 (n/a)635.90 (n/a)478.36 (n/a)445.60 (n/a)327.20 (n/a)139.87 (n/a)

M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:383.54 (n/a)2.33 (n/a)2.04 (n/a)1.92 (n/a)0.68 (n/a)4200.00 (n/a)3645.06 (n/a)3958.60 (n/a)2275.30 (n/a)784.15 (n/a)929.07 (n/a)610.70 (n/a)534.00 (n/a)503.31 (n/a)179.44 (n/a)

M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:380.21 (n/a)0.20 (n/a)0.21 (n/a)0.18 (n/a)0.01 (n/a)6802.90 (n/a)6172.06 (n/a)6057.60 (n/a)5806.30 (n/a)377.90 (n/a)11.56 (n/a)10.90 (n/a)11.08 (n/a)9.86 (n/a)0.63 (n/a)

M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:383.96 (n/a)3.88 (n/a)3.90 (n/a)3.78 (n/a)0.07 (n/a)3.96 (n/a)3.87 (n/a)3.90 (n/a)3.78 (n/a)0.07 (n/a)

M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:387.63 (n/a)6.99 (n/a)7.38 (n/a)5.23 (n/a)0.99 (n/a)7.62 (n/a)6.99 (n/a)7.37 (n/a)5.23 (n/a)0.99 (n/a)

M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
dd00072 — 2026-04-24 00:37:3814.14 (n/a)12.73 (n/a)13.85 (n/a)9.13 (n/a)2.11 (n/a)14.13 (n/a)12.73 (n/a)13.84 (n/a)9.12 (n/a)2.11 (n/a)

embedding_dim_1024-hidden_dim_3584

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)18223.72 (n/a)11971.42 (n/a)12995.72 (n/a)5381.90 (n/a)5221.89 (n/a)

embedding_dim_2048-hidden_dim_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)21279.77 (n/a)15559.76 (n/a)15929.40 (n/a)5947.35 (n/a)5849.09 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.04 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2071.10 (n/a)477.19 (n/a)329.15 (n/a)222.20 (n/a)429.91 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)625.00 (n/a)376.54 (n/a)289.60 (n/a)235.70 (n/a)161.97 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)557.50 (n/a)356.32 (n/a)350.00 (n/a)235.50 (n/a)129.36 (n/a)

input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)399.60 (n/a)300.92 (n/a)280.70 (n/a)246.50 (n/a)61.99 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.02 (n/a)0.03 (n/a)0.00 (n/a)0.01 (n/a)2463.60 (n/a)537.48 (n/a)306.30 (n/a)238.50 (n/a)540.68 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)523.50 (n/a)394.90 (n/a)436.00 (n/a)228.90 (n/a)135.14 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)616.90 (n/a)379.34 (n/a)293.80 (n/a)237.40 (n/a)169.26 (n/a)

input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)540.10 (n/a)379.64 (n/a)343.80 (n/a)244.50 (n/a)131.36 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.06 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)738.40 (n/a)431.15 (n/a)354.35 (n/a)217.60 (n/a)186.12 (n/a)

input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)543.60 (n/a)457.74 (n/a)481.00 (n/a)303.10 (n/a)100.87 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)607.00 (n/a)391.27 (n/a)336.90 (n/a)198.70 (n/a)145.38 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)513.60 (n/a)326.80 (n/a)284.50 (n/a)274.20 (n/a)104.52 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)1036.00 (n/a)571.76 (n/a)575.50 (n/a)315.80 (n/a)291.78 (n/a)

input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)517.50 (n/a)350.08 (n/a)352.10 (n/a)215.60 (n/a)133.33 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.05 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)999.50 (n/a)413.40 (n/a)436.10 (n/a)161.40 (n/a)182.62 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)438.60 (n/a)370.62 (n/a)384.50 (n/a)291.80 (n/a)62.95 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)533.80 (n/a)390.82 (n/a)421.00 (n/a)234.40 (n/a)138.33 (n/a)

input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.02 (n/a)0.03 (n/a)0.00 (n/a)0.01 (n/a)2171.40 (n/a)727.04 (n/a)320.80 (n/a)272.10 (n/a)815.41 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)615.70 (n/a)427.69 (n/a)490.40 (n/a)230.60 (n/a)152.02 (n/a)

input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)546.00 (n/a)407.24 (n/a)491.00 (n/a)232.80 (n/a)147.19 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.04 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2466.70 (n/a)502.39 (n/a)455.30 (n/a)230.50 (n/a)401.64 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)538.50 (n/a)365.44 (n/a)279.50 (n/a)246.80 (n/a)143.32 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)433.00 (n/a)320.70 (n/a)305.50 (n/a)216.00 (n/a)96.25 (n/a)

input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)714.60 (n/a)394.72 (n/a)312.70 (n/a)239.90 (n/a)193.87 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)658.10 (n/a)456.55 (n/a)452.50 (n/a)249.30 (n/a)127.89 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)520.30 (n/a)393.94 (n/a)366.00 (n/a)303.60 (n/a)99.67 (n/a)

input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)507.90 (n/a)356.22 (n/a)331.30 (n/a)235.60 (n/a)118.44 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.05 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)1377.90 (n/a)599.65 (n/a)484.65 (n/a)238.20 (n/a)339.41 (n/a)

input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.05 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)566.70 (n/a)439.74 (n/a)546.10 (n/a)262.70 (n/a)156.92 (n/a)

input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.04 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)602.20 (n/a)376.74 (n/a)299.80 (n/a)233.90 (n/a)149.87 (n/a)

input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)578.50 (n/a)359.84 (n/a)307.70 (n/a)192.80 (n/a)165.71 (n/a)

input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)531.80 (n/a)393.24 (n/a)426.70 (n/a)269.50 (n/a)109.77 (n/a)

input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)845.20 (n/a)461.30 (n/a)454.90 (n/a)253.00 (n/a)238.98 (n/a)

input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.04 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)553.60 (n/a)347.42 (n/a)302.20 (n/a)229.10 (n/a)126.15 (n/a)

input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)600.80 (n/a)468.66 (n/a)468.60 (n/a)326.00 (n/a)114.42 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.48 (n/a)0.31 (n/a)0.28 (n/a)0.25 (n/a)0.10 (n/a)520.70 (n/a)445.12 (n/a)472.10 (n/a)271.20 (n/a)99.25 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.39 (n/a)0.29 (n/a)0.29 (n/a)0.19 (n/a)0.07 (n/a)673.60 (n/a)473.60 (n/a)453.30 (n/a)338.30 (n/a)122.21 (n/a)

input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.42 (n/a)0.31 (n/a)0.31 (n/a)0.20 (n/a)0.10 (n/a)667.90 (n/a)459.60 (n/a)423.70 (n/a)310.50 (n/a)151.60 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.28 (n/a)0.16 (n/a)0.15 (n/a)0.05 (n/a)0.08 (n/a)1889.40 (n/a)820.10 (n/a)646.10 (n/a)353.30 (n/a)612.37 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.43 (n/a)0.27 (n/a)0.21 (n/a)0.17 (n/a)0.11 (n/a)570.90 (n/a)415.12 (n/a)459.30 (n/a)229.10 (n/a)144.70 (n/a)

rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.40 (n/a)0.21 (n/a)0.17 (n/a)0.07 (n/a)0.13 (n/a)1367.90 (n/a)636.78 (n/a)594.90 (n/a)246.60 (n/a)437.10 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.25 (n/a)0.16 (n/a)0.15 (n/a)0.04 (n/a)0.08 (n/a)1937.60 (n/a)709.78 (n/a)483.80 (n/a)300.40 (n/a)692.72 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.30 (n/a)0.20 (n/a)0.17 (n/a)0.12 (n/a)0.07 (n/a)614.60 (n/a)411.64 (n/a)443.10 (n/a)249.50 (n/a)145.18 (n/a)

rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.35 (n/a)0.19 (n/a)0.15 (n/a)0.13 (n/a)0.09 (n/a)551.50 (n/a)441.82 (n/a)493.90 (n/a)212.50 (n/a)133.63 (n/a)

seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
dd00072 — 2026-04-24 00:37:380.15 (n/a)0.12 (n/a)0.14 (n/a)0.08 (n/a)0.04 (n/a)26600.67 (n/a)18932.35 (n/a)14660.81 (n/a)13672.67 (n/a)6663.62 (n/a)
Phoenix - Examples

IRONCLAD

Tested on 2026_04_24_00_33_24 at commit dd00072.

Test Checks TTFT (mean)TPS (mean)

Trends:

IRONCLAD Trends

Copy link
Copy Markdown
Collaborator

@hunhoffe hunhoffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

# Create a worker to perform the task
# Large tile sizes (>4096) with LUT-based kernels need more stack space
# than the default 1024 bytes due to spilled vector temporaries.
worker_kwargs = {"stack_size": 0xD00} if line_size > 4096 else {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -1,119 +0,0 @@
# SPDX-FileCopyrightText: Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me why this is removed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it back on accident in a previous PR -- these now use the UnaryOperator design to reduce code duplication

@andrej andrej merged commit 7e6b95b into amd:devel Apr 24, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants