Fix GeLU stack overflow on Phoenix; fix results commits with new Phoenix CI#108
Merged
Fix GeLU stack overflow on Phoenix; fix results commits with new Phoenix CI#108
Conversation
Contributor
CI Test Resultsdd00072 (2026_04_24_00_48_25) IRONCLAD - CI SummaryExamples
Small
Extensive
Krackan - SmallIRONCLADTested on
Trends: IRONCLAD TrendsM_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128
M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1
M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512
M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024
M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1
embedding_dim_1024-hidden_dim_3584
embedding_dim_2048-hidden_dim_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_1-tile_size_2048
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True
input_length_2048-num_aie_columns_2-tile_size_1024
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True
input_length_2048-num_aie_columns_4-tile_size_512
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False
input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32
input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False
input_length_2048-num_aie_columns_8-tile_size_256
input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048
input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512
input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0
seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False
Krackan - ExamplesIRONCLADTested on
Trends: IRONCLAD Trendsllama_3.2_1b_prompt_1024_tokens_1
llama_3.2_1b_prompt_1024_tokens_40
llama_3.2_1b_prompt_13_tokens_1
llama_3.2_1b_prompt_13_tokens_40
Phoenix - SmallIRONCLADTested on
Trends: IRONCLAD TrendsM_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1
M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1
M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1
M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1
M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048
M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024
M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512
M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8
M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8
M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1
M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4
M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024
M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024
embedding_dim_1024-hidden_dim_3584
embedding_dim_2048-hidden_dim_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False
input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_1-tile_size_2048
input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False
input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False
input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True
input_length_2048-num_aie_columns_2-tile_size_1024
input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False
input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32
input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False
input_length_2048-num_aie_columns_4-tile_size_512
input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0
input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048
input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024
input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024
input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512
input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512
input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048
input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512
rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0
rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0
seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False
Phoenix - ExamplesIRONCLADTested on
Trends: IRONCLAD Trends |
hunhoffe
approved these changes
Apr 24, 2026
| # Create a worker to perform the task | ||
| # Large tile sizes (>4096) with LUT-based kernels need more stack space | ||
| # than the default 1024 bytes due to spilled vector temporaries. | ||
| worker_kwargs = {"stack_size": 0xD00} if line_size > 4096 else {} |
| @@ -1,119 +0,0 @@ | |||
| # SPDX-FileCopyrightText: Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved. | |||
Collaborator
There was a problem hiding this comment.
Can you remind me why this is removed?
Collaborator
Author
There was a problem hiding this comment.
Added it back on accident in a previous PR -- these now use the UnaryOperator design to reduce code duplication
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This should restore the green checkmark for the extensive test suite (and tacking on a little beautification of the aggregate CI results summary)