feat: Add CloudFormation Language Extensions support (Fn::ForEach)#8637
feat: Add CloudFormation Language Extensions support (Fn::ForEach)#8637
Conversation
0be94d0 to
5d6cbf3
Compare
|
Integration test: https://github.com/aws/aws-sam-cli/actions/runs/21808626015 |
e68efa0 to
4ed8396
Compare
5324dad to
707baad
Compare
| for name, value in nested_params.items(): | ||
| try: | ||
| resolved_value = resolver.resolve_value(value) | ||
| except Exception: # pragma: no cover - resolver raises only on malformed intrinsics |
There was a problem hiding this comment.
Good catch — addressed in c916784. Now the resolver call narrows to the expected exception types (UnresolvableReferenceError, InvalidTemplateException) and logs a DEBUG message with exc_info=True for anything else, so an accidental bug surfaces under --debug and in telemetry instead of being silently swallowed. Same narrowing pattern I applied to do_export's expansion call site in 6956c10.
The previous code caught bare Exception when expanding language extensions on a nested-stack child template, downgrading every failure (including SAM CLI bugs) to a single WARNING line. That masked real defects: any crash in the new library silently fell through to the non-expanded path, leaving the packaged child with unresolved local paths that CloudFormation could not resolve at deploy time. Two changes: 1. InvalidSamDocumentException (the documented failure surface of expand_language_extensions) keeps the warn-and-fallback behavior, but the message now explains the consequence: artifact URIs inside Fn::ForEach blocks will NOT be uploaded. 2. Any other exception is logged at ERROR with traceback (exc_info=True) and a pointer to file an issue. We still fall back so the rest of sam package keeps going, but the failure is observable to users running --debug and to telemetry instead of silently degrading. Also removed the check_using_language_extension branch that was dead: expand_language_extensions only raises InvalidSamDocumentException when it actually tried to expand, which implies the transform was present. Tests: - TestCloudFormationStackResourceExpansionErrorHandling covers both branches (warn vs error) and confirms the non-expanded fallback is still taken.
The helper caught bare Exception when resolving intrinsics in a nested-stack resource's Parameters property, silently swallowing any resolver bug (TypeError, AttributeError, KeyError from malformed state). A user would see incorrect template expansion with no signal. Now: - UnresolvableReferenceError / InvalidTemplateException continue (expected 'can't resolve at package time' cases — e.g. Ref to a sibling resource). - Any other exception is logged at DEBUG with traceback so --debug surfaces it; the value is still dropped so packaging proceeds. Matches the narrowing pattern applied to do_export's expansion call site in the previous commit.
Deployer._create_deploy_error was wrapping *any* CloudFormation
"Fn::FindInMap - Key X not found in Mapping Y" failure as
MissingMappingKeyError, which emits SAM-specific re-package guidance
("Re-run 'sam package' with the same parameter values" and advice
about Fn::ForEach dynamic artifact properties). That guidance is
misleading when the Mapping was written by the user — a classic
RegionMap → AMI lookup failure, for example, has nothing to do with
packaging.
Gate the wrapping on a fixed set of SAM-generated Mapping prefixes
(SAMCodeUri, SAMImageUri, SAMContentUri, SAMDefinitionUri, SAMSchemaUri,
SAMBodyS3Location, SAMDefinitionS3Location, SAMTemplateURL, SAMCode,
SAMContent, SAMLayers). Matching also requires an upper-case letter
after the prefix so names like SAMPLE/SAMSUNG don't accidentally match.
User-authored Fn::FindInMap failures now fall through to the generic
DeployFailedError with the raw CloudFormation message — accurate for
the user's own mistake, without SAM CLI injecting irrelevant advice.
Tests:
- TestIsSamGeneratedMapping covers matching and non-matching names
incl. SAMPLE / SAMSUNG / bare 'SAM' / lower-case variants.
- TestCreateDeployError gains two cases: RegionMap-style user Mapping
is NOT wrapped, and SAM-prefix-substring names are NOT wrapped.
- Existing SAM-generated cases (SAMCodeUriServices, SAMCodeUriMyLoop)
continue to wrap correctly.
Two issues flagged by the bot reviewer on PR #8637: [BUG] _update_sam_mappings_relative_paths in samcli/commands/_utils/ template.py used mapping_name.startswith("SAM") to decide whether to rewrite Mapping values as relative paths. That loose check corrupts customer-authored mappings whose names happen to start with SAM as a substring — SAMPLE, SAMSUNG, SAMCustomMapping, etc. Values would be silently rewritten with relative-path prefixes. [STYLE] infra_sync_executor.py had two hardcoded resource_logical_id.startswith("Fn::ForEach::") checks instead of using the shared is_foreach_key helper the rest of the codebase adopted in this PR. Maintenance risk if the detection logic ever changes. Consolidation: - Moved the precise SAM-mapping classifier (previously _is_sam_generated_mapping in samcli/commands/deploy/exceptions.py) into samcli/lib/cfn_language_extensions/utils.py as the public is_sam_generated_mapping so both deploy/exceptions.py and commands/_utils/template.py share the same implementation. - _update_sam_mappings_relative_paths now calls is_sam_generated_mapping — SAMPLE/SAMSUNG style names are correctly ignored. - infra_sync_executor.py now calls is_foreach_key at both sites. Tests: - test_skips_sam_prefix_substring_mappings in TestUpdateSamMappingsRelativePaths verifies SAMPLE/SAMSUNG/ SAMCustomMapping are left untouched. - Existing TestIsSamGeneratedMapping and TestCreateDeployError still pass via the re-export in deploy/exceptions.py.
_partial_resolve and _partial_resolve_find_in_map intentionally substitute AWS::NoValue / partial args for false-condition resources when the resolver raises. The swallow matches the Kotlin reference implementation and is not changing. The problem is debuggability: if a resolver bug or unexpected type slips through, the swallow hides it and the user sees a template with AWS::NoValue substitutions that look intentional. Adding LOG.debug with exc_info=True surfaces the traceback under --debug and in telemetry without changing any runtime behavior. Three swallow sites covered: - Ref substitution fallback - Generic Fn::* substitution fallback - Fn::FindInMap secondary fallback inside _partial_resolve_find_in_map The Re-raised InvalidTemplateException branch in _partial_resolve_find_in_map is unchanged — that path must error.
Previously Template.export() set parent_parameter_values as a duck-typed attribute on exporter instances, and CloudFormationStackResource.do_export read it via getattr(self, 'parent_parameter_values', None). The attribute wasn't declared on any base class, so: - Typos on either the write or read side wouldn't be caught by static analysis or IDE completion. - Any future exporter class for a nested-stack resource type that didn't explicitly handle this attribute would silently get None and skip parameter propagation, producing wrong language-extension expansion for its child template. Declared parent_parameter_values: Optional[Dict] = None on Resource. The read site in artifact_exporter.do_export now uses direct attribute access instead of getattr() with a default.
Previously only wait_for_execute, create_and_wait_for_changeset's internal callers (via sync's catch block), and sync itself routed CloudFormation ClientError through _create_deploy_error, which detects the SAM-generated Fn::FindInMap key-not-found signature and surfaces a user-friendly MissingMappingKeyError. create_stack, update_stack, and create_and_wait_for_changeset's outer catch block still raised the raw DeployFailedError. On sam sync the outer catch usually won, but sam deploy / sam package + deploy paths could hit the synchronous validation error from create_stack / update_stack directly and bypass the helpful MissingMappingKeyError entirely — users saw the raw CloudFormation message with no re-package guidance. All five previously raw call sites now go through self._create_deploy_error. The wrapper is a no-op for non-matching errors (including user-authored Fn::FindInMap failures — gated by is_sam_generated_mapping), so no regression for any other error path. Also left execute_changeset and has_stack untouched — neither can surface a Fn::FindInMap failure (has_stack is a pre-flight check; execute_changeset only kicks off the change set, stack-events-phase errors come back through wait_for_execute which was already wired). Tests: TestCreateDeployErrorRouting covers create_stack and update_stack wrapping a SAM-mapping error, and confirms user RegionMap-style errors still fall through to DeployFailedError.
expand_language_extensions was deep-copying the template 4-10 times per invocation for defensive isolation (cache put, cache get, no-LE path, LE path). For large templates this is measurable overhead. Replace with a one-time deep_freeze that converts dicts to MappingProxyType and lists to tuples. Frozen templates are truly immutable — any mutation attempt raises TypeError immediately instead of silently corrupting shared state. Callers that need mutable copies use deep_thaw (not copy.deepcopy which cannot pickle MappingProxyType on Python 3.13). Savings: - No-LE fast path: 4 deepcopies -> 1 deep_freeze (same O(n) cost) - Cache hit: 2 deepcopies -> 0 (return cached frozen result directly) - Cold LE path: 3 deepcopies -> 2 deep_freezes (expanded + original) - Cache put: 2 deepcopies -> 0 (frozen values are safe to share) Also: - Fixed isinstance(x, dict) check in build_context to accept Mapping so frozen MappingProxyType templates pass the type check. - Updated callers (build_context, package_context, artifact_exporter, language_extensions_packaging, wrapper) to use deep_thaw instead of copy.deepcopy when they need mutable copies. - Updated tests to verify immutability contract (TypeError on mutation) instead of asserting independent mutable copies.
The cached LanguageExtensionResult is returned directly on cache hit. While template fields are deeply frozen via MappingProxyType, the dynamic_artifact_properties field was a mutable list of mutable DynamicArtifactProperty dataclass instances. A caller mutating the list or its elements would corrupt the cached value for all subsequent hits. - DynamicArtifactProperty is now frozen=True - dynamic_artifact_properties field changed from List to Tuple with default () - Construction sites pass tuple() instead of list This completes the immutability contract: all fields of a cached LanguageExtensionResult are now truly immutable.
When a template has no AWS::LanguageExtensions transform, the work
being cached is a single dict lookup on template.get('Transform') —
O(1). The previous code still deep_freeze'd the entire template O(n)
and stored it in the module-level cache, which was pure overhead for
the vast majority of users who don't use language extensions.
Now the no-LE path returns a LanguageExtensionResult wrapping the
caller's original dict directly — zero copies, zero cache entries.
Callers that need to mutate (build_context, package_context) already
deep_thaw before mutating, so the aliasing is safe.
The LE path still freezes and caches as before.
|
|
||
| if not check_using_language_extension(template): | ||
| return LanguageExtensionResult( | ||
| expanded_template=template, |
There was a problem hiding this comment.
Acknowledged — the asymmetry is real but intentional. All four production call sites that consume the no-LE result either deep_thaw before mutating (sam_stack_provider, package_context, artifact_exporter) or only assign the result when had_language_extensions=True (sam_template_validator). The sam sync --watch path goes through SamLocalStackProvider.get_stacks which deep_thaws at line 278.
Adding deep_freeze back on the no-LE path would re-introduce the O(n) walk we just removed in 4d28eef — that's the exact overhead the commit targeted, since the vast majority of templates don't use AWS::LanguageExtensions. The caller already pays O(n) via deep_thaw when it needs a mutable copy, so freezing first just doubles the work.
The LanguageExtensionResult docstring documents the contract: callers that need to mutate must deep_thaw() first. For the no-LE path, the result aliases the input dict — same as any function that returns its argument. If a future caller violates this, the LE path's freeze will catch it with a TypeError; the no-LE path won't, but that's an acceptable trade-off for zero overhead on the common path.
|
|
||
| if not check_using_language_extension(template): | ||
| return LanguageExtensionResult( | ||
| expanded_template=template, |
There was a problem hiding this comment.
Acknowledged — the asymmetry is real but intentional. All four production call sites that consume the no-LE result either deep_thaw before mutating (sam_stack_provider, package_context, artifact_exporter) or only assign the result when had_language_extensions=True (sam_template_validator). The sam sync --watch path goes through SamLocalStackProvider.get_stacks which deep_thaws at line 278.
Adding deep_freeze back on the no-LE path would re-introduce the O(n) walk we just removed in 4d28eef — that's the exact overhead the commit targeted, since the vast majority of templates don't use AWS::LanguageExtensions. The caller already pays O(n) via deep_thaw when it needs a mutable copy, so freezing first just doubles the work.
The LanguageExtensionResult docstring documents the contract: callers that need to mutate must deep_thaw() first. For the no-LE path, the result aliases the input dict — same as any function that returns its argument. If a future caller violates this, the LE path's freeze will catch it with a TypeError; the no-LE path won't, but that's an acceptable trade-off for zero overhead on the common path.
The module-level _expansion_cache in sam_integration.py was designed to avoid redundant template expansion across multiple calls within a single CLI invocation. In practice, the cache was never hit: - sam build: calls expand_language_extensions once (via get_stacks) - sam package: calls it twice (get_stacks + _export) — the only potential hit, but the ~10ms saved is noise vs seconds of S3 uploads - sam validate/deploy/local: call it once - sam sync --watch: clears the cache before each reload The cache added ~40 lines of complexity (module-level mutable global, hash function, eviction logic, clear_expansion_cache calls in watch_manager and sam_function_provider) plus a thread-safety concern flagged in the review, all for negligible benefit. Removed: - _expansion_cache, _MAX_CACHE_SIZE, _hash_params, _cache_put, clear_expansion_cache from sam_integration.py - clear_expansion_cache() calls from watch_manager.py and sam_function_provider.py - template_path parameter from expand_language_extensions (only used for cache keying) and all call sites - Exports from __init__.py - TestExpansionCache class (replaced with two focused immutability tests) The deep_freeze on the LE path still ensures immutability without needing cache isolation.
| if not isinstance(mapping_entries, dict): | ||
| continue | ||
|
|
||
| for _key, value_dict in mapping_entries.items(): |
There was a problem hiding this comment.
[BUG] The _update_sam_mappings_relative_paths function iterates SAM-generated Mappings with a two-level nesting (mapping_entries[key] -> value_dict), but CloudFormation Mappings are three levels deep: MapName -> FirstLevelKey -> SecondLevelKey -> Value. The SAM-generated Mapping structure is SAMCodeUriFunctions -> Users -> {CodeUri: "./services/Users"}, which means mapping_entries is {"Users": {"CodeUri": "./services/Users"}}.
However, the function iterates mapping_entries.items() where each value_dict is expected to be {"CodeUri": "./services/Users"}. This only works if the SAM-generated Mapping has exactly two levels of nesting (MapName → CollectionValue → {PropertyName: Path}). If the Mapping structure ever uses the standard three-level CloudFormation format (MapName → FirstKey → SecondKey → Value), the path update would silently skip the values because prop_name would be the second-level key, not a property name from _ARTIFACT_PATH_PROPERTIES.
Verify that the SAM-generated Mapping structure always uses this two-level format. If it does, add a comment documenting this assumption. If it could use three levels, the iteration needs an additional nesting level.
| test: | ||
| # Run unit tests and fail if coverage falls below 94% | ||
| # Run unit tests (excluding cfn_language_extensions) and fail if coverage falls below 94% | ||
| pytest --cov samcli --cov schema --cov-report term-missing --cov-fail-under 94 tests/unit --ignore=tests/unit/lib/cfn_language_extensions --cov-config=.coveragerc_no_lang_ext |
There was a problem hiding this comment.
[GENERAL] The default make test target now excludes tests/unit/lib/cfn_language_extensions and uses a separate coverage config. This means developers running make test (the most common invocation) will not run the new module's tests. While make pr and make test-all include them, this creates a risk that regressions in the integration points (e.g., sam_integration.py, utils.py) are missed during local development.
Additionally, the dev target's conditional logic relies on git diff --name-only origin/develop..., which will fail silently (exit code 128) if origin/develop doesn't exist (e.g., fresh clone, different remote name), always falling through to the test target that skips the new tests.
Consider making make test run all tests by default and providing a make test-fast for the subset, rather than the other way around. This follows the principle of safe defaults.
| e, | ||
| ) | ||
| result = None | ||
| except Exception as e: # pylint: disable=broad-except |
There was a problem hiding this comment.
[ERROR_HANDLING] In CloudFormationStackResource.do_export, the broad except Exception at line 198 catches all unexpected errors during language extension expansion and sets result = None, falling through to the non-language-extensions code path. While the error is logged, this means a bug in the language extensions code could silently produce incorrect packaging output (e.g., missing artifact uploads for Fn::ForEach resources) rather than failing fast.
The InvalidSamDocumentException catch at line 182 is appropriate for user-facing template errors. But the broad catch at line 198 could mask serious issues like TypeError, KeyError, or AttributeError in the new code. Consider whether it would be safer to let unexpected exceptions propagate (or at least re-raise after logging) during the initial rollout of this feature, and only add the broad catch as a stability measure once the code is battle-tested.
| if not mapping_name: | ||
| return False | ||
| for prefix in _SAM_GENERATED_MAPPING_PREFIXES: | ||
| if mapping_name == prefix: |
There was a problem hiding this comment.
[BUG] The is_sam_generated_mapping function checks if mapping_name == prefix and returns False for bare prefixes (line 86-88). However, the check at line 90 (nxt = mapping_name[len(prefix)]) will raise an IndexError if mapping_name is exactly one character longer than the prefix and that character is accessed. Actually, looking more carefully, the mapping_name == prefix check at line 86 returns False before reaching line 90, and mapping_name.startswith(prefix) at line 89 guarantees len(mapping_name) > len(prefix) when the equality check failed. So line 90 is safe.
However, there's a logic gap: if mapping_name starts with a prefix but the next character is NOT uppercase (e.g., "SAMCodeUri123" or "SAMCodeUriservices"), the function continues to the next prefix. If no prefix matches, it returns False. This means a SAM-generated mapping like "SAMCodeUri_functions" (with underscore after prefix) would NOT be recognized. Verify that the mapping name generation code always produces a PascalCase segment after the prefix.
| Cost is O(n) — same as one ``copy.deepcopy`` — but you pay it once | ||
| at creation time and then never need defensive copies again. | ||
| """ | ||
| if isinstance(obj, MappingProxyType): |
There was a problem hiding this comment.
[GENERAL] The deep_freeze function converts dicts to MappingProxyType and lists to tuple. This is used throughout the codebase as a replacement for copy.deepcopy. However, MappingProxyType is not picklable and not JSON-serializable. Any code path that attempts to serialize a frozen template (e.g., via json.dumps, pickle, or yaml.dump) will fail with a TypeError.
The deep_thaw function is the intended inverse, but callers must remember to thaw before serialization. This is a new invariant that all existing and future code must respect. Consider adding a comment at the module level or in the docstring explicitly listing the operations that require thawing first (JSON serialization, YAML dump, pickle, etc.) to prevent future bugs.
Description
This PR adds support for CloudFormation Language Extensions in SAM CLI, addressing GitHub issue #5647.
Features
Fn::Ifin resource policiesKey Design Decisions
Fn::ForEachblocks with dynamic artifact properties (e.g.,CodeUri: ./src/${Name}) are supported via a Mappings transformationFn::ForEachcollections must be resolvable locally; cloud-dependent values (Fn::GetAtt,Fn::ImportValue) are not supported with clear error messagesSupported Commands
sam build- Builds all expanded functions, preserves original templatesam package- PreservesFn::ForEachstructure with S3 URIssam deploy- Uploads original template for CloudFormation to processsam validate- Validates language extension syntaxsam local invoke- Invokes expanded functions by namesam local start-api- Serves ForEach-generated API endpointssam local start-lambda- Serves all expanded functionsExample
Resolves #5647
Testing
Checklist