You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When models are deleted from a SQLMesh project, the OptimizedQueryCache retains references to cache files for those deleted models. On subsequent Context initialization, SQLMesh attempts to stat() these stale cache files, resulting in a FileNotFoundError that prevents the Context from being created.
Environment Details
SQLMesh Version: 0.227.1
Python Version: 3.12
Deployment: Airflow on AWS ECS with persistent volumes
When a model is removed from the SQLMesh project, the snapshot is removed from the state database
The physical table is dropped
However, the corresponding optimized_query cache files are NOT automatically cleaned up
Cache Initialization:
During Context.__init__(), the OptimizedQueryCache is initialized
FileCache.__init__() scans the cache directory and calls file.stat() on every file it finds
This happens BEFORE any model selection or filtering logic runs
Container Lifecycle:
In environments with persistent volumes (non-ephemeral containers), the cache accumulates over time
Unlike environments using /tmp that gets cleared on container restart, persistent cache directories retain stale entries indefinitely
Why Existing Solutions Don't Work
sqlmesh clean: Requires creating a Context to run, which fails with the same FileNotFoundError
no_auto_upstream=True: This flag only affects model selection during execution, not cache initialization
Janitor process: Handles database state cleanup (interval compaction), not filesystem cache
Reproduction Steps
Create a SQLMesh project with a model (e.g., my_project.my_model_name)
Run the model at least once to generate cache entries
Remove the model from the project
In an environment with persistent cache (cache_dir configured to non-tmp location):
Try to create a new Context: Context(paths=project_path)
Observe FileNotFoundError on stale cache file
Impact
Affected Scenarios
CI/CD Pipelines: Failed deployments when models are removed
Development: Developers unable to run SQLMesh locally after pulling changes that remove models
Production: Airflow DAGs failing when models are deprecated
Shared State Databases: When multiple projects share a state DB but have separate cache directories, cross-project model references can cause issues
Severity
High - Prevents Context initialization entirely, blocking all SQLMesh operations until manual intervention.
Current Workaround
Manual deletion of the optimized_query cache directory before Context initialization:
importshutilfrompathlibimportPath# Clear stale cache files to prevent FileNotFoundError during Context initialization.# This happens when models are removed but cache references still exist.# Manual deletion is required because `sqlmesh clean` itself needs Context initialization.cache_path=Path(SQLMESH_CACHE_DIR)
ifcache_path.exists():
optimized_query_cache=cache_path/"optimized_query"ifoptimized_query_cache.exists():
shutil.rmtree(optimized_query_cache)
# Now safe to create Contextcontext=Context(paths=SQLMESH_PROJECT_PATH)
Limitations of Workaround
Requires code changes in every place Context is created
Loses all cached query optimizations on every run
Not discoverable - developers hit the error before finding the solution
Suggested Fixes
Option 1: Graceful Degradation (Recommended)
Modify FileCache.__init__() to handle missing files gracefully:
# In sqlmesh/utils/cache.pyforfileincache_dir.glob(f"{self._cache_version}*"):
try:
stat_result=file.stat()
ifstat_result.st_atime<threshold:
file.unlink(missing_ok=True)
exceptFileNotFoundError:
# File was deleted between glob and stat - this is expected for stale entriescontinue
Pros:
No breaking changes
Handles race conditions naturally
Self-healing behavior
Cons:
Leaves stale files until next access
Option 2: Automatic Cleanup on Model Deletion
Hook into the model deletion workflow to clean up associated cache files:
# When a snapshot is invalidated/removeddef_cleanup_model_cache(self, model_name: str) ->None:
"""Remove cache entries for a deleted model."""ifself.config.cache_dir:
cache_pattern=f"*__{model_name.replace('.', '_')}*"forcache_fileinself.config.cache_dir.glob(cache_pattern):
cache_file.unlink(missing_ok=True)
Pros:
Proactive cleanup
No stale files accumulate
Cons:
More complex implementation
Need to track model name → cache file mapping
Option 3: Validate Cache on Context Load
Add a validation step during Context initialization that removes invalid cache entries:
defvalidate_and_clean_cache(cache_dir: Path, valid_models: set[str]) ->None:
"""Remove cache entries for models that no longer exist."""forcache_fileincache_dir.glob("*"):
model_name=extract_model_name(cache_file.name)
ifmodel_namenotinvalid_models:
cache_file.unlink(missing_ok=True)
Shared state database scenarios where multiple projects can create cross-project cache contamination (see Additional Context section below)
Additional Context
Multi-Project Shared State Database
When multiple SQLMesh projects share a PostgreSQL state database but maintain separate cache directories, the following can occur:
Project A and Project B share state DB
During plan(), the virtual layer update promotes ALL snapshots in the environment (including other project's models)
If Project A's cache directory is under the project path (e.g., /opt/airflow/sqlmesh/project_a/.cache), it can accumulate references to Project B's models
When Project B removes a model, Project A's cache still has references to it
This issue can be mitigated with isolated /tmp cache directories per project:
Issue Summary
When models are deleted from a SQLMesh project, the
OptimizedQueryCacheretains references to cache files for those deleted models. On subsequentContextinitialization, SQLMesh attempts tostat()these stale cache files, resulting in aFileNotFoundErrorthat prevents the Context from being created.Environment Details
/tmp/airflow/sqlmesh_cache/my_project/.cacheError Details
Root Cause Analysis
The Problem
Model Deletion Workflow:
optimized_querycache files are NOT automatically cleaned upCache Initialization:
Context.__init__(), theOptimizedQueryCacheis initializedFileCache.__init__()scans the cache directory and callsfile.stat()on every file it findsContainer Lifecycle:
/tmpthat gets cleared on container restart, persistent cache directories retain stale entries indefinitelyWhy Existing Solutions Don't Work
sqlmesh clean: Requires creating aContextto run, which fails with the sameFileNotFoundErrorno_auto_upstream=True: This flag only affects model selection during execution, not cache initializationReproduction Steps
my_project.my_model_name)cache_dirconfigured to non-tmp location):Context:Context(paths=project_path)FileNotFoundErroron stale cache fileImpact
Affected Scenarios
Severity
High - Prevents Context initialization entirely, blocking all SQLMesh operations until manual intervention.
Current Workaround
Manual deletion of the
optimized_querycache directory before Context initialization:Limitations of Workaround
Suggested Fixes
Option 1: Graceful Degradation (Recommended)
Modify
FileCache.__init__()to handle missing files gracefully:Pros:
Cons:
Option 2: Automatic Cleanup on Model Deletion
Hook into the model deletion workflow to clean up associated cache files:
Pros:
Cons:
Option 3: Validate Cache on Context Load
Add a validation step during Context initialization that removes invalid cache entries:
Pros:
Cons:
Related Issues
Additional Context
Multi-Project Shared State Database
When multiple SQLMesh projects share a PostgreSQL state database but maintain separate cache directories, the following can occur:
plan(), the virtual layer update promotes ALL snapshots in the environment (including other project's models)/opt/airflow/sqlmesh/project_a/.cache), it can accumulate references to Project B's modelsThis issue can be mitigated with isolated
/tmpcache directories per project:However, the core issue remains: cache entries are not tied to model lifecycle.
References
sqlmesh/utils/cache.py:66inFileCache.__init__()