Quote bind parameter names containing non-identifier characters#60
Open
msrathore-db wants to merge 1 commit intomainfrom
Open
Quote bind parameter names containing non-identifier characters#60msrathore-db wants to merge 1 commit intomainfrom
msrathore-db wants to merge 1 commit intomainfrom
Conversation
Column names sourced from DataFrames frequently contain hyphens (e.g. `col-with-hyphen`). SQLAlchemy uses the column name as the default bind parameter name, and Databricks named-parameter markers (`:name`) only accept bare identifiers ([A-Za-z_][A-Za-z0-9_]*). The hyphen was being emitted verbatim, producing invalid SQL like `:col-with-hyphen` which the server rejects with UNBOUND_SQL_PARAMETER because it parses `-with-hyphen` as stray tokens. Override `DatabricksStatementCompiler.bindparam_string` to wrap non-bare-identifier names in backticks (`:`col-with-hyphen``), which the Spark/Databricks SQL grammar accepts as a quoted parameter identifier (`simpleIdentifier -> quotedIdentifier` in `SqlBaseParser.g4`). This mirrors Oracle's `:"name"` approach to the same problem. The backticks are quoting syntax only — the parameter's logical name is still the text between them, so the params dict sent to the driver keeps the original unquoted key. `escaped_bind_names` is intentionally left empty so `construct_params` passes keys through unchanged. This covers hyphens, spaces, dots, brackets, leading digits, and any other character outside [A-Za-z0-9_], with no risk of collisions between sibling columns like `col-name` and `col_name` (a concern with single-character escape-map approaches). Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
[A-Za-z0-9_](most commonly hyphens, e.g.col-with-hyphenfrom DataFrame sources) fail withUNBOUND_SQL_PARAMETER. SQLAlchemy uses the column name as the default bind parameter name, so the compiled SQL contains:col-with-hyphen— which the Databricks parser splits on-and reports asunbound parameter: col. Same root cause as databricks-sql-python#368, filed againstdatabricks-sqlalchemy.DatabricksStatementCompiler.bindparam_stringso that any bind name which is not a bare identifier gets wrapped in backticks (:`col-with-hyphen`). The Spark/Databricks SQL grammar accepts this as a quoted parameter identifier (simpleIdentifier → quotedIdentifierinSqlBaseParser.g4), mirroring Oracle's:"name"approach.-→_), sibling columns likecol-nameandcol_namecompile to two distinct bind markers (:`col-name`vs:col_name) with two distinct dict keys — no silent value clobbering.escaped_bind_namesis left empty soconstruct_paramspasses the original unquoted key through to the driver.Why backticks (not an escape map)
The first draft used
bindname_escape_characterswith-→_, but that silently corrupts data when a table has bothcol-nameandcol_name(both collapse to:col_name, one value overwrites the other). A multi-char token like_x2d_dodges the collision but is aesthetically noisy and can still theoretically collide. Backticks sidestep the whole issue — the parameter's logical name is identical to the column's, so no mapping is needed. Empirically validated against a dogfood SQL warehouse: all ofcol-with-hyphen,col-name,col_name, andnormal_colround-trip correctly in a single INSERT.Test plan
TestBindParamQuotingclass intests/test_local/test_ddl.pycovering:col-name+col_namesiblings produce distinct markers and both values round-tripid,name) are NOT backticked (no regression)poetry run pytest tests/test_local/ --ignore=tests/test_local/e2e -q→ 250 passedcol-with-hyphen,col-name,col_name,normal_col; INSERT via SQLAlchemy; all four values round-tripped correctly.Co-authored-by: Isaac