Skip to content

Quote bind parameter names containing non-identifier characters#60

Open
msrathore-db wants to merge 1 commit intomainfrom
fix-hyphenated-column-bind-params
Open

Quote bind parameter names containing non-identifier characters#60
msrathore-db wants to merge 1 commit intomainfrom
fix-hyphenated-column-bind-params

Conversation

@msrathore-db
Copy link
Copy Markdown
Collaborator

Summary

  • Bug: INSERTs involving column names that contain characters outside [A-Za-z0-9_] (most commonly hyphens, e.g. col-with-hyphen from DataFrame sources) fail with UNBOUND_SQL_PARAMETER. SQLAlchemy uses the column name as the default bind parameter name, so the compiled SQL contains :col-with-hyphen — which the Databricks parser splits on - and reports as unbound parameter: col. Same root cause as databricks-sql-python#368, filed against databricks-sqlalchemy.
  • Fix: Override DatabricksStatementCompiler.bindparam_string so that any bind name which is not a bare identifier gets wrapped in backticks (:`col-with-hyphen`). The Spark/Databricks SQL grammar accepts this as a quoted parameter identifier (simpleIdentifier → quotedIdentifier in SqlBaseParser.g4), mirroring Oracle's :"name" approach.
  • No collision risk: unlike a character-escape approach (-_), sibling columns like col-name and col_name compile to two distinct bind markers (:`col-name` vs :col_name) with two distinct dict keys — no silent value clobbering.
  • Keys unchanged: backticks are SQL-side quoting only; escaped_bind_names is left empty so construct_params passes the original unquoted key through to the driver.

Why backticks (not an escape map)

The first draft used bindname_escape_characters with -_, but that silently corrupts data when a table has both col-name and col_name (both collapse to :col_name, one value overwrites the other). A multi-char token like _x2d_ dodges the collision but is aesthetically noisy and can still theoretically collide. Backticks sidestep the whole issue — the parameter's logical name is identical to the column's, so no mapping is needed. Empirically validated against a dogfood SQL warehouse: all of col-with-hyphen, col-name, col_name, and normal_col round-trip correctly in a single INSERT.

Test plan

  • New TestBindParamQuoting class in tests/test_local/test_ddl.py covering:
    • Hyphenated column renders backticked bind marker, original dict key preserved
    • col-name + col_name siblings produce distinct markers and both values round-trip
    • Plain identifiers (id, name) are NOT backticked (no regression)
    • Spaces and dots in column names also trigger backtick-quoting
    • Leading-digit column names are backticked
  • Full local test suite: poetry run pytest tests/test_local/ --ignore=tests/test_local/e2e -q → 250 passed
  • End-to-end validation against a live SQL warehouse: created a table with col-with-hyphen, col-name, col_name, normal_col; INSERT via SQLAlchemy; all four values round-tripped correctly.

Co-authored-by: Isaac

Column names sourced from DataFrames frequently contain hyphens
(e.g. `col-with-hyphen`). SQLAlchemy uses the column name as the
default bind parameter name, and Databricks named-parameter markers
(`:name`) only accept bare identifiers ([A-Za-z_][A-Za-z0-9_]*).
The hyphen was being emitted verbatim, producing invalid SQL like
`:col-with-hyphen` which the server rejects with
UNBOUND_SQL_PARAMETER because it parses `-with-hyphen` as stray
tokens.

Override `DatabricksStatementCompiler.bindparam_string` to wrap
non-bare-identifier names in backticks (`:`col-with-hyphen``),
which the Spark/Databricks SQL grammar accepts as a quoted
parameter identifier (`simpleIdentifier -> quotedIdentifier` in
`SqlBaseParser.g4`). This mirrors Oracle's `:"name"` approach to
the same problem.

The backticks are quoting syntax only — the parameter's logical
name is still the text between them, so the params dict sent to
the driver keeps the original unquoted key. `escaped_bind_names`
is intentionally left empty so `construct_params` passes keys
through unchanged.

This covers hyphens, spaces, dots, brackets, leading digits, and
any other character outside [A-Za-z0-9_], with no risk of
collisions between sibling columns like `col-name` and `col_name`
(a concern with single-character escape-map approaches).

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant