Add dataset tags to SDK for identification (DE-7033)#456
Open
Add dataset tags to SDK for identification (DE-7033)#456
Conversation
3eff859 to
8f06f7a
Compare
8f06f7a to
bf092a2
Compare
edwinpav
approved these changes
Apr 7, 2026
Expose dataset tags through the Python SDK so customers can identify datasets labeled by Scale vs other vendors via the API. - Add `tags` field to DatasetInfo model (returned by dataset.info()) - Add get_tags(), add_tags(), remove_tags() methods to Dataset class - Use POST /tags/remove instead of DELETE to avoid proxy body-stripping - Use pydantic v1/v2 compat shim for null-coercion validator - Guard against passing a bare string instead of a list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bf092a2 to
ccb48bc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tagsfield toDatasetInfomodel sodataset.info()returns dataset tagsget_tags(),add_tags(),remove_tags()methods toDatasetclass for programmatic tag managementTest plan
dataset.info()returns tags for a dataset with tags set in the UIdataset.add_tags(["Labeled by: Scale"])adds tagsdataset.get_tags()returns the current tag listdataset.remove_tags(["Labeled by: Scale"])removes tagsdataset.info()works against a server that doesn't yet return tags (defaults to[])🤖 Generated with Claude Code
Greptile Summary
This PR adds a
tagsfield toDatasetInfoand three new tag-management methods (get_tags,add_tags,remove_tags) to theDatasetclass, enabling programmatic access to dataset tags. The implementation is clean, follows existing codebase patterns (DELETE-with-body, pydantic v1/v2 shim), and includes good backward-compatibility handling via thecoerce_null_tagsvalidator.Confidence Score: 5/5
Safe to merge; all findings are P2 style suggestions that do not block correctness.
Both comments are P2: one is a defensive null-coercion improvement and the other is a test assertion robustness suggestion. Neither represents a present defect. The core logic, backward-compatibility handling, and test coverage are solid.
No files require special attention.
Vulnerabilities
No security concerns identified.
Important Files Changed
tags: List[str] = []field with acoerce_null_tagsvalidator for backward compatibility; pydantic v1/v2 import shim is consistent with existing patterns inpydantic_base.py.get_tags(),add_tags(),remove_tags()methods. DELETE-with-body pattern is established elsewhere in the codebase.get_tags()doesn't apply the same null-coercion asDatasetInfo, so a server returning{"tags": null}would silently returnNoneto callers.assert remaining2 == remainingis order-sensitive and may flake if the API returns tags in non-deterministic order.Sequence Diagram
sequenceDiagram participant Client as SDK Client participant Dataset as Dataset participant API as Nucleus API Client->>Dataset: dataset.info() Dataset->>API: GET dataset/{id}/info API-->>Dataset: JSON (may include tags: null) Dataset-->>Client: DatasetInfo (tags defaults to []) Client->>Dataset: dataset.get_tags() Dataset->>API: GET dataset/{id}/tags API-->>Dataset: {"tags": [...]} Dataset-->>Client: List[str] Client->>Dataset: dataset.add_tags(["Labeled by: Scale"]) Dataset->>API: POST dataset/{id}/tags {"tags": [...]} API-->>Dataset: {"tags": [...]} Dataset-->>Client: Updated List[str] Client->>Dataset: dataset.remove_tags(["Labeled by: Scale"]) Dataset->>API: DELETE dataset/{id}/tags {"tags": [...]} API-->>Dataset: {"tags": [...]} Dataset-->>Client: Remaining List[str]Prompt To Fix All With AI
Reviews (7): Last reviewed commit: "Add dataset tags to SDK for identificati..." | Re-trigger Greptile