Feat/voyageai: adding voyageai integration#4070
Feat/voyageai: adding voyageai integration#4070fzowl wants to merge 7 commits intosimstudioai:mainfrom
Conversation
…nnection string support - Add VoyageAI tools: embeddings (voyage-3, voyage-3-large, etc.) and rerank (rerank-2, rerank-2-lite) - Add VoyageAI block with operation dropdown (Generate Embeddings / Rerank) - Add VoyageAI icon and register in tool/block registries - Enhance MongoDB with connection string mode for Atlas (mongodb+srv://) support - Add connection mode toggle to MongoDB block (Host & Port / Connection String) - Update all 6 MongoDB API routes to accept optional connectionString - Add 48 unit tests (VoyageAI tools, block config, MongoDB utils)
…geAI and MongoDB - Expand VoyageAI tool tests: metadata, all models, edge cases, error codes (60 tests) - Expand VoyageAI block tests: structure, subBlocks, conditions, params edge cases (44 tests) - Expand MongoDB utils tests: connection modes, URI building, all validators (56 tests) - Add live integration tests: embeddings (7 models/scenarios), rerank (5 scenarios), e2e workflow - Integration tests use undici to bypass global fetch mock - Tests skip gracefully when VOYAGEAI_API_KEY env var is not set
- Add voyage-4-large, voyage-4, voyage-4-lite embedding models - Add voyage-3.5, voyage-3.5-lite embedding models - Add rerank-2.5, rerank-2.5-lite reranking models - Default embeddings model: voyage-3.5 - Default rerank model: rerank-2.5 - All models verified working with live API
…tegration - New tool: voyageai_multimodal_embeddings using voyage-multimodal-3.5 model - New API route: /api/tools/voyageai/multimodal-embeddings for server-side file handling - Supports text, image files/URLs, video files/URLs in a single embedding - Uses file-upload subBlocks with basic/advanced mode for images and video - Internal proxy pattern: downloads UserFiles via downloadFileFromStorage, converts to base64 - URL validation via validateUrlWithDNS for SSRF protection - 14 new unit tests (tool metadata, body, response transform) - 5 new integration tests (text-only, image URL, text+image, dimensions, auth) - 8 new block tests (multimodal operation, params, subBlocks)
Remove non-TSDoc separator comments, fix relative import in barrel export, fix any types, and apply biome formatting fixes.
Reverts MongoDB Atlas connection string support due to validation issues in the Zod schemas. VoyageAI integration remains intact.
PR SummaryMedium Risk Overview Registers three new tools (plus types) in the tools registry and adds a new internal Next.js route Adds a Reviewed by Cursor Bugbot for commit 7a6ee14. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
@fzowl is attempting to deploy a commit to the Sim Team on Vercel. A member of the Team first needs to authorize it. |
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| - | - | MongoDB Credentials | 2d28d8b | apps/sim/app/api/tools/mongodb/utils.test.ts | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Greptile SummaryThis PR adds a VoyageAI integration covering text embeddings, multimodal embeddings (images + video via an internal proxy route), and document reranking, with matching block, registry entries, and a comprehensive test suite. Two issues need attention before merge:
Confidence Score: 3/5Two P1 issues — canonicalParamId constraint violation and missing response.ok guards — should be fixed before merging. The canonicalParamId === id violation is a documented critical rule that may cause the canonical param transformation layer to drop file input values at runtime. The missing response.ok check means real API errors (invalid key, rate limit) produce opaque TypeErrors rather than actionable messages. Both are present on the changed code paths and need resolution. apps/sim/blocks/blocks/voyageai.ts (canonicalParamId constraint), apps/sim/tools/voyageai/embeddings.ts and rerank.ts (response error handling)
|
| Filename | Overview |
|---|---|
| apps/sim/blocks/blocks/voyageai.ts | New VoyageAI block with embeddings, multimodal embeddings, and rerank operations; canonicalParamId equals the subblock id for imageFiles and videoFile, violating the documented constraint. |
| apps/sim/tools/voyageai/embeddings.ts | Text embeddings tool; transformResponse does not check response.ok, so API errors produce a cryptic TypeError instead of the actual error message. |
| apps/sim/tools/voyageai/rerank.ts | Rerank tool; same missing response.ok guard as embeddings.ts, plus truncation param declared in types but not used here. |
| apps/sim/tools/voyageai/types.ts | Type definitions for all three operations; truncation is declared on both VoyageAIEmbeddingsParams and VoyageAIRerankParams but never wired into any tool or block. |
| apps/sim/app/api/tools/voyageai/multimodal-embeddings/route.ts | Internal proxy route for multimodal embeddings; properly validates input with Zod, uses checkInternalAuth, validates URLs with DNS, and handles all file/URL content types correctly. |
| apps/sim/tools/voyageai/multimodal-embeddings.ts | Multimodal embeddings tool; routes through the internal proxy (correct pattern for file handling), response transformation delegates to the route's structured output. |
| apps/sim/tools/voyageai/voyageai.test.ts | Comprehensive unit tests for all three tools; uses vi.resetAllMocks() in afterEach which conflicts with the project's testing guidelines. |
| apps/sim/tools/voyageai/voyageai.integration.test.ts | Integration tests skipped without VOYAGEAI_API_KEY; uses undici to bypass global fetch mock correctly, covers embeddings, rerank, and multimodal scenarios. |
| apps/sim/blocks/blocks/voyageai.test.ts | Block-level unit tests with solid coverage of subBlock structure, tool routing, and params mapping; all assertions look correct. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[VoyageAI Block] --> B{Operation}
B -->|embeddings| C[embeddingsTool]
B -->|rerank| D[rerankTool]
B -->|multimodal| E[multimodalEmbeddingsTool]
C -->|POST direct| F[VoyageAI Embeddings API]
D -->|POST direct| G[VoyageAI Rerank API]
E -->|POST proxy| H[Internal Multimodal Route]
H --> I{Content type}
I -->|Text| J[content: text]
I -->|imageFiles| K[base64 encode via storage]
I -->|imageUrls| L[validateUrlWithDNS]
I -->|videoFile| M[base64 encode via storage]
I -->|videoUrl| N[validateUrlWithDNS]
J & K & L & M & N --> O[VoyageAI Multimodal API]
F & G & O --> P[embeddings / results / usage]
Reviews (1): Last reviewed commit: "revert: drop all MongoDB connection stri..." | Re-trigger Greptile
| id: 'imageFiles', | ||
| title: 'Image Files', | ||
| type: 'file-upload', | ||
| canonicalParamId: 'imageFiles', | ||
| placeholder: 'Upload image files', | ||
| condition: { field: 'operation', value: 'multimodal_embeddings' }, | ||
| mode: 'basic', | ||
| multiple: true, | ||
| acceptedTypes: '.jpg,.jpeg,.png,.gif,.webp', | ||
| }, | ||
| { | ||
| id: 'imageFilesRef', | ||
| title: 'Image Files', | ||
| type: 'short-input', | ||
| canonicalParamId: 'imageFiles', | ||
| placeholder: 'Reference image files from previous blocks', | ||
| condition: { field: 'operation', value: 'multimodal_embeddings' }, | ||
| mode: 'advanced', | ||
| }, |
There was a problem hiding this comment.
canonicalParamId must not match a subblock's own id
Both imageFiles (line 76) and videoFile (line 104) subblocks use the same string for id and canonicalParamId. The documented canonical-param rule explicitly states that canonicalParamId must not match any subblock's id — the canonical ID must be distinct from all raw subblock IDs so the framework can delete raw params after canonical transformation without accidentally removing the canonical value itself.
| id: 'imageFiles', | |
| title: 'Image Files', | |
| type: 'file-upload', | |
| canonicalParamId: 'imageFiles', | |
| placeholder: 'Upload image files', | |
| condition: { field: 'operation', value: 'multimodal_embeddings' }, | |
| mode: 'basic', | |
| multiple: true, | |
| acceptedTypes: '.jpg,.jpeg,.png,.gif,.webp', | |
| }, | |
| { | |
| id: 'imageFilesRef', | |
| title: 'Image Files', | |
| type: 'short-input', | |
| canonicalParamId: 'imageFiles', | |
| placeholder: 'Reference image files from previous blocks', | |
| condition: { field: 'operation', value: 'multimodal_embeddings' }, | |
| mode: 'advanced', | |
| }, | |
| { | |
| id: 'imageFilesUpload', | |
| title: 'Image Files', | |
| type: 'file-upload', | |
| canonicalParamId: 'imageFiles', | |
| placeholder: 'Upload image files', | |
| condition: { field: 'operation', value: 'multimodal_embeddings' }, | |
| mode: 'basic', | |
| multiple: true, | |
| acceptedTypes: '.jpg,.jpeg,.png,.gif,.webp', | |
| }, |
Apply the same fix to the videoFile subblock (id: 'videoFileUpload', canonicalParamId: 'videoFile').
| transformResponse: async (response) => { | ||
| const data = await response.json() | ||
| return { | ||
| success: true, | ||
| output: { | ||
| embeddings: data.data.map((item: { embedding: number[] }) => item.embedding), | ||
| model: data.model, | ||
| usage: { | ||
| total_tokens: data.usage.total_tokens, | ||
| }, | ||
| }, | ||
| } | ||
| }, |
There was a problem hiding this comment.
No
response.ok guard exposes cryptic errors to users
transformResponse unconditionally calls data.data.map(...) without checking response.ok. When the VoyageAI API returns a 401/429/500, data.data is undefined and the call throws TypeError: Cannot read properties of undefined (reading 'map'). Users see that message instead of the actual API error. The same issue exists in rerank.ts (transformResponse, line 67).
| transformResponse: async (response) => { | |
| const data = await response.json() | |
| return { | |
| success: true, | |
| output: { | |
| embeddings: data.data.map((item: { embedding: number[] }) => item.embedding), | |
| model: data.model, | |
| usage: { | |
| total_tokens: data.usage.total_tokens, | |
| }, | |
| }, | |
| } | |
| }, | |
| transformResponse: async (response) => { | |
| const data = await response.json() | |
| if (!response.ok) { | |
| throw new Error(data.detail ?? data.message ?? `VoyageAI API error: ${response.status}`) | |
| } | |
| return { | |
| success: true, | |
| output: { | |
| embeddings: data.data.map((item: { embedding: number[] }) => item.embedding), | |
| model: data.model, | |
| usage: { | |
| total_tokens: data.usage.total_tokens, | |
| }, | |
| }, | |
| } | |
| }, |
| input: string | string[] | ||
| model?: string | ||
| inputType?: 'query' | 'document' | ||
| truncation?: boolean |
There was a problem hiding this comment.
truncation param defined in types but never used
VoyageAIEmbeddingsParams.truncation (line 8) and VoyageAIRerankParams.truncation (line 17) are declared but never referenced in the tool definitions, request bodies, or block config. Remove them to avoid dead code, or wire them into the request bodies if the feature is intended.
| truncation?: boolean | |
| export interface VoyageAIEmbeddingsParams { | |
| apiKey: string | |
| input: string | string[] | |
| model?: string | |
| inputType?: 'query' | 'document' | |
| } |
| afterEach(() => { | ||
| tester.cleanup() | ||
| vi.resetAllMocks() | ||
| }) |
There was a problem hiding this comment.
afterEach(vi.resetAllMocks) anti-pattern per testing guidelines
The testing rules specify beforeEach(() => vi.clearAllMocks()) with no redundant afterEach. Using vi.resetAllMocks() in afterEach is also more aggressive (resets mock implementations too), which can silently mask test-isolation issues. Remove the afterEach calls and keep only beforeEach(() => vi.clearAllMocks()).
| afterEach(() => { | |
| tester.cleanup() | |
| vi.resetAllMocks() | |
| }) | |
| beforeEach(() => { | |
| tester = new ToolTester(embeddingsTool as any) | |
| vi.clearAllMocks() | |
| }) |
Apply the same change to the Rerank Tool describe block.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7a6ee14. Configure here.
| .split(/[,\n]/) | ||
| .map((u) => u.trim()) | ||
| .filter(Boolean) | ||
| } |
There was a problem hiding this comment.
JSON.parse of imageUrls may return non-array value
Medium Severity
JSON.parse(params.imageUrls) can succeed but return a non-array value (e.g., a plain string if input is a quoted URL like "\"https://example.com/img.jpg\""). In that case, urls is assigned a string instead of string[], and the subsequent for (const url of urls) loop iterates over individual characters, each failing URL validation with a confusing error. Adding an Array.isArray check after JSON.parse and falling back to the split logic would prevent this.
Reviewed by Cursor Bugbot for commit 7a6ee14. Configure here.
| <path | ||
| d='M12 2L3 7v10l9 5 9-5V7l-9-5zm0 2.18L18.36 7.5 12 10.82 5.64 7.5 12 4.18zM5 8.82l6 3.32v7.04l-6-3.32V8.82zm8 10.36V15.8l6-3.32v3.7l-6 3z' | ||
| fill='currentColor' | ||
| /> |
There was a problem hiding this comment.
SVG icon right face has incorrect path coordinates
Low Severity
The VoyageAIIcon SVG path draws a 3D hexagonal shape where the right face has incorrect coordinates. The left face spans from y=8.82 to y=19.18, but the right face only spans from y=12.48 to y=19.18 — shifted ~3.66 units downward and much smaller. The right face doesn't connect to the top diamond, creating an asymmetric, visually broken icon instead of a proper 3D cube/prism.
Reviewed by Cursor Bugbot for commit 7a6ee14. Configure here.


Summary
Brief description of what this PR does and why.
Fixes #(issue)
Type of Change
Testing
I added unit tests, integration tests and also tested manually.
Checklist
Screenshots/Videos