Skip to content

Feat/voyageai: adding voyageai integration#4070

Open
fzowl wants to merge 7 commits intosimstudioai:mainfrom
fzowl:feat/voyageai-mongodb-atlas
Open

Feat/voyageai: adding voyageai integration#4070
fzowl wants to merge 7 commits intosimstudioai:mainfrom
fzowl:feat/voyageai-mongodb-atlas

Conversation

@fzowl
Copy link
Copy Markdown

@fzowl fzowl commented Apr 9, 2026

Summary

Brief description of what this PR does and why.

Fixes #(issue)

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

I added unit tests, integration tests and also tested manually.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

fzowl added 7 commits March 23, 2026 18:38
…nnection string support

- Add VoyageAI tools: embeddings (voyage-3, voyage-3-large, etc.) and rerank (rerank-2, rerank-2-lite)
- Add VoyageAI block with operation dropdown (Generate Embeddings / Rerank)
- Add VoyageAI icon and register in tool/block registries
- Enhance MongoDB with connection string mode for Atlas (mongodb+srv://) support
- Add connection mode toggle to MongoDB block (Host & Port / Connection String)
- Update all 6 MongoDB API routes to accept optional connectionString
- Add 48 unit tests (VoyageAI tools, block config, MongoDB utils)
…geAI and MongoDB

- Expand VoyageAI tool tests: metadata, all models, edge cases, error codes (60 tests)
- Expand VoyageAI block tests: structure, subBlocks, conditions, params edge cases (44 tests)
- Expand MongoDB utils tests: connection modes, URI building, all validators (56 tests)
- Add live integration tests: embeddings (7 models/scenarios), rerank (5 scenarios), e2e workflow
- Integration tests use undici to bypass global fetch mock
- Tests skip gracefully when VOYAGEAI_API_KEY env var is not set
- Add voyage-4-large, voyage-4, voyage-4-lite embedding models
- Add voyage-3.5, voyage-3.5-lite embedding models
- Add rerank-2.5, rerank-2.5-lite reranking models
- Default embeddings model: voyage-3.5
- Default rerank model: rerank-2.5
- All models verified working with live API
…tegration

- New tool: voyageai_multimodal_embeddings using voyage-multimodal-3.5 model
- New API route: /api/tools/voyageai/multimodal-embeddings for server-side file handling
- Supports text, image files/URLs, video files/URLs in a single embedding
- Uses file-upload subBlocks with basic/advanced mode for images and video
- Internal proxy pattern: downloads UserFiles via downloadFileFromStorage, converts to base64
- URL validation via validateUrlWithDNS for SSRF protection
- 14 new unit tests (tool metadata, body, response transform)
- 5 new integration tests (text-only, image URL, text+image, dimensions, auth)
- 8 new block tests (multimodal operation, params, subBlocks)
Remove non-TSDoc separator comments, fix relative import in barrel
export, fix any types, and apply biome formatting fixes.
Reverts MongoDB Atlas connection string support due to validation
issues in the Zod schemas. VoyageAI integration remains intact.
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 9, 2026

PR Summary

Medium Risk
Adds new external API integrations (VoyageAI) including a server-side proxy that processes user-provided files/URLs into base64, which increases risk around request validation, SSRF/file handling, and payload sizes despite added auth and URL validation.

Overview
Introduces a new Voyage AI workflow block with operations for text embeddings, multimodal embeddings (text + images/videos), and reranking, including UI subblocks for models, input types, and file/URL inputs.

Registers three new tools (plus types) in the tools registry and adds a new internal Next.js route POST /api/tools/voyageai/multimodal-embeddings that authenticates internally, validates media URLs, converts uploaded/stored media to base64 data URLs, and forwards requests to VoyageAI.

Adds a VoyageAIIcon, updates .gitignore for .playwright-mcp/, and includes comprehensive unit + optional live API integration tests for the new block/tools.

Reviewed by Cursor Bugbot for commit 7a6ee14. Bugbot is set up for automated code reviews on this repo. Configure here.

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 9, 2026

@fzowl is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Apr 9, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
- - MongoDB Credentials 2d28d8b apps/sim/app/api/tools/mongodb/utils.test.ts View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@fzowl fzowl changed the title Feat/voyageai Feat/voyageai: adding voyageai integration Apr 9, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR adds a VoyageAI integration covering text embeddings, multimodal embeddings (images + video via an internal proxy route), and document reranking, with matching block, registry entries, and a comprehensive test suite.

Two issues need attention before merge:

  • canonicalParamId equals the subblock's own id for both imageFiles and videoFile subblocks in voyageai.ts, violating the documented critical constraint that could cause params to be dropped during canonical transformation.
  • transformResponse in embeddings.ts and rerank.ts does not check response.ok, so non-2xx API responses (e.g. 401, 429) produce a cryptic TypeError: Cannot read properties of undefined instead of surfacing the actual VoyageAI error message.

Confidence Score: 3/5

Two P1 issues — canonicalParamId constraint violation and missing response.ok guards — should be fixed before merging.

The canonicalParamId === id violation is a documented critical rule that may cause the canonical param transformation layer to drop file input values at runtime. The missing response.ok check means real API errors (invalid key, rate limit) produce opaque TypeErrors rather than actionable messages. Both are present on the changed code paths and need resolution.

apps/sim/blocks/blocks/voyageai.ts (canonicalParamId constraint), apps/sim/tools/voyageai/embeddings.ts and rerank.ts (response error handling)

Vulnerabilities

  • Image and video URLs are validated with validateUrlWithDNS before being passed to the VoyageAI API, preventing SSRF via crafted URLs.
  • The apiKey param correctly uses user-only visibility (not hidden) across all three tools, consistent with project policy.
  • Internal authentication is enforced via checkInternalAuth on the multimodal proxy route.
  • No secrets are logged; request IDs are used for traceability.
  • No other security concerns identified.

Important Files Changed

Filename Overview
apps/sim/blocks/blocks/voyageai.ts New VoyageAI block with embeddings, multimodal embeddings, and rerank operations; canonicalParamId equals the subblock id for imageFiles and videoFile, violating the documented constraint.
apps/sim/tools/voyageai/embeddings.ts Text embeddings tool; transformResponse does not check response.ok, so API errors produce a cryptic TypeError instead of the actual error message.
apps/sim/tools/voyageai/rerank.ts Rerank tool; same missing response.ok guard as embeddings.ts, plus truncation param declared in types but not used here.
apps/sim/tools/voyageai/types.ts Type definitions for all three operations; truncation is declared on both VoyageAIEmbeddingsParams and VoyageAIRerankParams but never wired into any tool or block.
apps/sim/app/api/tools/voyageai/multimodal-embeddings/route.ts Internal proxy route for multimodal embeddings; properly validates input with Zod, uses checkInternalAuth, validates URLs with DNS, and handles all file/URL content types correctly.
apps/sim/tools/voyageai/multimodal-embeddings.ts Multimodal embeddings tool; routes through the internal proxy (correct pattern for file handling), response transformation delegates to the route's structured output.
apps/sim/tools/voyageai/voyageai.test.ts Comprehensive unit tests for all three tools; uses vi.resetAllMocks() in afterEach which conflicts with the project's testing guidelines.
apps/sim/tools/voyageai/voyageai.integration.test.ts Integration tests skipped without VOYAGEAI_API_KEY; uses undici to bypass global fetch mock correctly, covers embeddings, rerank, and multimodal scenarios.
apps/sim/blocks/blocks/voyageai.test.ts Block-level unit tests with solid coverage of subBlock structure, tool routing, and params mapping; all assertions look correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[VoyageAI Block] --> B{Operation}
    B -->|embeddings| C[embeddingsTool]
    B -->|rerank| D[rerankTool]
    B -->|multimodal| E[multimodalEmbeddingsTool]
    C -->|POST direct| F[VoyageAI Embeddings API]
    D -->|POST direct| G[VoyageAI Rerank API]
    E -->|POST proxy| H[Internal Multimodal Route]
    H --> I{Content type}
    I -->|Text| J[content: text]
    I -->|imageFiles| K[base64 encode via storage]
    I -->|imageUrls| L[validateUrlWithDNS]
    I -->|videoFile| M[base64 encode via storage]
    I -->|videoUrl| N[validateUrlWithDNS]
    J & K & L & M & N --> O[VoyageAI Multimodal API]
    F & G & O --> P[embeddings / results / usage]
Loading

Reviews (1): Last reviewed commit: "revert: drop all MongoDB connection stri..." | Re-trigger Greptile

Comment on lines +76 to +94
id: 'imageFiles',
title: 'Image Files',
type: 'file-upload',
canonicalParamId: 'imageFiles',
placeholder: 'Upload image files',
condition: { field: 'operation', value: 'multimodal_embeddings' },
mode: 'basic',
multiple: true,
acceptedTypes: '.jpg,.jpeg,.png,.gif,.webp',
},
{
id: 'imageFilesRef',
title: 'Image Files',
type: 'short-input',
canonicalParamId: 'imageFiles',
placeholder: 'Reference image files from previous blocks',
condition: { field: 'operation', value: 'multimodal_embeddings' },
mode: 'advanced',
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 canonicalParamId must not match a subblock's own id

Both imageFiles (line 76) and videoFile (line 104) subblocks use the same string for id and canonicalParamId. The documented canonical-param rule explicitly states that canonicalParamId must not match any subblock's id — the canonical ID must be distinct from all raw subblock IDs so the framework can delete raw params after canonical transformation without accidentally removing the canonical value itself.

Suggested change
id: 'imageFiles',
title: 'Image Files',
type: 'file-upload',
canonicalParamId: 'imageFiles',
placeholder: 'Upload image files',
condition: { field: 'operation', value: 'multimodal_embeddings' },
mode: 'basic',
multiple: true,
acceptedTypes: '.jpg,.jpeg,.png,.gif,.webp',
},
{
id: 'imageFilesRef',
title: 'Image Files',
type: 'short-input',
canonicalParamId: 'imageFiles',
placeholder: 'Reference image files from previous blocks',
condition: { field: 'operation', value: 'multimodal_embeddings' },
mode: 'advanced',
},
{
id: 'imageFilesUpload',
title: 'Image Files',
type: 'file-upload',
canonicalParamId: 'imageFiles',
placeholder: 'Upload image files',
condition: { field: 'operation', value: 'multimodal_embeddings' },
mode: 'basic',
multiple: true,
acceptedTypes: '.jpg,.jpeg,.png,.gif,.webp',
},

Apply the same fix to the videoFile subblock (id: 'videoFileUpload', canonicalParamId: 'videoFile').

Comment on lines +58 to +70
transformResponse: async (response) => {
const data = await response.json()
return {
success: true,
output: {
embeddings: data.data.map((item: { embedding: number[] }) => item.embedding),
model: data.model,
usage: {
total_tokens: data.usage.total_tokens,
},
},
}
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 No response.ok guard exposes cryptic errors to users

transformResponse unconditionally calls data.data.map(...) without checking response.ok. When the VoyageAI API returns a 401/429/500, data.data is undefined and the call throws TypeError: Cannot read properties of undefined (reading 'map'). Users see that message instead of the actual API error. The same issue exists in rerank.ts (transformResponse, line 67).

Suggested change
transformResponse: async (response) => {
const data = await response.json()
return {
success: true,
output: {
embeddings: data.data.map((item: { embedding: number[] }) => item.embedding),
model: data.model,
usage: {
total_tokens: data.usage.total_tokens,
},
},
}
},
transformResponse: async (response) => {
const data = await response.json()
if (!response.ok) {
throw new Error(data.detail ?? data.message ?? `VoyageAI API error: ${response.status}`)
}
return {
success: true,
output: {
embeddings: data.data.map((item: { embedding: number[] }) => item.embedding),
model: data.model,
usage: {
total_tokens: data.usage.total_tokens,
},
},
}
},

input: string | string[]
model?: string
inputType?: 'query' | 'document'
truncation?: boolean
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 truncation param defined in types but never used

VoyageAIEmbeddingsParams.truncation (line 8) and VoyageAIRerankParams.truncation (line 17) are declared but never referenced in the tool definitions, request bodies, or block config. Remove them to avoid dead code, or wire them into the request bodies if the feature is intended.

Suggested change
truncation?: boolean
export interface VoyageAIEmbeddingsParams {
apiKey: string
input: string | string[]
model?: string
inputType?: 'query' | 'document'
}

Comment on lines +17 to +20
afterEach(() => {
tester.cleanup()
vi.resetAllMocks()
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 afterEach(vi.resetAllMocks) anti-pattern per testing guidelines

The testing rules specify beforeEach(() => vi.clearAllMocks()) with no redundant afterEach. Using vi.resetAllMocks() in afterEach is also more aggressive (resets mock implementations too), which can silently mask test-isolation issues. Remove the afterEach calls and keep only beforeEach(() => vi.clearAllMocks()).

Suggested change
afterEach(() => {
tester.cleanup()
vi.resetAllMocks()
})
beforeEach(() => {
tester = new ToolTester(embeddingsTool as any)
vi.clearAllMocks()
})

Apply the same change to the Rerank Tool describe block.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7a6ee14. Configure here.

.split(/[,\n]/)
.map((u) => u.trim())
.filter(Boolean)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON.parse of imageUrls may return non-array value

Medium Severity

JSON.parse(params.imageUrls) can succeed but return a non-array value (e.g., a plain string if input is a quoted URL like "\"https://example.com/img.jpg\""). In that case, urls is assigned a string instead of string[], and the subsequent for (const url of urls) loop iterates over individual characters, each failing URL validation with a confusing error. Adding an Array.isArray check after JSON.parse and falling back to the split logic would prevent this.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7a6ee14. Configure here.

<path
d='M12 2L3 7v10l9 5 9-5V7l-9-5zm0 2.18L18.36 7.5 12 10.82 5.64 7.5 12 4.18zM5 8.82l6 3.32v7.04l-6-3.32V8.82zm8 10.36V15.8l6-3.32v3.7l-6 3z'
fill='currentColor'
/>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SVG icon right face has incorrect path coordinates

Low Severity

The VoyageAIIcon SVG path draws a 3D hexagonal shape where the right face has incorrect coordinates. The left face spans from y=8.82 to y=19.18, but the right face only spans from y=12.48 to y=19.18 — shifted ~3.66 units downward and much smaller. The right face doesn't connect to the top diamond, creating an asymmetric, visually broken icon instead of a proper 3D cube/prism.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7a6ee14. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant