Feat/voyageai: adding voyageai integration by fzowl · Pull Request #4070 · simstudioai/sim

fzowl · 2026-04-09T06:16:40Z

Summary

Brief description of what this PR does and why.

Fixes #(issue)

Type of Change

Testing

I added unit tests, integration tests and also tested manually.

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

…nnection string support - Add VoyageAI tools: embeddings (voyage-3, voyage-3-large, etc.) and rerank (rerank-2, rerank-2-lite) - Add VoyageAI block with operation dropdown (Generate Embeddings / Rerank) - Add VoyageAI icon and register in tool/block registries - Enhance MongoDB with connection string mode for Atlas (mongodb+srv://) support - Add connection mode toggle to MongoDB block (Host & Port / Connection String) - Update all 6 MongoDB API routes to accept optional connectionString - Add 48 unit tests (VoyageAI tools, block config, MongoDB utils)

…geAI and MongoDB - Expand VoyageAI tool tests: metadata, all models, edge cases, error codes (60 tests) - Expand VoyageAI block tests: structure, subBlocks, conditions, params edge cases (44 tests) - Expand MongoDB utils tests: connection modes, URI building, all validators (56 tests) - Add live integration tests: embeddings (7 models/scenarios), rerank (5 scenarios), e2e workflow - Integration tests use undici to bypass global fetch mock - Tests skip gracefully when VOYAGEAI_API_KEY env var is not set

- Add voyage-4-large, voyage-4, voyage-4-lite embedding models - Add voyage-3.5, voyage-3.5-lite embedding models - Add rerank-2.5, rerank-2.5-lite reranking models - Default embeddings model: voyage-3.5 - Default rerank model: rerank-2.5 - All models verified working with live API

…tegration - New tool: voyageai_multimodal_embeddings using voyage-multimodal-3.5 model - New API route: /api/tools/voyageai/multimodal-embeddings for server-side file handling - Supports text, image files/URLs, video files/URLs in a single embedding - Uses file-upload subBlocks with basic/advanced mode for images and video - Internal proxy pattern: downloads UserFiles via downloadFileFromStorage, converts to base64 - URL validation via validateUrlWithDNS for SSRF protection - 14 new unit tests (tool metadata, body, response transform) - 5 new integration tests (text-only, image URL, text+image, dimensions, auth) - 8 new block tests (multimodal operation, params, subBlocks)

Remove non-TSDoc separator comments, fix relative import in barrel export, fix any types, and apply biome formatting fixes.

Reverts MongoDB Atlas connection string support due to validation issues in the Zod schemas. VoyageAI integration remains intact.

cursor · 2026-04-09T06:16:50Z

PR Summary

Medium Risk
Adds new external API integrations (VoyageAI) including a server-side proxy that processes user-provided files/URLs into base64, which increases risk around request validation, SSRF/file handling, and payload sizes despite added auth and URL validation.

Overview
Introduces a new Voyage AI workflow block with operations for text embeddings, multimodal embeddings (text + images/videos), and reranking, including UI subblocks for models, input types, and file/URL inputs.

Registers three new tools (plus types) in the tools registry and adds a new internal Next.js route POST /api/tools/voyageai/multimodal-embeddings that authenticates internally, validates media URLs, converts uploaded/stored media to base64 data URLs, and forwards requests to VoyageAI.

Adds a VoyageAIIcon, updates .gitignore for .playwright-mcp/, and includes comprehensive unit + optional live API integration tests for the new block/tools.

^{Reviewed by Cursor Bugbot for commit 7a6ee14. Bugbot is set up for automated code reviews on this repo. Configure here.}

vercel · 2026-04-09T06:16:52Z

@fzowl is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

gitguardian · 2026-04-09T06:17:22Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
-	-	MongoDB Credentials	`2d28d8b`	apps/sim/app/api/tools/mongodb/utils.test.ts	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

greptile-apps · 2026-04-09T06:21:17Z

Greptile Summary

This PR adds a VoyageAI integration covering text embeddings, multimodal embeddings (images + video via an internal proxy route), and document reranking, with matching block, registry entries, and a comprehensive test suite.

Two issues need attention before merge:

canonicalParamId equals the subblock's own id for both imageFiles and videoFile subblocks in voyageai.ts, violating the documented critical constraint that could cause params to be dropped during canonical transformation.
transformResponse in embeddings.ts and rerank.ts does not check response.ok, so non-2xx API responses (e.g. 401, 429) produce a cryptic TypeError: Cannot read properties of undefined instead of surfacing the actual VoyageAI error message.

Confidence Score: 3/5

Two P1 issues — canonicalParamId constraint violation and missing response.ok guards — should be fixed before merging.

The canonicalParamId === id violation is a documented critical rule that may cause the canonical param transformation layer to drop file input values at runtime. The missing response.ok check means real API errors (invalid key, rate limit) produce opaque TypeErrors rather than actionable messages. Both are present on the changed code paths and need resolution.

apps/sim/blocks/blocks/voyageai.ts (canonicalParamId constraint), apps/sim/tools/voyageai/embeddings.ts and rerank.ts (response error handling)

Vulnerabilities

Image and video URLs are validated with validateUrlWithDNS before being passed to the VoyageAI API, preventing SSRF via crafted URLs.
The apiKey param correctly uses user-only visibility (not hidden) across all three tools, consistent with project policy.
Internal authentication is enforced via checkInternalAuth on the multimodal proxy route.
No secrets are logged; request IDs are used for traceability.
No other security concerns identified.

Important Files Changed

Filename	Overview
apps/sim/blocks/blocks/voyageai.ts	New VoyageAI block with embeddings, multimodal embeddings, and rerank operations; canonicalParamId equals the subblock id for imageFiles and videoFile, violating the documented constraint.
apps/sim/tools/voyageai/embeddings.ts	Text embeddings tool; transformResponse does not check response.ok, so API errors produce a cryptic TypeError instead of the actual error message.
apps/sim/tools/voyageai/rerank.ts	Rerank tool; same missing response.ok guard as embeddings.ts, plus truncation param declared in types but not used here.
apps/sim/tools/voyageai/types.ts	Type definitions for all three operations; truncation is declared on both VoyageAIEmbeddingsParams and VoyageAIRerankParams but never wired into any tool or block.
apps/sim/app/api/tools/voyageai/multimodal-embeddings/route.ts	Internal proxy route for multimodal embeddings; properly validates input with Zod, uses checkInternalAuth, validates URLs with DNS, and handles all file/URL content types correctly.
apps/sim/tools/voyageai/multimodal-embeddings.ts	Multimodal embeddings tool; routes through the internal proxy (correct pattern for file handling), response transformation delegates to the route's structured output.
apps/sim/tools/voyageai/voyageai.test.ts	Comprehensive unit tests for all three tools; uses vi.resetAllMocks() in afterEach which conflicts with the project's testing guidelines.
apps/sim/tools/voyageai/voyageai.integration.test.ts	Integration tests skipped without VOYAGEAI_API_KEY; uses undici to bypass global fetch mock correctly, covers embeddings, rerank, and multimodal scenarios.
apps/sim/blocks/blocks/voyageai.test.ts	Block-level unit tests with solid coverage of subBlock structure, tool routing, and params mapping; all assertions look correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[VoyageAI Block] --> B{Operation}
    B -->|embeddings| C[embeddingsTool]
    B -->|rerank| D[rerankTool]
    B -->|multimodal| E[multimodalEmbeddingsTool]
    C -->|POST direct| F[VoyageAI Embeddings API]
    D -->|POST direct| G[VoyageAI Rerank API]
    E -->|POST proxy| H[Internal Multimodal Route]
    H --> I{Content type}
    I -->|Text| J[content: text]
    I -->|imageFiles| K[base64 encode via storage]
    I -->|imageUrls| L[validateUrlWithDNS]
    I -->|videoFile| M[base64 encode via storage]
    I -->|videoUrl| N[validateUrlWithDNS]
    J & K & L & M & N --> O[VoyageAI Multimodal API]
    F & G & O --> P[embeddings / results / usage]

_{Reviews (1): Last reviewed commit: "revert: drop all MongoDB connection stri..." | Re-trigger Greptile}

greptile-apps · 2026-04-09T06:21:21Z

apps/sim/blocks/blocks/voyageai.ts

+      id: 'imageFiles',
+      title: 'Image Files',
+      type: 'file-upload',
+      canonicalParamId: 'imageFiles',
+      placeholder: 'Upload image files',
+      condition: { field: 'operation', value: 'multimodal_embeddings' },
+      mode: 'basic',
+      multiple: true,
+      acceptedTypes: '.jpg,.jpeg,.png,.gif,.webp',
+    },
+    {
+      id: 'imageFilesRef',
+      title: 'Image Files',
+      type: 'short-input',
+      canonicalParamId: 'imageFiles',
+      placeholder: 'Reference image files from previous blocks',
+      condition: { field: 'operation', value: 'multimodal_embeddings' },
+      mode: 'advanced',
+    },


canonicalParamId must not match a subblock's own id

Both imageFiles (line 76) and videoFile (line 104) subblocks use the same string for id and canonicalParamId. The documented canonical-param rule explicitly states that canonicalParamId must not match any subblock's id — the canonical ID must be distinct from all raw subblock IDs so the framework can delete raw params after canonical transformation without accidentally removing the canonical value itself.

Suggested change

id: 'imageFiles',

title: 'Image Files',

type: 'file-upload',

canonicalParamId: 'imageFiles',

placeholder: 'Upload image files',

condition: { field: 'operation', value: 'multimodal_embeddings' },

mode: 'basic',

multiple: true,

acceptedTypes: '.jpg,.jpeg,.png,.gif,.webp',

},

{

id: 'imageFilesRef',

title: 'Image Files',

type: 'short-input',

canonicalParamId: 'imageFiles',

placeholder: 'Reference image files from previous blocks',

condition: { field: 'operation', value: 'multimodal_embeddings' },

mode: 'advanced',

},

{

id: 'imageFilesUpload',

title: 'Image Files',

type: 'file-upload',

canonicalParamId: 'imageFiles',

placeholder: 'Upload image files',

condition: { field: 'operation', value: 'multimodal_embeddings' },

mode: 'basic',

multiple: true,

acceptedTypes: '.jpg,.jpeg,.png,.gif,.webp',

},

Apply the same fix to the videoFile subblock (id: 'videoFileUpload', canonicalParamId: 'videoFile').

greptile-apps · 2026-04-09T06:21:22Z

apps/sim/tools/voyageai/embeddings.ts

+  transformResponse: async (response) => {
+    const data = await response.json()
+    return {
+      success: true,
+      output: {
+        embeddings: data.data.map((item: { embedding: number[] }) => item.embedding),
+        model: data.model,
+        usage: {
+          total_tokens: data.usage.total_tokens,
+        },
+      },
+    }
+  },


No response.ok guard exposes cryptic errors to users

transformResponse unconditionally calls data.data.map(...) without checking response.ok. When the VoyageAI API returns a 401/429/500, data.data is undefined and the call throws TypeError: Cannot read properties of undefined (reading 'map'). Users see that message instead of the actual API error. The same issue exists in rerank.ts (transformResponse, line 67).

Suggested change

transformResponse: async (response) => {

const data = await response.json()

return {

success: true,

output: {

embeddings: data.data.map((item: { embedding: number[] }) => item.embedding),

model: data.model,

usage: {

total_tokens: data.usage.total_tokens,

},

},

}

},

transformResponse: async (response) => {

const data = await response.json()

if (!response.ok) {

throw new Error(data.detail ?? data.message ?? `VoyageAI API error: ${response.status}`)

}

return {

success: true,

output: {

embeddings: data.data.map((item: { embedding: number[] }) => item.embedding),

model: data.model,

usage: {

total_tokens: data.usage.total_tokens,

},

},

}

},

greptile-apps · 2026-04-09T06:21:23Z

apps/sim/tools/voyageai/types.ts

+  input: string | string[]
+  model?: string
+  inputType?: 'query' | 'document'
+  truncation?: boolean


truncation param defined in types but never used

VoyageAIEmbeddingsParams.truncation (line 8) and VoyageAIRerankParams.truncation (line 17) are declared but never referenced in the tool definitions, request bodies, or block config. Remove them to avoid dead code, or wire them into the request bodies if the feature is intended.

Suggested change

truncation?: boolean

export interface VoyageAIEmbeddingsParams {

apiKey: string

input: string | string[]

model?: string

inputType?: 'query' | 'document'

}

greptile-apps · 2026-04-09T06:21:24Z

apps/sim/tools/voyageai/voyageai.test.ts

+  afterEach(() => {
+    tester.cleanup()
+    vi.resetAllMocks()
+  })


afterEach(vi.resetAllMocks) anti-pattern per testing guidelines

The testing rules specify beforeEach(() => vi.clearAllMocks()) with no redundant afterEach. Using vi.resetAllMocks() in afterEach is also more aggressive (resets mock implementations too), which can silently mask test-isolation issues. Remove the afterEach calls and keep only beforeEach(() => vi.clearAllMocks()).

Suggested change

afterEach(() => {

tester.cleanup()

vi.resetAllMocks()

})

beforeEach(() => {

tester = new ToolTester(embeddingsTool as any)

vi.clearAllMocks()

})

Apply the same change to the Rerank Tool describe block.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 7a6ee14. Configure here.}

cursor · 2026-04-09T06:23:27Z

apps/sim/app/api/tools/voyageai/multimodal-embeddings/route.ts

+          .split(/[,\n]/)
+          .map((u) => u.trim())
+          .filter(Boolean)
+      }


JSON.parse of imageUrls may return non-array value

Medium Severity

JSON.parse(params.imageUrls) can succeed but return a non-array value (e.g., a plain string if input is a quoted URL like "\"https://example.com/img.jpg\""). In that case, urls is assigned a string instead of string[], and the subsequent for (const url of urls) loop iterates over individual characters, each failing URL validation with a confusing error. Adding an Array.isArray check after JSON.parse and falling back to the split logic would prevent this.

^{Reviewed by Cursor Bugbot for commit 7a6ee14. Configure here.}

cursor · 2026-04-09T06:23:27Z

apps/sim/components/icons.tsx

+      <path
+        d='M12 2L3 7v10l9 5 9-5V7l-9-5zm0 2.18L18.36 7.5 12 10.82 5.64 7.5 12 4.18zM5 8.82l6 3.32v7.04l-6-3.32V8.82zm8 10.36V15.8l6-3.32v3.7l-6 3z'
+        fill='currentColor'
+      />


SVG icon right face has incorrect path coordinates

Low Severity

The VoyageAIIcon SVG path draws a 3D hexagonal shape where the right face has incorrect coordinates. The left face spans from y=8.82 to y=19.18, but the right face only spans from y=12.48 to y=19.18 — shifted ~3.66 units downward and much smaller. The right face doesn't connect to the top diamond, creating an asymmetric, visually broken icon instead of a proper 3D cube/prism.

^{Reviewed by Cursor Bugbot for commit 7a6ee14. Configure here.}

fzowl added 7 commits March 23, 2026 18:38

chore: add .playwright-mcp to gitignore

0e6dac0

style: fix code review issues in VoyageAI integration

40153de

Remove non-TSDoc separator comments, fix relative import in barrel export, fix any types, and apply biome formatting fixes.

revert: drop all MongoDB connection string changes

7a6ee14

Reverts MongoDB Atlas connection string support due to validation issues in the Zod schemas. VoyageAI integration remains intact.

fzowl changed the title ~~Feat/voyageai~~ Feat/voyageai: adding voyageai integration Apr 9, 2026

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

cursor bot reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/voyageai: adding voyageai integration#4070

Feat/voyageai: adding voyageai integration#4070
fzowl wants to merge 7 commits intosimstudioai:mainfrom
fzowl:feat/voyageai-mongodb-atlas

fzowl commented Apr 9, 2026 •

edited

Loading

Uh oh!

cursor bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

vercel bot commented Apr 9, 2026

Uh oh!

gitguardian bot commented Apr 9, 2026

Uh oh!

greptile-apps bot commented Apr 9, 2026

Vulnerabilities

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

greptile-apps bot Apr 9, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 9, 2026

Uh oh!

cursor bot Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fzowl commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

Testing

Checklist

Screenshots/Videos

Uh oh!

cursor bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

vercel bot commented Apr 9, 2026

Uh oh!

gitguardian bot commented Apr 9, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Uh oh!

greptile-apps bot commented Apr 9, 2026

Greptile Summary

Confidence Score: 3/5

Vulnerabilities

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 9, 2026

Choose a reason for hiding this comment

JSON.parse of imageUrls may return non-array value

Uh oh!

cursor bot Apr 9, 2026

Choose a reason for hiding this comment

SVG icon right face has incorrect path coordinates

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fzowl commented Apr 9, 2026 •

edited

Loading

cursor bot commented Apr 9, 2026 •

edited

Loading