Deploying Orchestra as a Snowflake Native Application | The future of Postgres in Snowflake

Two things to consider here that appear unrelated but arent'

Sep 05, 2025

Foreword

Some of the biggest pains we see facing data engineers and analytics engineers is that they spend too much time maintaining boilerplate infrastructure, and still have no visibility into pipeline failures.

It means you’re constantly fighting fires and don’t have time to focus on building. What’s worse is that the business doesn’t trust the data.

At Orchestra we’re building a Unified Control Plane for Data Ops. A Data Status page if you like, with some incredible features to give data teams their time back so they can focus on building. You can try it now, free, here.

Introduction

Last week we launched Orchestra as an App on the Snowflake Marketplace.

The idea here is to make it easier for teams from an A) ecosystem but B) billing perspective.

Why manage things in two places when you can manage them in 1? Same goes for billing? Time saved is money saved.

I recently heard from Marco Sloot at Snowflake in London who is one of the lead engineers for Crunchy Data, the Postgres start-up Snowflake acquired earlier this year.

The vision is pretty cool - to build an RDS-Scale service from within Snowflake.

Very interesting, and very appealing. But the people in the room are confused. The people in the room are data people. They already have a database, called Snowflake. They don’t need another one, or if they do, they don’t know why they need it.

It is hard at this point to see the value of a Snowflake-deployed Postgres vs. a simply put Postgres scaler on the snowflake app marketplace. It will be interesting to see if data engineers and analytics folks begin to demand separate DBs.

Of course - the idea is you build your application layer with the storage ON Snowflake. That however, will require marketing to a completely different demographic. Just as Databricks are trying to do.

Time will tell - I do not think you can convince software engineers to decouple storage and compute from the rest of the cloud ecosystem.

It may yet be even harder to convince AI.

Orchestra as a Snowflake Native App

We’re thrilled to announce that Orchestra is now available within Snowflake as a Native Application.

In addition to the core Orchestra services, this includes the Orchestra Metadata App, which we’ve custom-built for Snowflake users looking to capitalise on AI as a Snowflake Native Application.

Many companies rely on Data Catalogs, Data Lineage, and Observability tools to collect their Metadata. However these approaches are not suited for an AI-Native Architecture.

Third party tools require extensive integration with Snowflake and other tools like Orchestrators
Metadata sits externally to Snowflake, which means MCP Servers must be stood up by Data teams for lineage data to be accessible
Data is not updated in real-time, which means AI Agents suffer from latency issues

When you run Orchestra on Snowflake, the Metadata App automatically moves data from the Orchestra metadata store to Snowflake which means this metadata is available in your single source of truth (“SSOT”) in real-time.

This means agents running data workflows have ready access to metadata. Agents running on Snowflake can therefore:

Easily Fix and recommend changes when pipelines fail
Easily identify anomlies and “long-running jobs”
Diagnose the root cause of errors and produce impact analysis

For the Organisation:

✅ Unified Observability in your warehouse: all the data showing why your pipelines are failing, who owns what, and how long everything is taking and costing is automatically centralised in your warehouse

✅ Cost Savings and Efficiencies: Analyse pipeline performance, monitor costs, track dependencies, and troubleshoot with unprecedented detail, all within your familiar Snowflake environment in an automated way

✅ AI and MCP ready: The Data Team don’t need to worry about building MCP-Servers to surface data for AI Agents — it’s all in Snowflake already

✅ Native Integration and security: You’re keeping architecture lean and elegant by keeping Snowflake as your data store and running Orchestra within it.

🔗 Learn more and get started with the Orchestra Metadata App today!

📚 Check-out the docs here

Setting up the Orchestra Snowflake Native App

Head over to the listing where you can learn more about the Snowflake Native App

Press enter or click to view image in full sizeDownloading the Snowflake Metadata App

2. Ensure you have an Orchestra Account

3. After installing the application in Snowflake, it’s time to follow the set-up steps here and below

Set-up Steps

Install the app from the Snowflake Marketplace
Grant the External Access Integration reference when prompted
Run the setup procedure to create the API access objects

Setup

After installation, run the following to create the API access objects:

-- Create the EAI objects (run this after granting the reference)
CALL core.create_eai_objects();

This procedure creates the necessary stored procedures that can access the Orchestra API using the configured External Access Integration.

Usage

Fetch Metadata

The app provides three main procedures to fetch data from the Orchestra API:

-- Get pipeline runs (defaults to page 1, 100 results per page)
SELECT core.get_pipeline_runs();

-- Get task runs (defaults to page 1, 100 results per page)
SELECT core.get_task_runs();
-- Get operations (defaults to page 1, 100 results per page)
SELECT core.get_operations();

You can also specify custom pagination:

-- Get pipeline runs with custom pagination
SELECT core.get_pipeline_runs(2, 50);  -- page 2, 50 results per page

Load Data into Tables

The app automatically creates the following tables in the public schema:

pipeline_runs - Stores pipeline run metadata
task_runs - Stores task run metadata
operations - Stores operation metadata

To load data into these tables:

-- Load pipeline runs data (inserts new or updates existing)
CALL core.load_pipeline_runs();

-- Load task runs data (inserts new or updates existing)
CALL core.load_task_runs();
-- Load operations data (inserts new or updates existing)
CALL core.load_operations();

Each procedure will:

Fetch the latest data from the Orchestra API
Transform the data to match the table schema
Use MERGE operations to handle existing records:

New records: Inserted into the table
Existing records: Updated with latest data from the AP

Return a success message with the number of records processed

Note: These procedures are idempotent — you can run them multiple times safely without creating duplicates or errors.

Extract Specific Fields

-- Extract specific fields from pipeline runs
SELECT
    value:id::STRING as pipeline_run_id,
    value:pipelineId::STRING as pipeline_id,
    value:runStatus::STRING as status,
    value:startedAt::TIMESTAMP_NTZ as started_at
FROM TABLE(FLATTEN(input => core.get_pipeline_runs():results));

Query the Loaded Data

Once data is loaded into tables, you can query it directly:

-- Query pipeline runs
SELECT * FROM public.pipeline_runs ORDER BY created_at DESC;

-- Query task runs with status filter
SELECT * FROM public.task_runs WHERE status = 'SUCCESS';
-- Query operations for a specific pipeline run
SELECT * FROM public.operations WHERE pipeline_run_id = 'your-pipeline-run-id';

Security

API keys are handled securely through Snowflake secrets
All API calls use HTTPS
Network access is restricted to app.getorchestra.io
Proper error handling for failed API calls
External Access Integration ensures secure external API access

Error Handling

The procedures return error information if API calls fail:

-- Check for errors in API response
SELECT
    CASE
        WHEN result:error IS NOT NULL THEN 'Error: ' || result:error::STRING
        ELSE 'Success'
    END as status
FROM (SELECT core.get_pipeline_runs() as result);

Table Schemas

Pipeline Runs Table

id - Unique pipeline run identifier (PRIMARY KEY, NOT NULL, UNIQUE)
pipeline_id - Pipeline identifier
pipeline_name - Name of the pipeline
account_id - Account identifier
env_id - Environment identifier
env_name - Environment name
run_status - Status of the pipeline run
message - Status message
created_at - Creation timestamp
updated_at - Last update timestamp
completed_at - Completion timestamp
started_at - Start timestamp
branch - Git branch
commit - Git commit hash
pipeline_version_number - Pipeline version
loaded_at - When the record was loaded into Snowflake

Use-case: identify long-running pipelines and see why:

Press enter or click to view image in full sizeLong-running pipelines

Task Runs Table

id - Unique task run identifier (PRIMARY KEY, NOT NULL, UNIQUE)
pipeline_run_id - Associated pipeline run
task_name - Name of the task
task_id - Task identifier
account_id - Account identifier
pipeline_id - Pipeline identifier
integration - Integration type
integration_job - Integration job name
status - Task status
message - Status message
external_status - External system status
external_message - External system message
platform_link - Link to external platform
task_parameters - Task parameters (VARIANT)
run_parameters - Run parameters (VARIANT)
connection_id - Connection identifier
number_of_attempts - Number of execution attempts
created_at - Creation timestamp
updated_at - Last update timestamp
completed_at - Completion timestamp
started_at - Start timestamp
loaded_at - When the record was loaded into Snowflake

Use-Case: identify long-running task runs

This particular task run took 74 seconds when ir nomally takes 50. This is an increase of about 50%, and should be investigated

Operations Table

id - Unique operation identifier (PRIMARY KEY, NOT NULL, UNIQUE)
account_id - Account identifier
pipeline_run_id - Associated pipeline run
task_run_id - Associated task run
inserted_at - Insertion timestamp
message - Operation message
operation_name - Name of the operation
operation_status - Operation status
operation_type - Type of operation
external_status - External system status
external_detail - External system details
external_id - External system identifier
integration - Integration type
integration_job - Integration job name
started_at - Start timestamp
completed_at - Completion timestamp
dependencies - Operation dependencies (VARIANT)
operation_duration - Duration in seconds
rows_affected - Number of rows affected
loaded_at - When the record was loaded into Snowflake

Use-case: find the distribution of cost/time spent for different dbt models and tests:

Simply filtering for dbt style operations (either in dbt core or in dbt Cloud) gives you a clear way to optimise and monitor test and materialisations over time

Support

For support, please contact:

Email: support@getorchestra.io
Rest API Documentation: https://docs.getorchestra.io/docs/metadata-api/overview

Find out more about Orchestra

Orchestra is a unified control plane for Data and AI Operations.

We help Data Teams spend less time maintaining infrastructure, make them proactive instead of reactive, and ultimately win trust in data and AI from the Business

We do this by consolidating Orchestration with monitoring, data quality testing, and data discovery. You don’t need an observability, lineage, catalog etc. with Orchestra.

Check out

The Orchestra Data Leadership Newsletter

Discussion about this post