Databricks | Merge

Available Tools

list_clusters

List all clusters in the Databricks workspace. Returns cluster IDs, names, states, and configurations. Use this to find cluster IDs for other operations.

get_cluster

Get detailed information about a specific Databricks cluster including state, configuration, and resource allocation. Use list_clusters first to find the cluster ID.

create_cluster

Create a new Databricks cluster. Requires cluster name, Spark version, and node type. Specify num_workers for fixed size or autoscale_min/max_workers for autoscaling.

start_cluster

Start a terminated Databricks cluster. The cluster must be in TERMINATED state. Use list_clusters to find clusters and their states.

terminate_cluster

Terminate a running Databricks cluster. This stops the cluster but preserves its configuration for restarting. Use list_clusters to find cluster IDs.

list_jobs

List jobs in the Databricks workspace with optional name filter and pagination. Returns job IDs, names, and settings. Use page_token from response for next page.

get_job

Get detailed information about a specific Databricks job including tasks, schedule, and configuration. Use list_jobs first to find the job ID.

create_job

Create a new Databricks job with one or more tasks. Each task needs a task_key and type (notebook_task, spark_python_task, sql_task, etc). Supports scheduling with cron expressions.

delete_job

Permanently delete a Databricks job. This also cancels any active runs. Use list_jobs to find the job ID.

run_job_now

Trigger an immediate run of a Databricks job. Optionally pass notebook_params or python_named_params to override defaults. Use list_jobs to find the job ID.

list_job_runs

List job runs in the Databricks workspace. Filter by job_id, active_only, or completed_only. Supports offset/limit pagination. Returns run IDs, states, and timing info.

get_job_run

Get detailed information about a specific job run including state, timing, and task details. Use list_job_runs to find the run ID.

cancel_job_run

Cancel an active job run. The run must be in PENDING or RUNNING state. Use list_job_runs with active_only=true to find cancellable runs.

get_job_run_output

Get the output of a completed job run including notebook results, SQL output, logs, and error traces. Use list_job_runs to find the run ID.

execute_sql_statement

Execute a SQL statement on a Databricks SQL warehouse. Returns results synchronously within wait_timeout (default 10s) or a statement_id for async polling via get_sql_statement.

get_sql_statement

Get the status and results of a SQL statement execution. Use this to poll for results of async statements started with execute_sql_statement.

cancel_sql_statement

Cancel a running SQL statement execution. Use get_sql_statement first to verify the statement is still in PENDING or RUNNING state.

list_sql_warehouses

List all SQL warehouses in the Databricks workspace. Returns warehouse IDs, names, sizes, and states. Use this to find warehouse IDs for SQL execution.

get_sql_warehouse

Get detailed information about a specific SQL warehouse including state, size, cluster count, and active sessions. Use list_sql_warehouses to find the warehouse ID.

create_sql_warehouse

Create a new Databricks SQL warehouse. Requires a name and cluster_size (T-shirt sizing from 2X-Small to 4X-Large). Optionally configure autoscaling and auto-stop.

start_sql_warehouse

Start a stopped SQL warehouse. The warehouse must be in STOPPED state. Use list_sql_warehouses to find warehouses and their states.

stop_sql_warehouse

Stop a running SQL warehouse. This deallocates compute resources. Use list_sql_warehouses to find warehouses and their states.

list_workspace

List objects in a Databricks workspace directory. Returns notebooks, directories, files, repos, and libraries at the given path. Use ’/’ for the root directory.

get_workspace_object_status

Get metadata about a workspace object including type, language (for notebooks), and timestamps. Use list_workspace to find valid paths.

delete_workspace_object

Delete a workspace object (notebook, file, or directory). For non-empty directories, set recursive=true. Use list_workspace to find valid paths.