Databricks
Available Tools
List all clusters in the Databricks workspace. Returns cluster IDs, names, states, and configurations. Use this to find cluster IDs for other operations.
Get detailed information about a specific Databricks cluster including state, configuration, and resource allocation. Use list_clusters first to find the cluster ID.
Create a new Databricks cluster. Requires cluster name, Spark version, and node type. Specify num_workers for fixed size or autoscale_min/max_workers for autoscaling.
Start a terminated Databricks cluster. The cluster must be in TERMINATED state. Use list_clusters to find clusters and their states.
Terminate a running Databricks cluster. This stops the cluster but preserves its configuration for restarting. Use list_clusters to find cluster IDs.
List jobs in the Databricks workspace with optional name filter and pagination. Returns job IDs, names, and settings. Use page_token from response for next page.
Get detailed information about a specific Databricks job including tasks, schedule, and configuration. Use list_jobs first to find the job ID.
Create a new Databricks job with one or more tasks. Each task needs a task_key and type (notebook_task, spark_python_task, sql_task, etc). Supports scheduling with cron expressions.
Permanently delete a Databricks job. This also cancels any active runs. Use list_jobs to find the job ID.
Trigger an immediate run of a Databricks job. Optionally pass notebook_params or python_named_params to override defaults. Use list_jobs to find the job ID.
List job runs in the Databricks workspace. Filter by job_id, active_only, or completed_only. Supports offset/limit pagination. Returns run IDs, states, and timing info.
Get detailed information about a specific job run including state, timing, and task details. Use list_job_runs to find the run ID.
Cancel an active job run. The run must be in PENDING or RUNNING state. Use list_job_runs with active_only=true to find cancellable runs.
Get the output of a completed job run including notebook results, SQL output, logs, and error traces. Use list_job_runs to find the run ID.
Execute a SQL statement on a Databricks SQL warehouse. Returns results synchronously within wait_timeout (default 10s) or a statement_id for async polling via get_sql_statement.
Get the status and results of a SQL statement execution. Use this to poll for results of async statements started with execute_sql_statement.
Cancel a running SQL statement execution. Use get_sql_statement first to verify the statement is still in PENDING or RUNNING state.
List all SQL warehouses in the Databricks workspace. Returns warehouse IDs, names, sizes, and states. Use this to find warehouse IDs for SQL execution.
Get detailed information about a specific SQL warehouse including state, size, cluster count, and active sessions. Use list_sql_warehouses to find the warehouse ID.
Create a new Databricks SQL warehouse. Requires a name and cluster_size (T-shirt sizing from 2X-Small to 4X-Large). Optionally configure autoscaling and auto-stop.
Start a stopped SQL warehouse. The warehouse must be in STOPPED state. Use list_sql_warehouses to find warehouses and their states.
Stop a running SQL warehouse. This deallocates compute resources. Use list_sql_warehouses to find warehouses and their states.
List objects in a Databricks workspace directory. Returns notebooks, directories, files, repos, and libraries at the given path. Use ’/’ for the root directory.
Get metadata about a workspace object including type, language (for notebooks), and timestamps. Use list_workspace to find valid paths.
Delete a workspace object (notebook, file, or directory). For non-empty directories, set recursive=true. Use list_workspace to find valid paths.