@swamp/gcp/dataproc
v2026.04.04.1
Google Cloud dataproc infrastructure models
Labels
gcpgoogle-clouddataproccloudinfrastructure
Contents
Install
$ swamp extension pull @swamp/gcp/dataprocRelease Notes
- Updated: workflowtemplates, clusters, clusters_nodegroups
@swamp/gcp/dataproc/autoscalingpoliciesv2026.04.03.3autoscalingpolicies.ts
Global Arguments
| Argument | Type | Description |
|---|---|---|
| name | string | Instance name for this resource (used as the unique identifier in the factory pattern) |
| basicAlgorithm? | object | Basic algorithm for autoscaling. |
| clusterType? | enum | Optional. The type of the clusters for which this autoscaling policy is to be configured. |
| id? | string | Required. The policy id.The id must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). Cannot begin or end with underscore or hyphen. Must consist of between 3 and 50 characters. |
| labels? | record | Optional. The labels to associate with this autoscaling policy. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with an autoscaling policy. |
| secondaryWorkerConfig | object | |
| minInstances? | number | Optional. Minimum number of instances for this group.Primary workers - Bounds: 2, max_instances. Default: 2. Secondary workers - Bounds: 0, max_instances. Default: 0. |
| weight? | number | Optional. Weight for the instance group, which is used to determine the fraction of total workers in the cluster from this instance group. For example, if primary workers have weight 2, and secondary workers have weight 1, the cluster will have approximately 2 primary workers for each secondary worker.The cluster may not reach the specified balance if constrained by min/max bounds or other autoscaling settings. For example, if max_instances for secondary workers is 0, then only primary workers w |
| workerConfig | object | |
| minInstances? | number | Optional. Minimum number of instances for this group.Primary workers - Bounds: 2, max_instances. Default: 2. Secondary workers - Bounds: 0, max_instances. Default: 0. |
| weight? | number | Optional. Weight for the instance group, which is used to determine the fraction of total workers in the cluster from this instance group. For example, if primary workers have weight 2, and secondary workers have weight 1, the cluster will have approximately 2 primary workers for each secondary worker.The cluster may not reach the specified balance if constrained by min/max bounds or other autoscaling settings. For example, if max_instances for secondary workers is 0, then only primary workers w |
| location? | string | The location for this resource (e.g., 'us', 'us-central1', 'europe-west1') |
createCreate a autoscalingPolicies
getGet a autoscalingPolicies
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the autoscalingPolicies |
updateUpdate autoscalingPolicies attributes
deleteDelete the autoscalingPolicies
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the autoscalingPolicies |
syncSync autoscalingPolicies state from GCP
Resources
state(infinite)— Describes an autoscaling policy for Dataproc cluster autoscaler.
@swamp/gcp/dataproc/batchesv2026.04.03.3batches.ts
Global Arguments
| Argument | Type | Description |
|---|---|---|
| name | string | Instance name for this resource (used as the unique identifier in the factory pattern) |
| environmentConfig? | object | Environment configuration for a workload. |
| labels? | record | Optional. The labels to associate with this batch. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a batch. |
| pysparkBatch? | object | A configuration for running an Apache PySpark (https://spark.apache.org/docs/latest/api/python/getting_started/quickstart.html) batch workload. |
| runtimeConfig? | object | Runtime configuration for a workload. |
| runtimeInfo? | object | Runtime information about workload execution. |
| sparkBatch? | object | A configuration for running an Apache Spark (https://spark.apache.org/) batch workload. |
| sparkRBatch? | object | A configuration for running an Apache SparkR (https://spark.apache.org/docs/latest/sparkr.html) batch workload. |
| sparkSqlBatch? | object | A configuration for running Apache Spark SQL (https://spark.apache.org/sql/) queries as a batch workload. |
| batchId? | string | Optional. The ID to use for the batch, which will become the final component of the batch's resource name.This value must be 4-63 characters. Valid characters are /[a-z][0-9]-/. |
| requestId? | string | Optional. A unique ID used to identify the request. If the service receives two CreateBatchRequests with the same request_id, the second request is ignored and the operation that corresponds to the first Batch created and stored in the backend is returned.Recommendation: Set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The value must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters. |
| location? | string | The location for this resource (e.g., 'us', 'us-central1', 'europe-west1') |
createCreate a batches
| Argument | Type | Description |
|---|---|---|
| waitForReady? | boolean | Wait for the resource to reach a ready state after creation (default: true) |
getGet a batches
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the batches |
deleteDelete the batches
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the batches |
syncSync batches state from GCP
analyzeanalyze
| Argument | Type | Description |
|---|---|---|
| requestId? | any | |
| requestorId? | any |
Resources
state(infinite)— A representation of a batch workload in the service.
@swamp/gcp/dataproc/clustersv2026.04.04.1clusters.ts
Global Arguments
| Argument | Type | Description |
|---|---|---|
| name | string | Instance name for this resource (used as the unique identifier in the factory pattern) |
| clusterName? | string | Required. The cluster name, which must be unique within a project. The name must start with a lowercase letter, and can contain up to 51 lowercase letters, numbers, and hyphens. It cannot end with a hyphen. The name of a deleted cluster can be reused. |
| autoscalingConfig? | object | Autoscaling Policy config associated with the cluster. |
| auxiliaryNodeGroups? | array | Optional. The node group settings. |
| clusterTier? | enum | Optional. The cluster tier. |
| clusterType? | enum | Optional. The type of the cluster. |
| configBucket? | string | Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets (https://cloud.google.com/dataproc/docs/concepts/configuring-cluster |
| dataprocMetricConfig? | object | The cluster config. |
| labels? | record | Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a cluster. |
| metrics? | object | Contains cluster daemon metrics, such as HDFS and YARN stats.Beta Feature: This report is available for testing purposes only. It may be changed before final release. |
| projectId? | string | Required. The Google Cloud Platform project ID that the cluster belongs to. |
| status? | object | The status of a cluster and its instances. |
| virtualClusterConfig? | object | The Dataproc cluster config for a cluster that does not directly control the underlying compute resources, such as a Dataproc-on-GKE cluster (https://cloud.google.com/dataproc/docs/guides/dpgke/dataproc-gke-overview). |
| region | string | Required. The Dataproc region in which to handle the request. |
| actionOnFailedPrimaryWorkers? | string | Optional. Failure action when primary worker creation fails. |
| requestId? | string | Optional. A unique ID used to identify the request. If the server receives two CreateClusterRequest (https://cloud.google.com/dataproc/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.CreateClusterRequest)s with the same id, then the second request will be ignored and the first google.longrunning.Operation created and stored in the backend is returned.It is recommended to always set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The ID must |
createCreate a clusters
getGet a clusters
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the clusters |
updateUpdate clusters attributes
deleteDelete the clusters
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the clusters |
syncSync clusters state from GCP
diagnosediagnose
| Argument | Type | Description |
|---|---|---|
| diagnosisInterval? | any | |
| job? | any | |
| jobs? | any | |
| tarballAccess? | any | |
| tarballGcsDir? | any | |
| yarnApplicationId? | any | |
| yarnApplicationIds? | any |
inject_credentialsinject credentials
| Argument | Type | Description |
|---|---|---|
| clusterUuid? | any | |
| credentialsCiphertext? | any |
repairrepair
| Argument | Type | Description |
|---|---|---|
| cluster? | any | |
| clusterUuid? | any | |
| dataprocSuperUser? | any | |
| gracefulDecommissionTimeout? | any | |
| nodePools? | any | |
| parentOperationId? | any | |
| requestId? | any |
startstart
| Argument | Type | Description |
|---|---|---|
| clusterUuid? | any | |
| requestId? | any |
stopstop
| Argument | Type | Description |
|---|---|---|
| clusterUuid? | any | |
| requestId? | any |
Resources
state(infinite)— Describes the identifying information, config, and status of a Dataproc cluster
@swamp/gcp/dataproc/clusters-nodegroupsv2026.04.04.1clusters_nodegroups.ts
Global Arguments
| Argument | Type | Description |
|---|---|---|
| labels? | record | Optional. Node group labels. Label keys must consist of from 1 to 63 characters and conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values can be empty. If specified, they must consist of from 1 to 63 characters and conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). The node group must have no more than 32 labels. |
| name? | string | The Node group resource name (https://aip.dev/122). |
| nodeGroupConfig? | object | The config settings for Compute Engine resources in an instance group, such as a master or worker group. |
| roles? | array | Required. Node group roles. |
| nodeGroupId? | string | Optional. An optional node group ID. Generated if not specified.The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). Cannot begin or end with underscore or hyphen. Must consist of from 3 to 33 characters. |
| parentOperationId? | string | Optional. operation id of the parent operation sending the create request |
| requestId? | string | Optional. A unique ID used to identify the request. If the server receives two CreateNodeGroupRequest (https://cloud.google.com/dataproc/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.CreateNodeGroupRequest) with the same ID, the second request is ignored and the first google.longrunning.Operation created and stored in the backend is returned.Recommendation: Set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The ID must contain only lette |
| location? | string | The location for this resource (e.g., 'us', 'us-central1', 'europe-west1') |
createCreate a nodeGroups
getGet a nodeGroups
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the nodeGroups |
syncSync nodeGroups state from GCP
repairrepair
| Argument | Type | Description |
|---|---|---|
| instanceNames? | any | |
| repairAction? | any | |
| requestId? | any |
resizeresize
| Argument | Type | Description |
|---|---|---|
| gracefulDecommissionTimeout? | any | |
| parentOperationId? | any | |
| requestId? | any | |
| size? | any |
Resources
state(infinite)— Dataproc Node Group. The Dataproc NodeGroup resource is not related to the Da...
@swamp/gcp/dataproc/jobsv2026.04.03.3jobs.ts
Global Arguments
| Argument | Type | Description |
|---|---|---|
| name | string | Instance name for this resource (used as the unique identifier in the factory pattern) |
| done? | boolean | Output only. Indicates whether the job is completed. If the value is false, the job is still in progress. If true, the job is completed, and status.state field will indicate if it was successful, failed, or cancelled. |
| driverControlFilesUri? | string | Output only. If present, the location of miscellaneous control files which can be used as part of job setup and handling. If not present, control files might be placed in the same location as driver_output_uri. |
| driverOutputResourceUri? | string | Output only. A URI pointing to the location of the stdout of the job's driver program. |
| driverSchedulingConfig? | object | Driver scheduling configuration. |
| flinkJob? | object | A Dataproc job for running Apache Flink applications on YARN. |
| hadoopJob? | object | A Dataproc job for running Apache Hadoop MapReduce (https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html) jobs on Apache Hadoop YARN (https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/YARN.html). |
| hiveJob? | object | A Dataproc job for running Apache Hive (https://hive.apache.org/) queries on YARN. |
| jobUuid? | string | Output only. A UUID that uniquely identifies a job within the project over time. This is in contrast to a user-settable reference.job_id that might be reused over time. |
| labels? | record | Optional. The labels to associate with this job. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values can be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a job. |
| pigJob? | object | A Dataproc job for running Apache Pig (https://pig.apache.org/) queries on YARN. |
| placement? | object | Dataproc job config. |
| prestoJob? | object | A Dataproc job for running Presto (https://prestosql.io/) queries. IMPORTANT: The Dataproc Presto Optional Component (https://cloud.google.com/dataproc/docs/concepts/components/presto) must be enabled when the cluster is created to submit a Presto job to the cluster. |
| pysparkJob? | object | A Dataproc job for running Apache PySpark (https://spark.apache.org/docs/latest/api/python/index.html#pyspark-overview) applications on YARN. |
| reference? | object | Encapsulates the full scoping used to reference a job. |
| scheduling? | object | Job scheduling options. |
| sparkJob? | object | A Dataproc job for running Apache Spark (https://spark.apache.org/) applications on YARN. |
| sparkRJob? | object | A Dataproc job for running Apache SparkR (https://spark.apache.org/docs/latest/sparkr.html) applications on YARN. |
| sparkSqlJob? | object | A Dataproc job for running Apache Spark SQL (https://spark.apache.org/sql/) queries. |
| status? | object | Dataproc job status. |
| statusHistory? | array | Output only. The previous job status. |
| trinoJob? | object | A Dataproc job for running Trino (https://trino.io/) queries. IMPORTANT: The Dataproc Trino Optional Component (https://cloud.google.com/dataproc/docs/concepts/components/trino) must be enabled when the cluster is created to submit a Trino job to the cluster. |
| yarnApplications? | array | Output only. The collection of YARN applications spun up by this job.Beta Feature: This report is available for testing purposes only. It might be changed before final release. |
getGet a jobs
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the jobs |
updateUpdate jobs attributes
deleteDelete the jobs
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the jobs |
syncSync jobs state from GCP
cancelcancel
submitsubmit
| Argument | Type | Description |
|---|---|---|
| job? | any | |
| requestId? | any |
submit_as_operationsubmit as operation
| Argument | Type | Description |
|---|---|---|
| job? | any | |
| requestId? | any |
Resources
state(infinite)— A Dataproc job resource.
@swamp/gcp/dataproc/sessionsv2026.04.03.3sessions.ts
Global Arguments
| Argument | Type | Description |
|---|---|---|
| environmentConfig? | object | Environment configuration for a workload. |
| jupyterSession? | object | Jupyter configuration for an interactive session. |
| labels? | record | Optional. The labels to associate with the session. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a session. |
| name? | string | Identifier. The resource name of the session. |
| runtimeConfig? | object | Runtime configuration for a workload. |
| runtimeInfo? | object | Runtime information about workload execution. |
| sessionTemplate? | string | Optional. The session template used by the session.Only resource names, including project ID and location, are valid.Example: * https://www.googleapis.com/compute/v1/projects/[project_id]/locations/[dataproc_region]/sessionTemplates/[template_id] * projects/[project_id]/locations/[dataproc_region]/sessionTemplates/[template_id]The template must be in the same project and Dataproc region as the session. |
| sparkConnectSession? | object | Spark connect configuration for an interactive session. |
| user? | string | Optional. The email address of the user who owns the session. |
| requestId? | string | Optional. A unique ID used to identify the request. If the service receives two CreateSessionRequests (https://cloud.google.com/dataproc/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.CreateSessionRequest)s with the same ID, the second request is ignored, and the first Session is created and stored in the backend.Recommendation: Set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The value must contain only letters (a-z, A-Z), numbers (0-9 |
| sessionId? | string | Required. The ID to use for the session, which becomes the final component of the session's resource name.This value must be 4-63 characters. Valid characters are /a-z-/. |
| location? | string | The location for this resource (e.g., 'us', 'us-central1', 'europe-west1') |
createCreate a sessions
| Argument | Type | Description |
|---|---|---|
| waitForReady? | boolean | Wait for the resource to reach a ready state after creation (default: true) |
getGet a sessions
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the sessions |
deleteDelete the sessions
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the sessions |
syncSync sessions state from GCP
terminateterminate
| Argument | Type | Description |
|---|---|---|
| requestId? | any |
Resources
state(infinite)— A representation of a session.
@swamp/gcp/dataproc/sessiontemplatesv2026.04.03.3sessiontemplates.ts
Global Arguments
| Argument | Type | Description |
|---|---|---|
| description? | string | Optional. Brief description of the template. |
| environmentConfig? | object | Environment configuration for a workload. |
| jupyterSession? | object | Jupyter configuration for an interactive session. |
| labels? | record | Optional. Labels to associate with sessions created using this template. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values can be empty, but, if present, must contain 1 to 63 characters and conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a session. |
| name? | string | Required. Identifier. The resource name of the session template. |
| runtimeConfig? | object | Runtime configuration for a workload. |
| sparkConnectSession? | object | Spark connect configuration for an interactive session. |
| location? | string | The location for this resource (e.g., 'us', 'us-central1', 'europe-west1') |
createCreate a sessionTemplates
getGet a sessionTemplates
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the sessionTemplates |
updateUpdate sessionTemplates attributes
deleteDelete the sessionTemplates
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the sessionTemplates |
syncSync sessionTemplates state from GCP
Resources
state(infinite)— A representation of a session template.
@swamp/gcp/dataproc/workflowtemplatesv2026.04.04.1workflowtemplates.ts
Global Arguments
| Argument | Type | Description |
|---|---|---|
| name | string | Instance name for this resource (used as the unique identifier in the factory pattern) |
| dagTimeout? | string | Optional. Timeout duration for the DAG of jobs, expressed in seconds (see JSON representation of duration (https://developers.google.com/protocol-buffers/docs/proto3#json)). The timeout duration must be from 10 minutes ("600s") to 24 hours ("86400s"). The timer begins when the first job is submitted. If the workflow is running at the end of the timeout period, any remaining jobs are cancelled, the workflow is ended, and if the workflow was running on a managed cluster, the cluster is deleted. |
| encryptionConfig? | object | Encryption settings for encrypting workflow template job arguments. |
| id? | string | |
| jobs? | array | Required. The Directed Acyclic Graph of Jobs to submit. |
| labels? | record | Optional. The labels to associate with this template. These labels will be propagated to all jobs and clusters created by the workflow instance.Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt).Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt).No more than 32 labels can be associated with a template. |
| parameters? | array | Optional. Template parameters whose values are substituted into the template. Values for parameters must be provided when the template is instantiated. |
| placement? | object | Specifies workflow execution target.Either managed_cluster or cluster_selector is required. |
| version? | number | Optional. Used to perform a consistent read-modify-write.This field should be left blank for a CreateWorkflowTemplate request. It is required for an UpdateWorkflowTemplate request, and must match the current server version. A typical update template flow would fetch the current template with a GetWorkflowTemplate request, which will return the current template with the version field filled in with the current server version. The user updates other fields in the template, then returns it as part |
| location? | string | The location for this resource (e.g., 'us', 'us-central1', 'europe-west1') |
createCreate a workflowTemplates
getGet a workflowTemplates
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the workflowTemplates |
updateUpdate workflowTemplates attributes
deleteDelete the workflowTemplates
| Argument | Type | Description |
|---|---|---|
| identifier | string | The name of the workflowTemplates |
syncSync workflowTemplates state from GCP
instantiateinstantiate
| Argument | Type | Description |
|---|---|---|
| parameters? | any | |
| requestId? | any | |
| version? | any |
instantiate_inlineinstantiate inline
| Argument | Type | Description |
|---|---|---|
| createTime? | any | |
| dagTimeout? | any | |
| encryptionConfig? | any | |
| id? | any | |
| jobs? | any | |
| labels? | any | |
| name? | any | |
| parameters? | any | |
| placement? | any | |
| updateTime? | any | |
| version? | any |
Resources
state(infinite)— A Dataproc workflow template resource.
2026.04.03.3238.6 KBApr 3, 2026
Google Cloud dataproc infrastructure models
Release Notes
- Updated: autoscalingpolicies, batches, sessiontemplates, sessions, workflowtemplates, clusters, clusters_nodegroups, jobs
linux-x86_64linux-aarch64darwin-x86_64darwin-aarch64
gcpgoogle-clouddataproccloudinfrastructure
2026.04.03.1236.8 KBApr 3, 2026
Google Cloud dataproc infrastructure models
Release Notes
- Updated: autoscalingpolicies, batches, sessiontemplates, sessions, workflowtemplates, clusters, clusters_nodegroups, jobs
linux-x86_64linux-aarch64darwin-x86_64darwin-aarch64
gcpgoogle-clouddataproccloudinfrastructure
2026.04.02.2236.2 KBApr 2, 2026
Google Cloud dataproc infrastructure models
Release Notes
- Updated: workflowtemplates, clusters
linux-x86_64linux-aarch64darwin-x86_64darwin-aarch64
gcpgoogle-clouddataproccloudinfrastructure
2026.03.27.1240.1 KBMar 27, 2026
Google Cloud dataproc infrastructure models
Release Notes
- Added: autoscalingpolicies, batches, sessiontemplates, sessions, workflowtemplates, clusters, clusters_nodegroups, jobs
linux-x86_64linux-aarch64darwin-x86_64darwin-aarch64
gcpgoogle-clouddataproccloudinfrastructure