Skip to main content

@swamp/gcp/dataproc

v2026.04.04.1

Google Cloud dataproc infrastructure models

Labels

gcpgoogle-clouddataproccloudinfrastructure

Contents

Install

$ swamp extension pull @swamp/gcp/dataproc

Release Notes

  • Updated: workflowtemplates, clusters, clusters_nodegroups

@swamp/gcp/dataproc/autoscalingpoliciesv2026.04.03.3autoscalingpolicies.ts

Global Arguments

ArgumentTypeDescription
namestringInstance name for this resource (used as the unique identifier in the factory pattern)
basicAlgorithm?objectBasic algorithm for autoscaling.
clusterType?enumOptional. The type of the clusters for which this autoscaling policy is to be configured.
id?stringRequired. The policy id.The id must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). Cannot begin or end with underscore or hyphen. Must consist of between 3 and 50 characters.
labels?recordOptional. The labels to associate with this autoscaling policy. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with an autoscaling policy.
secondaryWorkerConfigobject
minInstances?numberOptional. Minimum number of instances for this group.Primary workers - Bounds: 2, max_instances. Default: 2. Secondary workers - Bounds: 0, max_instances. Default: 0.
weight?numberOptional. Weight for the instance group, which is used to determine the fraction of total workers in the cluster from this instance group. For example, if primary workers have weight 2, and secondary workers have weight 1, the cluster will have approximately 2 primary workers for each secondary worker.The cluster may not reach the specified balance if constrained by min/max bounds or other autoscaling settings. For example, if max_instances for secondary workers is 0, then only primary workers w
workerConfigobject
minInstances?numberOptional. Minimum number of instances for this group.Primary workers - Bounds: 2, max_instances. Default: 2. Secondary workers - Bounds: 0, max_instances. Default: 0.
weight?numberOptional. Weight for the instance group, which is used to determine the fraction of total workers in the cluster from this instance group. For example, if primary workers have weight 2, and secondary workers have weight 1, the cluster will have approximately 2 primary workers for each secondary worker.The cluster may not reach the specified balance if constrained by min/max bounds or other autoscaling settings. For example, if max_instances for secondary workers is 0, then only primary workers w
location?stringThe location for this resource (e.g., 'us', 'us-central1', 'europe-west1')
createCreate a autoscalingPolicies
getGet a autoscalingPolicies
ArgumentTypeDescription
identifierstringThe name of the autoscalingPolicies
updateUpdate autoscalingPolicies attributes
deleteDelete the autoscalingPolicies
ArgumentTypeDescription
identifierstringThe name of the autoscalingPolicies
syncSync autoscalingPolicies state from GCP

Resources

state(infinite)— Describes an autoscaling policy for Dataproc cluster autoscaler.
@swamp/gcp/dataproc/batchesv2026.04.03.3batches.ts

Global Arguments

ArgumentTypeDescription
namestringInstance name for this resource (used as the unique identifier in the factory pattern)
environmentConfig?objectEnvironment configuration for a workload.
labels?recordOptional. The labels to associate with this batch. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a batch.
pysparkBatch?objectA configuration for running an Apache PySpark (https://spark.apache.org/docs/latest/api/python/getting_started/quickstart.html) batch workload.
runtimeConfig?objectRuntime configuration for a workload.
runtimeInfo?objectRuntime information about workload execution.
sparkBatch?objectA configuration for running an Apache Spark (https://spark.apache.org/) batch workload.
sparkRBatch?objectA configuration for running an Apache SparkR (https://spark.apache.org/docs/latest/sparkr.html) batch workload.
sparkSqlBatch?objectA configuration for running Apache Spark SQL (https://spark.apache.org/sql/) queries as a batch workload.
batchId?stringOptional. The ID to use for the batch, which will become the final component of the batch's resource name.This value must be 4-63 characters. Valid characters are /[a-z][0-9]-/.
requestId?stringOptional. A unique ID used to identify the request. If the service receives two CreateBatchRequests with the same request_id, the second request is ignored and the operation that corresponds to the first Batch created and stored in the backend is returned.Recommendation: Set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The value must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.
location?stringThe location for this resource (e.g., 'us', 'us-central1', 'europe-west1')
createCreate a batches
ArgumentTypeDescription
waitForReady?booleanWait for the resource to reach a ready state after creation (default: true)
getGet a batches
ArgumentTypeDescription
identifierstringThe name of the batches
deleteDelete the batches
ArgumentTypeDescription
identifierstringThe name of the batches
syncSync batches state from GCP
analyzeanalyze
ArgumentTypeDescription
requestId?any
requestorId?any

Resources

state(infinite)— A representation of a batch workload in the service.
@swamp/gcp/dataproc/clustersv2026.04.04.1clusters.ts

Global Arguments

ArgumentTypeDescription
namestringInstance name for this resource (used as the unique identifier in the factory pattern)
clusterName?stringRequired. The cluster name, which must be unique within a project. The name must start with a lowercase letter, and can contain up to 51 lowercase letters, numbers, and hyphens. It cannot end with a hyphen. The name of a deleted cluster can be reused.
autoscalingConfig?objectAutoscaling Policy config associated with the cluster.
auxiliaryNodeGroups?arrayOptional. The node group settings.
clusterTier?enumOptional. The cluster tier.
clusterType?enumOptional. The type of the cluster.
configBucket?stringOptional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets (https://cloud.google.com/dataproc/docs/concepts/configuring-cluster
dataprocMetricConfig?objectThe cluster config.
labels?recordOptional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a cluster.
metrics?objectContains cluster daemon metrics, such as HDFS and YARN stats.Beta Feature: This report is available for testing purposes only. It may be changed before final release.
projectId?stringRequired. The Google Cloud Platform project ID that the cluster belongs to.
status?objectThe status of a cluster and its instances.
virtualClusterConfig?objectThe Dataproc cluster config for a cluster that does not directly control the underlying compute resources, such as a Dataproc-on-GKE cluster (https://cloud.google.com/dataproc/docs/guides/dpgke/dataproc-gke-overview).
regionstringRequired. The Dataproc region in which to handle the request.
actionOnFailedPrimaryWorkers?stringOptional. Failure action when primary worker creation fails.
requestId?stringOptional. A unique ID used to identify the request. If the server receives two CreateClusterRequest (https://cloud.google.com/dataproc/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.CreateClusterRequest)s with the same id, then the second request will be ignored and the first google.longrunning.Operation created and stored in the backend is returned.It is recommended to always set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The ID must
createCreate a clusters
getGet a clusters
ArgumentTypeDescription
identifierstringThe name of the clusters
updateUpdate clusters attributes
deleteDelete the clusters
ArgumentTypeDescription
identifierstringThe name of the clusters
syncSync clusters state from GCP
diagnosediagnose
ArgumentTypeDescription
diagnosisInterval?any
job?any
jobs?any
tarballAccess?any
tarballGcsDir?any
yarnApplicationId?any
yarnApplicationIds?any
inject_credentialsinject credentials
ArgumentTypeDescription
clusterUuid?any
credentialsCiphertext?any
repairrepair
ArgumentTypeDescription
cluster?any
clusterUuid?any
dataprocSuperUser?any
gracefulDecommissionTimeout?any
nodePools?any
parentOperationId?any
requestId?any
startstart
ArgumentTypeDescription
clusterUuid?any
requestId?any
stopstop
ArgumentTypeDescription
clusterUuid?any
requestId?any

Resources

state(infinite)— Describes the identifying information, config, and status of a Dataproc cluster
@swamp/gcp/dataproc/clusters-nodegroupsv2026.04.04.1clusters_nodegroups.ts

Global Arguments

ArgumentTypeDescription
labels?recordOptional. Node group labels. Label keys must consist of from 1 to 63 characters and conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values can be empty. If specified, they must consist of from 1 to 63 characters and conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). The node group must have no more than 32 labels.
name?stringThe Node group resource name (https://aip.dev/122).
nodeGroupConfig?objectThe config settings for Compute Engine resources in an instance group, such as a master or worker group.
roles?arrayRequired. Node group roles.
nodeGroupId?stringOptional. An optional node group ID. Generated if not specified.The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). Cannot begin or end with underscore or hyphen. Must consist of from 3 to 33 characters.
parentOperationId?stringOptional. operation id of the parent operation sending the create request
requestId?stringOptional. A unique ID used to identify the request. If the server receives two CreateNodeGroupRequest (https://cloud.google.com/dataproc/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.CreateNodeGroupRequest) with the same ID, the second request is ignored and the first google.longrunning.Operation created and stored in the backend is returned.Recommendation: Set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The ID must contain only lette
location?stringThe location for this resource (e.g., 'us', 'us-central1', 'europe-west1')
createCreate a nodeGroups
getGet a nodeGroups
ArgumentTypeDescription
identifierstringThe name of the nodeGroups
syncSync nodeGroups state from GCP
repairrepair
ArgumentTypeDescription
instanceNames?any
repairAction?any
requestId?any
resizeresize
ArgumentTypeDescription
gracefulDecommissionTimeout?any
parentOperationId?any
requestId?any
size?any

Resources

state(infinite)— Dataproc Node Group. The Dataproc NodeGroup resource is not related to the Da...
@swamp/gcp/dataproc/jobsv2026.04.03.3jobs.ts

Global Arguments

ArgumentTypeDescription
namestringInstance name for this resource (used as the unique identifier in the factory pattern)
done?booleanOutput only. Indicates whether the job is completed. If the value is false, the job is still in progress. If true, the job is completed, and status.state field will indicate if it was successful, failed, or cancelled.
driverControlFilesUri?stringOutput only. If present, the location of miscellaneous control files which can be used as part of job setup and handling. If not present, control files might be placed in the same location as driver_output_uri.
driverOutputResourceUri?stringOutput only. A URI pointing to the location of the stdout of the job's driver program.
driverSchedulingConfig?objectDriver scheduling configuration.
flinkJob?objectA Dataproc job for running Apache Flink applications on YARN.
hadoopJob?objectA Dataproc job for running Apache Hadoop MapReduce (https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html) jobs on Apache Hadoop YARN (https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/YARN.html).
hiveJob?objectA Dataproc job for running Apache Hive (https://hive.apache.org/) queries on YARN.
jobUuid?stringOutput only. A UUID that uniquely identifies a job within the project over time. This is in contrast to a user-settable reference.job_id that might be reused over time.
labels?recordOptional. The labels to associate with this job. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values can be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a job.
pigJob?objectA Dataproc job for running Apache Pig (https://pig.apache.org/) queries on YARN.
placement?objectDataproc job config.
prestoJob?objectA Dataproc job for running Presto (https://prestosql.io/) queries. IMPORTANT: The Dataproc Presto Optional Component (https://cloud.google.com/dataproc/docs/concepts/components/presto) must be enabled when the cluster is created to submit a Presto job to the cluster.
pysparkJob?objectA Dataproc job for running Apache PySpark (https://spark.apache.org/docs/latest/api/python/index.html#pyspark-overview) applications on YARN.
reference?objectEncapsulates the full scoping used to reference a job.
scheduling?objectJob scheduling options.
sparkJob?objectA Dataproc job for running Apache Spark (https://spark.apache.org/) applications on YARN.
sparkRJob?objectA Dataproc job for running Apache SparkR (https://spark.apache.org/docs/latest/sparkr.html) applications on YARN.
sparkSqlJob?objectA Dataproc job for running Apache Spark SQL (https://spark.apache.org/sql/) queries.
status?objectDataproc job status.
statusHistory?arrayOutput only. The previous job status.
trinoJob?objectA Dataproc job for running Trino (https://trino.io/) queries. IMPORTANT: The Dataproc Trino Optional Component (https://cloud.google.com/dataproc/docs/concepts/components/trino) must be enabled when the cluster is created to submit a Trino job to the cluster.
yarnApplications?arrayOutput only. The collection of YARN applications spun up by this job.Beta Feature: This report is available for testing purposes only. It might be changed before final release.
getGet a jobs
ArgumentTypeDescription
identifierstringThe name of the jobs
updateUpdate jobs attributes
deleteDelete the jobs
ArgumentTypeDescription
identifierstringThe name of the jobs
syncSync jobs state from GCP
cancelcancel
submitsubmit
ArgumentTypeDescription
job?any
requestId?any
submit_as_operationsubmit as operation
ArgumentTypeDescription
job?any
requestId?any

Resources

state(infinite)— A Dataproc job resource.
@swamp/gcp/dataproc/sessionsv2026.04.03.3sessions.ts

Global Arguments

ArgumentTypeDescription
environmentConfig?objectEnvironment configuration for a workload.
jupyterSession?objectJupyter configuration for an interactive session.
labels?recordOptional. The labels to associate with the session. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a session.
name?stringIdentifier. The resource name of the session.
runtimeConfig?objectRuntime configuration for a workload.
runtimeInfo?objectRuntime information about workload execution.
sessionTemplate?stringOptional. The session template used by the session.Only resource names, including project ID and location, are valid.Example: * https://www.googleapis.com/compute/v1/projects/[project_id]/locations/[dataproc_region]/sessionTemplates/[template_id] * projects/[project_id]/locations/[dataproc_region]/sessionTemplates/[template_id]The template must be in the same project and Dataproc region as the session.
sparkConnectSession?objectSpark connect configuration for an interactive session.
user?stringOptional. The email address of the user who owns the session.
requestId?stringOptional. A unique ID used to identify the request. If the service receives two CreateSessionRequests (https://cloud.google.com/dataproc/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.CreateSessionRequest)s with the same ID, the second request is ignored, and the first Session is created and stored in the backend.Recommendation: Set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The value must contain only letters (a-z, A-Z), numbers (0-9
sessionId?stringRequired. The ID to use for the session, which becomes the final component of the session's resource name.This value must be 4-63 characters. Valid characters are /a-z-/.
location?stringThe location for this resource (e.g., 'us', 'us-central1', 'europe-west1')
createCreate a sessions
ArgumentTypeDescription
waitForReady?booleanWait for the resource to reach a ready state after creation (default: true)
getGet a sessions
ArgumentTypeDescription
identifierstringThe name of the sessions
deleteDelete the sessions
ArgumentTypeDescription
identifierstringThe name of the sessions
syncSync sessions state from GCP
terminateterminate
ArgumentTypeDescription
requestId?any

Resources

state(infinite)— A representation of a session.
@swamp/gcp/dataproc/sessiontemplatesv2026.04.03.3sessiontemplates.ts

Global Arguments

ArgumentTypeDescription
description?stringOptional. Brief description of the template.
environmentConfig?objectEnvironment configuration for a workload.
jupyterSession?objectJupyter configuration for an interactive session.
labels?recordOptional. Labels to associate with sessions created using this template. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values can be empty, but, if present, must contain 1 to 63 characters and conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a session.
name?stringRequired. Identifier. The resource name of the session template.
runtimeConfig?objectRuntime configuration for a workload.
sparkConnectSession?objectSpark connect configuration for an interactive session.
location?stringThe location for this resource (e.g., 'us', 'us-central1', 'europe-west1')
createCreate a sessionTemplates
getGet a sessionTemplates
ArgumentTypeDescription
identifierstringThe name of the sessionTemplates
updateUpdate sessionTemplates attributes
deleteDelete the sessionTemplates
ArgumentTypeDescription
identifierstringThe name of the sessionTemplates
syncSync sessionTemplates state from GCP

Resources

state(infinite)— A representation of a session template.
@swamp/gcp/dataproc/workflowtemplatesv2026.04.04.1workflowtemplates.ts

Global Arguments

ArgumentTypeDescription
namestringInstance name for this resource (used as the unique identifier in the factory pattern)
dagTimeout?stringOptional. Timeout duration for the DAG of jobs, expressed in seconds (see JSON representation of duration (https://developers.google.com/protocol-buffers/docs/proto3#json)). The timeout duration must be from 10 minutes ("600s") to 24 hours ("86400s"). The timer begins when the first job is submitted. If the workflow is running at the end of the timeout period, any remaining jobs are cancelled, the workflow is ended, and if the workflow was running on a managed cluster, the cluster is deleted.
encryptionConfig?objectEncryption settings for encrypting workflow template job arguments.
id?string
jobs?arrayRequired. The Directed Acyclic Graph of Jobs to submit.
labels?recordOptional. The labels to associate with this template. These labels will be propagated to all jobs and clusters created by the workflow instance.Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt).Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt).No more than 32 labels can be associated with a template.
parameters?arrayOptional. Template parameters whose values are substituted into the template. Values for parameters must be provided when the template is instantiated.
placement?objectSpecifies workflow execution target.Either managed_cluster or cluster_selector is required.
version?numberOptional. Used to perform a consistent read-modify-write.This field should be left blank for a CreateWorkflowTemplate request. It is required for an UpdateWorkflowTemplate request, and must match the current server version. A typical update template flow would fetch the current template with a GetWorkflowTemplate request, which will return the current template with the version field filled in with the current server version. The user updates other fields in the template, then returns it as part
location?stringThe location for this resource (e.g., 'us', 'us-central1', 'europe-west1')
createCreate a workflowTemplates
getGet a workflowTemplates
ArgumentTypeDescription
identifierstringThe name of the workflowTemplates
updateUpdate workflowTemplates attributes
deleteDelete the workflowTemplates
ArgumentTypeDescription
identifierstringThe name of the workflowTemplates
syncSync workflowTemplates state from GCP
instantiateinstantiate
ArgumentTypeDescription
parameters?any
requestId?any
version?any
instantiate_inlineinstantiate inline
ArgumentTypeDescription
createTime?any
dagTimeout?any
encryptionConfig?any
id?any
jobs?any
labels?any
name?any
parameters?any
placement?any
updateTime?any
version?any

Resources

state(infinite)— A Dataproc workflow template resource.