Service Catalogue

Overview

To ease the process of GCP resources setup for users, we have service catalogues available on our AI Innovation Platform where users can easily add GCP services to their projects/workspaces from the platform.

To share an example, let's assume users want to provision a GPU instance hosting Jupyterlab service which can be shared among multiple users and have separate working spaces to maintain development environments. To achieve this, a normal user has to:

Host the Jupyter service
Attach it behind a load balancer and
Secure it using an authentication method on the GCP console

This can be done hassle free using our service catalogue functionality on our platform using just a few clicks which will not only take care of the authentication and Jupyterlab service setup but also take care of other security & cloud best practices like adding tags, labels, no external IPs, etc.

The user has the flexibility to update/change the settings of the instances from the GCP console later on, as they now have access to the GCP console as well.

Service catalogue is an add-on functionality and not a compulsion for our platform. It is a recommended feature for hosting the basic setup required with respect to security and cloud best practices and thereby modified according to user needs. Users can still however add services from GCP console directly onto their projects, if they prefer.

Service catalogue has two main flows in terms of whether the services are to be added to a new project or to be added to an existing project. Below are the details of both flows:

New Project: The user needs to add the details for the project like project type, unit name, group name, workstream name etc. Once added, the user can add services (this is optional). Once all details are added, the user can submit the request which will go for approval to admins. Based on the action taken by the admin, the user will be notified about their service addition request. (For more details, check the Request New Project section of this documentation)

Note - A GCS bucket is added by default for marketplace operations. Please do not modify this bucket.
Existing Project: For the existing projects, user need to add the services from the project details page and submit it (Refer to the below section for more details on how it can be done). The services will be added directly and users will receive the notification on the status of service addition request.

Provision services using service catalogue

The provision to spin up new resources is provided on the AI Innovation Platform itself, let us talk about how this can be done.

Below are the steps to be followed to add services:

Go to "My Projects" and click on the project for which the services need to be added.
In the details page, by clicking on "Add" under Services more services list be available.
Select the services to be added and click on "Add" to add the service configurations.
Add the details of the services and click on "Submit". The user will be notified when the services are added.
A link of the service will be available on the service name itself as a hyperlink. The users can click on this and can navigate to the GCP console to see these service.

Note - Please note that the services can be added at any state of the project (requested, provisioning, etc) but they will be spun up only when the project is in active state. Also, services addition will not require any approvals from admins.

Available services and their configurations

This section explains the available services and parameters for each of the services part of the service catalogue. It will also help you to understand how you can access these services from the GCP console.

App Engine

A fully managed platform for building and deploying scalable web applications. Google App Engine is a fully managed, serverless platform designed to build and deploy scalable web applications. It abstracts the infrastructure management, allowing developers to focus purely on application development. App Engine automatically handles tasks such as provisioning servers, managing scaling, load balancing and applying updates, making it a powerful solution for building applications that can scale seamlessly based on demand.

Auto-scaling
Managed infrastructure
Multiple languages
Integrated services
No server management
Pay-per-use
Version control

App Engine Parameters

The following are key parameters that must be configured when creating an App Engine instance:

Parameter	Description	Mandatory
App Engine Name	Name of the App Engine service	Yes
Region	Region where the App Engine is deployed	Yes
CPU	CPU configuration for the App Engine	Yes
Memory	Memory allocated for the App Engine	Yes
Version ID	Version identifier for the App Engine	Yes
Container Image URL	URL of the container image used for deployment	Yes
Port	Port number used by the App Engine	Yes
Disk Size	Size of the disk allocated to the App Engine (in GB)	Yes
Service Account	Service account associated with the App Engine	No

Default Configurations

Runtime Configuration:
- Runtime: The service will be deployed on the App Engine with a customizable runtime environment. By default, the runtime service name for the App Engine application is set to "custom".
Scaling Settings:
- Cooldown Period: A cooldown period of 120 seconds will be applied between scaling operations. This means the system will wait for 120 seconds before attempting to scale up or down, ensuring stable performance and avoiding rapid changes.
- Target CPU Utilization: The target CPU utilization for automatic scaling is set to 0.5 by default, meaning scaling actions will trigger when CPU usage exceeds or falls below 50%. This helps maintain optimal performance and cost efficiency.
Traffic Splitting:
- Traffic Management: Using the google_app_engine_service_split_traffic resource, we will manage and split traffic across different versions of your App Engine service. This allows you to direct a percentage of traffic to different versions without downtime or disruptions. By default 100% traffic is routed to latest version of the application.

Note: Cloud Run is the latest evolution of Google Cloud Serverless, building on the experience of running App Engine for more than a decade. Cloud Run runs on much of the same infrastructure as App Engine standard environment, so there are many similarities between these two platforms. Cloud Run is designed to improve upon the App Engine experience, incorporating many of the best features of both App Engine standard environment and App Engine flexible environment. Hence Cloud Run would be our recommendation instead of using App Engine.

References:

Appengine Overview ⧉

Artifact Registry

Repository for managing and storing container images. Container images greatly increase reproducibility and security in AI development. They are essential for many cloud-based services. Google Cloud's Artifact Registry is a fully managed service that allows you to store, manage and secure your build artifacts, such as container images and dependencies like libraries, packages and binaries. It supports a variety of formats like Docker, Maven and npm and is designed to scale efficiently while ensuring security and reliability. It is a key part of the DevOps lifecycle, helping teams organize and manage artifacts generated during the build process.

Centralized storage
Multi-format support
Secure access
Version tracking
Seamless integration
Dependency management
Scalable repository

Artifact Registry Parameters

Parameter	Description	Mandatory
Artifact Registry Name	Name of the Artifact Registry	Yes
Format	Format of the artifacts (e.g., Docker)	Yes
Location	Region where the Artifact Registry is located	Yes
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No
Cleanup Policy	Policy for cleaning up old or unused artifacts	No

Default Configurations

Enable APIs:
- The Artifact Registry API (artifactregistry.googleapis.com) is enabled, allowing the service to manage and store container images, libraries, and artifacts effectively. This API is essential for the Artifact Registry to operate within your Google Cloud environment.
Cleanup Policy:
- Cleanup Policy Dry Run: By default, the cleanup policy is set to false, meaning the cleanup pipeline will actively remove old or unused versions of artifacts in the repository to manage storage efficiently. However, if set to true, the cleanup will run in "dry run" mode, preventing actual deletion of versions and allowing you to review what would be cleaned up before making any changes.

References:

Artifact registry Overview ⧉

Push and Pull Images ⧉

Grant Repo specific Permissions ⧉

Cloud Build

A service for automating the building, testing and deployment of code. Cloud Build is a service that executes our builds on Google Cloud infrastructure. Cloud Build can import source code from Cloud Storage, GitLab, GitHub, or Bitbucket, execute a build to our specifications and produce artifacts such as Docker containers or Java archives. Cloud Build executes our build as a series of build steps, where each build step is run in a Docker container. A build step can do anything that can be done from a container irrespective of the environment. To perform your tasks, we can either use the supported build steps provided by Cloud Build or write our own build steps.

Continuous integration
Custom workflows
Multi-language support
Fast builds
Secure pipelines
Scalable resources
Artifact management

Cloud Build Parameters

Parameter	Description	Mandatory
Resource Name	Name of the Cloud Build resource	Yes
Region	Region where the Cloud Build resource is located	Yes
Deploy To	Target location or service for deployment	Yes
Container Image URL	URL of the container image used in the build process	Yes

Default Configurations

Pub/Sub Trigger: We configure a Pub/Sub topic that triggers the Cloud Build process. By default, this topic is set to "gcr", meaning that Cloud Build will be triggered by messages published to this topic.
Substitution Variables: Substitution variables allow you to pass dynamic values during the build process. For example:
- _ACTION: Represents the action specified in the incoming Pub/Sub message (e.g., a build, tag, or deploy action).
- _IMAGE_TAG: Captures the image tag provided in the Pub/Sub message, ensuring the correct image version is used during the build process. User can update the build pipeline according to their container image if required.
Filter:

By default, the filter is set to match on the image tag (_IMAGE_TAG.matches("")), but you can modify it to include more specific conditions based on your needs. The default settings ensures an automated deployment of new version of images as an when pushed to artifact registry.
Build Step Image:

The Gcloud image used during the build process is defined as "gcr.io/cloud-builders/gcloud". This image contains the necessary tools to execute Google Cloud commands, including managing Cloud Run services and jobs.
Logging Options:

Logging options determine where Cloud Build logs are stored. The default setting is "CLOUD_LOGGING_ONLY", ensuring all logs are sent to Google Cloud Logging for easy monitoring and troubleshooting.
Build Timeout:

The timeout for the Cloud Build process is set to 540 seconds. If the build exceeds this time, it will be automatically terminated. This can be adjusted if longer build times are required for your specific workloads.
Labels:

Labels help organize and track your build resources. These labels can be applied to the virtual machines (VMs) that are created as part of the Cloud Build process, making it easier to manage and classify resources. You can specify custom labels such as project ownership, data classification, and more.
Custom Roles for Cloud Build:

A custom role called "Cloud Run Deployer" is configured with a set of permissions required to manage Cloud Run services and jobs. This role includes capabilities like:
- Updating jobs, listing locations, managing revisions, accessing services, and creating logs.
- Acting as a service account, allowing Cloud Build to securely interact with other Google Cloud services.

References:

Cloud Build Overview ⧉

Cloud Composer

Workflow orchestration service built on Apache Airflow for scheduling and managing chains of jobs/functionalities. Cloud Composer is a fully managed workflow orchestration service built on Apache Airflow. It allows you to create, schedule, monitor and manage workflows, simplifying the automation of tasks across cloud services and on-premises systems. Composer integrates seamlessly with other Google Cloud services and provides automation for environment setup, letting you focus on your workflows rather than infrastructure management.

Workflow automation
Apache Airflow
Scalable scheduling
Integrated monitoring
Dependency management
Custom workflows
Easy scaling

Cloud Composer Parameters

Parameter	Description	Mandatory
Cloud Composer Cluster Name	Name of the Cloud Composer cluster	Yes
Region	Region where the Cloud Composer cluster is located	Yes
Resilience Mode	Mode of resilience for the Cloud Composer	Yes
IP Range Pods	IP range for the pods in the environment	No
IP Range Services	IP range for the services in the environment	No
Environment Size	Size of the environment (e.g., small, medium, large)	Yes
Service Account	Service account associated with the Cloud Composer	No
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No

Default Configurations

IAM roles for Cloud Composer:
- Composer Worker Role: This assigns the roles/composer.worker role to a service account, allowing it to perform tasks required for the Cloud Composer environment.
Workload configurations: The environment's workload configurations vary based on the size of the environment (small, medium, large). These configurations dictate the amount of CPU, memory, and storage allocated for different components like:
- Scheduler: Manages the orchestration of tasks.
- Web Server: Provides the UI for managing workflows.
- Worker: Executes the actual tasks.
- Triggerer: Handles external triggers for workflows. These configurations ensure that your environment is optimized based on your operational needs.
Private environment:
- Private service connect: This ensures the environment uses a private connection type for enhanced security by enabling private endpoints and connecting through a private subnetwork.
Dynamic workloads: The Composer environment dynamically adjusts its resources based on workload size using the configurations provided. This makes it flexible to scale with your operational requirements.
Node configuration: The environment nodes are configured with custom networking, including private IP allocation, subnetwork configuration, and optional encryption settings using a KMS key.
Resilience mode: The Composer environment can be set to different resilience levels to ensure high availability and fault tolerance, enhancing the reliability of your workflows.

References:

Cloud Composer Overview ⧉

Cloud Run Job

Service that runs repetitive compute tasks in a serverless environment. It can be reached from any machine in the project and is great for centralising certain functionalities (i.e. data conversion) within the development project. Cloud Run is a managed compute platform that enables you to run containers that are invocable via requests or events. Cloud Run is serverless: it abstracts away all infrastructure management, so we can focus on what matters most — building great applications. A service that allows you to run a container to completion without a server. This service only run its own tasks and exits when it is finished.

Containerized tasks
On-demand execution
Scalable resources
No server management
Automated scaling
Event-driven
Pay-per-use

Cloud Run Parameters

Parameter	Description	Mandatory
Job Name	Name of the Cloud Run Job	Yes
Region	Region where the job will be executed	Yes
No of Tasks	Number of tasks to run in the job	Yes
CPU	CPU allocation for each task	Yes
Memory	Memory allocation for each task	Yes
Service Account	Service account associated with the job	No
Container Image URL	URL of the container image used for the job	Yes
Parallelism	Number of tasks to run in parallel	Yes
Task Timeout	Timeout for each task	Yes
Time Unit	Unit of time for the task timeout (e.g., seconds, minutes)	Yes

Default Configurations

Variables:
- max_retries: Defines the maximum number of retries allowed for the job in case of failures. The default value is set to 3.
- command: Specifies the command to be run inside the container. The default value is null, meaning no specific command is set by default, and the container will use its own startup process.

References:

Cloud Run Jobs Overview ⧉

Cloud Run Service

A service to run code that handles incoming requests continuously (eg. a web service). Different to Cloud Run Job, that executes each job based on a trigger.

Fully managed
Auto-scaling
Containerized apps
No server management
Event-driven
Pay-per-use
Fast deployment

Cloud Run Services Parameters

Parameter	Description	Mandatory
Cloud Run Name	Name of the Cloud Run service	Yes
Region	Region where the Cloud Run service is located	Yes
CPU	CPU allocation for the service	Yes
Memory	Memory allocation for the service	Yes
Container Image URL	URL of the container image used for the service	Yes
Port	Port number used by the Cloud Run service	Yes
Service Account	Service account associated with the service	No

Default Configurations

Module: Serverless Load Balancer

The serverless_loadbalancer module is configured to enable the seamless integration of a load balancer with the Cloud Run service. This ensures that incoming traffic is efficiently distributed across multiple instances, improving the availability and reliability of the service.
Security Policy :

The security_policy variable is used to apply a Cloud Armor Security Policy to the Cloud Run service. Cloud Armor helps protect applications from security threats like DDoS attacks. The default value is set to an empty string, meaning no specific policy is enforced unless provided.

References:

Cloud Run Service Overview ⧉
Host Cloud Run Service behind Load Balancer

Cloud SQL

A relational database service for MySQL, PostgreSQL and SQL Server. Cloud SQL is a fully-managed database service that helps us set up, maintain, manage and administer our relational databases on Google Cloud Platform.

Automated backups
High availability
Secure connections
Scalable instances
Easy maintenance
Integrated monitoring

Cloud SQL Parameters

Parameter	Description	Mandatory
Cloud SQL Name	Name of the Cloud SQL instance	Yes
Region	Region where the Cloud SQL instance is located	Yes
Database Version	Version of the database	Yes
Edition	Edition of the database (e.g., Enterprise)	Yes
Database Name	Name of the database	Yes
Username	Username for accessing the database	Yes
Tier	Tier of the Cloud SQL instance (e.g., db-f1-micro)	Yes
Backup Location	Location for database backups	Yes
Disk Size	Size of the disk (in GB)	Yes
Disk Type	Type of disk (e.g., SSD, HDD)	Yes

Default Configurations

Database Flags: We’ve introduced database-specific flags to enhance the configuration of various database engines:
- PostgreSQL:
  - log_duration: Enables logging of the duration of each statement.
  - pgaudit.log: Logs all activities for audit purposes.
  - log_hostname: Logs the hostname of the client connecting to the instance.
  - log_checkpoints: Enables checkpoint logging.
- MySQL:
  - general_log: Enables logging of general queries for troubleshooting.
  - skip_show_database: Restricts the SHOW DATABASE command to authorized users.
  - wait_timeout: Sets the wait timeout to manage inactive connections.
- SQL Server:
  - 1204: Enables deadlock logging.
  - remote_access: Allows remote connections to the SQL server.
  - remote_query_timeout: Configures the timeout for remote queries to 300 seconds.
Private Network Configuration:
- private_network_url : This variable specifies the private network URL for securely connecting to the Cloud SQL instance within your network.
Maintenance Window: To minimize downtime, the maintenance window is configured as follows:
- Day: Sunday
- Hour: Midnight (UTC)
- Update Track: Set to "stable" to receive stable updates.
SSL Certificate Management: We’ve automated the management of SSL certificates for secure communication:
- Client Certificate: SSL certificates are created and managed for each SQL instance using Google Secret Manager to ensure encrypted communication.
- Password Management: Random, complex passwords are generated and securely stored using Secret Manager.
- Secret Storage: All sensitive information, including certificates and credentials, is stored in Secret Manager for added security.

References:

Cloud SQL Overview ⧉
Connect to Cloud SQL from GKE
Connect to SQL from Cloud Run

Cloud Source

Service for hosting private code repositories. Cloud Source Repositories are fully featured, private Git repositories hosted on Google Cloud.

Version control
Secure storage
Code collaboration
Integrated CI/CD
Branch management
Access control
Scalable hosting

Cloud Source Parameters

Parameter	Description	Mandatory
Repository Name	Name of the Cloud Source repository	Yes

Default Configurations:

No default configurations have been implemented.

References:

Cloud Source Repository CLI ⧉

Cloud Storage

A service that allows data to be stored and accessed remotely over the internet. Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery.

Infinite storage volume
Remote access
Data backup
Secure encryption
Cost-efficient
Sync across devices
Collaboration support

Cloud Storage Parameters

Parameter	Description	Mandatory
Bucket Name	Name of the Cloud Storage bucket	Yes
Region	Region where the storage bucket will be located	Yes
Storage Class	Type of storage class (e.g., STANDARD, NEARLINE, COLDLINE, ARCHIVE)	Yes
GCS Lifecycle	Lifecycle management policies for the bucket	Yes
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No

Bucket Name

Bucket names are globally unique and once created cannot be updated. We follow ${USER_DEFINED_SERVICE_NAME}-${RANDOM_ID}naming convention for all the buckets created from Service Catalogues. For more details please refer official documentation ⧉.

Region

Currently service catalogues support provisioning of regional buckets which are optimal w.r.t regional workloads. In case the use case demands to have multi-regional buckets to be created, the user will have access to the Google Cloud Storage console for creating the buckets.

To understand what suits better for your usecase please refer location considerations ⧉ and location recommendation ⧉

Storage Class

You can select one of the below storage class while provisioning the buckets:

Storage Class	Monthly Availability	Minimum Storage Duration	Retrieval Fees	Best For?
STANDARD	99.99% in regions	None	None	Frequently accessed ("hot" data)
NEARLINE	99.9% in regions	30 days	Yes	Highly durable storage service for storing infrequently accessed data
COLDLINE	99.9% in regions	90 days	Yes	Highly durable storage service for infrequently accessed data with slightly lower availability
ARCHIVE	99.9% in regions	365 days	Yes	Highly durable storage service for data archiving, online backup, and disaster recovery

To learn more about storage classes visit official documentation ⧉

Default Lifecycle policy

The Lifecycle Policy can be applied during creation of the buckets. By default the following policy is applied to the buckets created from Service Catalogues.

Action	Object condition	__
Set to Nearline	30+ days since object was created Storage Class matches Standard
Set to Coldline	90+ days since object was created Storage Class matches Nearline
Set to Archive	365+ days since object was created Storage Class matches Coldline
Delete object	730+ days since object was created Storage Class matches Archive

Note: The above is added as standard policy to ensure data lifecycle optimization. Users have access to update the policies according to their requirement. Refer the official documentation ⧉ for learning more about lifecycle policies.

Customer-managed encryption key (CMEK)

Cloud Storage always encrypts your data on the server side, before it is written to disk, at no additional charge. By default Google Managed Encryption is used. To increase data security we recommend to use Customer managed encryption keys for data sensitive workloads as server side encryption.

Note: Client-side encryption: encryption that occurs before data is sent to Cloud Storage. Such data arrives at Cloud Storage already encrypted but also undergoes server-side encryption.

Default Configurations

Bucket Settings:

Versioning Enabled: This feature will be enabled to keep a history of all object versions in the bucket. It allows you to recover objects that are overwritten or deleted, adding an extra layer of data protection.
Uniform Bucket-Level Access: We will enforce uniform bucket-level access (uniform_bucket_level_access = true), meaning all access control will be managed at the bucket level rather than at the individual object level. This simplifies access management and improves security.

References:

Cloud Storage Overview ⧉
Discover Object Storage Gcloud Commands ⧉

Compute Engine

Creating new computing machines (called virtual machines or VMs) in the cloud to preprocess data, develop code, host smaller apps with a web frontend (e.g. tensor board, custom image viewer etc) and so on.
Google Compute Engine allows you to create and run virtual machines (VMs) on Google’s powerful infrastructure. Think of a virtual machine as a computer that runs entirely in the cloud—there’s no need to buy or maintain any physical hardware. You can start with just one virtual machine or scale up to thousands, depending on your needs. The best part? You only pay for what you use, with no upfront costs.

Accessible via SSH
Scalable resources
Customizable VMs
Global availability
Secure infrastructure
Automated backups

Compute Engine Parameters

Parameter	Description	Mandatory
Instance Name	Name of the virtual machine	Yes
Region	Region where the VM will be deployed	Yes
Zone	Zone within the selected region	Yes
GPU Type	Type of GPU attached to the VM	Yes
GPU Count	Number of GPUs attached to the VM	Yes
Machine Type	Type of machine instance	Yes
Boot Disk Image	Image used for the boot disk	Yes
Boot Disk Size	Size of the boot disk (in GB)	Yes
Boot Disk Type	Type of boot disk (e.g., SSD, HDD)	Yes
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No
Service Account	Service account associated with the VM	No

GCP:

Users can access the instances by SSH'ing to the instances either via Browser SSH by clicking the SSH button available on the compute vm console or gcloud SDK.
Users can use IAP desktop or IAP tunnel to access their VMs.

For more details and steps to login refer Different ways to connect to GCP Projects section.

Default Configurations

Network Tags: - allow-iap: This network tag ensures that Identity-Aware Proxy (IAP) is enabled, which helps secure your VM instances by requiring authentication and authorization before access. It is required to SSH into the instance. If not applied, the instance will not allow SSH traffic.

Metadata: - install-nvidia-driver: Automatically installs NVIDIA drivers if GPU support is needed for the VM, ensuring optimized GPU performance.

Default Settings:

Several default settings are applied to ensure the VMs are secure, manageable, and meet operational standards:

VM with Internal IP Only: By default, the VM is configured with an internal IP, providing connectivity within the VPC network while keeping the instance isolated from public internet traffic.
Service Account: Each VM instance is automatically attached to a service account, which provides the necessary permissions to access Google Cloud resources securely. If service account parameter is blank while provisioning instance it will use custom service account created by the platform. In case of user provides an existing service account of their choice that service account is used.
KMS Key Integration: The VM is integrated with Cloud KMS (Key Management Service) to encrypt sensitive data and ensure secure handling of any confidential information.
Machine Configurations: While we are working on introducing more machine configurations for compute instances, users can still opt for a different configuration by provisioning the services with basic configurations from our platform and editing the instance configuration from GCP compute engine section.

Compliance requirements for Compute Engine configurations: - Ensure that no external IP is attached to the instances. Raise exceptions with support team if any requirements. - It is recommended to use CMEK or customer supplied encryptions for disks when working with regulated projects. - Ensure effective labels are not modified on the instances

References:

Compute Engine Overview ⧉

Edit Compute Engine Machine Type ⧉

Add or Remove GPUs ⧉

Stop or Restart VMs ⧉

Cloud Workstation

A managed development environment that provides secure, scalable and accessible virtual desktops for coding. Cloud Workstations provides preconfigured, customizable and secure managed development environments on Google Cloud. Cloud Workstations is accessible through a browser-based IDE, from multiple local code editors (such as IntelliJ IDEA Ultimate or VS Code), or through SSH. Instead of manually setting up development environments, we can create a workstation configuration specifying environment in a reproducible way

Managed environment
Scalable resources
Secure access
Integrated tools
Custom configurations
Remote development
Collaborative features

Cloud Workstation Parameters

Parameter	Description	Mandatory
Instance Name	Name of the Cloud Workstations instance	Yes
Region	Region where the instance is located	Yes
GPU Type	Type of GPU attached to the instance	Yes
GPU Count	Number of GPUs assigned	Yes
Machine Type	Type of machine instance	Yes
Boot Disk Image	Image used for the boot disk	Yes
Boot Disk Size	Size of the boot disk (in GB)	Yes
Boot Disk Type	Type of boot disk (e.g., SSD, HDD)	Yes
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No
Service Account	Service account associated with the instance	No

References:

Cloud Workstations ⧉

GKE Cluster

Google Kubernetes Engine (GKE) is a fully managed Kubernetes service for deploying, managing and scaling containerized applications. Kubernetes, the leading container orchestration platform, automates the deployment, scaling and operation of application containers across clusters of hosts. With GKE, you can benefit from Google Cloud’s security, reliability and scalability while focusing on your applications without managing the underlying infrastructure.

Kubernetes management
Auto-scaling
High availability
Secure environment
Integrated monitoring
Easy upgrades
Custom configurations

GKE Parameters

Parameter	Description	Mandatory
GKE Cluster Name	Name of the GKE Cluster	Yes
Region	Region where the GKE Cluster is located	Yes
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No
Service Account	Service account associated with the GKE Cluster	No
IP Range Pods	IP range for the pods in the cluster	No
IP Range Services	IP range for the services in the cluster	No
Auto Scaling	Auto scaling configuration for the cluster	No
Primary Node Pool Name	Name of the primary node pool	Yes
Primary Node Pool Minimum Count	Minimum number of nodes in the primary node pool	Yes
Primary Node Pool Maximum Count	Maximum number of nodes in the primary node pool	No
Primary Node Pool Machine Type	Type of machine instances in the primary node pool	No
Enable Extra Node Pool	Whether to enable an extra node pool	No
Secondary Node Pool Name	Name of the secondary node pool	No
Secondary Node Pool Minimum Count	Minimum number of nodes in the secondary node pool	No
Secondary Node Pool Maximum Count	Maximum number of nodes in the secondary node pool	No
Secondary Node Pool Machine Type	Type of machine instances in the secondary node pool	No

Default Configurations

Labels:
- cluster_name: A label used to identify the GKE cluster, making it easier to organize and manage multiple clusters within your environment.
- node_pool: This label tracks the specific node pool to which the nodes belong, providing better visibility and management over different node pools.
Master IPv4 CIDR Block:
- master_ipv4_cidr_block: Specifies the IP range in CIDR notation for the GKE master network. This helps in defining network isolation for the GKE master nodes.
Horizontal Pod Autoscaling:
- horizontal_pod_autoscaling: This configuration enables horizontal pod autoscaling in the GKE cluster, allowing pods to scale automatically based on CPU usage or other metrics. The default is true to optimize resource utilization.
Maintenance Recurrence:
- maintenance_recurrence: Defines the frequency of the recurring maintenance window in RFC5545 format. This helps automate regular cluster maintenance like patching and updates.
Remove Default Node Pool:
- remove_default_node_pool: If set to true, this will remove the default node pool during cluster setup, allowing for custom node pool configurations.
Node Pools Labels:
- node_pools_labels: A map of maps that allows you to specify custom labels for nodes in different node pools, enhancing node management and classification.
Google Compute Engine Persistent Disk CSI Driver:
- gce_pd_csi_driver: Enables the Google Compute Engine Persistent Disk Container Storage Interface (CSI) Driver. This allows Kubernetes workloads to dynamically provision and manage persistent disks. By default, this feature is enabled.
Identity Namespace:
- identity_namespace: Specifies the workload pool for attaching Kubernetes service accounts. The default value is set to enabled, automatically configuring the project-based pool ([project_id].svc.id.goog), improving security and access control.

References:

GKE Overview ⧉

JupyterHub

A multi-user platform for hosting Jupyter notebooks in a shared environment. A customized solution that provides JupyterLab experience which give user isolation. Recommended for sharing single instance with multiple users for collaboration.

Collaborative notebooks
Scalable environment
User management
Customizable environments
Centralized access
Secure authentication
Resource sharing

JupyterHub Parameters

Parameter	Description	Mandatory
Instance Name	Name of the virtual machine	Yes
Region	Region where the VM will be deployed	Yes
Zone	Zone within the selected region	Yes
GPU Type	Type of GPU attached to the VM	Yes
GPU Count	Number of GPUs attached to the VM	Yes
Machine Type	Type of machine instance	Yes
Boot Disk Image	Image used for the boot disk	Yes
Boot Disk Size	Size of the boot disk (in GB)	Yes
Boot Disk Type	Type of boot disk (e.g., SSD, HDD)	Yes
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No
Service Account	Service account associated with the VM	No

Default Configurations

Network Tags: The following tags are applied to enable necessary functionality for the JupyterHub service:
- allow-health-checks: Ensures health checks are allowed for monitoring the service.
- allow-iap: Enables Identity-Aware Proxy (IAP) for secure access.
Startup Script:

A custom startup script is used to configure JupyterHub with the necessary user settings and URLs. Key parameters include:
- user_list: List of users with access.
- user_email: Email of the requesting user. By adding the users in this list, the users will be able to add the users on JupyterHub Notebook.
- admin_email: Email of the JupyterHub admin. This will give admin level permission for the users on JupyterHub service. Please modify this list responsibly.
  
  Note: Users of the instance will be responsible for managing the user access to the notebooks and are advised to share the notebooks with registered users only.
Custom Metadata:
- custom-proxy-url: This metadata key stores the link to the notebook URL hosting the JupyterHub server. This value will be referred to as CUSTOM_PROXY_URL henceforth.
- startup-script: This metadata key stores the startup script used for the JupyterHub as described above.
- install-nvidia-driver: This ensures the installation of nvidia-drivers into the instances if GPUs attached. In case of user facing any issues with the Nvidia drivers, reach out to our support team via feedback tool.
Proxy URLs:
- Each JupyterHub instances have been configured to host custom services on proxy ports. Ports 8002-8006 are enabled for hosting the services. Please ensure that the services hosted are authenticated for better security.
- The service loadbalancers will be attached to the cloud armor policies.
- Naming convention of the URL: $CUSTOM_PROXY_URL/$INSTANCE_NAME/proxy/{8002-8006} where CUSTOM_PROXY_URL which is defined in custom metadata.
Load Balancers:
- Each services are hosted behind a loadbalancer which follows same name as instance/service deployed. You can view the load balancers in GCP console under

Data Security Guidelines:

Our platform is committed to ensuring the security of your data within the project environment. However, it is user responsibility to maintain the confidentiality of your data and prevent it from being shared outside of the project.

Do not share your Jupyter Notebooks with anyone outside of the project.
Do not store your data outside of the project resources created. This includes sharing them online, on personal devices, or through any other means.
Be mindful of the information you include in your notebooks. Avoid storing sensitive data, such as passwords or personal information, directly within the notebooks.

Note: By using this platform, you agree to be responsible for the security of your data and to comply with these guidelines.
If you have any questions or concerns about data security, please contact our support team.

Visit our Developer Hub for learning tools and tips for working with Notebooks.

References:

Jupyter service Overview ⧉

MIS

The Medical Imaging Suite (MIS) is a comprehensive system for capturing, managing and analyzing medical images to enhance diagnostic accuracy.

Efficient workflow
Accurate diagnosis
Seamless data access
3D imaging support
Remote collaboration
EHR integration

MIS Parameters

Parameter	Description	Mandatory
Instance Name	Name of the MIS instance	Yes
Region	Region where the instance is located	Yes
Zone	Zone within the selected region	Yes
GPU Type	Type of GPU attached to the instance	Yes
GPU Count	Number of GPUs attached	Yes
Machine Type	Type of machine instance	Yes
Boot Disk Image	Image used for the boot disk	Yes
Boot Disk Size	Size of the boot disk (in GB)	Yes
Boot Disk Type	Type of boot disk (e.g., SSD, HDD)	Yes
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No
Service Account	Service account associated with the instance	No

As MIS is a version of JupyterHub, all the configurations of JupyterHub service is applicable for MIS.

Default Configurations

Tags:

The following tags are applied to enable necessary functionality for the JupyterHub service: - allow-health-checks: Ensures health checks are allowed for monitoring the service. - allow-iap: Enables Identity-Aware Proxy (IAP) for secure access.
Startup Script:

A custom startup script is used to configure JupyterHub with the necessary user settings and URLs. Key parameters include: - user_list: List of users with access.
- user_email: Email of the requesting user. By adding the users in this list, the users will be able to add the users on JupyterHub Notebook.
- admin_email: Email of the JupyterHub admin. This will give admin level permission for the users on JupyterHub service. Please modify this list responsibly.
  
  Note: Users of the instance will be responsible for managing the user access to the notebooks and are advised to share the notebooks with registered users only.
Custom Metadata:
- custom-proxy-url: This metadata key stores the link to the notebook URL hosting the JupyterHub server.
- startup-script: This metadata key stores the startup script used for the JupyterHub as described above.
- install-nvidia-driver: This ensures the installation of nvidia-drivers into the instances if GPUs attached. In case of user facing any issues with the Nvidia drivers, reach out to our support team via feedback tool.

References:

Medical Imaging Suite ⧉

3D Slicer

The is a swiss army knife of medical imaging research. It can load, visualize and analyze various file formats and data modalities. Further, it is continuously developed by the research community. The extension manager contains more than 150 extensions. 3D Slicer is an open-source software platform for medical image informatics, image processing and three-dimensional visualization. In GCP, it can be deployed as part of custom computing or visualization solutions to support medical and scientific research. GCP provides the infrastructure to host and process 3D Slicer workloads.

Medical imaging
3D visualization
Open-source
Multi-modality support
Advanced analytics
Interactive tools
Extensible modules

3D Slicer Parameters

Parameter	Description	Mandatory
VM Name	Name of the virtual machine	Yes
Region	Region where the VM will be deployed	Yes
Zone	Zone within the selected region	Yes
GPU Type	Type of GPU attached to the VM	Yes
GPU Count	Number of GPUs attached to the VM	Yes
Machine Type	Type of machine instance	Yes
Disk Name	Name of the VM disk	Yes
Boot Disk Image	Image used for the boot disk	Yes
Boot Disk Size	Size of the boot disk (in GB)	Yes
Boot Disk Type	Type of boot disk (e.g., SSD, HDD)	Yes
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No
Service Account	Service account associated with the VM	No

Access 3D Slicer

3D Slicer service is deployed as a Desktop service. To access the application, we need to RDP into the Windows Desktop on GCP and use the application. This can be accessed using IAP Desktop as mentioned here.

Default Configurations

Mount Folder:
- mount_folder: The folder where the service's data will be mounted is specified. By default, this is set to D:/, but it can be customized based on your requirements. Read how to mount GCS buckets to Windows Instances here.
- KMS Key for Disk Encryption: The service will support encryption using a Key Management Service (KMS) key. The KMS key’s self-link can be specified for secure data encryption (kms_key_self_link). If no key is provided, a default configuration will be used.
Tags:
- allow-iap: This tag enables Identity-Aware Proxy (IAP), providing secure access to the 3D Slicer service, allowing only authorized users to connect.

References:

3D Slicer Documentation ⧉

TPU

A specialized hardware accelerator designed by Google for high-performance machine learning tasks. Tensor Processing Units (TPUs) are Google's custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. Cloud TPUs allow you to access TPUs from Compute Engine, Google Kubernetes Engine and Vertex AI.

High performance
Machine learning
Accelerated computing
TensorFlow optimized
Scalable
Low latency
Cost-efficient

TPU VM Parameters

Parameter	Description	Mandatory
VM Name	Name of the TPU virtual machine	Yes
Region	Region where the TPU VM is located	Yes
Zone	Zone within the selected region	Yes
Accelerator Type	Type of TPU accelerator	Yes
Runtime Version	Version of the TPU runtime	Yes
TPU VM Type	Type of TPU VM	No
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No
Service Account	Service account associated with the TPU VM	No

Default Configurations

Tags:
- allow-iap: Enables secure access to your TPU instance using Identity-Aware Proxy (IAP), ensuring controlled access.
Private Instance:
- By default no public IP is attached to the TPU instance to ensure instance is not accessible over the network.

References:

Cloud TPUs Overview ⧉

Vertex AI Workbench

This is a JupyterLab on a pre-configured virtual machine located inside Vertex AI. Ideal to get started within a few minutes right from the browser. Vertex AI Workbench is a tool provided by Google Cloud for developers and data scientists working to build machine learning models. It’s based on Jupyter notebooks, which are interactive documents that allow you to write code, run it and see the results, all in one place. Think of it as a workspace for data scientists and engineers to do their job easily. You can use Vertex AI Workbench to access various Google Cloud services, making it easier to build and deploy machine learning models without leaving the notebook environment.

Scalable
ML workflow automation
Custom environments

Vertex AI Workbench Parameters

Parameter	Description	Mandatory
VM Name	Name of the Vertex AI Workbench virtual machine	Yes
Region	Region where the Vertex AI WorkbenchVM is located	Yes
Zone	Zone within the selected region	Yes
GPU Type	Type of Vertex AI Workbench GPU	Yes
GPU Count	Count of GPU	Yes
Machine Type	Type of Vertex AI Workbench machine	Yes
Boot Disk Size	Size of the boot disk	Yes
Boot Disk Type	Type of the boot disk	Yes
Customer-managed encryption key (CMEK)	Encryption key managed by the customer	No
Service Account	Service account associated with the Vertex AI Workbench VM	No

Default Configurations

Enable Required APIs:

The necessary APIs (notebooks.googleapis.com, aiplatform.googleapis.com) for Vertex AI Workbench will be enabled to ensure smooth operation and functionality of the service.

Network and IP Settings:

Disable Public IP: All the notebooks created by catalogues ensures that public IP access is disabled (disable public ip = true), so the instance will only be accessible through private networking for enhanced security.
Enable IP Forwarding: IP forwarding will be enabled (enable ip forwarding = true), allowing the instance to forward packets to other destinations, which can be useful for certain networking scenarios.
Proxy Access: Proxy access will remain enabled (disable proxy access = false), meaning that proxy services can still be used for accessing the instance if needed.

Network Tags:

The following tags will be assigned to the instance to help with identification, access control, and categorization:
- allow-iap: This tag allows access via Identity-Aware Proxy (IAP), which adds an extra layer of security for SSH access.
- deeplearning-vm: This tag indicates that the instance is configured for deep learning workloads.
- notebook-instance: This tag designates the instance as a notebook for development and experimentation.

Visit our Developer Hub for learning tools and tips for working with Notebooks.

References:

Vertex AI Overview ⧉

SSH into Vertex AI Workbenches ⧉

Common Labels

In this section, we will walk you through the labels applied for all the services created from the Service Catalogues.

System Labels :
- controlled-by: Specifies the team or department that controls the resources.
- created-by: Identifies the user or system that created the resources.
- goog-terraform-provisioned: Denotes that the resources was provisioned using Terraform.
- os-patch: Tracks OS patch information relevant to the resources.
Workload Specific Labels : These labels will be customized according to your
- contains-phi: Indicates that the resource may store Protected Health Information (PHI), ensuring that the data is treated accordingly.
- data-classification: Marks the sensitivity level of the data stored (e. g., public, internal, restricted, confidential).
Note: Workload Specific labels can be updated according to user workloads. We recommend to not update System labels for traceability. Users can add new labels to their buckets according to their requirements.