Skip to content

Service Catalogue

Overview

To ease the process of GCP resources setup for users, we have service catalogues available on our AI Innovation Platform where users can easily add GCP services to their projects/workspaces from the platform.

To share an example, let's assume users want to provision a GPU instance hosting Jupyterlab service which can be shared among multiple users and have separate working spaces to maintain development environments. To achieve this, a normal user has to:

  1. Host the Jupyter service
  2. Attach it behind a load balancer and
  3. Secure it using an authentication method on the GCP console

This can be done hassle free using our service catalogue functionality on our platform using just a few clicks which will not only take care of the authentication and Jupyterlab service setup but also take care of other security & cloud best practices like adding tags, labels, no external IPs, etc.

The user has the flexibility to update/change the settings of the instances from the GCP console later on, as they now have access to the GCP console as well.

Service catalogue is an add-on functionality and not a compulsion for our platform. It is a recommended feature for hosting the basic setup required with respect to security and cloud best practices and thereby modified according to user needs. Users can still however add services from GCP console directly onto their projects, if they prefer.

Service catalogue has two main flows in terms of whether the services are to be added to a new project or to be added to an existing project. Below are the details of both flows:

  1. New Project: The user needs to add the details for the project like project type, unit name, group name, workstream name etc. Once added, the user can add services (this is optional). Once all details are added, the user can submit the request which will go for approval to admins. Based on the action taken by the admin, the user will be notified about their service addition request. (For more details, check the Request New Project section of this documentation)

    Note - A GCS bucket is added by default for marketplace operations. Please do not modify this bucket.

  2. Existing Project: For the existing projects, user need to add the services from the project details page and submit it (Refer to the below section for more details on how it can be done). The services will be added directly and users will receive the notification on the status of service addition request.

Provision services using service catalogue

The provision to spin up new resources is provided on the AI Innovation Platform itself, let us talk about how this can be done.

Below are the steps to be followed to add services:

  1. Go to "My Projects" and click on the project for which the services need to be added. Project View

  2. In the details page, by clicking on "Add" under Services more services list be available. Add Service

  3. Select the services to be added and click on "Add" to add the service configurations. Add Configuration

    Add Configuration

  4. Add the details of the services and click on "Submit". The user will be notified when the services are added.   Cloud Storage

  5. A link of the service will be available on the service name itself as a hyperlink. The users can click on this and can navigate to the GCP console to see these service. Cloud Storage

Note - Please note that the services can be added at any state of the project (requested, provisioning, etc) but they will be spun up only when the project is in active state. Also, services addition will not require any approvals from admins.

Available services and their configurations

This section explains the available services and parameters for each of the services part of the service catalogue. It will also help you to understand how you can access these services from the GCP console.

App Engine

A fully managed platform for building and deploying scalable web applications. Google App Engine is a fully managed, serverless platform designed to build and deploy scalable web applications. It abstracts the infrastructure management, allowing developers to focus purely on application development. App Engine automatically handles tasks such as provisioning servers, managing scaling, load balancing and applying updates, making it a powerful solution for building applications that can scale seamlessly based on demand.

  • Auto-scaling
  • Managed infrastructure
  • Multiple languages
  • Integrated services
  • No server management
  • Pay-per-use
  • Version control

App Engine

App Engine Parameters

The following are key parameters that must be configured when creating an App Engine instance:

Parameter Description Mandatory
App Engine Name Name of the App Engine service Yes
Region Region where the App Engine is deployed Yes
CPU CPU configuration for the App Engine Yes
Memory Memory allocated for the App Engine Yes
Version ID Version identifier for the App Engine Yes
Container Image URL URL of the container image used for deployment Yes
Port Port number used by the App Engine Yes
Disk Size Size of the disk allocated to the App Engine (in GB) Yes
Service Account Service account associated with the App Engine No

Default Configurations

  • Runtime Configuration:

    • Runtime: The service will be deployed on the App Engine with a customizable runtime environment. By default, the runtime service name for the App Engine application is set to "custom".
  • Scaling Settings:

    • Cooldown Period: A cooldown period of 120 seconds will be applied between scaling operations. This means the system will wait for 120 seconds before attempting to scale up or down, ensuring stable performance and avoiding rapid changes.

    • Target CPU Utilization: The target CPU utilization for automatic scaling is set to 0.5 by default, meaning scaling actions will trigger when CPU usage exceeds or falls below 50%. This helps maintain optimal performance and cost efficiency.

  • Traffic Splitting:

    • Traffic Management: Using the google_app_engine_service_split_traffic resource, we will manage and split traffic across different versions of your App Engine service. This allows you to direct a percentage of traffic to different versions without downtime or disruptions. By default 100% traffic is routed to latest version of the application.

Note: Cloud Run is the latest evolution of Google Cloud Serverless, building on the experience of running App Engine for more than a decade. Cloud Run runs on much of the same infrastructure as App Engine standard environment, so there are many similarities between these two platforms. Cloud Run is designed to improve upon the App Engine experience, incorporating many of the best features of both App Engine standard environment and App Engine flexible environment. Hence Cloud Run would be our recommendation instead of using App Engine.

References:

Appengine Overview ⧉

Artifact Registry

Repository for managing and storing container images. Container images greatly increase reproducibility and security in AI development. They are essential for many cloud-based services. Google Cloud's Artifact Registry is a fully managed service that allows you to store, manage and secure your build artifacts, such as container images and dependencies like libraries, packages and binaries. It supports a variety of formats like Docker, Maven and npm and is designed to scale efficiently while ensuring security and reliability. It is a key part of the DevOps lifecycle, helping teams organize and manage artifacts generated during the build process.

  • Centralized storage
  • Multi-format support
  • Secure access
  • Version tracking
  • Seamless integration
  • Dependency management
  • Scalable repository

Artifact Registry

Artifact Registry Parameters

Parameter Description Mandatory
Artifact Registry Name Name of the Artifact Registry Yes
Format Format of the artifacts (e.g., Docker) Yes
Location Region where the Artifact Registry is located Yes
Customer-managed encryption key (CMEK) Encryption key managed by the customer No
Cleanup Policy Policy for cleaning up old or unused artifacts No

Default Configurations

  • Enable APIs:

    • The Artifact Registry API (artifactregistry.googleapis.com) is enabled, allowing the service to manage and store container images, libraries, and artifacts effectively. This API is essential for the Artifact Registry to operate within your Google Cloud environment.
  • Cleanup Policy:

    • Cleanup Policy Dry Run: By default, the cleanup policy is set to false, meaning the cleanup pipeline will actively remove old or unused versions of artifacts in the repository to manage storage efficiently. However, if set to true, the cleanup will run in "dry run" mode, preventing actual deletion of versions and allowing you to review what would be cleaned up before making any changes.

References:

Artifact registry Overview ⧉

Push and Pull Images ⧉

Grant Repo specific Permissions ⧉

Cloud Build

A service for automating the building, testing and deployment of code. Cloud Build is a service that executes our builds on Google Cloud infrastructure. Cloud Build can import source code from Cloud Storage, GitLab, GitHub, or Bitbucket, execute a build to our specifications and produce artifacts such as Docker containers or Java archives. Cloud Build executes our build as a series of build steps, where each build step is run in a Docker container. A build step can do anything that can be done from a container irrespective of the environment. To perform your tasks, we can either use the supported build steps provided by Cloud Build or write our own build steps.

  • Continuous integration
  • Custom workflows
  • Multi-language support
  • Fast builds
  • Secure pipelines
  • Scalable resources
  • Artifact management

Cloud Build

Cloud Build Parameters

Parameter Description Mandatory
Resource Name Name of the Cloud Build resource Yes
Region Region where the Cloud Build resource is located Yes
Deploy To Target location or service for deployment Yes
Container Image URL URL of the container image used in the build process Yes

Default Configurations

  • Pub/Sub Trigger: We configure a Pub/Sub topic that triggers the Cloud Build process. By default, this topic is set to "gcr", meaning that Cloud Build will be triggered by messages published to this topic.

  • Substitution Variables: Substitution variables allow you to pass dynamic values during the build process. For example:

    • _ACTION: Represents the action specified in the incoming Pub/Sub message (e.g., a build, tag, or deploy action).
    • _IMAGE_TAG: Captures the image tag provided in the Pub/Sub message, ensuring the correct image version is used during the build process. User can update the build pipeline according to their container image if required.
  • Filter:

    By default, the filter is set to match on the image tag (_IMAGE_TAG.matches("")), but you can modify it to include more specific conditions based on your needs. The default settings ensures an automated deployment of new version of images as an when pushed to artifact registry.

  • Build Step Image:

    The Gcloud image used during the build process is defined as "gcr.io/cloud-builders/gcloud". This image contains the necessary tools to execute Google Cloud commands, including managing Cloud Run services and jobs.

  • Logging Options:

    Logging options determine where Cloud Build logs are stored. The default setting is "CLOUD_LOGGING_ONLY", ensuring all logs are sent to Google Cloud Logging for easy monitoring and troubleshooting.

  • Build Timeout:

    The timeout for the Cloud Build process is set to 540 seconds. If the build exceeds this time, it will be automatically terminated. This can be adjusted if longer build times are required for your specific workloads.

  • Labels:

    Labels help organize and track your build resources. These labels can be applied to the virtual machines (VMs) that are created as part of the Cloud Build process, making it easier to manage and classify resources. You can specify custom labels such as project ownership, data classification, and more.

  • Custom Roles for Cloud Build:

    A custom role called "Cloud Run Deployer" is configured with a set of permissions required to manage Cloud Run services and jobs. This role includes capabilities like:

    • Updating jobs, listing locations, managing revisions, accessing services, and creating logs.

    • Acting as a service account, allowing Cloud Build to securely interact with other Google Cloud services.

References:

Cloud Build Overview ⧉

Cloud Composer

Workflow orchestration service built on Apache Airflow for scheduling and managing chains of jobs/functionalities. Cloud Composer is a fully managed workflow orchestration service built on Apache Airflow. It allows you to create, schedule, monitor and manage workflows, simplifying the automation of tasks across cloud services and on-premises systems. Composer integrates seamlessly with other Google Cloud services and provides automation for environment setup, letting you focus on your workflows rather than infrastructure management.

  • Workflow automation
  • Apache Airflow
  • Scalable scheduling
  • Integrated monitoring
  • Dependency management
  • Custom workflows
  • Easy scaling

Cloud Composer

Cloud Composer Parameters

Parameter Description Mandatory
Cloud Composer Cluster Name Name of the Cloud Composer cluster Yes
Region Region where the Cloud Composer cluster is located Yes
Resilience Mode Mode of resilience for the Cloud Composer Yes
IP Range Pods IP range for the pods in the environment No
IP Range Services IP range for the services in the environment No
Environment Size Size of the environment (e.g., small, medium, large) Yes
Service Account Service account associated with the Cloud Composer No
Customer-managed encryption key (CMEK) Encryption key managed by the customer No

Default Configurations

  • IAM roles for Cloud Composer:

    • Composer Worker Role: This assigns the roles/composer.worker role to a service account, allowing it to perform tasks required for the Cloud Composer environment.
  • Workload configurations: The environment's workload configurations vary based on the size of the environment (small, medium, large). These configurations dictate the amount of CPU, memory, and storage allocated for different components like:

    • Scheduler: Manages the orchestration of tasks.
    • Web Server: Provides the UI for managing workflows.
    • Worker: Executes the actual tasks.
    • Triggerer: Handles external triggers for workflows. These configurations ensure that your environment is optimized based on your operational needs.
  • Private environment:

    • Private service connect: This ensures the environment uses a private connection type for enhanced security by enabling private endpoints and connecting through a private subnetwork.
  • Dynamic workloads: The Composer environment dynamically adjusts its resources based on workload size using the configurations provided. This makes it flexible to scale with your operational requirements.

  • Node configuration: The environment nodes are configured with custom networking, including private IP allocation, subnetwork configuration, and optional encryption settings using a KMS key.

  • Resilience mode: The Composer environment can be set to different resilience levels to ensure high availability and fault tolerance, enhancing the reliability of your workflows.

References:

Cloud Composer Overview ⧉

Cloud Run Job

Service that runs repetitive compute tasks in a serverless environment. It can be reached from any machine in the project and is great for centralising certain functionalities (i.e. data conversion) within the development project. Cloud Run is a managed compute platform that enables you to run containers that are invocable via requests or events. Cloud Run is serverless: it abstracts away all infrastructure management, so we can focus on what matters most — building great applications. A service that allows you to run a container to completion without a server. This service only run its own tasks and exits when it is finished.

  • Containerized tasks
  • On-demand execution
  • Scalable resources
  • No server management
  • Automated scaling
  • Event-driven
  • Pay-per-use

Cloud Run Job

Cloud Run Parameters

Parameter Description Mandatory
Job Name Name of the Cloud Run Job Yes
Region Region where the job will be executed Yes
No of Tasks Number of tasks to run in the job Yes
CPU CPU allocation for each task Yes
Memory Memory allocation for each task Yes
Service Account Service account associated with the job No
Container Image URL URL of the container image used for the job Yes
Parallelism Number of tasks to run in parallel Yes
Task Timeout Timeout for each task Yes
Time Unit Unit of time for the task timeout (e.g., seconds, minutes) Yes

Default Configurations

  • Variables:
    • max_retries: Defines the maximum number of retries allowed for the job in case of failures. The default value is set to 3.
    • command: Specifies the command to be run inside the container. The default value is null, meaning no specific command is set by default, and the container will use its own startup process.

References:

Cloud Run Jobs Overview ⧉

Cloud Run Service

A service to run code that handles incoming requests continuously (eg. a web service). Different to Cloud Run Job, that executes each job based on a trigger.

  • Fully managed
  • Auto-scaling
  • Containerized apps
  • No server management
  • Event-driven
  • Pay-per-use
  • Fast deployment

Cloud Run Service

Cloud Run Services Parameters

Parameter Description Mandatory
Cloud Run Name Name of the Cloud Run service Yes
Region Region where the Cloud Run service is located Yes
CPU CPU allocation for the service Yes
Memory Memory allocation for the service Yes
Container Image URL URL of the container image used for the service Yes
Port Port number used by the Cloud Run service Yes
Service Account Service account associated with the service No

Default Configurations

  • Module: Serverless Load Balancer

    The serverless_loadbalancer module is configured to enable the seamless integration of a load balancer with the Cloud Run service. This ensures that incoming traffic is efficiently distributed across multiple instances, improving the availability and reliability of the service.

  • Security Policy :

    The security_policy variable is used to apply a Cloud Armor Security Policy to the Cloud Run service. Cloud Armor helps protect applications from security threats like DDoS attacks. The default value is set to an empty string, meaning no specific policy is enforced unless provided.

References:

Cloud Run Service Overview ⧉
Host Cloud Run Service behind Load Balancer

Cloud SQL

A relational database service for MySQL, PostgreSQL and SQL Server. Cloud SQL is a fully-managed database service that helps us set up, maintain, manage and administer our relational databases on Google Cloud Platform.

  • Automated backups
  • High availability
  • Secure connections
  • Scalable instances
  • Easy maintenance
  • Integrated monitoring

Cloud SQL

Cloud SQL Parameters

Parameter Description Mandatory
Cloud SQL Name Name of the Cloud SQL instance Yes
Region Region where the Cloud SQL instance is located Yes
Database Version Version of the database Yes
Edition Edition of the database (e.g., Enterprise) Yes
Database Name Name of the database Yes
Username Username for accessing the database Yes
Tier Tier of the Cloud SQL instance (e.g., db-f1-micro) Yes
Backup Location Location for database backups Yes
Disk Size Size of the disk (in GB) Yes
Disk Type Type of disk (e.g., SSD, HDD) Yes

Default Configurations

  • Database Flags: We’ve introduced database-specific flags to enhance the configuration of various database engines:

    • PostgreSQL:

      • log_duration: Enables logging of the duration of each statement.
      • pgaudit.log: Logs all activities for audit purposes.
      • log_hostname: Logs the hostname of the client connecting to the instance.
      • log_checkpoints: Enables checkpoint logging.
    • MySQL:

      • general_log: Enables logging of general queries for troubleshooting.
      • skip_show_database: Restricts the SHOW DATABASE command to authorized users.
      • wait_timeout: Sets the wait timeout to manage inactive connections.
    • SQL Server:

      • 1204: Enables deadlock logging.
      • remote_access: Allows remote connections to the SQL server.
      • remote_query_timeout: Configures the timeout for remote queries to 300 seconds.
  • Private Network Configuration:

    • private_network_url : This variable specifies the private network URL for securely connecting to the Cloud SQL instance within your network.
  • Maintenance Window: To minimize downtime, the maintenance window is configured as follows:

    • Day: Sunday
    • Hour: Midnight (UTC)
    • Update Track: Set to "stable" to receive stable updates.
  • SSL Certificate Management: We’ve automated the management of SSL certificates for secure communication:

    • Client Certificate: SSL certificates are created and managed for each SQL instance using Google Secret Manager to ensure encrypted communication.
    • Password Management: Random, complex passwords are generated and securely stored using Secret Manager.
    • Secret Storage: All sensitive information, including certificates and credentials, is stored in Secret Manager for added security.

References:

Cloud SQL Overview ⧉
Connect to Cloud SQL from GKE
Connect to SQL from Cloud Run

Cloud Source

Service for hosting private code repositories. Cloud Source Repositories are fully featured, private Git repositories hosted on Google Cloud.

  • Version control
  • Secure storage
  • Code collaboration
  • Integrated CI/CD
  • Branch management
  • Access control
  • Scalable hosting

Cloud Source

Cloud Source Parameters

Parameter Description Mandatory
Repository Name Name of the Cloud Source repository Yes

Default Configurations:

No default configurations have been implemented.

References:

Cloud Source Repository CLI ⧉

Cloud Storage

A service that allows data to be stored and accessed remotely over the internet. Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery.

  • Infinite storage volume
  • Remote access
  • Data backup
  • Secure encryption
  • Cost-efficient
  • Sync across devices
  • Collaboration support

Cloud Storage

Cloud Storage Parameters

Parameter Description Mandatory
Bucket Name Name of the Cloud Storage bucket Yes
Region Region where the storage bucket will be located Yes
Storage Class Type of storage class (e.g., STANDARD, NEARLINE, COLDLINE, ARCHIVE) Yes
GCS Lifecycle Lifecycle management policies for the bucket Yes
Customer-managed encryption key (CMEK) Encryption key managed by the customer No

Bucket Name

Bucket names are globally unique and once created cannot be updated. We follow ${USER_DEFINED_SERVICE_NAME}-${RANDOM_ID}naming convention for all the buckets created from Service Catalogues. For more details please refer official documentation ⧉.

Region

Currently service catalogues support provisioning of regional buckets which are optimal w.r.t regional workloads. In case the use case demands to have multi-regional buckets to be created, the user will have access to the Google Cloud Storage console for creating the buckets.

To understand what suits better for your usecase please refer location considerations ⧉ and location recommendation ⧉

Storage Class

You can select one of the below storage class while provisioning the buckets:

Storage Class Monthly Availability Minimum Storage Duration Retrieval Fees Best For?
STANDARD 99.99% in regions None None Frequently accessed ("hot" data)
NEARLINE 99.9% in regions 30 days Yes Highly durable storage service for storing infrequently accessed data
COLDLINE 99.9% in regions 90 days Yes Highly durable storage service for infrequently accessed data with slightly lower availability
ARCHIVE 99.9% in regions 365 days Yes Highly durable storage service for data archiving, online backup, and disaster recovery

To learn more about storage classes visit official documentation ⧉

Default Lifecycle policy

The Lifecycle Policy can be applied during creation of the buckets. By default the following policy is applied to the buckets created from Service Catalogues.

Action Object condition __
Set to Nearline 30+ days since object was created Storage Class matches Standard
Set to Coldline 90+ days since object was created Storage Class matches Nearline
Set to Archive 365+ days since object was created Storage Class matches Coldline
Delete object 730+ days since object was created Storage Class matches Archive

Note: The above is added as standard policy to ensure data lifecycle optimization. Users have access to update the policies according to their requirement. Refer the official documentation ⧉ for learning more about lifecycle policies.

Customer-managed encryption key (CMEK)

Cloud Storage always encrypts your data on the server side, before it is written to disk, at no additional charge. By default Google Managed Encryption is used. To increase data security we recommend to use Customer managed encryption keys for data sensitive workloads as server side encryption.

Note: Client-side encryption: encryption that occurs before data is sent to Cloud Storage. Such data arrives at Cloud Storage already encrypted but also undergoes server-side encryption.

Default Configurations

Bucket Settings:

  • Versioning Enabled: This feature will be enabled to keep a history of all object versions in the bucket. It allows you to recover objects that are overwritten or deleted, adding an extra layer of data protection.

  • Uniform Bucket-Level Access: We will enforce uniform bucket-level access (uniform_bucket_level_access = true), meaning all access control will be managed at the bucket level rather than at the individual object level. This simplifies access management and improves security.

References:

Cloud Storage Overview ⧉
Discover Object Storage Gcloud Commands ⧉

Compute Engine

Creating new computing machines (called virtual machines or VMs) in the cloud to preprocess data, develop code, host smaller apps with a web frontend (e.g. tensor board, custom image viewer etc) and so on.
Google Compute Engine allows you to create and run virtual machines (VMs) on Google’s powerful infrastructure. Think of a virtual machine as a computer that runs entirely in the cloud—there’s no need to buy or maintain any physical hardware. You can start with just one virtual machine or scale up to thousands, depending on your needs. The best part? You only pay for what you use, with no upfront costs.

  • Accessible via SSH
  • Scalable resources
  • Customizable VMs
  • Global availability
  • Secure infrastructure
  • Automated backups

Compute Engine

Compute Engine Parameters

Parameter Description Mandatory
Instance Name Name of the virtual machine Yes
Region Region where the VM will be deployed Yes
Zone Zone within the selected region Yes
GPU Type Type of GPU attached to the VM Yes
GPU Count Number of GPUs attached to the VM Yes
Machine Type Type of machine instance Yes
Boot Disk Image Image used for the boot disk Yes
Boot Disk Size Size of the boot disk (in GB) Yes
Boot Disk Type Type of boot disk (e.g., SSD, HDD) Yes
Customer-managed encryption key (CMEK) Encryption key managed by the customer No
Service Account Service account associated with the VM No

GCP:

  • Users can access the instances by SSH'ing to the instances either via Browser SSH by clicking the SSH button available on the compute vm console or gcloud SDK.
  • Users can use IAP desktop or IAP tunnel to access their VMs.

For more details and steps to login refer Different ways to connect to GCP Projects section.

Default Configurations

Network Tags: - allow-iap: This network tag ensures that Identity-Aware Proxy (IAP) is enabled, which helps secure your VM instances by requiring authentication and authorization before access. It is required to SSH into the instance. If not applied, the instance will not allow SSH traffic.

Metadata: - install-nvidia-driver: Automatically installs NVIDIA drivers if GPU support is needed for the VM, ensuring optimized GPU performance.

Default Settings:

Several default settings are applied to ensure the VMs are secure, manageable, and meet operational standards:

  • VM with Internal IP Only: By default, the VM is configured with an internal IP, providing connectivity within the VPC network while keeping the instance isolated from public internet traffic.
  • Service Account: Each VM instance is automatically attached to a service account, which provides the necessary permissions to access Google Cloud resources securely. If service account parameter is blank while provisioning instance it will use custom service account created by the platform. In case of user provides an existing service account of their choice that service account is used.
  • KMS Key Integration: The VM is integrated with Cloud KMS (Key Management Service) to encrypt sensitive data and ensure secure handling of any confidential information.

  • Machine Configurations: While we are working on introducing more machine configurations for compute instances, users can still opt for a different configuration by provisioning the services with basic configurations from our platform and editing the instance configuration from GCP compute engine section.

Compliance requirements for Compute Engine configurations: - Ensure that no external IP is attached to the instances. Raise exceptions with support team if any requirements. - It is recommended to use CMEK or customer supplied encryptions for disks when working with regulated projects. - Ensure effective labels are not modified on the instances

References:

Compute Engine Overview ⧉

Edit Compute Engine Machine Type ⧉

Add or Remove GPUs ⧉

Stop or Restart VMs ⧉

Cloud Workstation

A managed development environment that provides secure, scalable and accessible virtual desktops for coding. Cloud Workstations provides preconfigured, customizable and secure managed development environments on Google Cloud. Cloud Workstations is accessible through a browser-based IDE, from multiple local code editors (such as IntelliJ IDEA Ultimate or VS Code), or through SSH. Instead of manually setting up development environments, we can create a workstation configuration specifying environment in a reproducible way

  • Managed environment
  • Scalable resources
  • Secure access
  • Integrated tools
  • Custom configurations
  • Remote development
  • Collaborative features

Cloud Workstation

Cloud Workstation Parameters

Parameter Description Mandatory
Instance Name Name of the Cloud Workstations instance Yes
Region Region where the instance is located Yes
GPU Type Type of GPU attached to the instance Yes
GPU Count Number of GPUs assigned Yes
Machine Type Type of machine instance Yes
Boot Disk Image Image used for the boot disk Yes
Boot Disk Size Size of the boot disk (in GB) Yes
Boot Disk Type Type of boot disk (e.g., SSD, HDD) Yes
Customer-managed encryption key (CMEK) Encryption key managed by the customer No
Service Account Service account associated with the instance No

References:

Cloud Workstations ⧉

GKE Cluster

Google Kubernetes Engine (GKE) is a fully managed Kubernetes service for deploying, managing and scaling containerized applications. Kubernetes, the leading container orchestration platform, automates the deployment, scaling and operation of application containers across clusters of hosts. With GKE, you can benefit from Google Cloud’s security, reliability and scalability while focusing on your applications without managing the underlying infrastructure.

  • Kubernetes management
  • Auto-scaling
  • High availability
  • Secure environment
  • Integrated monitoring
  • Easy upgrades
  • Custom configurations

GKE Cluster

GKE Parameters

Parameter Description Mandatory
GKE Cluster Name Name of the GKE Cluster Yes
Region Region where the GKE Cluster is located Yes
Customer-managed encryption key (CMEK) Encryption key managed by the customer No
Service Account Service account associated with the GKE Cluster No
IP Range Pods IP range for the pods in the cluster No
IP Range Services IP range for the services in the cluster No
Auto Scaling Auto scaling configuration for the cluster No
Primary Node Pool Name Name of the primary node pool Yes
Primary Node Pool Minimum Count Minimum number of nodes in the primary node pool Yes
Primary Node Pool Maximum Count Maximum number of nodes in the primary node pool No
Primary Node Pool Machine Type Type of machine instances in the primary node pool No
Enable Extra Node Pool Whether to enable an extra node pool No
Secondary Node Pool Name Name of the secondary node pool No
Secondary Node Pool Minimum Count Minimum number of nodes in the secondary node pool No
Secondary Node Pool Maximum Count Maximum number of nodes in the secondary node pool No
Secondary Node Pool Machine Type Type of machine instances in the secondary node pool No

Default Configurations

  1. Labels:

    • cluster_name: A label used to identify the GKE cluster, making it easier to organize and manage multiple clusters within your environment.
    • node_pool: This label tracks the specific node pool to which the nodes belong, providing better visibility and management over different node pools.
  2. Master IPv4 CIDR Block:

    • master_ipv4_cidr_block: Specifies the IP range in CIDR notation for the GKE master network. This helps in defining network isolation for the GKE master nodes.
  3. Horizontal Pod Autoscaling:

    • horizontal_pod_autoscaling: This configuration enables horizontal pod autoscaling in the GKE cluster, allowing pods to scale automatically based on CPU usage or other metrics. The default is true to optimize resource utilization.
  4. Maintenance Recurrence:

    • maintenance_recurrence: Defines the frequency of the recurring maintenance window in RFC5545 format. This helps automate regular cluster maintenance like patching and updates.
  5. Remove Default Node Pool:

    • remove_default_node_pool: If set to true, this will remove the default node pool during cluster setup, allowing for custom node pool configurations.
  6. Node Pools Labels:

    • node_pools_labels: A map of maps that allows you to specify custom labels for nodes in different node pools, enhancing node management and classification.
  7. Google Compute Engine Persistent Disk CSI Driver:

    • gce_pd_csi_driver: Enables the Google Compute Engine Persistent Disk Container Storage Interface (CSI) Driver. This allows Kubernetes workloads to dynamically provision and manage persistent disks. By default, this feature is enabled.
  8. Identity Namespace:

    • identity_namespace: Specifies the workload pool for attaching Kubernetes service accounts. The default value is set to enabled, automatically configuring the project-based pool ([project_id].svc.id.goog), improving security and access control.

References:

GKE Overview ⧉

JupyterHub

A multi-user platform for hosting Jupyter notebooks in a shared environment. A customized solution that provides JupyterLab experience which give user isolation. Recommended for sharing single instance with multiple users for collaboration.

  • Collaborative notebooks
  • Scalable environment
  • User management
  • Customizable environments
  • Centralized access
  • Secure authentication
  • Resource sharing

Jupyter Hub

JupyterHub Parameters

Parameter Description Mandatory
Instance Name Name of the virtual machine Yes
Region Region where the VM will be deployed Yes
Zone Zone within the selected region Yes
GPU Type Type of GPU attached to the VM Yes
GPU Count Number of GPUs attached to the VM Yes
Machine Type Type of machine instance Yes
Boot Disk Image Image used for the boot disk Yes
Boot Disk Size Size of the boot disk (in GB) Yes
Boot Disk Type Type of boot disk (e.g., SSD, HDD) Yes
Customer-managed encryption key (CMEK) Encryption key managed by the customer No
Service Account Service account associated with the VM No

Default Configurations

  • Network Tags: The following tags are applied to enable necessary functionality for the JupyterHub service:

    • allow-health-checks: Ensures health checks are allowed for monitoring the service.
    • allow-iap: Enables Identity-Aware Proxy (IAP) for secure access.
  • Startup Script:

    A custom startup script is used to configure JupyterHub with the necessary user settings and URLs. Key parameters include:

    • user_list: List of users with access.

    • user_email: Email of the requesting user. By adding the users in this list, the users will be able to add the users on JupyterHub Notebook.

    • admin_email: Email of the JupyterHub admin. This will give admin level permission for the users on JupyterHub service. Please modify this list responsibly.

      Note: Users of the instance will be responsible for managing the user access to the notebooks and are advised to share the notebooks with registered users only.

  • Custom Metadata:

    • custom-proxy-url: This metadata key stores the link to the notebook URL hosting the JupyterHub server. This value will be referred to as CUSTOM_PROXY_URL henceforth.

    • startup-script: This metadata key stores the startup script used for the JupyterHub as described above.

    • install-nvidia-driver: This ensures the installation of nvidia-drivers into the instances if GPUs attached. In case of user facing any issues with the Nvidia drivers, reach out to our support team via feedback tool.

  • Proxy URLs:

    • Each JupyterHub instances have been configured to host custom services on proxy ports. Ports 8002-8006 are enabled for hosting the services. Please ensure that the services hosted are authenticated for better security.
    • The service loadbalancers will be attached to the cloud armor policies.
    • Naming convention of the URL: $CUSTOM_PROXY_URL/$INSTANCE_NAME/proxy/{8002-8006} where CUSTOM_PROXY_URL which is defined in custom metadata.
  • Load Balancers:

    • Each services are hosted behind a loadbalancer which follows same name as instance/service deployed. You can view the load balancers in GCP console under

Data Security Guidelines:

Our platform is committed to ensuring the security of your data within the project environment. However, it is user responsibility to maintain the confidentiality of your data and prevent it from being shared outside of the project.

  • Do not share your Jupyter Notebooks with anyone outside of the project.
  • Do not store your data outside of the project resources created. This includes sharing them online, on personal devices, or through any other means.
  • Be mindful of the information you include in your notebooks. Avoid storing sensitive data, such as passwords or personal information, directly within the notebooks.

Note: By using this platform, you agree to be responsible for the security of your data and to comply with these guidelines.
If you have any questions or concerns about data security, please contact our support team.

Visit our Developer Hub for learning tools and tips for working with Notebooks.

References:

Jupyter service Overview ⧉

MIS

The Medical Imaging Suite (MIS) is a comprehensive system for capturing, managing and analyzing medical images to enhance diagnostic accuracy.

  • Efficient workflow
  • Accurate diagnosis
  • Seamless data access
  • 3D imaging support
  • Remote collaboration
  • EHR integration

Medical Imaging Suite (MIS)

MIS Parameters

Parameter Description Mandatory
Instance Name Name of the MIS instance Yes
Region Region where the instance is located Yes
Zone Zone within the selected region Yes
GPU Type Type of GPU attached to the instance Yes
GPU Count Number of GPUs attached Yes
Machine Type Type of machine instance Yes
Boot Disk Image Image used for the boot disk Yes
Boot Disk Size Size of the boot disk (in GB) Yes
Boot Disk Type Type of boot disk (e.g., SSD, HDD) Yes
Customer-managed encryption key (CMEK) Encryption key managed by the customer No
Service Account Service account associated with the instance No

As MIS is a version of JupyterHub, all the configurations of JupyterHub service is applicable for MIS.

Default Configurations

  • Tags:

    The following tags are applied to enable necessary functionality for the JupyterHub service: - allow-health-checks: Ensures health checks are allowed for monitoring the service. - allow-iap: Enables Identity-Aware Proxy (IAP) for secure access.

  • Startup Script:

    A custom startup script is used to configure JupyterHub with the necessary user settings and URLs. Key parameters include: - user_list: List of users with access.

    • user_email: Email of the requesting user. By adding the users in this list, the users will be able to add the users on JupyterHub Notebook.

    • admin_email: Email of the JupyterHub admin. This will give admin level permission for the users on JupyterHub service. Please modify this list responsibly.

      Note: Users of the instance will be responsible for managing the user access to the notebooks and are advised to share the notebooks with registered users only.

  • Custom Metadata:

    • custom-proxy-url: This metadata key stores the link to the notebook URL hosting the JupyterHub server.

    • startup-script: This metadata key stores the startup script used for the JupyterHub as described above.

    • install-nvidia-driver: This ensures the installation of nvidia-drivers into the instances if GPUs attached. In case of user facing any issues with the Nvidia drivers, reach out to our support team via feedback tool.

References:

Medical Imaging Suite ⧉

3D Slicer

The is a swiss army knife of medical imaging research. It can load, visualize and analyze various file formats and data modalities. Further, it is continuously developed by the research community. The extension manager contains more than 150 extensions. 3D Slicer is an open-source software platform for medical image informatics, image processing and three-dimensional visualization. In GCP, it can be deployed as part of custom computing or visualization solutions to support medical and scientific research. GCP provides the infrastructure to host and process 3D Slicer workloads.

  • Medical imaging
  • 3D visualization
  • Open-source
  • Multi-modality support
  • Advanced analytics
  • Interactive tools
  • Extensible modules

3D Slicer

3D Slicer Parameters

Parameter Description Mandatory
VM Name Name of the virtual machine Yes
Region Region where the VM will be deployed Yes
Zone Zone within the selected region Yes
GPU Type Type of GPU attached to the VM Yes
GPU Count Number of GPUs attached to the VM Yes
Machine Type Type of machine instance Yes
Disk Name Name of the VM disk Yes
Boot Disk Image Image used for the boot disk Yes
Boot Disk Size Size of the boot disk (in GB) Yes
Boot Disk Type Type of boot disk (e.g., SSD, HDD) Yes
Customer-managed encryption key (CMEK) Encryption key managed by the customer No
Service Account Service account associated with the VM No

Access 3D Slicer

3D Slicer service is deployed as a Desktop service. To access the application, we need to RDP into the Windows Desktop on GCP and use the application. This can be accessed using IAP Desktop as mentioned here.

Default Configurations

  1. Mount Folder:

    • mount_folder: The folder where the service's data will be mounted is specified. By default, this is set to D:/, but it can be customized based on your requirements. Read how to mount GCS buckets to Windows Instances here.
    • KMS Key for Disk Encryption: The service will support encryption using a Key Management Service (KMS) key. The KMS key’s self-link can be specified for secure data encryption (kms_key_self_link). If no key is provided, a default configuration will be used.
  2. Tags:

    • allow-iap: This tag enables Identity-Aware Proxy (IAP), providing secure access to the 3D Slicer service, allowing only authorized users to connect.

References:

3D Slicer Documentation ⧉

TPU

A specialized hardware accelerator designed by Google for high-performance machine learning tasks. Tensor Processing Units (TPUs) are Google's custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. Cloud TPUs allow you to access TPUs from Compute Engine, Google Kubernetes Engine and Vertex AI.

  • High performance
  • Machine learning
  • Accelerated computing
  • TensorFlow optimized
  • Scalable
  • Low latency
  • Cost-efficient

Tensor Processing Units (TPUs)

TPU VM Parameters

Parameter Description Mandatory
VM Name Name of the TPU virtual machine Yes
Region Region where the TPU VM is located Yes
Zone Zone within the selected region Yes
Accelerator Type Type of TPU accelerator Yes
Runtime Version Version of the TPU runtime Yes
TPU VM Type Type of TPU VM No
Customer-managed encryption key (CMEK) Encryption key managed by the customer No
Service Account Service account associated with the TPU VM No

Default Configurations

  • Tags:

    • allow-iap: Enables secure access to your TPU instance using Identity-Aware Proxy (IAP), ensuring controlled access.
  • Private Instance:

    • By default no public IP is attached to the TPU instance to ensure instance is not accessible over the network.

References:

Cloud TPUs Overview ⧉

Vertex AI Workbench

This is a JupyterLab on a pre-configured virtual machine located inside Vertex AI. Ideal to get started within a few minutes right from the browser. Vertex AI Workbench is a tool provided by Google Cloud for developers and data scientists working to build machine learning models. It’s based on Jupyter notebooks, which are interactive documents that allow you to write code, run it and see the results, all in one place. Think of it as a workspace for data scientists and engineers to do their job easily. You can use Vertex AI Workbench to access various Google Cloud services, making it easier to build and deploy machine learning models without leaving the notebook environment.

  • Scalable
  • ML workflow automation
  • Custom environments

Vertex AI Workbench

Vertex AI Workbench Parameters

Parameter Description Mandatory
VM Name Name of the Vertex AI Workbench virtual machine Yes
Region Region where the Vertex AI WorkbenchVM is located Yes
Zone Zone within the selected region Yes
GPU Type Type of Vertex AI Workbench GPU Yes
GPU Count Count of GPU Yes
Machine Type Type of Vertex AI Workbench machine Yes
Boot Disk Size Size of the boot disk Yes
Boot Disk Type Type of the boot disk Yes
Customer-managed encryption key (CMEK) Encryption key managed by the customer No
Service Account Service account associated with the Vertex AI Workbench VM No

Default Configurations

Enable Required APIs:

  • The necessary APIs (notebooks.googleapis.com, aiplatform.googleapis.com) for Vertex AI Workbench will be enabled to ensure smooth operation and functionality of the service.

Network and IP Settings:

  • Disable Public IP: All the notebooks created by catalogues ensures that public IP access is disabled (disable public ip = true), so the instance will only be accessible through private networking for enhanced security.
  • Enable IP Forwarding: IP forwarding will be enabled (enable ip forwarding = true), allowing the instance to forward packets to other destinations, which can be useful for certain networking scenarios.
  • Proxy Access: Proxy access will remain enabled (disable proxy access = false), meaning that proxy services can still be used for accessing the instance if needed.

Network Tags:

  • The following tags will be assigned to the instance to help with identification, access control, and categorization:

    • allow-iap: This tag allows access via Identity-Aware Proxy (IAP), which adds an extra layer of security for SSH access.
    • deeplearning-vm: This tag indicates that the instance is configured for deep learning workloads.
    • notebook-instance: This tag designates the instance as a notebook for development and experimentation.

Visit our Developer Hub for learning tools and tips for working with Notebooks.

References:

Vertex AI Overview ⧉

SSH into Vertex AI Workbenches ⧉

Common Labels

In this section, we will walk you through the labels applied for all the services created from the Service Catalogues.

  • System Labels :

    • controlled-by: Specifies the team or department that controls the resources.
    • created-by: Identifies the user or system that created the resources.
    • goog-terraform-provisioned: Denotes that the resources was provisioned using Terraform.
    • os-patch: Tracks OS patch information relevant to the resources.
  • Workload Specific Labels : These labels will be customized according to your

    • contains-phi: Indicates that the resource may store Protected Health Information (PHI), ensuring that the data is treated accordingly.
    • data-classification: Marks the sensitivity level of the data stored (e. g., public, internal, restricted, confidential).

    Note: Workload Specific labels can be updated according to user workloads. We recommend to not update System labels for traceability. Users can add new labels to their buckets according to their requirements.