Connect Google Cloud Storage

How to Connect Google Cloud Storage (GCS)

📘

This feature is only available for paid account tiers. Check Plans and Pricing for more information!

Starting in your chosen project page, select the Dataset tab on the sidebar and select the option Connect to External Buckets at the top. You can then select Google Cloud Storage from the dropdown list and click Proceed to start the process. Also, please be logged in to your Google Cloud account so that you have all your GCS information.

In the new tab Setup Google Cloud Storage Integration, you should be able to see the three steps to go through in the process.

1. Bucket Details

There are three items in this section:

  • Connection Name: This is an identifier for the connection between your Nexus project and your GCS, and can be whatever you want it to be named.
  • GCS Bucket Name: This is the name of your bucket in GCS, which should follow GCS' naming standards such as no use of special characters.
  • Folder Prefix: This is an optional entry allowing you to choose specific subfolders or blobs in your Container for integration. This is for you to restrict Datature's access to only the folders that you want it to read. If left empty, Nexus will just use the data in the root folder of the Container. Ensure that the overall folder path is to a folder filled with the assets that you want to be read.

2. GCS IAM Policy

In this section, Datature generates one command that you must copy into your GCS cloud shell. The command will only grant Datature the view permission of your GCS bucket.


gcloud storage buckets add-iam-policy-binding \
  gs://{bucket name} \
  --member=serviceAccount:[email protected] \
  --role=roles/storage.objectViewer

With this Command, go to the Google Cloud website under your account and click Activate Cloud Shell at the top of the Google Cloud console, a Cloud Shell session will open inside a new frame at the bottom of the Google Cloud console, paste and run the command in the CLOUD SHELL Terminal frame.

Then, you will need to update the CORS configuration. Create an empty file named CORS_CONFIG_FILE, and copy the following text into it:

# cat CORS_CONFIG_FILE
[
  {
    "maxAgeSeconds": 3600,
    "method": [
      "GET",
      "HEAD",
      "DELETE",
      "PUT",
      "POST"
    ],
    "origin": [
      "https://nexus.datature.io"
    ],
    "responseHeader": [
      "Content-Type",
      "Access-Control-Allow-Origin",
      "Content-Length",
      "Cache-Control",
      "x-goog-meta-houston-attempt",
      "x-goog-hash"
    ]
  }
]

Run the following command to update the CORS configuration to your bucket.

gcloud storage buckets update gs://{bucket name} --cors-file=CORS_CONFIG_FILE

3. Sync Assets

🚧

Errors here do not necessarily imply that the incorrect information was entered at the second step, as this is the culminating connection step. You should check entered information in the first and second (if present) steps as well.

Now that your Google Cloud Storage is connected to Nexus, you can now choose whether you want to Sync Now or Sync Later. Note that you can always sync at any time after the connection has been made in Step 2. If you choose Sync Now, Nexus will begin to sync your asset metadata from your GCS onto the platform.

📘

Syncs can take anywhere from 5 - 40 minutes depending on the number of assets you have in your chosen folder, so please be patient and give the platform that time to update and load all your assets so that they can be used on the rest of the platform.

Once the sync has been completed, refresh your Dataset page to see your assets loaded in from your GCS!


How Google Cloud Storage Connectivity Works and General Precautions

GCS Connectivity Functionality

We do not hold any of your actual image or video data on our platform. Rather, we are reading the image metadata from your storage and loading that information on our platform. Additionally, the access is read-only. Consequently, what this also means is that your Google Cloud Storage is essentially the master dataset. Changes to your image dataset on our platform will not be reflected in your storage. If you have made changes in your storage and sync on our platform, the most recent changes will be reflected.

GCS Connectivity Relating to Quota Usage

Every asset that is uploaded onto our platform via Google Cloud Storage Connectivity still counts towards the image quota. Therefore, you should be careful to check your Usage Quota before syncing storages so as to allow for all your assets to be uploaded and utilized on our platform. To see what account tier would be most appropriate for your image quota, go to Plans and Pricing.

Asset Requirements

  • Synced images must now satisfy the following new criteria because e cannot strip EXIF tags from synced assets:
    • The asset must have no EXIF orientation tag or the image must have an EXIF orientation of 1 (i.e., it is already in the upright position)
  • Any MP4 files are supported, but with the following two restrictions:
    • Major brand: mp42
    • Pixel format: yuv420p

These restrictions ensure that the videos can play in all supported browsers in the annotator. Typical MP4 files should be able to meet these requirements.

Since we do not hold your video data on our platform, externally-synced video assets will not be modified in any way. This means that they will retain their original dimensions, quality, and audio, if any.


Common Questions

Why are my images/videos not syncing?

This issue is likely due to insufficient quota for the current month. There will be a limit on the number of images that you can upload per month, based on your current plan. For videos, the quota is calculated based on the number of frames. Your video may be rejected if the total number of frames exceed your remaining quota. Do check out your Usage Quota to monitor your monthly usage. If you would like to increase your quota, do consider upgrading your plan.