Connect Azure Blob

📘

This feature is only available for paid account tiers. Check Plans and Pricing for more information!

How to Connect Azure Blobs

Starting in your chosen project page, select the Dataset tab on the sidebar and select the option Connect to External Buckets at the top. You can then select Microsoft Azure from the dropdown list and click Proceed to start the process. Also, please be logged in to your Azure account so that you have all your Blob Storage information.

In the new tab Setup Microsoft Azure Integration, you should be able to see the five steps to go through in the process.

1. Blob Details

There are four items in this section:

  • Connection Name: This is an identifier for the connection between your Nexus project and your Blob Storage, and can be whatever you want it to be named.
  • Storage Account Name: This is the name of your Storage Account in Azure, which should follow Azure’s naming standards, such as no usage of special characters.
  • Container Name: This is the name of your Container in your Storage Account that holds your assets. It should also follow Azure’s naming standards.
  • Folder Prefix: This is an optional entry allowing you to choose specific subfolders or blobs in your Container for integration. This is for you to restrict Datature's access to only the folders that you want it to read. If left empty, Nexus will just use the data in the root folder of the Container. Ensure that the overall folder path is to a folder filled with the assets that you want to be read.

2. Azure Policy

In this section, Datature generates three snippets of information that you must execute in order. The first is a shell command to create a unique service principal, the second is the IAM Role Assignment unique identifier, and the third is the role assignment conditions.

az ad sp create --id <UNIQUE_ID_GENERATED_BY_NEXUS>

You will need to run this command using the Azure CLI. For more information on how to install the CLI and authenticate your Azure account, please refer to the Azure CLI documentation. Upon running the shell command, you should see a JSON output similar to following:

{
  "@odata.context": "https://graph.microsoft.com/v1.0/$metadata#servicePrincipals/$entity",
  "accountEnabled": true,
  "addIns": [],
  "alternativeNames": [],
  "appDescription": null,
  "appDisplayName": "datatureconnector-c0633659-a636-40e2-812b-d5f98d1af592",
  "appId": "994d4d51-b1fa-42fa-a4ce-008c4617bde1",
  "appOwnerOrganizationId": "50b5220a-29e1-40ce-8d95-5798afb3b6e5",
  "appRoleAssignmentRequired": false,
  "appRoles": [],
  "applicationTemplateId": null,
  "createdDateTime": null,
  "deletedDateTime": null,
  "description": null,
  "disabledByMicrosoftStatus": null,
  "displayName": "datatureconnector-c0633659-a636-40e2-812b-d5f98d1af592",
  "homepage": null,
  "id": "e63d1c8f-07b0-4ad2-8cba-2e00bc6136e0",
  "info": {
    "logoUrl": null,
    "marketingUrl": null,
    "privacyStatementUrl": null,
    "supportUrl": null,
    "termsOfServiceUrl": null
  },
  "keyCredentials": [],
  "loginUrl": null,
  "logoutUrl": null,
  "notes": null,
  "notificationEmailAddresses": [],
  "oauth2PermissionScopes": [],
  "passwordCredentials": [],
  "preferredSingleSignOnMode": null,
  "preferredTokenSigningKeyThumbprint": null,
  "replyUrls": [],
  "resourceSpecificApplicationPermissions": [],
  "samlSingleSignOnSettings": null,
  "servicePrincipalNames": ["994d4d51-b1fa-42fa-a4ce-008c4617bde1"],
  "servicePrincipalType": "Application",
  "signInAudience": "AzureADMultipleOrgs",
  "tags": [],
  "tokenEncryptionKeyId": null,
  "verifiedPublisher": {
    "addedDateTime": null,
    "displayName": null,
    "verifiedPublisherId": null
  }
}

Then, head over to the Azure website under your account and go to <YOUR_STORAGE_ACCOUNT_NAME> → Access Control (IAM) on the sidebar. In the Access Control (IAM) Dashboard, click Add → Add Role Assignment to create a new role assignment. In the Role tab, scroll down until you see the Storage Blob Data Reader role. Select the role and click Next.

🚧

Any Azure-related quotas are handled by Azure so issues on that end must be handled by you.

In the Members tab, click on Select members. In the Select field, copy and paste the IAM Role Assignment unique identifier from Nexus and click on the option that pops up. Press Select to confirm the member and click Next to proceed.

datatureconnector-c0633659-a636-40e2-812b-d5f98d1af592

In the Conditions tab, click on Add conditions to add a new condition to your role assignment. Select Code as the editor type, copy and paste the JSON condition from Nexus, and click Save.

(
  (
    !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'})
  )
  OR 
  (
    @Resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEquals '<YOUR_CONTAINER_NAME>'
  )
)

Finally, you can click on Review + assign to confirm and create your role assignment. You can click on the Role assignments tab to verify that your new role assignment exists.

3. (Optional) Resource Sharing for Video Assets

If your Blob Storage contains videos, you will need to set resource sharing (CORS) rules to enable Nexus to access your video assets. Simply head over to the Resource sharing (CORS) section under Settings in the sidebar. In the Blob service tab, add the following values to the fields as shown in the table below, and click Save to commit your changes.

Allowed originsAllowed methods
https://nexus.datature.io/GET

4. Blob Storage Connection

You can now complete the Blob Storage connection. Do note that the Azure changes could take up to 5 minutes to take effect. Hence, it will be normal if you face a broken heart with text saying that there are storage connection issues. You can click on Retry to refresh the storage connection. However, if this issue persists after 5 minutes, please check that you have entered all credentials and names correctly.

🚧

Errors here do not necessarily imply that the incorrect information was entered at the second step, as this is the culminating connection step. You should check entered information in the first and third (if present) steps as well.

If the storage connection is successful, you will see a green heart with text saying that the storage connection is complete.

5. Sync Assets

Now that your Blob Storage is connected to Nexus, you can now choose whether you want to Sync Now or Sync Later. Note that you can always sync at any time after the connection has been made in Step 3. If you choose Sync Now, Nexus will begin to sync your asset metadata from your Blob Storage onto the platform.

📘

Syncs can take anywhere from 5 - 40 minutes depending on the number of assets you have in your chosen folder, so please be patient and give the platform that time to update and load all your assets so that they can be used on the rest of the platform.

Once the sync has completed, refresh your Dataset page to see your assets loaded in from the Blob Storage!


How Azure Blob Connectivity Works and General Precautions

Azure Blob Connectivity Functionality

We do not hold any of your actual image or video data on our platform. Rather, we are reading in the image metadata from your storage and loading that information on our platform. Additionally, the access is read-only. Consequently, what this also means is that your Azure Blob is essentially the master dataset. Changes to your image dataset on our platform will not be reflected in your storage. If you have made changes in your storage and sync on our platform, the most recent changes will be reflected.

If you have connections to multiple Azure Blobs, syncing both blobs in the same Assets page will add both sets of images.

Azure Blob Connectivity Relating to Quota Usage

Every asset that is uploaded onto our platform via Azure Blob Connectivity still counts towards the image quota. Therefore, you should be careful to check your Usage Quota before syncing storages so as to allow for all your assets to be uploaded and utilized on our platform. To see what account tier would be most appropriate for your image quota, go to Plans and Pricing.

Asset Requirements

  • Synced images must now satisfy the following new criteria because we cannot strip EXIF tags from synced assets:

    • The asset must have no EXIF orientation tag, or the image must have an EXIF orientation of 1 (i.e., it is already in the upright position)
  • Any MP4 files are supported, but with the following restrictions:

    • major_brand or compatible_brand must include at least one of the supported brands: isom, iso2, mp41, mp42
    • Pixel format (pix_fmt): yuv420p
    • Video stream metadata should not contain any mentions of 4:4:4 or 4:2:2
    • Number of frames (nb_frames): Should be similar to r_frame_rate * time_base * duration_ts
    • [If Present] sample_aspect_ratio and SAR should either be 1:1 or any corresponding 1-to-1 ratio.

To check the above metadata in your videos, you can use ffprobe to retrieve the information:

# For Ubuntu-based systems
sudo apt-get install -y ffprobe
ffprobe <YOUR_VIDEO>.mp4

These restrictions ensure that the videos can play in all supported browsers in the annotator. Typical MP4 files should be able to meet these requirements.

Since we do not hold your video data on our platform, externally-synced video assets will not be modified in any way. This means that they will retain their original dimensions, quality, and audio, if any.


Common Questions

Why is my Blob Storage not connecting?

Please check that you have added the IAM Role Assignment to your Storage Account and not any Container or Blob within.

Why are my images/videos not syncing?

This issue is likely due to insufficient quota for the current month. There will be a limit on the number of images that you can upload per month, based on your current plan. For videos, the quota is calculated based on the number of frames. Your video may be rejected if the total number of frames exceed your remaining quota. Do check out your Usage Quota to monitor your monthly usage. If you would like to increase your quota, do consider upgrading your plan.