Using Integrated Vectorization in Azure AI Search – baeke.info

[ad_1]

The vector search capability of Azure AI Search became generally available mid November 2023. With that release, the developer is responsible for creating embeddings and storing them in a vector field in the index.

However, Microsoft also released integrated vectorization in preview. Integrated vectorization is useful in two ways:

You can define a vectorizer in the index schema. It can be used to automatically convert a query to a vector. This is useful in the Search Explorer in the portal but can also be used programmatically.
You can use an embedding skill for your indexer that automatically vectorizes index fields for you.

First, let’s look at defining a vectorizer in the index definition and using it in the portal for search.

Table of Contents

Vector search in the portal

Below is a screenshot of an index with a title and a titleVector field. The index stores information about movies:

The integrated vectorizer is defined in the Vector profiles section:

When you add the profile, you configure the algorithm and vectorizer. The vectorizer simply points to an embedding model in Azure OpenAI. For example:

Note: it’s recommended to use managed identity

Now, from JSON View in Search Explorer, you can perform a vector search. If you see a search field at the top, you can remove that. It’s for full-text search.

Above, the query commencement is converted to a vector by the integrated vectorizer. The vector search comes up with Inception as the first match. I am not sure if you would want to search for movies this way but it proves the point. 😛

Using an embedding skill during indexing

Suppose you have several JSON documents about movies. Below is one example:

{
    "title": "Inception",
    "year": 2010,
    "director": "Christopher Nolan",
    "genre": ["Action", "Adventure", "Sci-Fi"],
    "starring": ["Leonardo DiCaprio", "Joseph Gordon-Levitt", "Ellen Page"],
    "imdb_rating": 8.8
  }

When you have a bunch of these files in Azure Blob Storage, you can use the Import Data wizard to construct an index from these files.

This wizard, at the time of writing, does not create vectors for you. There is another wizard, Import and vectorize data, but it will treat the JSON as any document and store it in a content field. A vector is created from the content field.

We will stick to the first wizard. It will do several things:

create a data source to access the JSON documents in an Azure Storage Account container
infer the schema from the JSON files
propose an index definition that you can alter
create an indexer that indexes the documents on the schedule that you set
add skills like entity extraction; select a simple skill here like translation so you are sure there will be a skillset that the indexer will use

In the step to customize the index definition, ensure you make fields searchable and retrievable as needed. In addition, define a vector field. In my case, I created a titleVector field:

titleVector

When the wizard is finished, the indexer will run and populate the index. Of course, the titleVector field will be empty because there is no process in place that calculates the vectors during indexing.

Let’s fix that. In Skillsets, go the the skillset created by the wizard and click it.

Replace the Skillset JSON definition with the content below and change resourceUri, apiKey and deploymentId as needed. You can also add the embedding skill to the existing array of skills if you want to keep them.

{
  "@odata.context": "https://acs-geba.search.windows.net/$metadata#skillsets/$entity",
  "@odata.etag": "\"0x8DBF01523E9A94D\"",
  "name": "azureblob-skillset",
  "description": "Skillset created from the portal. skillsetName: azureblob-skillset; contentField: title; enrichmentGranularity: document; knowledgeStoreStorageAccount: ;",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "embed",
      "description": null,
      "context": "/document",
      "resourceUri": "https://OPENAI_INSTANCE.openai.azure.com",
      "apiKey": "AZURE_OPENAI_KEY",
      "deploymentId": "EMBEDDING_MODEL",
      "inputs": [
        {
          "name": "text",
          "source": "/document/title"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "titleVector"
        }
      ],
      "authIdentity": null
    }
  ],
  "cognitiveServices": null,
  "knowledgeStore": null,
  "indexProjections": null,
  "encryptionKey": null
}

Above, we want to embed the title field in our document and create a vector for it. The context is set to /document which means that this skill is executed for each document once.

Now save the skillset. This skill on its own will create the vectors but will not save them in the index. You need to update the indexer to write the vector to a field.

Let’s navigate to the indexer:

Click the indexer and go to the Indexer Definition (JSON) tab. Ensure you have an outputFieldMappings section like below:

{
  "@odata.context": "https://acs-geba.search.windows.net/$metadata#indexers/$entity",
  "@odata.etag": "\"0x8DBF01561D9E97F\"",
  "name": "movies-indexer",
  "description": "",
  "dataSourceName": "movies",
  "skillsetName": "azureblob-skillset",
  "targetIndexName": "movies-index",
  "disabled": null,
  "schedule": null,
  "parameters": {
    "batchSize": null,
    "maxFailedItems": 0,
    "maxFailedItemsPerBatch": 0,
    "base64EncodeKeys": null,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "parsingMode": "json"
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "metadata_storage_path",
      "mappingFunction": {
        "name": "base64Encode",
        "parameters": null
      }
    }
  ],
  "outputFieldMappings": [
    {
      "sourceFieldName": "/document/titleVector",
      "targetFieldName": "titleVector"
    }
  ],
  "cache": null,
  "encryptionKey": null
}

Above, we map the titleVector enrichment (think of it as something temporary during indexing) to the real titleVector field in the index.

Reset and run the indexer

Reset the indexer so it will index all documents again:

Next, click the Run button to start the indexing process. When it finishes, do a search with Search Explorer and check that there are vectors in the titleVector field. It’s an array of 1536 floating point numbers.

Conclusion

Integrated vectorization is a welcome extra feature in Azure AI Search. Using it in searches is very easy, especially in the portal.

Using the embedding skill is a bit harder, because you need to work with skillset and indexer definitions in JSON and you have to know exactly what you have to add. But once you get it right, the indexer does all the vectorization work for you.

[ad_2]

Using Integrated Vectorization in Azure AI Search – baeke.info

Vector search in the portal

Using an embedding skill during indexing

Reset and run the indexer

Conclusion

Like this:

Related

Projesh Kar

Leave a Comment Cancel reply

Product Highlight

Recent Posts

5 Ways to Find Scholarships and Grants for Grad School in 2025

MPR – MPR Australia

RYM – Ryman Healthcare | Aussie Stock Forums

What’s Coming in the 2025 Release

She Pushed To Overturn Trump’s Loss In The 2020 Election. Now She’ll Help Oversee U.S. Election Security.

Speeding Up Development and Reducing Costs (2025–2030)

South Carolina individual kidnapped, forced to withdraw money from ATM

Credit scores fall year over year, more borrowers miss payments

STOCK TIPS FOR SEP. 17 2025

Kirkland & Ellis Partner Gets Pierced in Court to Seal Claire’s Rescue Kirkland & Ellis Partner Gets Pierced in Court to Seal Claire’s Rescue –

Vector search in the portal

Using an embedding skill during indexing

Reset and run the indexer

Conclusion

Share this:

Like this:

Related

Projesh Kar

Leave a Comment Cancel reply

Product Highlight

Recent Posts