Azure Cognitive Search is a cloud-based search service by Microsoft. It empowers seamless integration of powerful search capabilities into applications. It offers AI-driven indexing, faceted navigation, and natural language processing. It scales dynamically, supports diverse data sources, and enhances user experiences with accurate, fast, and relevant search results.
Creating a Cognitive Search Service
- On the Azure Portal, search for Cognitive Search. Then click on 'Create Search Service'
Select the resource group and give a unique name as this will be the part of the Search URL
Click on Next, which will take us to the Scaling section. For this post, we will not look into the scaling part.
Click on Next, and then 'Create + Review'
Once the resource is created, head over to the newly created service.
Important parts of Search Service
The above Search Management options are most important options in setting up the Search Service. We will look at them one by one.
Data Sources
- In this section, we can setup the input data sources for the Search Service.
- There are multiple options we can use as Data Source.
- Blob Storage
- Data Lake
- Table
- Cosmos DB
- SQL Database
On selecting any option, it will ask us for setting up the connection to the respective resource. Specially for Cosmos DB and SQL Database, we need to provide a database query which the Search Service will execute on the database and fetch the data into Search Service.
Indexers
Once the data sources are created, we need to create Indexers. The main responsibility of Indexers is to get the data from data sources and feed it to the Search Index. It acts like a middle ware. Here, we can add Skillsets to the data like performing OCR, Entity Extraction etc.
We can set a schedule for Indexers which will ensure that the Indexers fetch the latest data from the data source at particular time intervals.
Once the basic details are filled, we can scroll and set the schedule.
Enabling the Base64 encoding ensure that if there are any special characters in the Key like '.' etc., this characters will get encoded so that this keys will not cause any issue in the URL.
Enable Incremental Enrichment - This option will ensure that if any skillsets or AI Processing is applicable to the resource, the enrichments will be cached so that enrichments are only applicable to new documents. This enrichments are stored in Azure storage.
Default Batch size - The number of documents which are to be processed by the indexer at a time, by default this value is dependent on the data source.
Indexer Cache Location - The storage where the enrichments are to be stored
Max Failed Items - Threshold to set which can define a max number of documents that can fail during indexing and still the indexing can be termed as successful.
Max Failed Items per Batch - Number of items that can be fail.
Managed identity authentication - If the Search Service has to be authenticated using Managed Identity, then this option is helpful.
Index
Indexes are the most important part, where the search queries are executed.
In we think about the flow:
- Data sources establish connection to read the data
- Indexers fetch the data, clean it, send it to the index
- Index gets the data from Indexers, and we can decide which fields should be Searchable, Filterable, Sortable, Facetable.
While creating the index, we can create the fields which are be included in the index, and set the properties of the fields.
Flow
Aliases
As the index name is included in the URL when a Search query is executed, we can give a different alias to the Search index so that the alias can be included in the query.
Top comments (0)