Azure AI Search: What analyzer should I use for fields that contain a mixture of two languages?

Problem

I run a web resource in which we discuss Japanese reading material. Discussions are mostly in English, but of course contain Japanese words and phrases under discussion. As a result, much of the text content that I want to search over using Azure AI Search contains a mixture of Japanese and English, within the same field. For example, a user might ask a question like: What does the へと mean in a sentence like スーパーへと走った? I see in the Azure AI documentation that the case of having different fields in different languages is handled by attaching language-specific analyzers to each field. For example, a French language analyzer would be assigned to the French product description and an English language analyzer would be assigned to the English product description. But I don't see anything that talks about intelligently handling a mixture of languages within the same field. The Standard Lucene analyzer does a so-so job, but one challenge with Japanese is that many words can be represented in multiple ways. For example, in the sentence above, a user who is unfamiliar with kanji might write the last word はしった instead of 走った. The Lucene standard analyzer isn't smart enough to handle this.

Solution

The solution suggested is to use Microsoft's English analyzer which can handle inflected and irregular word forms much better. For fields that contain a mixture of Japanese and English, a single field with the Microsoft English analyzer can be used. An alternative approach is to create separate fields with language-specific analyzers and then use the Text Translation cognitive skill to translate the text for each field.

Code

Azure AI Search API/SDK analyzer attribute alternative 44496506

Configure and use multiple language analyzers on Azure Search 71952884

How to translate and update Azure Cognitive Search Index document for different language analyzer fields? 72084347

How do I add a char filter to a Microsoft language analyzer in Azure Search? 56886993

How to customize tokenization of numbers by the en.microsoft analyzer? 55149249

Azure Search: Is there support for conjugation in the French or any language analyzer? 43336339

Azure Search analyzer is not matching other word tense 40680445

Azure Search - issues with Phonetic Analyzer 45155778

Azure Cognitive Search standard Lucene analyzer wildcard and fuzzy search issues 61937261

Note: The provided code snippets are examples and may need to be adapted to the specific use case and requirements.

. . .

Recraft: Infinite AI Artboard

Premium image generation and editing tool. Store and share your own styles, create, fine-tune, upscale, and perfect your visuals.

APA7 citation generator. Citefast automatically formats citations in ...

Citefast is a FREE APA7 citation generator. Generate and manage your references, in-text citations and title pages in APA 7th edition.

Virus pop ups from Online YouTube converter website - Windows 10 ...

Mar 26, 2020 ... I downloaded the Online You Tube converter website and since then I am continually getting pop ups and it appears to have put a virus on my ...

Rabbit-Converter/Rabbit-PHP: Rabbit in PHP - GitHub

... converting the text , please use Parabaik. Installation. Install using composer: composer require "rabbit-converter/rabbit-php:dev-master". Usage. Rabbit ...

Can not remove Google Lens from Chrome - Google Chrome ...

Mar 2, 2022 ... I have chrome://flags/#enable-lens-region-search disabled but I am still seeing "Search image with Google Lens" when I right click an image.