I run a web resource in which we discuss Japanese reading material. Discussions are mostly in English, but of course contain Japanese words and phrases under discussion. As a result, much of the text content that I want to search over using Azure AI Search contains a mixture of Japanese and English, within the same field. For example, a user might ask a question like: What does the へと mean in a sentence like スーパーへと走った? I see in the Azure AI documentation that the case of having different fields in different languages is handled by attaching language-specific analyzers to each field. For example, a French language analyzer would be assigned to the French product description and an English language analyzer would be assigned to the English product description. But I don't see anything that talks about intelligently handling a mixture of languages within the same field. The Standard Lucene analyzer does a so-so job, but one challenge with Japanese is that many words can be represented in multiple ways. For example, in the sentence above, a user who is unfamiliar with kanji might write the last word はしった instead of 走った. The Lucene standard analyzer isn't smart enough to handle this.
The solution suggested is to use Microsoft's English analyzer which can handle inflected and irregular word forms much better. For fields that contain a mixture of Japanese and English, a single field with the Microsoft English analyzer can be used. An alternative approach is to create separate fields with language-specific analyzers and then use the Text Translation cognitive skill to translate the text for each field.
Azure AI Search API/SDK analyzer attribute alternative 44496506
Configure and use multiple language analyzers on Azure Search 71952884
How to translate and update Azure Cognitive Search Index document for different language analyzer fields? 72084347
How do I add a char filter to a Microsoft language analyzer in Azure Search? 56886993
How to customize tokenization of numbers by the en.microsoft analyzer? 55149249
Azure Search: Is there support for conjugation in the French or any language analyzer? 43336339
Azure Search analyzer is not matching other word tense 40680445
Azure Search - issues with Phonetic Analyzer 45155778
Azure Cognitive Search standard Lucene analyzer wildcard and fuzzy search issues 61937261
Note: The provided code snippets are examples and may need to be adapted to the specific use case and requirements.