20. File Search Query Language
- Status: accepted
- Deciders: @butonic, @micbar, @dragotin, @c0rby, @kulmann, @felix-schwarz, @JammingBen
- Date: 2023-06-23
From the users perspective, the interface to search is just a single form field where the user enters one or more search terms. The minimum expectation is that the search returns file names and links to files that:
- have a file name that contains at least one of the search terms
- contain at least one of the search terms in the file contents
- have metadata that is equal or contains one of the search terms
- The standard user should not be bothered by a query syntax
- The power user should also be able to narrow his search with an efficient and flexible syntax
- We need to consider different backend technologies which we need to access through an abstraction layer
- Using different indexing systems should lead to a slightly different feature set without changing the syntax completely
- KQL - Keyword Query Language
- Simple Query
- Lucene Query Language
- Solr Query Language
- Elasticsearch Query Language
Chosen option: KQL - Keyword Query Language, because it enables advanced search across all platforms.
- We can use the same query language in all clients
- We need to build and maintain a backend connector
The Keyword Query Language (KQL) is used by Microsoft Share Point and other Microsoft Services. It uses very simple query elements, property restrictions and operators.
- Good, because we can fulfill all our current needs
- Good, because it is very similar to the used query language in iOS
- Good, because it supports date time keywords like “today”, “this week” and more
- Good, because it can be easily extended to use “shortcuts” for eg. document types like
:presentation
which combine multiple mime types. - Good, because it is successfully implemented and used in similar use cases
- Good, because it gives our clients the freedom to always use the same query language across all platforms
- Good, because Microsoft Graph API is using it, we will have an easy transition in the future
- Bad, because we need to build and maintain a connector to different search backends (bleve, elasticsearch or others)
Implement a very simple search approach: Return all files which contain at least one of the keywords in their name, path, alias or selected metadata.
- Good, because that covers 80% of the users needs
- Good, because it is very straightforward
- Good, because it is a suitable solution for GA
- Bad, because it is below the industry standard
- Bad, because it only provides one search query
The Lucene Query Parser syntax supports advanced queries like term, phrase, wildcard, fuzzy search, proximity search, regular expressions, boosting, boolean operators and grouping. It is a well known query syntax used by the Apache Lucene Project. Popular Platforms like Wikipedia are using Lucene or Solr, which is the successor of Lucene
- Good, because it is a well documented and powerful syntax
- Good, because it is very close to the Elasticsearch and the Solr syntax which enhances compatibility
- Bad, because there is no powerful and well tested query parser for golang available
- Bad, because it adds complexity and fulfilling all the different query use-cases can be an “uphill battle”
Solr is highly reliable, scalable and fault-tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world’s largest internet sites.
- Good, because it is a well documented and powerful syntax
- Good, because it is very close to the Elasticsearch and the Lucene syntax which enhances compatibility
- Good, because it has a strong community with large resources and knowledge
- Bad, because it adds complexity and fulfilling all the different query use-cases can be an “uphill battle”
Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses. It is able to combine multiple query types into compound queries. It is also a successor of Solr.
- Good, because it is a well documented and powerful syntax
- Good, because it is very close to the Elasticsearch and the Solr syntax which enhances compatibility
- Good, because there is a stable and well tested go client which brings a query builder
- Good, because it could be used as the query language which supports different search backends by just implementing what is needed for our use-case
- Bad, because it adds complexity and fulfilling all the different query use-cases can be an “uphill battle”