[RFC] What filtered search algorithms should DiskANN support?#1128
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an RFC documenting an empirical evaluation of DiskANN filtered-search algorithm variants, with a recommendation for which algorithms the repo should support going forward.
Changes:
- Introduces a new RFC describing existing and proposed filtered-search algorithms (inline, beta, multi-hop, two-queue, adaptive-L).
- Records benchmark results on two datasets (Caselaw and YFCC) with accompanying plots.
- Proposes a path forward: add inline (optionally adaptive-L), retain multi-hop, deprecate beta, and close two-queue.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1128 +/- ##
==========================================
+ Coverage 88.87% 89.45% +0.57%
==========================================
Files 485 484 -1
Lines 92112 91407 -705
==========================================
- Hits 81868 81767 -101
+ Misses 10244 9640 -604
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…soft/DiskANN into users/magdalen/filtered_search_rfc
|
Could this be in the wiki? The plots are a quite large addition to the repo. |
Do we want an RFC that points to benchmarks in the wiki then? Because folks were pretty clear that they wanted an RFC on filtered search. |
I have moved the benchmark discussion section to the DiskANN wiki: https://github.com/microsoft/DiskANN/wiki/Evaluation-of-Filtered-Search-Algorithms |
arrayka
left a comment
There was a problem hiding this comment.
LGTM. Approving, assuming the QPS and latency comparison between Adaptive L and the beta filter for disk search scenarios looks promising.
There was a problem hiding this comment.
Thanks for the great work. The RFC recommendations around deprecating beta filter, adding inline filtered search, and adding adaptive L search make sense to me.
One follow up implementation question is how we should expose these similar and overlapping algorithms at the right level of granularity, so clients can reuse them through providers, with or without their own query planners, depending on their use cases.
This PR implements the recommendation in the [filtered search RFC](#1128) to implement inline filtering with the adaptive L method as an optional addition. --------- Co-authored-by: qingcha chen <qinchen@microsoft.com> Co-authored-by: Magdalen Manohar <magdalen@magdalen.localdomain> Co-authored-by: Magdalen Manohar <mmanohar@microsoft.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Mark Hildebrand <hildebrandmw@gmail.com> Co-authored-by: Mark Hildebrand <mhildebrand@microsoft.com>
This PR adds an RFC on which filtered search algorithms the DiskANN repo should support.