The Natural Language Search Quality Bar
TRIGGER
Users struggle with complex filter interfaces—figuring out which filter combinations to select requires learning the UI's taxonomy and manually translating their intent into the system's structure, creating friction for even simple queries.
APPROACH
Mercury's engineering team (led by staff engineer Matt Russell) built a natural language search feature for their banking app where users type queries like "Show me how much I spent at Chipotle in the last two years" and the system translates them into structured filters. Input: natural language query string. Output: filter combination matching the user's intent (merchant, date range, amount). Given Mercury's high quality bar for financial software, they deployed backend reinforcement systems to verify LLM-generated filters against query semantics, implemented fallback paths when confidence is low, and used slower rollout cadences than typical features. The feature entered beta testing after extensive internal validation.
PATTERN
“Users forgive mediocre AI writing but won't tolerate one wrong search result—they can instantly verify search outputs, so errors erode trust fast. For high-stakes domains, build verification and fallback paths before shipping, not after discovering edge cases in production.”
✓ WORKS WHEN
- Filter system has clear semantics that can be deterministically verified against the query
- Users have diverse, unpredictable ways of expressing the same filter intent
- Fallback to manual filter selection is acceptable when confidence is low
- Product has high quality bar where incorrect results damage trust (finance, healthcare)
✗ FAILS WHEN
- Filter options are ambiguous or overlapping (verification becomes impossible)
- Users expect the system to also interpret what they meant if their query was unclear
- Error tolerance is high and users will self-correct (consumer search vs. financial data)
- Query patterns are predictable enough that autocomplete or templates suffice