You might have read recently that we added merchant auto completion to some parts of our product.
This “simple” addition required some thinking on how people are using our product and what would be the best experience for them.
Like every product change, it is crucial to define how important this feature is and how important the flow supported is. Editing the merchant name of an expense is part of the expense editing flow. While important, this is kind of a secondary flow for us. We are optimizing the usage of our SmartScan and bank import technologies to automate the process of creating expenses in an user account. By using these two technologies in conjunction with Expense Rules, users can automate the vast majority of the expense reporting experience.
Despite that however, the ability to manually change a merchant name is still necessary because:
- SmartScan can fail
- The name of the transaction in the bank account might be terrible
- The user simply needs to create a cash expense.
Having auto completion for this step is not mandatory, but it can make the user experience much smoother. Details like this make the difference between a great experience and an amazing one.
Prepping the Build
With all that in mind, we can only spend a few hours building this auto completion mechanism. Why? We are a small and dynamic team, and we need to aggressively prioritize our development hours to projects with the highest ROI.
The first question we asked ourselves regarding auto completion is the origin of the data. Where does the merchant we are suggesting come from? Are they merchants the user has previously used? Or are we picking our suggestions from all merchants in our system?
1. Using Data From User History vs. All Data History
Basing auto completion on a user’s past expenses runs into issues quickly as a new user with a brand new account. With no historical data, it would be very difficult to find merchants for auto complete. On the other hand, if we pick from all merchants in the existing Expensify database, there is a much higher chance that someone else has bought something from this merchant, so we can easily find that merchant and offer it as a suggestion.
2. Variations – What Should We Do?
Expensify’s database holds millions of different merchants from all over the world and in every business area imaginable. Sadly, this amazing source of data can not be used directly, because for each merchant, there can be a dozen variations due to inconsistent cases, typos, location, or other extra information.
Let’s take the Hilton hotels as an example. We have “Hilton”, “Hilton hotel”, “Hilton hotel San Francisco”, “Helton”, and possibly many other variations. This level of variation can be due to user input but mostly comes from our bank import technology, which allows users to automatically import their transactions coming from their bank account and into Expensify. The same merchant can show up completely different depending on the bank; for example, you might have noticed that merchant names in your bank statements are not exactly nice or accurate, to say the least. We have a few ways to “clean up” the data imported, but the number of possible variations makes it impossible to be perfect.
3. Solving the Variations Issue
Why is this diversity is a problem? Well, if the user starts to type “Hilton,” the suggestions he could get include “Hilton”, “Hilton hotel,” “Hilton hotel San Francisco,” or “Hilton New York,” which are in fact the same exact merchant. What if the user was looking for a merchant called “Hilton BBQ”? It could have appeared immediately if the hotel suggestions were collapsed into one.
To solve that issue, we could just clean our data set by removing the extra suggestion and keep only “Hilton” for the hotel. Problem, we would need to review every single merchant in our database to check if we want to offer a suggestion with it or not. What we can do instead is to build a smarter algorithm that weights the results. The autocompletion will not just be based on the merchant that starts with the same letters, but it will also take into account the frequency/popularity of the merchant. As a result, the most popular merchant will be shown first, and the more letters the user types, the more precise the suggestions become.
In our current implementation, we are using a combination of both: we have extracted and cleaned merchants and we are using their frequency for the suggestions.
Building Out the Feature
With all these parameters, how did we actually build the auto-categorization feature?
So instead, we need an API around that. The API has to be extremely fast to allow us to autocomplete as people are typing. It has to scale with our spiky traffic, should not wake our ops team at night but still be up all the time, etc. To do this, we decided to use Constructor.io, a brand new service offering autocompletion as a service. We send them our data set, and in exchange we get a purely functional, worry-free, API endpoint to get our autocomplete suggestion. And because autocompletion is their business, they offer a feature that we would not have thought such as typo handling.