11/30/2017 Update: We have learned more information since we originally published this article. Please see the bottom for updates.
Over the Thanksgiving break we’ve received a bunch of questions about how exactly SmartScan works and whether it uses Amazon Mechanical Turk, so I wanted to respond directly with the facts, starting with putting some concerns to rest:
What Didn’t Happen
There was no breach of data. No user data was seen by anybody who hasn’t accepted a binding and enforceable confidentiality agreement. No paying customer after 2013 has had their receipt processed by a Mechanical Turk worker.
What Did Happen
Here’s a detailed breakdown of the company’s history with Amazon Mechanical Turk, from the start to today:
- 2009 – Expensify launches SmartScan, initially built with Amazon Mechanical Turk.
- 2012 – Expensify moves away from Mechanical Turk and begins using a private workforce of non-Turk SmartScan agents.
- Sep 20, 2017 – Expensify begins live testing of Private SmartScan, a system that allows large companies with complex compliance requirements to use the Mechanical Turk platform with their own finance team. This initial testing only uses Expensify employee receipts.
- Sep 26, 2017 – To simplify obtaining product feedback, Expensify temporarily restricts access to a hand-picked selection of our most seasoned SmartScan agents who signed up for Turk accounts to test.
- Nov 15, 2017 – We begin processing 10% of non-paying user receipts that require human review using the Mechanical Turk platform, but with access still limited to only our private SmartScan agents (who signed up for a Turk account and were approved).
- Nov 22 at 9:26am, 2017 – Access to the jobs is restored to vetted Turk workers, alongside our existing private SmartScan agents.
- Nov 23 (Thanksgiving Day) at 9:54am, 2017 – In response to community concern we stop the test and return all volume to our private SmartScan agents.
What This Means
In short, less than 0.00004% of users — none of whom are paying customers — had a receipt processed by a vetted Turk worker rather than one of our own private SmartScan agents.
Why This Caused Concern
I think the main concern was simply due to us not providing the information above in a timely fashion. The company was away with our families and trying hard to be responsive, while also making the most of a rare opportunity to be with our loved ones. Accordingly, this vacuum of information provided by the company was filled with a variety of well-intentioned but inaccurate theories that generated a bunch of compounding, exaggerated fears. As a family-friendly business we try hard to separate work life from home life, and in this case that separation came at a substantial cost.
With the above timeline, however, I believe the concern would relate to the 24 hour period where some non-paying users’ receipt data was processed by vetted Turk workers. In particular, I think the term “vetted Turk worker” is misunderstood, which is the source of most of the concern and confusion.
How Amazon Vets Mechanical Turk Workers
I think there is a general perception that Amazon Mechanical Turk is a wild west where anybody can sign up and anonymously do work without any vetting, oversight, or controls. This couldn’t be further from the truth.
- To start, Mechanical Turk workers need to sign up using their personal Amazon accounts, and Amazon has made it clear that a long history of account activity is required, resulting in many workers’ applications being denied. Indeed, we experienced this problem first hand when many of the SmartScan agents we’ve worked with for years were rejected by Amazon due to a lack of purchase history.
- This purchase history requirement might seem a small detail, but consider what it means: that Amazon has a record of not just billing but also shipping addresses, authenticated by live credit card purchases. Faking that would be an extraordinary effort requiring years of planning.
- But that is only the start. After that you must also connect a bank account for reimbursement — which makes sense, as workers need some way to get paid. But given that Turk workers are not Amazon employees, this makes them independent contractors — and by facilitating payments between “requesters” (eg, us) and “workers”, that makes Amazon Mechanical Turk a payment processor. This is a heavily regulated environment that Expensify knows well, because we are in the same boat — we facilitate reimbursements from employers to employees.
- That regulation is significant not just because it’s a burden, but because that burden requires the processor to do a series of checks on everybody involved. This starts with a “Know Your Customer / Customer Identification Program” (KYC/CIP) check that just verifies all of the identifying information matches up (and because Amazon has so much of it, this check is incredibly strong).
- Once authoritatively identified, you then perform a series of other checks under the umbrella of OFAC and Anti-Money Laundering (AML). These checks involve scanning criminal records for any sign of fraud or financial crime conviction, public records, international databases, and a whole range of sources to create a collective “trust” score — with anybody under a certain threshold being denied.
The upshot of all this is becoming a Turk worker isn’t hard for a normal person — it’s no surprise that most people can sign up and become Turk workers easily, because all those checks come up clear for nearly everybody.
But the important thing is these checks are extremely difficult to pass if there is any indication that you shouldn’t be trusted. Accordingly, the only workers eligible to process Expensify receipts were those who passed Amazon’s screening process — a process that is far more rigorous than people generally give it credit.
How Expensify Vets Mechanical Turk Workers
But we don’t just stop there. Once approved by Turk, then you enter our SmartScan system as a new agent. At this point we don’t know anything about your quality, so we begin testing you with sample receipts (eg, receipts that we know the answer to already). Failure to process them at high quality means you are banned from the system. Accordingly, the only way to continue to obtain access to more receipts is if you’ve correctly processed the historical receipts — a very tedious task that we know is quite difficult. Therefore, even after passing Turk’s screening, there is no way to simply “scroll through” a bunch of receipts without investing a continuous amount of heavy effort.
How Expensify Protects Users
To repeat, there was no breach here. From the moment you apply to become a Turk worker you are bound by its confidentiality clauses, and nobody outside of a confidentiality agreement has seen anything. But it begs the question: how enforceable are those clauses? The simple answer is: very.
This might be surprising given Amazon Mechanical Turk’s reputation as an anonymous wild west. But never forget: Amazon knows precisely who each worker is, to an incredible degree of confidence. And Expensify knows exactly which workers have seen which data. Accordingly, in any kind of a breach (eg, where user data is revealed to the public in violation of Amazon’s and Expensify’s agreements), it is a straightforward matter to look up the corresponding workerIDs.
Then what you might ask? Well, it depends on the circumstance. At the very least, we can block the worker instantly using an API call, ensuring they can’t see any additional data. (And because it’s so difficult to become a Turk worker in the first place, they can’t just create a new account and try again.) But in the event of a serious breach, we would simply ask Amazon to reveal their identities — or failing that, compel them to reveal the identities by asking a court to grant a subpoena to unseal them. Amazon has nothing to gain by hiding this information, and indeed will already reveal full identifying information when necessary to comply with the necessary tax filings. Nobody gains by keeping this a secret, so the more important it is to obtain their identity, the easier it is to obtain.
To be clear, we can keep diving deeper and deeper into more exotic scenarios — and yes, a Jason Bourne level spy could carefully curate a fake Amazon account with a fake identity over the course of years, after having already fooled a bank to issue him a fraudulent credit card under that fake identity and matching bank account to that fake address, and then carefully transcribe randomly issued receipts one at a time. But to what end? And this is where it’s important to bring some context to the discussion, as we’ve been approaching this entirely in the abstract.
Given the extremely high amount of time and effort it would take to gradually accumulate enough information for some unstated purpose, why would anybody actually do this? The amount of cash they are earning by transcribing our receipts would likely exceed whatever supposed black market value of selling the receipt data — especially since the market value of these images is effectively zero. The information on a receipt is already strictly controlled by the Payment Card Industry Data Security Standard (PCI-DSS), and is designed to provide nothing of value. That’s why receipts are so commonly thrown out — because they are literally garbage. Indeed, the fastest way to get a bunch of receipts is to just find a trash can in front of a merchant you care about, and then steal the trash. That is a far faster, less traceable way to get a much larger number of receipts far quicker. But we are comfortable throwing out our receipts precisely because we recognize there is no value in someone actually doing that.
But let’s get even more extreme and suggest that this person was patiently waiting for some specific data — and to bring it back to the original example that triggered a lot of this discussion, the idea of an Uber receipt containing personal data such as your home address.
Why would someone hell bent on getting random home address data use this particular method of doing it? Why not just become an Uber driver? Or even just become a pizza delivery person, or look in the phone book? Then you not only get a ton of actual personal addresses (for some still unstated purpose), without going through the enormous effort to become one of our identifiable agents, and without exposing yourself to the incredible risk of being caught by all of our extensive logging and worker tracking.
At the end of the day, SmartScan is an optional feature. It is an extremely powerful, useful, and convenient feature, but it does come with the recognition that sometimes, a human might see it. But we accept this in so many areas in our lives without hesitation, why the sudden concern here?
Indeed, anybody concerned by the real-world risks of a vetted, tested transcriptionist reading their Uber receipt should probably consider the vastly more immediate and life-threatening consequences of getting into that stranger’s car in the first place.
Life is not without risks. And our job isn’t to make perfect guarantees about the absence of risk — everybody knows that is impossible. Rather, our job is to make you aware of the risks — as infinitesimal as they might be — and put you in control of whether or not to take them and reap the incredible productivity and quality of life benefits they enable.
What Happens Next
Needless to say, we were caught off guard by the sheer scale of concern raised by dipping our toe into the Mechanical Turk waters after so many years of absence, and we found the water to be very, very cold. We have already pulled our toe back out and have no active Mechanical Turk jobs at this time. But in light of all this public concern we’re going to:
- Talk openly with any concerned party to understand the exact nature of the concern and resolve it. This is complicated by the tremendous amount of misinformation circulating about how Mechanical Turk works and the degree to which we’ve used it, so this blog post is the first step in hopefully mitigating that.
- Improve our documentation about how SmartScan works and what kinds of vetting we employ for human component. We have been open from the start that humans played a key role, but we haven’t given much more detail than that so we’ll correct that. Outsourced human transcription has been a staple in the enterprise since the dawn of expense management, but we should have recognized when bringing this enterprise feature to the masses that it was a mistake to assume everyone realized this.
- Keep scanning. This is a critical feature that is core to our real time functionality, and one of the primary reasons people choose Expensify. Any user is, of course, free to skip this feature or even disable it entirely in their accounts, but we stand by its safety and security and encourage you to keep using it.
This has been an exciting few days, and our small test has clearly struck a nerve. Please write firstname.lastname@example.org to share any concerns you have, and we will make sure they are all addressed as soon as we can. Thank you for your patience and understanding, and thank you for trusting SmartScan with your receipts. Yes, some human might look at them. But the whole point of the feature is that they are doing it so that you needn’t.
11/30/17 Update: Further internal investigation has confirmed that, for a period of 24 hours from Wednesday, November 22nd at 9:26am to Thursday, November 23rd at 9:54am (Thanksgiving), a misconfigured task on Amazon Mechanical Turk made it possible for a group of up to 208 receipts (comprised of both Expensify employee and free user receipts) to be viewed using the task preview function. Affected users have been alerted and notified which specific receipt of theirs was potentially viewable. In the spirit of full transparency, we’ll leave the original blog post unchanged to preserve our understanding of the situation at that moment in time.