Taco Model
A taco festival, personal rating, Yelp data, machine learning, and finding the best tacos.
A taco festival is a wonderful time to explore all kinds of tacos and make magical memories with people. Cherished memories are great, sure, but one can’t ignore that it’s a festival of critical data - taco data.
Many taco restaurants (or restaurants that serve tacos) were represented at our 2017 Taco Festival but many were not. My goal was to rank the tacos in terms of those I liked and those I did not like, and then use the comments people were leaving these restaurants on Yelp to train a model to predict my like/dislike of taco restaurants that were not at the event based on their comments on Yelp.
Why did I think this was reasonable? First, Yelp rating alone are trash garbage. Sometimes you see a Taco Bell with a 5 star rating because it has clean bathrooms and is near the highway. When I look at reviews of places that I like I notice there’s a different way in which people talk about them. They refer to a lot of specifics and compare that place to other places. The reviewer may have a tradition of coming to this place that they mention. There’s lots of tells but it’s hard to describe them.
Training
I began with the tacos I most liked from the festival and some I liked outside the festival:
- Maskadores Taco Shop
- Backyard Taco
- Mr Mesquite
- Julia’s Mesquite Mexican and Burgers
- Tacos Huicho
- Taco Guild
- El Chavo Mexican Restaurant
- Chico Malo
There were several from the festival I liked but that weren’t on Yelp (at the time), so I had to drop them from my analysis and now I forgot who they are. The next step was to construct my negative list:
- Filiberto’s Mexican Food
- Rubio’s
- Del Taco
- Taco Bell
- Chipotle Mexican Grill
- Ni De Aqui Ni De Alla
Yelp has an API and I was careful to make sure I didn’t break their rate limit. Using that I wrote a script to search a grid over the greater Phoenix area looking for Taco places. I found 373 such places. I did a simple bag-of-words model with the run-of-the-mill Porter stemmer for comments, but I looked at tuples of sizes one through three. Individual details about the business were included too, including the Yelp rating, the price, the number of reviews, and how the restaurant was tagged.
Model
I fit a random forest model to this large collection of features and was surprised by some of the things that came up as important. First, the unsurprising parts. A good taco restaurant had higher ratings and more reviews. They also had higher prices. I didn’t think of it at the time but I was really talking about street tacos and the token “street” came up as important. Two tokens really stood out to me as signs of a good place:
- Seeing “new favorite” suggests to me that a person is really exploring and evaluating.
- “[W]e came back” suggests that the person reviewed on the second visit becuase they really enjoyed the first and wanted to spread the word.
Reviews of good places had tokens that I think describe what the people were doing at the time. These are things like “shopping”, “house hunting”, or being “out with a friend”. These reviews tended to note the atmosphere of the place.
Additional tokens for the good places were:
- Travel
- Authentic
- Food was fresh
The taco places I ranked negatively had higher ranges of their ratings (lots more variability in score). They also might be tagged as Salad, Fast Food, Tex-Mex, or Seafood. Some of the tokens that were important for the bad restaurants were:
- Location
- Clean
- Night
- Cashier
One difference I found very interesting was in how people described their emotional response to the tacos. Good places had “great”, “wow”, and “amazing” but bad places had “love”, “good”, and “worst”. Finding “worst” for the bad places is not surprizing, but “love” certainly is surprising.
Another difference I found very interesting was the following. The token “never” was important for good places, but “always” and “usually” were important for bad places. I’m not sure why this would be - I imagine someone has never had some item and they tried it at a good place and they really liked it. Or perhaps they were just triyng a place they’d never been. I can’t imagine someone leaving a review for Taco Bell, “I had never been to the 32nd St. location but let me tell you…”
The combination of never with great, wowed, and amazed suggests to me that people are really feeling like they struck gold and discovered something. Indeed, good places are often described as a gem. That’s opposed to people getting what they always get and love from places that I don’t really like. Perhaps it’s tautological that, in looking for a new place that would really wow me, I find that the reviews of people who looked for new places and were wowed are most important.
Predictions
The model had some good hits and a weird hit with its predictions.
- Ta’Carbon had the highest predictive ranking and certainly earned it. It was really good.
- El Chavo was also very good.
- CRUjente was odd. They have a Koread Kimchi taco that was very good, but it’s not really a taco place in the same way as the others.
- Tortas Ahogadas Guadalajara - I didn’t get around to visiting this one.
- Isabel’s Amor - I didn’t get around to visiting this one.
- Joyride Taco House - This had a predictive ranking of 0.5765 and to be honest I don’t feel like there’s anything very special about Joyride, so I stopped at this point.
Some places that I love but haven’t had their tacos had a high predictive ranking but below that of Joyride, so an honorable mention goes to SumaMaya.
Extensions and Future Work
When I told my friends at work about the Taco model several became excited and demanded a pizza model. Pizza, it turns out, is far more difficult. First, there aren’t pizza festivals so I had to rely on Robert’s personal ratings of places. Second, there’s lots of ways for pizza to be good. New York style, Chicago style, Artisinal flatbread weirdness, it’s hard to pin down good. One place our early model picked out was full of rave reviews because it served gluten free pizza. It was objectively awful pizza, but probably the best pizza not containing gluten at the time.
Anyway I think I’m done with taco modeling for now. One of you deep learning nerds out there should make some incredible taco model to “make the world a beter place” or something. Edgar out!