How do you classify something like the Chevy El Camino? Is it a truck? pickup truck? car? or something in-between?
When my son was two years old he used to call pickup trucks “George trucks” after his Uncle George who drove a pickup truck at the time. One day we saw a Chevy El Camino and I asked him “George Truck”. He said quickly “No, not George truck”.
That got me to wondering how he could look at different sizes and configurations of pickup trucks (with cap, without cap, with tailgate, without, two seater, four seater) and identify them as “pickups”, while the El Camino was not a pickup. The Aristotelian view is that there is an essential quality of pickup (a light truck with a cab and a bed and possibly a tailgate), while an El Camino would not qualify since it is a car with a bed and tailgate. But one could also employ fuzzy logic — there are degrees of being a pickup truck and while a pickup would = 1, an El Camino would be a partial member, perhaps = 0.5. Another way to look at it is using Venn diagrams, where the El Camino would be at the intersection of Pickups and Cars.
In a taxonomy the solutions could be to
- have only two categories: car and truck, and include the El Camino under car (since it has a car body and is typically licensed as a car)
- have three categories (car, truck, hybrid)
- have three categories (car, truck, utility vehicle) and then have a couple of categories below utility vehicle: sport utility vehicle (SUV) and coupe utility vehicle.
Relating to data and analytics, how we group and categorize data has a lot to do with the meaning we derive from the data set. It’s an often hidden assumption in data analysis and choosing the right categorization impacts inferences. Sometimes it’s useful to use multiple categorization schemes to get different perspectives on the data set.