Soda Machines & Big Data

Con­sider the stan­dard soda machine. Typ­i­cally it will have 5 types of soda. (eg cola, diet cola, caf­feine free cola, diet caf­feine free cola, fla­vored soda (cherry eg). In most cases, it’s pretty sim­ple: each week it will be loaded with soda, emp­tied of coins, and put back into oper­a­tion. A savvy vend­ing machine oper­a­tor would check which kinds of soda are sold out more often (maybe the cherry fla­vored soda) and restock these more often or even remove a slow sell­ing soda (caf­feine free soda) and replace it with another fla­vored soda. This wouldn’t require big data and it would be pos­si­ble to cre­ate a sim­ple col­umn graph as fol­lows to track soda sales:

Week1

Week2

Week3

Week4

Total

Cola

10

10

12

9

41

Diet Cola

12

15

12

12

51

Caf­feine Free Cola

3

2

1

3

9

Caf­feine Free Diet Cola

10

12

8

10

40

Orange Soda

20

22

25

21

88

Total

55

61

58

55

229

An eas­ier way to view the data would be as per­cent­ages of the total sodas sold each week:

Week1 Week2 Week3 Week4 Total
Cola

18%

16%

21%

16%

18%

Diet Cola

22%

25%

21%

22%

22%

Caf­feine Free Cola

5%

3%

2%

5%

4%

Caf­feine Free Diet Cola

18%

20%

14%

18%

17%

Orange Soda

36%

36%

43%

38%

38%

Rel­a­tively straight­for­ward to grasp the main point that most peo­ple pre­fer Orange soda and few like Caffeine-​free Cola. (thought that’s not true of caf­feine free diet cola) . The ven­dor could adjust the mix of sodas in the machine based on this sim­ple table.

Now, con­sider a new type of soda machine that starts out with the same basic sodas but allows the cus­tomer to add fla­vor­ings to them: vanilla, cherry vanilla, orange, rasp­berry, lime, lemon, coconut,lemon-lime, pineap­ple. To sim­plify, assume the cus­tomer can get an entire glass of soda with only a base soda fla­vor and one addi­tional fla­vor (or none if the cus­tomer prefers). This means the same soda machine can pro­duce 50 types of sodas on a given day. Let’s also imag­ine it can track time and day, and num­bers of glasses of each fla­vor. (since it needs a com­puter to han­dle the spe­cific mix­tures per drink.

With this, the data increases in quan­tity: 50 types of soda (cola, vanilla cola, cherry vanilla cola), and instead of a man­ual count each week, we have an hourly count each day. While the “tra­di­tional” soda machine gives us 20 pieces of data to ana­lyze a month, the “new” machine gives us 33,600 pieces of data a month — quite a jump for a soda machine. This begins to look like a big data prob­lem, where we have the fol­low­ing issues:

  • How do we man­age the data — col­lect, store, and make the data available?
  • How to we make it use­ful and action­able — for prod­uct deci­sions or mar­ket­ing or under­stand­ing seg­ments and trends?
  • How to com­bine it with other data sets (eg loca­tion, type of cus­tomer, type of store or out­let) which increases the com­plex­ity, but also usefulness?
  • How do we find deep insights and not get over­whelmed with the sea of data?
  • What hap­pens if we push it to the extreme — instead of 10 fla­vors, there are hun­dreds or thou­sands? How do we find mean­ing in that?

Part of the rea­son for this blog is to explore these issues in some depth and think of some pos­si­ble solu­tions and ways to man­age the flood of data and make it use­ful. In the next few arti­cles, I’ll focus on aspects of this and exper­i­ment with ways we can man­age big data prob­lems and get some mean­ing out of them.

Soda machines and big data (xlx file)
Podcast

One Comment

Leave a Reply

Your email is never shared.Required fields are marked *