HomeBusiness IntelligenceBallpark figures: Analyzing MLB baseball attendance

Ballpark figures: Analyzing MLB baseball attendance

It’s springtime within the U.S., which implies one thing as American as apple pie is again: baseball. And since there’s every kind of nice knowledge round one of many nation’s nice pastimes, we determined for this week’s submit to take a look at Main League Baseball (MLB) attendance statistics from the final 20 years, which is printed on many web sites together with the one we used to get the info you’ll discover within the charts under: ESPN.com.

To gather the attendance knowledge from ESPN, we used Jupyter Workspaces (at present in beta in Domo) and the Python package deal Stunning Soup to parse the HTML. And since Domo can now schedule code in Jupyter Workspaces to run on an everyday schedule, you may make sure that this web page will proceed to replace with the 2022 knowledge.

The very first thing you’ll in all probability discover when trying on the knowledge is that 2020 is lacking. That’s as a result of, because of the pandemic, baseball was performed with out followers that yr. There was a little bit of a return to normalcy in 2021, but it surely wasn’t till this season that every one spectating restrictions had been lifted, so will probably be attention-grabbing to observe how attendance rebounds (although, in full transparency, we solely have the info for full years proper now, so we aren’t capturing any knowledge associated to seasonality, equivalent to how climate or a crew’s place within the playoff race impacts ticket gross sales).

One good option to assessment this knowledge is with an outdated favourite of many knowledge scientists: a field and whisker plot. The chart reveals the minimal and most common attendance for every crew within the whiskers (the highest and backside traces). I’ve sorted this to indicate the crew with the very best peak attendance yr on the left, and the bottom on the precise:

The place the visualization will get extra attention-grabbing for me is with the field components. Every field reveals the house between twenty fifth and seventy fifth percentiles, which is supposed to mirror how a lot a crew’s attendance has swung over time. The larger bins inform me these groups (equivalent to Philadelphia and Detroit) have had some nice years for attendance and a few not so nice years. Smaller bins (equivalent to Boston) say {that a} crew has been very constant in its attendance numbers. We’ve additionally filtered the chart for pre-pandemic years solely since 2021 (and to a lesser extent partial 2022 knowledge) skews the info.

Another strategy to understanding how groups rank in attendance is to create indexes of the place a crew’s attendance stands relative to the full MLB common—which is what we’ve performed immediately under. Darkish blue bins imply {that a} crew is effectively above the typical, whereas darkish orange bins imply {that a} crew is effectively under the typical. You need to use the filters to take a look at no matter league, division, crew(s), or yr(s) you’re occupied with:

Lengthy-time Domo customers could also be these indexes and considering that I did some pre-calculation in a Magic ETL or a Dataset View. It’s true that doing calculations on such whole ranges sometimes require pre-calculation. But when I did that, it might be laborious to permit for the yr filter. So, the key is out: With Domo’s new FIXED beast modes (at present in beta), you are able to do FIXED stage of element features proper in a beast mode. For the above “Index to League Avg”, that is the calculation:

You possibly can see there are two issues taking place right here. First, when I’ve the SUM FIXED by League, then it’s summing throughout all values with the identical league because the row I’m on. That permits me to get that league whole we want for the denominator of the index. Second, it’s utilizing FILTER ALLOW to inform Domo that filters on 12 months can influence the FIXED features.  There are alternatives for FILTER ALLOW, FILTER DENY, and FILTER NONE.

Right here’s one final instance of how helpful the FIXED with FILTER DENY may be. The bar charts under are defaulted to the New York Yankees (my boss’ favourite crew). The primary chart will not be utilizing FIXED, so once I filter for the Yankees, the Min, Max, and Median fields grow to be meaningless since they get filtered to be the identical as the chosen crew. The second chart makes use of FIXED and DENY on crew identify in order that the Min, Max, and Median stay as references to the primary common, which is for the Yankees.

One of many issues I really like—and in addition at occasions discover maddening—about exploring new knowledge is that there’s all the time extra to discover. As I labored on this submit, I spotted that it might be fairly attention-grabbing to herald groups’ win/loss information in addition to info on stadium capability. However then I assumed: Let’s perhaps save that for a future submit.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments