Ohio Resource Center
[blank]

Hollywood Box Office



Estimated Time
One or two class periods plus research time outside of class
Prerequisites
  • Knowledge of the measures of center and various data displays is helpful, but these can be covered in the lesson, thus providing a reason to learn these concepts.
Materials Needed
  • Computer access to gather data or, alternatively, printouts of data
  • (Optional) Calculators or programs such as Fathom that can create different data displays
Ohio Standards Alignment  

Topics

Analysis of univariate data, measures of central tendency, data collection and display, real-world applications.

Overview

Students will investigate the average earnings of a movie in a given week using data they collect from web or print resources. In this investigation, students use the measures of center and different graphical displays of the data to make an argument that each of the measures of center may be appropriate, providing an explanation for their choice of mean, median, or mode. They identify outliers and create box plots (both modified and non-modified), histograms, and stem-and-leaf plots. Students also analyze spread and distribution of the data as part of their investigation. To conclude the exploration, students prepare a poster and give a report to the class. As students analyze actual data about a topic that engages them, they apply mathematics to a real-world situation and can see the value of the statistical concepts they are studying.

Learning objectives:

  • To identify data as univariate.
  • To find measures of central tendency and interpret those measures, selecting the “best” measure of center to summarize the data.
  • To create and interpret different displays for the data (box plot, modified box plot, histogram, stem-and-leaf plot).
  • To analyze the shape of a distribution using spread, shape, clusters and outliers, identifying outliers using the interquartile range.
  • To construct logical arguments based on an analysis of the data.
  • To communicate mathematical ideas orally and in writing.

The "Hook"

Each week newspapers and news shows give a list of the top box office movies for the previous weekend. 

Critical Question
How much money is it realistic to expect a movie to earn in any given weekend?
 
Critical Question
How much money (in total) should Hollywood expect to earn in a weekend?

The Investigation

Direct students to sources of data, such as Box Office Mojo.  This site offers, for example, the daily gross income from the top 10 movies over the past 14 days (see Daily Index), the weekend gross, the total gross, and the average gross from over 120 individual movies. The production budget for each movie is shown, which makes it possible to calculate the profit from any one movie (see Weekend Box Office). Students can be directed to look at different weeks of interest. Most weeks have more than 100 movies in release with quite a few earning a very small amount of money. The teacher may wish to limit the number of movies analyzed per week to a more manageable number, say 25 movies. 

If students do not have access to a computer, the teacher may want to print out the data from a weekend, such as the Fourth of July weekend of 2004, which has an interestingly large outlier, although for most weekends there are one or two movies that are outliers.

Looking over a set of, say, 25 movies on a given weekend, have students analyze the data they collect by calculating the measures of center, the range, and the interquartile range.  Students should make at least two data displays (box plot and histogram). After finding the outliers, students should delete the outliers and create a modified box plot.  Other data displays may be used as appropriate. An interesting question to consider at this point: 

Critical Question
What is the “average” amount of money a movie earns each weekend in the box office returns?  Can we ignore the large outliers in the data? Why or why not?
 
Looking deeper into measures of central tendency:
Critical Question
Is it possible to look at both the mean and median as good representative values to answer different questions about the data?  What questions does each measure answer?  How do data displays help you analyze the data?

Have students present their analyses, including the answers to the above critical questions. Their data displays should form a part of this presentation. The presentation can itself serve as an assessment of student learning.

Students may be interested in exploring further questions about box office data. The crucial first step is for them to define a question that can be answered by the data available.

Critical Question
What question do you choose to explore? What data do you need to investigate it? How will those data help answer your question?

Teaching Tips

  • Students may have an interest in different movies and each movie's opening weekend. Since most weekends have a large earner and many that are not, this variation in choice of movies should not hinder the point of the lesson and will allow different sets of data to be explored. 
  • Students can argue that the mean is appropriate as studios can expect one movie to have high earnings. On the other hand, on a weekend such as the one referenced (Fourth of July, 2004), the outlier is so large that the median may be a better choice. Finally, a great many movies at the end of the list are earning approximately the same small amount, making the mode an appropriate choice. The graphical analysis assists in making a good argument for the latter two.
  • Students may want to look at earnings per screen, number of screens, and other data available on the site to augment their arguments or as differing topics of interest. A bivariate data set to consider could be gross earnings over time – a declining pattern that is rarely linear.

    Suggested rubric for the concept of measures of center:

    2 – Correctly finds three measures of center.

    1 – Correctly computes mean OR median; correctly identifies mode (if it exists for the data).

    0 – Neither the mean nor the median is correctly calculated.

    Suggested rubric for the graphs:

    4 – Scale is correct and the scales are labeled properly; data are displayed in an appropriate manner (bars are drawn with a ruler using the proper scale and a common width; box plots have five-point summary points clearly marked; if a modified box plot, all of the outliers are clearly marked and labeled).

    3 – One of the criteria in the 4-point list is done incorrectly.

    2 – Two of the criteria in the 4-point list are done incorrectly.

    1 – Only one of the criteria in the 4-point list is done correctly.

    0 – None of the parts of the graph is done correctly.

    Suggested rubric for the presentation or the report:

    6 – Correctly identifies the question for which the mean is the best answer (generally, what the studio expects to earn on a typical weekend); correctly identifies the question for which the median is the best answer (what a single movie may be expected to earn on a weekend); deals with different sets of data, such as the first 25 movies versus the first 50 movies, and how that choice affects the data; has a clear reason for each answer; if addressed, combines reports from other groups to see a trend among the weekends studied.

    5 –  Correctly identifies either the question for which the mean is the best answer (generally, what the studio expects to earn on a typical weekend) OR the question for which the median is the best answer (what a single movie may be expected to earn on a weekend); deals with different sets of data, such as the first 25 movies versus the first 50 movies, and how that choice affects the data; has a clear reason for most answers.

    4 –  Correctly identifies either the question for which the mean is the best answer (generally, what the studio expects to earn on a typical weekend) OR the question for which the median is the best answer (what a single movie may be expected to earn on a weekend); deals with one of the different sets of data, such as the first 25 movies OR the first 50 movies; has a clear reason for most answers.

    3 –  Incorrectly identifies the question for which the mean is the best answer (generally, what the studio expects to earn on a typical weekend) and incorrectly identifies the question for which the median is the best answer (what a single movie may be expected to earn on a weekend); deals with the different sets of data, such as the first 25 movies versus the first 50 movies, and how that choice affects the data; has a clear reason for many answers.

    2 –  Incorrectly identifies the question for which the mean is the best answer (generally, what the studio expects to earn on a typical weekend) and incorrectly identifies the question for which the median is the best answer (what a single movie may be expected to earn on a weekend); deals with one of the different sets of data, such as the first 25 movies OR the first 50 movies; has a clear reason for some answers.

    1  –  Incorrectly identifies the question for which the mean is the best answer (generally, what the studio expects to earn on a typical weekend) and incorrectly identifies the question for which the median is the best answer (what a single movie may be expected to earn on a weekend); deals with one of the different sets of data, such as the first 25 movies OR the first 50 movies; has trouble explaining several answers.

    0 – Deals with nearly all data incorrectly.

    Citation

    From the teaching files of Fred Dillon.