MAT/543 MAT543 MAT 543 WEEK 2 Homework

MAT 543 WEEK  2 Homework

Week 2 Homework

  • Homework
    • Chapter 2: Exercises 2-1 through 2-5 (page 48 of the text)

LEARNING OBJECTIVE 1: CALCULATING AND USING DESCRIPTIVE STATISTICS

Statistics is a term that usually invokes dread and discomfort in students and practitioners alike. This is probably because statistics holds a close relationship to mathematics, although the two are distinct. Statistics uses mathematical relationships between data to allow managers to make decisions about both the data itself and about the likelihood that sampled data represent a broader and more generalized trend or population. Statistics, however, need not be difficult. For managers, some simple categorizing of techniques can help focus statistical analysis in ways that are easily understood and applied.

Managerial statistics have three primary functions. The first function is to describe certain data elements, such as the number of births over a time period or the expenses incurred for a service unit. The second function is to compare two points of data, such as births from 1 year to the next or error rates between care sites. The third is to predict data, such as visit volume in future months. This chapter examines the first two, saving prediction for Chapters 5, 6, and 7, given its specialized nature. First, however, a discussion about the nature of data is in order. Data are quite simply numbers within a context. Green, although a very nice color, is only an adjective by itself. If, however, we record the eye color of a room of 20 people, and then code those colors with numbers—i.e., 1 for blue, 2 for brown, 3 for green—we have transformed the colors into points of data. Similarly, if we were to then count the number of people with green eyes, we have performed a statistical function—the calculation of a descriptive statistic.

A number of different types of analyses can be performed on data. When the time comes to conduct these analyses, students often face “analysis paralysis.” Imagine, for example, that you find yourself in a new position, and you have been asked by your boss, Mr. Walden, to take a look at some utilization trends using data from the organization’s data warehouse. You pull up the corresponding data files, and are faced with more than 30 variables (columns in a spreadsheet) and tens of thousands of records (rows in a spreadsheet representing individual patient visits). In the middle is a sea of numbers of various sorts that continue on as you scroll and scroll down the screen. In some organizations, there may be millions of data records in thousands of tables, which is intimidating to be sure. However, a data file with 10,000 rows and one with 20 are not all that different. Each can be described in similar ways. What is important is the data itself. Understanding what the numbers represent (the context) and how they were created will lead you down certain analytic paths and not others, allowing you to put some statistical methods aside for some data.

Measuring Data

Data come in only four varieties. Students of introductory statistics will no doubt recall the terms nominal, ordinal, interval, and ratio. All refer to measurement of data variables. Variables are simply data that can take on different values, depending on what is being measured. In the earlier example, the color of 20 people’s eyes was recorded, thus creating the variable “eye color.” In this instance it is a variable that is measured nominally. Nominal refers to data that exist in non-overlapping categories. They have no ranking and are mutually exclusive; for example, eye color, insurance type, gender, and ethnicity. Ordinal variables are slightly different in that they are still measured categorically, but the categories have a ranking. An example of this would be satisfaction scales, in which somewhat satisfied might be followed by very satisfied, etc. These are common in health surveys. The final two types are often taken together as interval/ratio variables. These are often termed continuously measured variables; examples include time and money. The difference here is that they are actually still categories, but the distance between categories is equal. Think of a time scale as derived in seconds—one second, two seconds, etc. We could derive smaller increments if we so wished, creating fractions of seconds as is often done in Olympic time trials and racing. The increments ultimately do not matter, however. What does matter is that the distance between them is equal. This allows mathematical calculations on these forms of data. The difference between interval and ratio data has to do with the presence of a meaningful zero when measuring a ratio variable, a distinction not important for this discussion.

At this point the insightful student might realize that examining measurement provides two distinct types of variables—those that have equal distances between measurement points and those that do not. Often, these distinctions are recognized by labeling nominal and ordinal data as categorical, and interval/ratio data as continuous. We too will follow this convention. Ordinal data present a unique measurement form. It is important to understand the type of data you are working with because each is analyzed differently.

Descriptive Statistics with One Variable (Univariate)

First, examine descriptive statistics in relation to categorical data. Table 2-1 provides data on 14 patients, recording their insurance type.

Insurance type is a categorical variable. The categories are not ranked, nor is there any relationship among them. Patients usually claim a type of primary insurance (or lack thereof) upon visit. To describe the data, we are limited to only a handful of techniques. The first is to simply count. Here we can count total patients or patients by the type of insurance they have. The second, which requires a bit of mathematics to be conducted first, is to create percentages for the number of persons falling into each category. A percentage is simply the number of persons in a category divided by the total number of persons, multiplied by 100. Not multiplying by 100 is also correct, although this provides a decimal fraction and not a percentage. For example, we may wish to know how many people reported having United as their insurer. One way to summarize this would be to count, which amount to three individuals in Table 2-1. To calculate a percentage, we would divide that 3 by 14, which gives us 0.21, or 21% of patients. Percentages and fractions provide slightly more information than do counts. Inherent in them is the context of the whole. If we tell you three patients had United insurance, you may still wonder if that is a lot, not many, or a modest amount; but if we say 21% of patients had United, you now have some sense of the entire group of patients, although we have not provided the total. Here, providing the total in addition to the percentage would provide both the count and the total, creating a more complete picture of the data being described. Listing counts and percentages of categorical data is also called creating frequencies from the data. We could graph the data at this point and obtain a visual representation of how frequently patients used various types of insurance as we have done in Figure 2-1. From a descriptive standpoint, this is the limit of analyzing a singular categorical variable.

Table 2-1 Insurance Type by Patient

 1

           

United

 2

           

Medicare

 3

           

Medicaid

 4

           

Medicare

 5

           

BC/BS

 6

           

United

 7

           

BC/BS

 8

           

BC/BS

 9

           

Medicaid

10

           

Uninsured

11

           

Medicare

12

           

Uninsured

13

           

United

14

           

MBCA

Figure 2-1 Patient Insurance by Type

The second type of data you may wish to describe are those measured as interval/ratio variables. These are also commonly referred to as continuous data. Again, these are actually categorical data as well, but the categories are of equal size. Examples include variables such as time, money, height, and weight. The equal distances between categories are what allow for mathematical analysis of these data. So, for example, adding one dollar to two dollars adds the same amount as adding one dollar to ten dollars. This allows us to calculate a number of descriptive measures that examine the centrality of the data and its spread, which are both useful for our purposes. We first examine measures of central tendency.

Measures of Central Tendency

As described, data that are collected across a number of observations vary from observation to observation; thus, the term variable. Plotting these data reveals both the spread and the clustering of individual observations. An example is given in Figure 2-2.

From these data, we can see that the number of chart pulls appear to largely be centered between 10 and 30 per day, with a few days of higher volume, and one with lower volume. What would be helpful for analytic purposes would be to have a set of summary statistics to describe the data. The statistic that is the mathematical center of a data set is the average, or mean. It is the foundation for many other statistical concepts as well. To calculate the mean, simply add up all the values and divide by n, which is the number of observations. We can also find the median, which is the center of the distribution of data when all the observations are arranged from lowest to highest. The mode is the more frequently reported data value. Given our data from Figure 2-2, Table 2-2 reports these measures of central tendency.

Figure 2-2 Port City Hospital Daily Chart Filings per Day

Figure 2-2A Port City Hospital Daily Chart Filings per Day Part 2

Table 2-2 Port City Hospital Daily Chart Filings per Day

Day

           

Charts Filed

           

Day

           

Charts Filed

 1

           

12

           

16

           

12

 2

           

15

           

17

           

15

 3

           

18

           

18

           

23

 4

           

12

           

19

           

32

 5

           

13

           

20

           

19

 6

           

16

           

21

           

12

 7

           

22

           

22

           

18

 8

           

15

           

23

           

17

 9

           

14

           

24

           

21

10

           

19

           

25

           

20

11

           

23

           

26

           

11

12

           

26

           

27

           

12

13

           

38

           

28

           

12

14

           

22

           

29

           

18

15

           

 7

           

30

           

23

Mean

           

18

           

 

           

 

Median

           

18

           

 

           

 

Mode

           

12

           

 

           

 

In this instance, the median and mean are the same value, 18. The mode is 12. Had the mean been higher than the median, it would indicate that there were some high values of the data that were pulling the mean upward. Examine the following range of income values: $13,000, $25,000, $33,000, $42,000, $56,000. The mean of these data is $33,800. The median is $33,000. If, however, we replace the value of $56,000 with $120,000, notice what happens. The median is still $33,000, yet the mean increases to $46,600. This is because the median is not dependent on all other values in the distribution. It is what we call a robust measure, or one that is resistant to outlying values. The mean is not robust, as we demonstrated. When examining data distributions, it is sometimes helpful to look at both the mean and the median. Doing so can quickly tell you something about the presence of outlying values and the spread of the data.

Measures of Spread

Although the mean, median, and mode tell us something about the middle or centrality of the data, we may also be interested in how varied and spread out the data are. This is both helpful to understand the range of data values, and also to examine the possibility of outlier values that might be affecting our measures of central tendency. Examine again the data in Table 2-2 and Figure 2-2. We know that the mean of these data is 18 charts filed on average. We also know that there are many days when the number of charts filed exceeds 18 per day and also falls short of 18 per day. The maximum and minimum values tell us this, and are important measures for summarizing our data. Here they are 48 and 7, respectively. Their difference, or 41 (48–7) is what is known as the range. Examining the range in addition to other measures of central tendency allows a clearer picture of the data distribution (even without a graph!).

There is one final measure of spread that should be considered. If we were to draw a line at the mean in Figure 2-2 we would see that about half of the data points were clustered above and half below (and although this is always true of the median, it need not always be so for the mean) (see Figure 2.2 A). Here we see that some points lie closer to the mean than others, whereas some lie on the mean. Thus, each point of observation lies some distance from the mean, whether positive or negative. What would be interesting is to know how far from the mean are the data on average. The final summary measure of spread does this, which is the standard deviation. Simply put, the standard deviation is the average distance of a given data point to its mean. In the chart filing example, we are asking on average how far do the data points diverge from their mean, which is 18? To do this we could start by measuring the distance of each point to the mean, and then simply dividing by n to get the average. But wait. Because the mean is the mathematical average of all the points, the distances when summed will total zero. Dividing zero by anything is a mathematical impossibility. So, to counter this problem, the negative distances need to be eliminated by squaring all of the distances. This eliminates our zero total problem, but also converts all our original distances into squared distances, so that when we add them up and divide by n, we have the average total squared distance, also known as the variance. In this case, this creates a measure interpreted as the number of charts pulled squared. This creates an interpretive problem in that we no longer have the same units with which we started. To return to our original units requires that we eliminate the squared term by taking the square root, thus providing the standard deviation.

Working with Samples

    7�z�I�(

  1. Question Attachments

    1 attachments —

Answer Detail

Get This Answer

Invite Tutor