MAT/543 MAT543 MAT 543 WEEK 2 Homework
- strayer university / MAT 543
- 30 Mar 2018
- Price: $10
- Other / Other
MAT 543 WEEK 2 Homework
Week 2 Homework
- Homework
- Chapter 2: Exercises 2-1 through 2-5 (page 48 of the text)
LEARNING OBJECTIVE 1: CALCULATING AND USING
DESCRIPTIVE STATISTICS
Statistics is a term that usually invokes
dread and discomfort in students and practitioners alike. This is probably
because statistics holds a close relationship to mathematics, although the two
are distinct. Statistics uses mathematical relationships between data to allow
managers to make decisions about both the data itself and about the likelihood
that sampled data represent a broader and more generalized trend or population.
Statistics, however, need not be difficult. For managers, some simple
categorizing of techniques can help focus statistical analysis in ways that are
easily understood and applied.
Managerial statistics have three primary
functions. The first function is to describe certain data elements, such as the
number of births over a time period or the expenses incurred for a service
unit. The second function is to compare two points of data, such as births from
1 year to the next or error rates between care sites. The third is to predict
data, such as visit volume in future months. This chapter examines the first
two, saving prediction for Chapters 5, 6, and 7, given its specialized nature.
First, however, a discussion about the nature of data is in order. Data are
quite simply numbers within a context. Green, although a very nice color, is
only an adjective by itself. If, however, we record the eye color of a room of
20 people, and then code those colors with numbers—i.e., 1 for blue, 2 for
brown, 3 for green—we have transformed the colors into points of data.
Similarly, if we were to then count the number of people with green eyes, we
have performed a statistical function—the calculation of a descriptive
statistic.
A number of different types of analyses can
be performed on data. When the time comes to conduct these analyses, students
often face “analysis paralysis.” Imagine, for example, that you find yourself
in a new position, and you have been asked by your boss, Mr. Walden, to take a
look at some utilization trends using data from the organization’s data
warehouse. You pull up the corresponding data files, and are faced with more
than 30 variables (columns in a spreadsheet) and tens of thousands of records
(rows in a spreadsheet representing individual patient visits). In the middle
is a sea of numbers of various sorts that continue on as you scroll and scroll
down the screen. In some organizations, there may be millions of data records
in thousands of tables, which is intimidating to be sure. However, a data file
with 10,000 rows and one with 20 are not all that different. Each can be
described in similar ways. What is important is the data itself. Understanding
what the numbers represent (the context) and how they were created will lead
you down certain analytic paths and not others, allowing you to put some
statistical methods aside for some data.
Measuring Data
Data come in only four varieties. Students of
introductory statistics will no doubt recall the terms nominal, ordinal,
interval, and ratio. All refer to measurement of data variables. Variables are
simply data that can take on different values, depending on what is being
measured. In the earlier example, the color of 20 people’s eyes was recorded,
thus creating the variable “eye color.” In this instance it is a variable that
is measured nominally. Nominal refers to data that exist in non-overlapping
categories. They have no ranking and are mutually exclusive; for example, eye
color, insurance type, gender, and ethnicity. Ordinal variables are slightly
different in that they are still measured categorically, but the categories
have a ranking. An example of this would be satisfaction scales, in which
somewhat satisfied might be followed by very satisfied, etc. These are common
in health surveys. The final two types are often taken together as
interval/ratio variables. These are often termed continuously measured
variables; examples include time and money. The difference here is that they
are actually still categories, but the distance between categories is equal.
Think of a time scale as derived in seconds—one second, two seconds, etc. We
could derive smaller increments if we so wished, creating fractions of seconds
as is often done in Olympic time trials and racing. The increments ultimately
do not matter, however. What does matter is that the distance between them is
equal. This allows mathematical calculations on these forms of data. The
difference between interval and ratio data has to do with the presence of a
meaningful zero when measuring a ratio variable, a distinction not important
for this discussion.
At this point the insightful student might
realize that examining measurement provides two distinct types of
variables—those that have equal distances between measurement points and those
that do not. Often, these distinctions are recognized by labeling nominal and
ordinal data as categorical, and interval/ratio data as continuous. We too will
follow this convention. Ordinal data present a unique measurement form. It is
important to understand the type of data you are working with because each is
analyzed differently.
Descriptive Statistics with One Variable
(Univariate)
First, examine descriptive statistics in
relation to categorical data. Table 2-1 provides data on 14 patients, recording
their insurance type.
Insurance type is a categorical variable. The
categories are not ranked, nor is there any relationship among them. Patients
usually claim a type of primary insurance (or lack thereof) upon visit. To
describe the data, we are limited to only a handful of techniques. The first is
to simply count. Here we can count total patients or patients by the type of
insurance they have. The second, which requires a bit of mathematics to be
conducted first, is to create percentages for the number of persons falling
into each category. A percentage is simply the number of persons in a category
divided by the total number of persons, multiplied by 100. Not multiplying by
100 is also correct, although this provides a decimal fraction and not a
percentage. For example, we may wish to know how many people reported having
United as their insurer. One way to summarize this would be to count, which
amount to three individuals in Table 2-1. To calculate a percentage, we would
divide that 3 by 14, which gives us 0.21, or 21% of patients. Percentages and
fractions provide slightly more information than do counts. Inherent in them is
the context of the whole. If we tell you three patients had United insurance,
you may still wonder if that is a lot, not many, or a modest amount; but if we
say 21% of patients had United, you now have some sense of the entire group of
patients, although we have not provided the total. Here, providing the total in
addition to the percentage would provide both the count and the total, creating
a more complete picture of the data being described. Listing counts and
percentages of categorical data is also called creating frequencies from the
data. We could graph the data at this point and obtain a visual representation
of how frequently patients used various types of insurance as we have done in
Figure 2-1. From a descriptive standpoint, this is the limit of analyzing a
singular categorical variable.
Table 2-1 Insurance Type by Patient
1
United
2
Medicare
3
Medicaid
4
Medicare
5
BC/BS
6
United
7
BC/BS
8
BC/BS
9
Medicaid
10
Uninsured
11
Medicare
12
Uninsured
13
United
14
MBCA
Figure 2-1 Patient Insurance by Type
The second type of data you may wish to
describe are those measured as interval/ratio variables. These are also
commonly referred to as continuous data. Again, these are actually categorical
data as well, but the categories are of equal size. Examples include variables
such as time, money, height, and weight. The equal distances between categories
are what allow for mathematical analysis of these data. So, for example, adding
one dollar to two dollars adds the same amount as adding one dollar to ten
dollars. This allows us to calculate a number of descriptive measures that
examine the centrality of the data and its spread, which are both useful for
our purposes. We first examine measures of central tendency.
Measures of Central Tendency
As described, data that are collected across
a number of observations vary from observation to observation; thus, the term
variable. Plotting these data reveals both the spread and the clustering of
individual observations. An example is given in Figure 2-2.
From these data, we can see that the number
of chart pulls appear to largely be centered between 10 and 30 per day, with a
few days of higher volume, and one with lower volume. What would be helpful for
analytic purposes would be to have a set of summary statistics to describe the
data. The statistic that is the mathematical center of a data set is the
average, or mean. It is the foundation for many other statistical concepts as
well. To calculate the mean, simply add up all the values and divide by n,
which is the number of observations. We can also find the median, which is the
center of the distribution of data when all the observations are arranged from
lowest to highest. The mode is the more frequently reported data value. Given
our data from Figure 2-2, Table 2-2 reports these measures of central tendency.
Figure 2-2 Port City Hospital Daily Chart
Filings per Day
Figure 2-2A Port City Hospital Daily Chart
Filings per Day Part 2
Table 2-2 Port City Hospital Daily Chart
Filings per Day
Day
Charts Filed
Day
Charts Filed
1
12
16
12
2
15
17
15
3
18
18
23
4
12
19
32
5
13
20
19
6
16
21
12
7
22
22
18
8
15
23
17
9
14
24
21
10
19
25
20
11
23
26
11
12
26
27
12
13
38
28
12
14
22
29
18
15
7
30
23
Mean
18
Median
18
Mode
12
In this instance, the median and mean are the
same value, 18. The mode is 12. Had the mean been higher than the median, it
would indicate that there were some high values of the data that were pulling
the mean upward. Examine the following range of income values: $13,000,
$25,000, $33,000, $42,000, $56,000. The mean of these data is $33,800. The
median is $33,000. If, however, we replace the value of $56,000 with $120,000,
notice what happens. The median is still $33,000, yet the mean increases to
$46,600. This is because the median is not dependent on all other values in the
distribution. It is what we call a robust measure, or one that is resistant to
outlying values. The mean is not robust, as we demonstrated. When examining
data distributions, it is sometimes helpful to look at both the mean and the
median. Doing so can quickly tell you something about the presence of outlying
values and the spread of the data.
Measures of Spread
Although the mean, median, and mode tell us
something about the middle or centrality of the data, we may also be interested
in how varied and spread out the data are. This is both helpful to understand
the range of data values, and also to examine the possibility of outlier values
that might be affecting our measures of central tendency. Examine again the
data in Table 2-2 and Figure 2-2. We know that the mean of these data is 18
charts filed on average. We also know that there are many days when the number
of charts filed exceeds 18 per day and also falls short of 18 per day. The
maximum and minimum values tell us this, and are important measures for
summarizing our data. Here they are 48 and 7, respectively. Their difference,
or 41 (48–7) is what is known as the range. Examining the range in addition to
other measures of central tendency allows a clearer picture of the data
distribution (even without a graph!).
There is one final measure of spread that
should be considered. If we were to draw a line at the mean in Figure 2-2 we
would see that about half of the data points were clustered above and half
below (and although this is always true of the median, it need not always be so
for the mean) (see Figure 2.2 A). Here we see that some points lie closer to
the mean than others, whereas some lie on the mean. Thus, each point of
observation lies some distance from the mean, whether positive or negative.
What would be interesting is to know how far from the mean are the data on
average. The final summary measure of spread does this, which is the standard
deviation. Simply put, the standard deviation is the average distance of a
given data point to its mean. In the chart filing example, we are asking on
average how far do the data points diverge from their mean, which is 18? To do
this we could start by measuring the distance of each point to the mean, and
then simply dividing by n to get the average. But wait. Because the mean is the
mathematical average of all the points, the distances when summed will total
zero. Dividing zero by anything is a mathematical impossibility. So, to counter
this problem, the negative distances need to be eliminated by squaring all of
the distances. This eliminates our zero total problem, but also converts all
our original distances into squared distances, so that when we add them up and
divide by n, we have the average total squared distance, also known as the
variance. In this case, this creates a measure interpreted as the number of
charts pulled squared. This creates an interpretive problem in that we no
longer have the same units with which we started. To return to our original
units requires that we eliminate the squared term by taking the square root,
thus providing the standard deviation.
Working with Samples
7�z�I�(
Question Attachments
1 attachments —