Okay this is top 10 topic number one in descriptive statistics and to start war on the business 302L webpage will go to a exam review of the new statistics produce statistics review the actual PowerPoint presentation the department has provided click on open and the very first page of should be about descriptive statistics now remember, in statistics are really two broad kinds of statistics to descriptive statistics which means to describe a set of data using the data itself to describe itself and the second broad class is inferential statistics inferential statistics means making inferences from a sample of Jews usually small to a population which is usually large and so descriptive statistics and in French so statistics descriptor statistics are sort of the easier statistics departure learned earlier in the semester and inferential statistics are the harder parts of statistics because you have to compare means compare proportions should to linear regression and check the residuals and things like that. So this is descriptive statistics and a in descriptive statistics and the key idea to it there too really. He is the first is a measure of central tendency of the measures of central location and the second as measures of dispersion of standard deviation will get to that in a moment to the first big one is measures of central location are measures of central tendency those of the mean sometimes called the arithmetic mean week sometimes call it the average but you shouldn't use the word average and statistics which are to be very purse ice so call it the mean or call it the arithmetic mean because numbers are added out and the media and the motor that are similar and we'll talk about how look and their similar and different as we go through this particular lecture mean and median and mouth would take the mean of population mean, which means things about the population which therefore we use Greek letters so there's a Greek letter new population mean he cool.
Ellis, hard to remember such camera to the Greek letter trying use the whole war the population Main is equal to the Psalm and other acts up the sum of all of the different numbers and is it in a population so it's really some of X. of one to a sub to ask of 360-4605 all of the different numbers of fish 10 members in the population you sum up all in numbers and then divide by the total number of numbers, which is uppercase and would see if I can point to this here for site uppercase and an upper case and is of the number of observations, which is in this case 10 for example, so it is an example 5 plus one plus six or three numbers. So five posts won a six plus six is 12 divided by 312 divided by three as for so that's the main same thing with a sample mean, which will get to in the moments as we as a lowercase and rather than a large case in a tank. A little bit of Al to row the uppercase sigma here in that he means some nation as opposed to lowercase sigma, which means standard deviation which will get to later but uppercase Sigma means some all the numbers so the sum of ex is equal to and the number of in the number of observations like three times the main three times for his 12th UK. Usually you don't see it written that way you'll see it written about a as it's written about what the population ages sum up all the numbers divided by the number of observations pull a point number three same thing for the sample means just that we sent using a Greek letter mu, we use on a look ex-bar, which usually asked with a bar over a year it's written ask the day all are but usually it's asked with a bar over a just and is the same calculation sum of all the observations divide by the number of observations. It's just that we as a lowercase and because it's the sample as opposed to upper case and which is the population you so sample me in population are computed very similar effect of the algebra this is exactly the same impact with example with the number of hours spent on the Internet for eight and nine ex-bar scene notes the sample mean is equal to four plus eight plus nine and with four plus eight is 12 plus nine is 21 divided by three is seven hours of cake now. When viewed not used to mean the mean is a very nice measure of what is the middle of the day when the new and not use the main. You don't use the mean if the number of observations is small or if you have really extreme values and usually where we hit extreme values is on things like income and housing price of Sosa, very, for example we have a small group of Bill of people and Bill Gates is in the group of the mean is going to be very high even though most of the people, though having comes like a Bill Gates the chip of the chairman and founder of Microsoft, and so we tend not to use the mean in that case because the mean is skewed to larch here's another example don't use of the number for if three houses were sold this week and one was a mansion mansion is a very large house that means the mean is going to be too high. So for things like income and housing prices where you tend to see it at some other things as well, we tend a use for me the end rather than the mean will get to the medium of in the media and the median is the middle value of select the media and middle-of-the-road as it were contained in the middle of a freeway for example, the other numbers 51 and six step one you first have to sort the data 51 and six doesn't tell you where the mediumship disorder so you put it in order 15 and six and then step to just pick the number in the middle with very straightforward that the middle value the beauty of this is if we had in, real estate prices, and we had a very high number for example of this is one in five and 60 the median would still be five, which is kind and nice book will give out you can almost show you an example on the end of this PowerPoint in a moment when there's an even number of observations. The median is computed by averaging the two observations in the middle so this was one for five and six you take the average of four and five, which is 4.5 tank with a and to of a fourth bullet point, you can use the media and women are extreme value is negative or positive for example homes sales $100000-$1000 900,000 so the meet the mean would be $400,000, which is 100 plus 200 300 plus 900s 1200 1200 divided by three would be $400,000, but the media as to whether thousand dollars in other words in this particular case the meeting in is a better measure of central tendency. A better estimate of central tendency then is the mean usually the mean is just fine. If you need to know the media and in special cases okay.
On mode mode that comes from the French word for fashion our most popular it means the most frequent and value. So what's the most frequent value for example, get some categorical variables female and male and female what's the mode of the modus female because there are two females and one male poll point number three here's another example 112358. What's the mode the motors won because one appears twice yes. Can a number have, I can as set members have more than one vote yes it can. We don't use the mode as often as we use the median or other means, but it's it but it's useful to know because it's a measure of central tendency. It may not be a very good measure and see the following example. Here's an example of a some sample data 00578 9121422 and 23. It doesn't tell is what the sample is, but will just take the numbers is a written in the sample mean is equal to ex-bar as opposed to mew because it's the sample the Psalm of all of the axis of divided by the number of observations some of the text of a Orion is equal behind are divided by 10 with just 10 and the median is 8.5 which is the middle two numbers half of the middle to numbers eight plus nine and 17 divided by two is a man half and the Mota 01 so the mode is not very helpful here the mean is probably the best measure in this particular case Khazar are too many extremes ideas and secondarily you could use the medium is well in almost all cases, you can use the mean and why because the mean combined with the standard deviation, which will learn in a moment are the two main characteristics that we need to compare what we see to what might have occurred in a theoretical distribution like a normal distribution and the theory about what makes that work is is being able to compute the mean correctly and being able to compute the standard deviation crack looks so quite often we use the mean, which is also notice the arithmetic mean, because were adding up the numbers there are other kinds of means like him. After comedians as well, but those aren't on the exam gain.
Relationship there are relationships between these three members to mean and median in them out, case number one if the probability distribution is symmetrical, that means its bell shaped or like a normal distribution shape like a bell then if it's perfectly symmetric to mean equals the media and equals mile I can tell you in real life, this doesn't occur very often, but it had if it were to occur in the mean would be equal. The median would be pulled them out in case number two if the distribution of is positively skewed to the right. That's redundant, but positively skewed means to the right for example incomes of up and incomes of employers in a large firm a large number of relatively low-paid workers and a small number of relatively high paid executive sets probably quite typical in that particular case the mode is less than the median, which is less than the main the mode, which is the most popular number is half contains a large number of the relatively low-paid workers, which is less than the median and the mean is going to be skewed high remembers going to be skewed positive because of the small number but very high paid executives in the firm and so it's important to understand the relationships between all three of those measures of central tendency to case number three in the last one here, if the distribution is negatively skewed to the left, and that's redundant but negatively skewed to the stuff you can say to yourself to help you remember, for example the time taken by students to write exams few of few students hand in their exams early and a majority of the students turn in their exam at the end of the exam will. So in that particular case since the majority of the students turn in their exam at the end of the fair in the mold is probably the highest number so it saw here on the right and everything else is below the mode so the mean is less than the median in the media is less than the mild pain. So those of the three kinds of relationships so and I said before that the main issue with descriptive statistics to broad broad classes of a tube excuse me a bit too important concepts and descriptive statistics, particularly for calculations are measures of central tendency how things tend to be towards the middle and measures of dispersion or measures of variability, which is how things tend to be different from the media scares me different from the mean I'm different from each other. Actually, so some dispersion and this is on the slide measures of still a very ability to slide 10 if you're following along on the podcast how much and how much is the spread of the data sets, one definition and the spread of the day it is also a how much is the uncertainty of the data and a lot of managed the reason this is important is a lot of management issues is up are related to reducing the amount of uncertainty or at least understanding the degree of uncertainty in our environment. So how much is that we know a measure censure tend to survive a measure of central tendency is how far the information how far the points are away from the middle with to limit or repeat to the sarong definition to measure of central tendency is where is the middle of the day that we usually we use the main dispersion is how far the parts of the points are away from the the actual mean that you just computed in measures of central tendency to how how much is the spread of the date data for some very simple measures are the range in the range this means the highest to lowest what to talk about Malone moment the two ones that you're most familiar with the other variants and the square root of the variants, which is the standard deviation, and we need to standard deviation, because that goes with the mean and once we have those two. We can compare what we see in a sample to what might appear in a theoretical distribution like a normal distribution or the team tea approximation to the normal distribution and pay and so the first one is the range rain is very straightforward with just look up highest number subtract the minimum to take the highest number spread the minimum that's the range of the range is affected by unusual valuables has the same issue as the mean the difference between the mean in the media before, if there's one member that's pretty high risk and it tend to skew the results a little, but it's not the end of the world but you just need to be the need to kick nice that for example if the city of Santa Monica has a high of 105 and a low of 30 once a century the range would be a 105 to 3075 but really, that range is a little off the range really should be closer to 105 should be a hunter and 90 or 80 or something like that or 70 because it doesn't have 30 very often the case of the range is just the maximum number and the minimum number and sometimes we subtract those two together to give a number like 75 105 -30. The standard deviation is better than the range because all of the date issues day the standard deviation again remember is a measure of dispersion in other words before he calculated the mean now, what we want and know is how far the on average, how far are the points away from the mean better is better than the range is we use all the day the the population standard deviation is the square root of the variance in the variance in a letter for the variances sigma... lowercase sigma don't confuse that with uppercase signal, which is the summation side being so we compute the variants first and then we take the square root, and we want the positive square root, and that's the standard deviation and standard deviation is always greater than sura Bank with a one of the nice things about a standard deviation is in some measure of distance. Member the normal distribution and mean as in the middle of how far we move along the lines and how if I am and therefore how much area under a normal curve can we discuss how far we move along the line that measure is is called a standard deviation like miles and feet and inches is just that were not measuring and in that kind of the unit were measuring it in it in in in a distance call the standard deviation units from okay one of the nice things is in the normal distribution, one that we have a something called the empirical rule the supplies to normal distributions or bill shape cursed for the PowerPoint also calls on mound curves as well, one of the nice things that we now war from this rule in normal distributions is that 68% of the day that lie within plus or minus one standard deviation of the main and so it's kind of far out from the mean on both sides but not too far out. We also know that 95% of the data falls within plus or minus two standard deviations of the mean this is very commonly used in statistics in knowing whether or not your two standard deviations out from the main men last one is 99.7% of the data lies with them closer minus three standard deviations of the mean we usually don't go to too far farther than that because he's starting to get into the error you get into percentages that really come very close to under percent in operation still talk about six Sigma, which is six standard deviations and that this is just that's just an extension of this idea, but for your particular exam, you need to know the 68% of the day is within a plus and minus one standard deviation and 95% is within plus or minus two standard deviations of the main and 99.7% and another is nearly all of that all of is within plus and minus three standard deviations of the main okay here's the famous for Milan slide 14 for the standard deviations to the standard deviation is the square root of the variants you notice we use a lower case asked like us were talking about sample here are actually I guess that's an uppercase us here in this particular case, and I can actually will call this is an uppercase S., but really should be a lower case that's because it's a sample mean, if we're as a sample of standard deviation if there was a population standard deviation would use Sigma below are two cinema so the sample standard deviation is the square root of the sum of each one of the numbers, minus the mean of all those numbers, because remember we want to see how far the numbers are way from the main silly sparks subtract the mean from each one of these numbers square at and some those out with the pain and divide through by an minus one and means the number of observations of the number of a a number of numbers in it in a list that usually we caught the number of observations minus one... and that's been without the square roots signs call the variants to take the square root is called with the standard deviation and with it. Here's an example in this might be a little hard to see on the podcast or on slide 15 but if you go back and review the podcast with the PowerPoint slides. You can take a look at so the first column is asked a bunch of numbers in the next column as ex minus Explorer in the last column's ex -64 square the usually isn't a lot of time to do always fancy calculations on the exam but you do need to know some of the basics to be able to do this and this is one of those basics remember we use computers when the number of one that when no in the numbers are pretty large and no one may have a knock at a thousand numbers or 10,000 numbers way you should be able to calculate simple elementary statistics will may only have 45 or six different kinds of numbers to first column is asked in numbers or 6678 and 13. The sum of that is 40 the meanest 40 divided by the number of observations five, which is equal to a Benin next column we subtract the mean from each one of those numbers a six minus eight is -26 minus eight is -27 minus eight -18 -80 zero 13 minus eight is equal to five and we sum all those up by definition those have to add to zero, because were subtracting the mean away from all of the numbers since we can can computer the mean from each of those numbers if we subtract the mean from each of those numbers and add them up for going to get's are a bit. When you square the number because we got a get rid of the the, - this and other reasons too, but then not important right now so minus two times minus two is four minus two times minus two is four -1 tons minus one as one syrup time service or oh and five times five is 25 to the suspects might as X. Paar Square when we sum them up. And that's 34 and on the next page of say that the total of variation is 34 UK excuse me is the total variation is 34, which was the sum of all the ex -64 square didn't say if we divide that by 10 minus one and was five sew and minus one is for we take 34 and divide by four we get 8.5, which is the sample variance and if we take the square root of that the sample standard deviation is the square root of the variants are 8.5 and I don't have a calculator and fun in the bunch is reading it off the off the screen here. And you will have a calculator on the exam and just a simple one will do this is why you need one with the square root to the square root of a .5 is 2.9 okay and a next life slide 17 measures a very ability here's another example is a kind of example, the hourly wages earned by a sample of five students are seven dollars five dollars $11 eight dollars and six dollars. What's the range the range again is the highest to lowest the maximum to the minimum 11 -56 and sometimes a range is not a complete answer. So we continue on, we want to compute the variance to the variants that is as squared can remember as a standard deviations to the squares of variants and squared for Sigma squared in this case or sample so wheels*or it is equal to some of X. mine sex far, which is the means simple main square divided by an minus one and if you do all those calculations I won't read them over the podcast but if you do that come up with 21.2 divided by five -1, which is 5.30 we take the square root of five at-bats the variance week to find samples to find the standard deviation we take the square root of that and come up with 2.30 McCain does serve those are the measures of dispersion for that particular data set. Very instant and standard deviation if you know, one. You know the other up a couple more ideas associated with a descriptive statistics, and that's something called exploratory data analysis to a fancy name. It just means exploring your data in a graphical kind of way and one of the things that's important in a graphics can include a table as well as a chart and and it's really important to understand if you have two variables, and for example, how are you going to display it. And so from you, you always have to remember the kinds of variables like we studied in top 10 pounds of number 10 are the categorical variables sometimes called nominal or qualitative herbals or are they quantitative variables like interval and ratio of variables or other continuous numbers like dollars or units okay some of the graphical tools on slide 18 relate to this for example, the first full point use a line chart, if you're drawing a trend over time so that means the x-axis the horizontal axis got the time in the y-axis the has got to use some other number some other usually continuous function doesn't have to be a continuous number can be a discrete members well to just make sure that the x-axis is time that so that's a line chart don't confuse that with the next one, which is a scattershot diagram or scatter chart in that particular case use a relationship between two variables to quantitative variables that Orza got to be a continuous amounts in general a tour you scatter plots and four, when one of the variables is a quantitative in the other's qualitative than we usually make a bar chart are sometimes called a column chart as well on a bar chart has the discrete elements of the qualitative burial ball on the x-axis horizontal axis and then the link to the bar on a y-axis is some kind of continuous variable doesn't have to be a continuous murder, but it usually is, and on our maps a bar chart sometimes pie chart to use for this proprietor to really a bench that choice statisticians don't use by charts, because it's difficult for humans to compute the to understand the area of us pie slice very easily and statistics is about communicating data and were always sensitive to them if we think that people want to be able to interpret what we try and avoid the kind chart even though yes I know it's an Excel who came to stick with a bar chart and number four is a histogram Mr. Graham is kind of like a bike sure party chart risk to become like a bar chart. But it's a little bit different to bar chart of a histogram is a frequency for each class of the measured data maximum and minimum divided by the number of classified as so the wit of a particular bar if you will in the histogram can vary based on how wide the classes in the histogram is the basis for how we build up, and what we see as random variables, what we observe his random variable or outcomes of random variables and build that up for the Central limit theorem to prove things like a normal which you should can ask all the histogram frequency for each class of the measured data for each segment of the particular data with less on the list is a box plot box plot isn't used as often as a scatterplot burp archer about a block spot is a graphical display based on quartiles, which is chunking up the in all of those numbers into four part which divides well as the power points as divides the data into four parts on the median is really dividing it into two parts of quartiles of calculation which divides it into four parts in other words of comedian cuts and into two parts, that is a median of the medium with sketch it into another for parts which tempted into 10 other two parts of the existing two parts to plate two times to us for as a call for Charles and a box plot can display this on the screen for each particular side of okay and that's top 10 concept number one descriptors to