Tuesday, December 10, 2019
Descriptive Statistics free essay sample
There are two main branches of statistics: descriptive and inferential. Descriptive statistics is used to say something about a set of information that has been collected only. Inferential statistics is used to make predictions or comparisons about a larger group (a population) using information gathered about a small part of that population. Thus, inferential statistics involves generalizing beyond the data, something that descriptive statistics does not do. Other distinctions are sometimes made between data types. Discrete data are whole numbers, and are usually a count of objects. (For instance, one study might count how many pets different families own; it wouldnââ¬â¢t make sense to have half a goldfish, would it? ) â⬠¢ Measured data, in contrast to discrete data, are continuous, and thus may take on any real value. (For example, the amount of time a group of children spent watching TV would be measured data, since they could watch any number of hours, even though their wa tching habits will probably be some multiple of 30 minutes. ) â⬠¢ Numerical data are numbers. Categorical data have labels (i. e. words). (For example, a list of the products bought by different families at a grocery store would be categorical data, since it would go something like {milk, eggs, toilet paper, . . . }. ) Scales of Measurement Statistical information, including numbers and sets of numbers, has specific qualities that are of interest to researchers. These qualities, including magnitude, equal intervals, and absolute zero, determine what scale of measurement is being used and therefore what statistical procedures are est. Magnitude refers to the ability to know if one score is greater than, equal to, or less than another score. Equal intervals means that the possible scores are each an equal distance from each other. And finally, absolute zero refers to a point where none of the scale exists or where a score of zero can be assigned. When we combine these three scale qualities, we can determine that there are four scales of measurement. The lowest level is the nomi nal scale, which represents only names and therefore has none of the three qualities. A list of students in alphabetical order, a list of favorite cartoon characters, or the names on an organizational chart would all be classified as nominal data. The second level, called ordinal data, has magnitude only, and can be looked at as any set of data that can be placed in order from greatest to lowest but where there is no absolute zero and no equal intervals. Examples of this type of scale would include Likert Scales and the Thurstone Technique. The third type of scale is called an interval scale, and possesses both magnitude and equal intervals, but no absolute zero. Temperature is a classic example of an interval scale because we know that each degree is the same distance apart and we can easily tell if one temperature is greater than, equal to, or less than another. Temperature, however, has no absolute zero because there is (theoretically) no point where temperature does not exist. Finally, the fourth and highest scale of measurement is called a ratio scale. A ratio scale contains all three qualities and is often the scale that statisticians prefer because the data can be more easily analyzed. Age, height, weight, and scores on a 100-point test would all be examples of ratio scales. If you are 20 years old, you not only know that you are older than someone who is 15 years old (magnitude) but you also know that you are five years older (equal intervals). With a ratio scale, we also have a point where none of the scale exists; when a person is born his or her age is zero. Random Sampling The first statistical sampling method is simple random sampling. In this method, each item in the population has the same probability of being selected as part of the sample as any other item. For example, a tester could randomly select 5 inputs to a test case from the population of all possible valid inputs within a range of 1-100 to use during test execution, To do this the tester could use a random number generator or simply put each number from 1-100 on a slip of paper in a hat, mixing them up and drawing out 5 numbers. Random sampling can be done with or without replacement. If it is done without replacement, an item is not returned to the population after it is selected and thus can only occur once in the sample. Systematic Sampling Systematic sampling is another statistical sampling method. In this method, every nth element from the list is selected as the sample, starting with a sample element n randomly selected from the first k elements. For example, if the population has 1000 elements and a sample size of 100 is needed, then k would be 1000/100 = 10. If number 7 is randomly selected from the first ten elements on the list, the sample would continue down the list selecting the 7th element from each group of ten elements. Care must be taken when using systematic sampling to ensure that the original population list has not been ordered in a way that introduces any non-random factors into the sampling. An example of systematic sampling would be if the auditor of the acceptance test process selected the 14th acceptance test case out of the first 20 test cases in a random list of all acceptance test cases to retest during the audit process. The auditor would then keep adding twenty and select the 34th test case, 54th test case, 74th test case and so on to retest until the end of the list is reached. Stratified Sampling The statistical sampling method called stratified sampling is used when representatives from each subgroup within the population need to be represented in the sample. The first step in stratified sampling is to divide the population into subgroups (strata) based on mutually exclusive criteria. Random or systematic samples are then taken from each subgroup. The sampling fraction for each subgroup may be taken in the same proportion as the subgroup has in the population. For example, if the person conducting a customer satisfaction survey selected random customers from each customer type in proportion to the number of customers of that type in the population. For example, if 40 samples are to be selected, and 10% of the customers are managers, 60% are users, 25% are operators and 5% are database administrators then 4 managers, 24 uses, 10 operators and 2 administrators would be randomly selected. Stratified sampling can also sample an equal number of items from each subgroup. For example, a development lead randomly selected three modules out of each programming language used to examine against the coding standard. Cluster Sampling The fourth statistical sampling method is called cluster sampling, also called block sampling. In cluster sampling, the population that is being sampled is divided into groups called clusters. Instead of these subgroups being homogeneous based on a selected criteria as in stratified sampling, a cluster is as heterogeneous as possible to matching the population. A random sample is then taken from within one or more selected clusters. For example, if an organization has 30 small projects currently under development, an auditor looking for compliance to the coding standard might use cluster sampling to randomly select 4 of those projects as representatives for the audit and then randomly sample code modules for auditing from just those 4 projects. Cluster sampling can tell us a lot about that particular cluster, but unless the clusters are selected randomly and a lot of clusters are sampled, generalizations cannot always be made about the entire population. For example, random sampling from all the source code modules written during the previous week, or all the modules in a particular subsystem, or all modules written in a particular language may cause biases to enter the sample that would not allow statistically valid generalization. NON-PROBABILITY SAMPLING Non-probability sampling is a sampling technique where the samples are gathered in a process that does not give all the individuals in the population equal chances of being selected. In any form of research, true random sampling is always difficult to achieve. Most researchers are bounded by time, money and workforce and because of these limitations, it is almost impossible to randomly sample the entire population and it is often necessary to employ another sampling technique, the non-probability sampling technique. In contrast with probability sampling, non-probability sample is not a product of a randomized selection processes. Subjects in a non-probability sample are usually selected on the basis of their accessibility or by the purposive personal judgment of the researcher. The downside of this is that an unknown proportion of the entire population was not sampled. This entails that the sample may or may not represent the entire population accurately. Therefore, the results of the research cannot be used in generalizations pertaining to the entire population. TYPES OF NON-PROBABILITY SAMPLING CONVENIENCE SAMPLING Convenience sampling is probably the most common of all sampling techniques. With convenience sampling, the samples are selected because they are accessible to the researcher. Subjects are chosen simply because they are easy to recruit. This technique is considered easiest, cheapest and least time consuming. CONSECUTIVE SAMPLING Consecutive sampling is very similar to convenience sampling except that it seeks to include ALL accessible subjects as part of the sample. This non-probability sampling technique can be considered as the best of all non-probability samples because it includes all subjects that are available that makes the sample a better representation of the entire population. QUOTA SAMPLING Quota sampling is a non-probability sampling technique wherein the researcher nsures equal or proportionate representation of subjects depending on which trait is considered as basis of the quota. For example, if basis of the quota is college year level and the researcher needs equal representation, with a sample size of 100, he must select 25 1st year students, another 25 2nd year students, 25 3rd year and 25 4th year students. The bases of the quota are usually age, gender, education, race, religion and socioeconomic status. JUDGMENTAL SAMPLING Judgmental sampling is more commonly known as purposive sampling. In this type of sampling, subjects are chosen to be part of the sample with a specific purpose in mind. With judgmental sampling, the researcher believes that some subjects are more fit for the research compared to other individuals. This is the reason why they are purposively chosen as subjects. SNOWBALL SAMPLING Snowball sampling is usually done when there is a very small population size. In this type of sampling, the researcher asks the initial subject to identify another potential subject who also meets the criteria of the research. The downside of using a snowball sample is that it is hardly representative of the population. WHEN TO USE NON-PROBABILITY SAMPLING This type of sampling can be used when demonstrating that a particular trait exists in the population. * It can also be used when the researcher aims to do a qualitative, pilot or exploratory study. * It can be used when randomization is impossible like when the population is almost limitless. * It can be used when the research does not aim to generate results that will be used to create generalizations pertaining to the entire population. It is also useful when the researcher has limited budget, time and workforce. * This technique can also be used in an initial study which will be carried out again using a randomized, probability sampling. Definition of Statistics Statistics like many other sciences is a developing discipline. It is not nothing static. It has gradually developed during last few centuries. In different times, it has been defined in diff erent manners. Some definitions of the past look very strange today but those definitions had their place in their own time. Defining a subject has always been difficult task. A good definition of today may be discarded in future. It is difficult to define statistics. Some of the definitions are reproduced here: (1) The kings and rulers in the ancient times were interested in their manpower. They conducted census of population to get information about their population. They used information to calculate their strength and ability for wars. In those days statistics was defined as ââ¬Å"the science of kings, political and science of statecraftâ⬠2) A. L. Bowley defined statistics as ââ¬Å"statistics is the science of countingâ⬠This definition places the entries stress on counting only. A common man also thinks as if statistics is nothing but counting. This used to be the situation but very long time ago. Statistics today is not mere counting of people, counting of animals, counting of trees and counting of fighting force. It has now grown to a rich methods of data analysis and interpretation. (3 ) A. L. Bowley has also defined as ââ¬Å"science of averagesâ⬠This definition is very simple but it covers only some area of statistics. Average is very simple important in statistics. Experts are interested in average deaths rates, average birth rates, average increase in population, and average increase in per capita income, average increase in standard of living and cost of living, average development rate, average inflation rate, average production of rice per acre, average literacy rate and many other averages of different fields of practical life. But statistics is not limited to average only. There are many other statistical tools like measure of variation, measure of correlation, measures of independence etcâ⬠¦ Thus this definition is weak and incomplete and has been buried in the past. (4) Prof: Boddington has defined statistics as ââ¬Å"science of estimate and probabilitiesâ⬠This definition covers a major part of statistics. It is close to the modern statistics. But it is not complete because it stress only on probability. There are some areas of statistics in which probability is not used. (5) A definition due to W. I. King is ââ¬Å"the science of statistics is the method of judging collection, natural or social phenomena from the results obtained from the analysis or enumeration or collection of estimatesâ⬠. This definition is close to the modern statistics. But it does not cover the entire scope of modern statistics. Secrist has given a detailed definition of statistics in plural sense. His definition is given on the previous. He has not given any importance to statistics in singular sense. Statistics both in the singular and the plural sense has been combined in the following definition which is accepted as the modern definition of statistics. statistics are the numerical statement of facts capable of analysis and interpretation and the science of statistics is the study of the principles and the methods applied in collecting, presenting, analysis and interpreting the numerical data in any field of inquiry. â⬠staà ·tisà ·tics (st-tstks)n. 1. (used with a sing. verb) The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling. 2. (used with a pl. verb) Numerical data.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.