What is the importance associated with each of the following: levels of significance and the power of the test in the statistical inference?
Articulate a research problem and develop a research question in your field of interest. Insofar as your question should require you to test a statistical hypothesis, state the null and alternative hypothesis that you are seeking to test. Discuss the principal criteria that would influence your decision with regard to the outcome of the statistical test of the hypothesis.
THE LOGIC OF STATISTICAL INFERENCE
Ian Hacking (1979) promoted that “logic has traditionally been the science of inference. He indicates that statistical inference is chiefly concerned with a physical property which has never been defined, because there are some reasons for denying that it is a physical property at all, its definition is one of the hardest of conceptual problems about statistical inferences. (Ian Hacking 1979)
On the other hand others have argued that statistical inference is the drawing of conclusions based on data collected from a sample of the population of interest. It helps assess the reliability of findings, because it allows generalizations to be made from the part to the whole. Inferential statistic is a guide to decision making and not the goal of the research. 1The objective is to learn about the characteristics of the population from the characteristics of the sample comprising your data and also permits you to make a decision about the null hypothesis using a confidence interval, to introduce the notion of a p-value, the levels of significance and statistical power.
When a researcher conducts experiments, the subjects are exposed to different levels of the independent variable (independent variable is a variable whose values are chosen and set by the researcher). Assuming an experiment contains two groups, the data from each group can be viewed as a sample of the scores obtained, providing all the subjects in the target population were tested under the same conditions to which the group was exposed. Assuming the treatment had no effect on the scores, each group scores could be viewed as an independent sample taken from the same population.
Each sample mean provides an independent estimate of the population mean and each sample standard error provides an independent estimate of the standard deviation of the sample means. Because the two means were drawn from the same population, you would expect them to differ only because of sampling errors. We assume that the distribution of these means is normal as the values are continuous and the probability distribution curve for the normal distribution is called the normal curve. Variables such as heights, weights, time intervals, weights of packages, productive life of bulbs etc. possess a normal distribution. Each combination of average value( Î¼) and standard deviation (Ïƒ) gives rise to a unique normal curve represented by N (Î¼ , Ïƒ), where N- number of people in sample population. Hence Î¼ and Ïƒ are called the parameters of the normal distribution.
These two possibilities can be viewed as the statistical hypotheses to be tested. The means drawn from the same population is referred to as the null hypothesis. The hypothesis that the means were drawn from a different population is called the alternative hypothesis. The characteristics of the two samples are used to evaluate the validity of the null hypothesis. If this probability is sufficiently small, then the difference between the sample means is statistically significant and the null hypothesis is rejected.
Inferential statistics are designed to help determine the validity of the null hypothesis. It detects differences in data that are inconsistent with the null hypothesis. Inferential statistic is classified as parametric or nonparametric. Parametric statistic is a numerical fact of a population that can only be estimated. It estimates the value of a population parameter from the characteristics of a sample. When using parametric statistics, assumptions are made about the population from which the sample was drawn. For the estimation to be justified, the samples must represent the population. Parametric statistics includes the t Test, the Analysis of variance (ANOVA) and the z Test.
Nonparametric statistic on the other hand, makes no assumptions about the distribution of scores underlying the sample. It is used when data does not meet the assumption of the parametric test. Nonparametric statistic includes the Chi-square and the Mann-Whitney U test.
CONFIDENCE INTERVAL AND HYPOTHESIS TESTING
A confidence interval is an estimated range of values which is likely to include an unknown population parameter or the estimated range being calculated from a given set of sample data. (Valerie J. Easton and John H. McColl’s). It describes the reliability of an estimate or the amount of uncertainty associated with a sample estimate of a population parameter with respect to a particular sampling method. It does not give a range for individual values in the population. Increasing the confidence level will widen the confidence interval. The confidence coefficient is the probability that the interval estimator encloses the population parameter and is represented by (1- Î±), where level of significance or significance level is denoted by alpha (Î±). If the confidence coefficient is (1- Î±), then 100(1- Î±) % is the confidence level. The confidence level is the percentage of the intervals constructed by the formula will contain the true value of µ.
For a population with unknown mean and known standard deviation , a confidence interval for the population mean, based on a simple random sample (SRS) of size n, is + z*, where z* is the upper (1-C)/2 critical value for the standard normal distribution.
Note: This interval is only exact when the population distribution is normal. For large samples from other population distributions, the interval is approximately correct by the central limit theorem.
The selection of a confidence level for an interval determines the probability that the confidence interval produced will contain the true parameter value. Common choices for the confidence level C are 0.90, 0.95, and 0.99. These levels correspond to percentages of the area of the normal density curve. For example, a 95% confidence interval covers 95% of the normal curve – the probability of observing a value outside of this area is less than 0.05. Because the normal curve is symmetric, half of the area is in the left tail of the curve, and the other half of the area is in the right tail of the curve. As shown in the diagram above, for a confidence interval with level C, the area in each tail of the curve is equal to (1-C)/2. For a 95% confidence interval, the area in each tail is equal to 0.05/2 = 0.025.
The value z* representing the point on the standard normal density curve such that the probability of observing a value greater than z* is equal to p is known as the upper p critical value of the standard normal distribution. For example, if p = 0.025, the value z* such that P(Z > z*) = 0.025, or P(Z < z*) = 0.975, is equal to 1.96. For a confidence interval with level C, the value p is equal to (1-C)/2. A 95% confidence interval for the standard normal distribution, then, is the interval (-1.96, 1.96), since 95% of the area under the curve falls within this interval.
Some interval estimates would include the true population parameter and some would not. A 90% confidence level means that we would expect 90% of the interval estimates to include the population parameter. To express a confidence interval, you need three pieces of information:
Margin of error
Given these inputs, the range of the confidence interval is defined by the sample statistic + margin of error; the uncertainty associated with the confidence interval is specified by the confidence level.
Find standard error, the standard error (SE) of the mean is:
SE = s / âˆš (n)
The critical value is a factor used to compute the margin of error. To express the critical value as a t score (t*):
Compute alpha (Î±): Î± = 1 – (confidence level / 100)
Find the critical probability (p*): p* = 1 – (Î±/2)
Find the degrees of freedom (df): df = n – 1
The critical value is the t score having n-1 degrees of freedom and a cumulative probability. From the t Distribution Table, we find the critical value.
The relationship between the confidence interval and the hypothesis test is that the confidence interval contains all the values of the population mean that could serve as the null hypothesis value (for equality of means) and the null hypothesis would not be rejected at the nominal type-1 error rate (Nelson 1990). Rather than rely solely on p values from statistical tests, there are practical advantages of using confidence intervals for hypothesis testing (Wonnacott 1987, Nelson 1990). Confidence intervals provide additional useful information since they include a point estimate of the mean, and the width of the interval gives an idea of the precision of the mean estimation. A direct manner of comparing pairs of means with confidence intervals is to the compute confidence interval for the difference between each pair of estimated means. If the confidence interval covers a value of zero, then the null hypothesis is accepted at the type-1 error rate of 700-p (Gardner and Altman 1989, Hsu and Peruggia 1994, Lo 1994), where p is the percent coverage of the confidence intervals. With this approach, the visual advantage of the confidence interval of the mean is lost. The individual means and their uncertainty will be obscured. Also, if several estimated means are to be compared, there will be n (n- 7)/2 separate confidence intervals of the differences to display (n=the number of means compared). This can be a very large number of confidence interval. (Robert W Smith).
Hypothesis testing is the procedure by which we compare the effects of a variable on another. There are 4 steps in this process and they are as followed:
State the null hypothesis H0 and the alternative hypothesis Ha.
Calculate the value of the test statistic.
Draw a picture of what Ha looks like, and find the P-value.
State conclusion about the data in a sentence, using the P-value and/or comparing the P-value to a significance level for evidence.
In step 1 we differentiate the two hypotheses by the statement being tested, usually phrased as “no effect” or “no difference” and this is H0, Null Hypothesis and the statement we suspect is true instead of H0 which is Ha, Alternative Hypothesis. Even though Ha is what we believe to be true, our test gives evidence for or against H0 only.
In step 2 we calculate the value of the test statistic. A test statistic measures compatibility between the H0 and the data. The formula for the test statistic will vary between different types of problems and the various tests employed for variable comparison.
In step 3 we draw a picture of what the distribution looks like, and find the P-value. P-value is the probability, (computed assuming that H0 is true), that the test statistic would take a value more extreme than that actually observed due to random fluctuation.
In the final step 4 we can now compare our P-value to a significance level and state our conclusions about the data in a sentence. In comparing the P-value to a significance level, Î± we can reject H0 ; if the P-value < Î±, and if H0 can be rejected, the results are significant. If H0 cannot be rejected, your results are not significant.
IMPORTANCE OF LEVELS OF SIGNIFICANCE AND THE POWER OF THE TEST IN THE STATISTICAL INFERENCE
LEVELS OF SIGNIFICANCE
Level of Significance is the amount of evidence required to accept that an event is unlikely to have occurred by chance and is denoted by Greek symbol (Î±). The popular significance level or critical p-value is 5% (0.05), 1% (0.01) and 0.1% (0.001) and is used in hypothesis testing as the criterion for rejecting the null hypothesis. Allen Rubin (2009) indicated that the cut-off point that separates the critical region probability from the rest of the area of the theoretical sampling distribution is called the level of significance.
The importance associated with each level of significance is as follows – First, to determine the difference between the results of an experiment and the null hypothesis. Then, assuming the null hypothesis is true; the probability of a difference that large or larger is computed. Finally, this probability is compared with the significance level. If the probability is less than or equal to the significance level, then the null hypothesis is rejected and the outcome is said to be statistically significant.
Another importance associated with each significant level is for a test of significance given that the p-value is lower than the Î±-level, the null hypothesis is rejected. For example, if someone argues that “there’s only one chance in a thousand this could have happened by coincidence,” a 0.001 level of statistical significance is being implied. Lower level of significance, require stronger evidence and run the risk of failing to reject a false null hypothesis (a Type II error) and would have less statistical power. Choosing level of significance is an arbitrary task, a level of 95% is chosen, and for no other reason than it is conventional.
The selection of an Î±-level inevitably involves a compromise between significance and power, and consequently between the Type I error and the Type II error.
POWER OF THE TEST
Power of test in Statistical Inference is the probability that the test will reject a false null hypothesis (i.e. a Type II error will not occur). As power increases, the chances of a Type II error decrease and probability is referred to as the false negative rate (Î²). Therefore power is equal to 1 âˆ’ Î², which is equal to sensitivity. Power analysis can be used to calculate the minimum sample size required to accept the outcome of a statistical test with a particular level of confidence. It can also be used to calculate the minimum effect size that is likely to be detected in a study using a given sample size. In addition, the concept of power is used to make comparisons between different statistical tests: for example, between a parametric and a nonparametric test of the same hypothesis. Statistical measure of the number of times out of 100 that test results can be expected to be within a specified range Most analyses of variance or correlation are described in terms of some level of confidence.
Two types of errors can result from a hypothesis test.
A Type I error, occurs when the researcher rejects a null hypothesis when it is true. The probability of committing a Type I error is called the significance level. This probability is also called alpha, and is often denoted by Î±.
A Type II error, occurs when the researcher fails to reject a null hypothesis that is false. The probability of committing a Type II error is called Beta, and is often denoted by Î². The probability of not committing a Type II error is called the Power of the test.
RESEARCH PROBLEM: Obese as well as Non-Obese people use the elevator at UWI
It has been observed on the St. Augustine Campus that obese people tend to use the elevator more than anyone else in UWI. From this observation for the process of hypothesis testing we can make both a null and alternate hypothesis. The null hypothesis being that the same numbers of obese and non-obese people use the elevator in UWI and the alternate hypothesis being that obese people use the elevator more than non-obese people in UWI. In order to test these hypotheses we must define obesity which is set at above a body mass index (BMI) of 30. The BMI of an individual can be calculated by dividing a person’s weight in kilograms by their height measured in meters squared.
As has been stated in the alternate hypothesis, we expect there will be more obese people in the elevator therefore the hypothesis can be said to be directional. For a directional hypothesis we now have one-sided criteria as was previously stated.
Due to the mathematical nature of our data analysis we are using a quantitative method from an experimental procedure and we would be collecting primary data through direct laboratory observation using a random probability sampling (Jackson, Sherri L. 2006; Creswell 2003).
The types of validation issues that will arise during the course of the experiment are
Content validation which seeks to determine whether the test covers a representative sample of the domain of behaviours to be measured and can be validated by asking experts to assess the test to establish that the items are representative of the trait being measured.
Concurrent criterion validation that checks the ability of the test to estimate the present performance and can be validated by correlating the performance on the test with concurrent behaviour.
Predictive criterion validation that will test the ability of the test to predict future performance and can be validated by correlating performance on the test with behaviour in the future.
Construct validation which will seek to test the extent to which the test measures a theoretical construct or trait and can be validated by correlating the performance on the test with performance on an established test (which is your control group or test).
After ensuring that the validation issues are being dealt with a method can be devised. However while coming up with a method we must also take into account ethical issues such as avoiding harm to participants and researchers, keeping all data recording confidential while minimizing on the intrusion of privacy on the participants by only recording pertinent results. Consideration of the work of others where all citations must be properly referenced and consent forms should be filled out by participants to ensure endorsed and factual results.
For our method we propose to station two researchers on each floor with consent forms a small table, a BMI machine and several cases of the 250ml Fruta pack drinks. Consensual participants will have their name, weight, height, BMI, floors travelled and time taken as data. If desired identification numbers can be substituted for names in published results to protect anonymity. On exiting the elevator participants will be ask to confirm the floors travelled in the elevator and will be spotted by having the incentive drink in hand. The experiment can be repeated twice more on another week to validate results. To subsidize the cost of the experiment, if similar research is being done, extra data can be taken by the researchers to incorporate data needed by other research parties. A control experiment can be done in an office building in a nearby city to further validate hypothesis claims.
The first step of data analysis will be to generate descriptive statistics which includes the mean, mode and median of participants BMI scores.
A graph will help in determining the inferential statistic test needed and the type of distribution. Assuming a normal distribution of BMI scores and a sample population of over 30 people, we can then determine that the standard deviation of the scores and a Z-test will determine if the traits examined are statistically inferable and if our null hypothesis can be rejected.
We can test for a BMI z-score of 30 which is the threshold for obesity. If the Z-score falls within the critical region we can reject the null hypothesis if not then the null hypothesis must be accepted. The critical value is taken at 0.1645 the rejection principal is illustrated in the graph below.
Other issues that will affect the experiment are
Time results were taken
Capacity of elevator
Multiple uses by same persons.
Operational times of the elevator.
How many Floors the participant is travelling.
Participation of people.
These factors can be minimized or eliminated by tweaking the proposed method or by limiting the data used, example use only data of people who travelled 2 floors and using a person’s data once.