Learning Objectives #
By the end of this chapter, fellows will have a clear understanding of:
- Scientific method History
- Experimental design
- Statistical terms
- Variables
Introduction #
Research in simulation is no different than any other kind of research. Research related to simulation is mostly focused on simulation technology, the testing of simulation methodology, and teaching methods and techniques that are used to improve the knowledge and experience of students or physicians in the healthcare system. Therefore simulation research may cover a spectrum of research activities described by Kirkpatrick’s classification. This may include but not be limited to observation, validity, reliability, efficacy, cost effectiveness, and the evaluation of healthcare outcomes that may be affected by educational processes. It is therefore important to understand the basic principles of research, such as the scientific method and experimental design, particularly as it pertains to patient simulation in health education.
Historical Notes #
Since the beginning of time human beings have been curious about the nature of things and how we gain knowledge. Today, psychologists classify knowledge into three categories: Personal knowledge, procedural knowledge and propositional knowledge.
Personal Knowledge –
Knowledge claimed by an individual, such as in statements like I know where the doctor’s office is. Personal knowledge relates to specific information that an individual has gained on their own and personally believes in.
Procedural Knowledge –
Knowledge of how to do things, such as swimming or riding a bike. In this case, the individual is claiming that they possess certain skills without necessarily knowing anything about the theories of thermodynamics or the physics of water and air in relationship to the body, etc.
Propositional Knowledge –
Essentially this kind of knowledge can be considered justified knowledge, or the knowledge of truth and facts, such as that the heart has four chambers.
The first two types of knowledge are simple to understand and do not create controversies. However, propositional knowledge requires justifications that must be based on facts or evidence that are the subject of scientific methods.
Throughout time human beings have ascribed to several schools of thought and philosophy in relation to propositional knowledge. The Ancient Greek schools of philosophy, such as the Hellenistic School, promoted the idea of skepticism for a long time. Many philosophers such as Gorgias (c. 487 – 376 B.C.), Pyrrho of Elis (c. 360 – 270 B.C.), Socrates and others believed that since we cannot ever actually reach the truth we should not claim to know the truth. The famous claim of Socrates, “I know one and only one thing, that I know nothing”, is proof of the extent of skepticism in this period. Pyrrhonian skepticism was described in a book called “Outlines of Pyrrhonism” by Greek physician Sextus Empiricus (c. 200 A.D.). Empiricus incorporated aspects of Empiricism into Pyrrhonian skepticism, stating that the origin of all knowledge is experience.
During the Dark Ages, all trends of rational thinking were totally suppressed by religious dogmatism. Around the 16th century, in the period of reasoning and enlightenment, the idea of skepticism flourished again through the work of Michel de Montaigne (1533-1592) in France and Francis Bacon in England. At this time it was René Descartes (1596 – 1650) who concluded that when he thinks, nothing could interfere with his existence, which is why he irrefutably exists. This famous phrase, “Cogito Ergo Sum”, provided a foundation for propositional knowledge and the development of the scientific method. It was also during this period that philosophical divisions such as epistemology, ontology and theology emerged.
In the context of knowledge justification, human beings have always been curious about natural phenomena and how they occur. What is the cause behind the natural occurrences and how can we control it? In the quest for answers, people slowly developed the modern idea of the “scientific method”. Ancient civilizations such as the Egyptians and others used empirical methods in astronomy, mathematics and medicine. The Greeks also contributed to the advancement of the scientific method, particularly with the work of Aristotle. During the Dark Ages, the work of Islamic scholars such as Ibn Sina (980 – June 1037), Al-Biruni (973-1048), and others also contributed to the development of the scientific method. However, an Islamic scholar named Alhazen, or Ibn al-Haytham (965 –1040), performed optical and physical experimentation and was the first to explicitly state that a controlled environment and specific measurements are required for an experiment in order to draw a valid conclusion. For this reason, some people call him the father of the scientific method. Today the scientific method is defined by the Oxford English Dictionary as a method or procedure that has characterized natural science since the 17th century, consisting of systematic observation, measurement and experiments, and the formulation of testing and modification of hypotheses. Scientific knowledge is reliable propositional knowledge that reflects the truth under specific circumstances and conditions. This knowledge is based on empirical fact, which is open to challenge.
Exploration of cause-event relationships and identifications of functional variables affecting the event led scientists in two important directions. First, to the verification of scientific theories with a generalizable character on the occurrence of events, secondly to assume control over the phenomena of interest. That is why it can be claimed that the scientific method today has both Epistemological and Ontological characteristics.
Epistemology—Introduced by James Frederick Ferrier (1808-1864), the term epistemology comes from the Greek “Epistem”, meaning knowledge and understanding and logos, the study of—is the branch of philosophy concerned with the nature and scope of knowledge, also referred to as the “theory of knowledge”. Epistemology is knowledge about knowledge; it questions what knowledge is and how it can be acquired. In Epistemology two types of knowledge are identified—1) Prepositional knowledge, the knowledge of what, and 2) Acquaintance knowledge, the knowledge of how.
Ontology—The philosophical study of being, existence or reality. It is a branch of metaphysics that deals with questions concerning what entities exist or can be said to exist and how these entities can be classified according to their similarities or differences.
The scientific method was used in applied science long before investigators such as James in 1890, Thorndike in 1903 and others used the scientific method in the fields of psychology and education to observe cause-effect relationships between teaching and learning variables. Under this paradigm of knowledge, the scientific method led to the creation of educational theories such as Behaviorism (Pavlov, Watson 1925; Skinner 1938), Cognitivism (Meisser 1967 and others) and Constructivism (Piaget 19…and Voygotsky19…..). The natural progression for investigators such as Norman, Brooks, Schmidt and others using the scientific method in psychology and education during this time was towards health education.
Today, the scientific method is used not only for examinations and testing of theories and philosophical concepts, but also in the field of Practice Applied Education and psychology. Using the scientific method, we are able to test if innovative approaches to medical education work; if the approaches apply to other areas proven effective for a group of learners in one area of knowledge; use of experts in various educational modalities, and others. Reporting on the scientific method requires proof of the validity of the scientific method; therefore in the beginning, experiments are repeated to prove that results are real and valid.
Based on the mathematical approach to the probability theory developed by Blaise Pascal (1623-1662) and Pierre Fermat (1601-1665), investigators began testing the validity of the scientific method. Perhaps Jules Gararret (1890) and Venn (1888) were the first to use terms such as “test” and “significance” in historical research. In the 1900s, K. Person developed the chi-squared test and W.S Gosset developed the T-distribution test called the “Student T-test”. The modern hypothesis for testing in scientific research was developed using Fisher’s (1925) “Significant Testing” idea and Pearson’s (1922) notion of “null hypothesis testing”. This simply means that the probability in the difference between two steps is insignificant (if you repeat the experiment you are most likely to not have the same results – consistent with the “null hypothesis”) or the probability between the two groups is significant (if you repeat the experiment it is likely to have the same results – rejecting the null hypothesis.) An experiment is valid only when the null hypothesis is rejected.
By definition, the term null hypothesis assumes that any kind of difference observed in a set of data is due to chance, and no statistical significance exists in a set of given observations (a simple variable is no different than zero). In scientific experimentation two types of error can occur:
- Type I: the null hypothesis is falsely rejected giving a “false positive”.
- Type II: the null hypothesis fails to be rejected and an actual difference between populations is missed giving a “false negative”.
Some believe that too much emphasis has been placed on the null hypothesis, while other investigators believe that it is not the difference between groups that is important but rather the size of the difference. Still others believe that focusing on the effects is not sufficiently broad to cover all aspects of the data analytics. Nevertheless, at this particular time, null hypothesis is the dominant model in scientific research.
In keeping with this concept, it is logical to relate the scientific method to three important principals of critical thinking: empiricism, use of empirical evidence; skepticism, avoidance of dogmatism and openness to challenge of the hypothesis; and realism, the use of logical reasoning.
The structural components of the scientific method therefore are as follows:
1. Construction of a question that can be spontaneous and comes from an observation, prior knowledge or identification of a gap in a practical or theoretical aspect of knowledge.
2. Information gathering to ensure that the question will solve the problem, or to establish a problem statement based on available information.
3. Proposal of a solution to the problem or an answer to the question using abstract or logical thinking. This is called the Scientific Hypothesis. A hypothesis is an informed, testable and predictive solution to a scientific problem.
4. Testing the hypothesis by experimentation or further observation using logic or empirical evidence. Experimentation is the most important part of the scientific method and has to be tangible and measurable. If a hypothesis cannot be proved by experimentation it is not a valid scientific hypothesis.
5. Establishment of scientific facts. The results of scientific experimentations that provide solutions to the problem, answer the question, and can be ended with a conclusion. A conclusion is a statement of the truth based on scientific hypotheses and tested by scientific experimentation, and a specific conclusion is drawn from that. The conclusion expresses a specific degree of confidence, usually 95%% or more.
6. Corroboration. This is when several evidences are in agreement of support, evidences gathered by independent authorities in the field. In science, when a hypothesis is tested, we take the data from several authorities and make sure ample reliable evidence is provided that would be irrational to deny. This type of reputed experimentation with similar results will go through the mainstream knowledge of the truth. This is called evidence-based knowledge, and can also be challenged in different circumstances. Science is always open to new evidence.
Experimental Design #
The concept of causality was described by David Hume (1711-1776), and later several methods were proposed by John Stuart Mill (1802-1873) for establishing cause-effect relationships. The randomized control trial method is based on Mills’ method of difference, which is based on the isolation of the effect of a single variable controlled by the investigator, an “independent variable”, and other variables that are observed, “dependent variables”. The measurement of the outcome of dependent variables affected by the independent variable and/ or manipulation of the independent variable will show if the change in dependent variables is caused by the effect of independent variables.
Structure of Experimental design #
Experimental design is the process of designing a study to meet specified objectives and be able to make inferences. In order to answer the research questions, one should conceder the following two important issues when planning an experiment:
- To ensure that the design provides the right type of data to be collected
- Sufficient sample size and power are available.
The steps of experimental design are the same as the components of the scientific method described previously, and include research questions, review of background literature, formulation of hypothesis, objectives, conduction of experiments, data collection, data analysis, making assumptions, conclusions and corroboration. At this point it is important to discuss some of the fundamental issues in the context of experimental design that may affect outcomes. These issues mainly revolve around the sample size calculation, experimental units, variables, treatment and design structure, as well as some basic statistical approaches to scientific research.
Experimental Units
EUs are individuals or objects under investigation by the researcher. Pending the objective of the study, the sampling unit can be students, teachers or a group of students or teachers. The Unit could also be animals or patients or groups of patients. It is important to note that sampling/experimental units are the smallest units of data being collected. In order to know how many experimental units are required for a particular experimentation it is essential to do sample size calculation.
Sample Size Calculation #
The first practical step for the design of an experiment is to know how big the experimental groups should be or how large a sample is required. In order to determine this important value for experimentation, it is important to keep in mind three important questions.
- How accurate the answer should be.
- What level of confidence is required for the experiment.
- Is there any prior knowledge in this area similar or close to the subject of the experiment?
If the answer to all these questions are available and the purpose of the experiment is to discover the proportion, the following formula is used:
ME=z √ P (1-P)
n
Where:
ME is the desired margin of error
z is the z score,
P is the prior judgment and
n is the sample size
The investigator or experimental designer determines the desired margin of error, usually within 1-4%%. The experimental designer may use any value between 1 and 4%%. The Z-score is a statistical measurement of scores under a normal curve in relation to the mean. It is calculated to discover the probability of the mean in relation to the confidence interval. A Z-score of 1.96 has a confidence interval of 95%%, which is often used for experiments in this area.
When the estimated value is not proportionate to the mean, or in cases when the standard deviation is unknown or the sample size is less than 30, the following formula with a T-score is used instead of a Z-score for the calculation of sample size.
This formula is:
ME= t S
√ n
Where:
ME is the desired margin of error
T is the score used to calculate the confidence interval that depends on both the degree of freedom and the desired confidence interval
S is the standard deviation
n is the sample size.
This formula is used when n is less than 30. Under these conditions, the value of T is very close to the value of Z. This is why the same value of 1.96 for a 95%% level of confidence is used, and this is also why in both formulas 1.96 appears as a constant. Since the standard deviation is not known, the value of S is determined by prior knowledge/experiment or simply guessed.
Note: Additional information on the calculation of sample size is provided in the reading references in this chapter.
Variables
By definition a variable is a characteristic that may assume more than one set of values to which a numerical measure can be assigned. In any experimental design, four types of variables are taken into consideration:
- Primary Variables or Variables of Interest
- Constant Variables
- Background Variables
- Uncontrollable / Hard to Change Variables
These are independent variables, also called factors, which form the treatment plan in the experimental design and serve as factors that cause the effect and possible variations in response.
Variables that are not part of the treatment but can affect the experiment and can be controlled during experimentation. These variables may include the use of equipment, standard procedures and operators, measuring devices, time, location and others.
These are not variables of interest or treatment but are present by default and may influence the outcome of the experiment. The important characteristic of background variables is that they are measurable but cannot be controlled. Background variables are treated as co-variants in experimental design. Statistical methods of co-variant analysis are used to remove the effect of these background variables.
These are like background variables in that they are not variables of interest but are able to affect the variables of interest and subsequently influence the outcome and conclusion of the experiment. Characteristics of these variables include that certain conditions prevent them being measured, controlled or manipulated.
Treatment Structure #
Treatment Structure must include the factors the researcher is interested in (also called Independent Variables or Primary Variables), which are directly related to the primary objective of the study and form the conclusion of the study. In the treatment strategy, one can study a single factor with a single effect; a single factor with multiple effects; multiple factors with a single effect or multiple factors with multiple effects. It is also important to define the level of the factor of interest. There can be a range or variety of levels (also called subsets) in a treatment factor, which is also important to take into consideration in the experimental design. Treatment factors can be fixed or random. Fixed factors are when the factor has a small number of defined levels that are considered in the experimental design, such as gender (Male or Female). Also, a range of levels may exist in a treatment factor but for the purposes of the experiment, specific levels are chosen (such as specific years and the age of a person). If the level of treatment is not defined and the levels are chosen randomly, it is called a random treatment factor.
When a combination of > 2 factors or levels are considered in the experimental design, it is important to be sure they are logical and have a compound effect, such as synergism, antagonism and others. Sometimes the treatment factor is repeated several times in the same experimental unit. This is referred to as replication. Replication should not be considered an additional experimental unit. Replication is important for reducing errors in an experiment, and is different than repeated measures. Repeated measures are when the effect of the same factor is repeatedly measured on the same subject in specific time intervals. In this case each set of data is treated as new set of data.
In consideration of treatment structure it is important to consider the following assumptions:
- Since the independent variable always affects the dependent variable, the experiment has only one direction, from Independent ⟹ Dependent, not the other way around.
- Since only experimental variables are systematically manipulated, it is important to make sure that other explanations (such as Background Variables, Constant Variables, Uncontrollable Variables) for the difference are eliminated.
Design Structure #
In an experimental design, experimental units are allocated to a treatment factor randomly. Biases are inherent in every aspect of experimental design, therefore it is important not to try to eliminate biases but to understand which biases will be acceptable in each particular circumstance. Randomization, essentially meaning the allocation of experimental units randomly, is important for reducing the incidence of bias.
If a specific constraint is introduced to randomization, this is called Block Design. For example, experimental units such as students could be assigned to a treatment randomly or students can be divided in groups, or blocks, by characteristics that make them homogenized. The former is called Completely Randomized Design, where units are assigned to treatment randomly, and the latter is called Randomized Complete Block Design, where the blocks are formed first and then randomly allocated to the treatment factors. Block Design provides stratification of the experimental units with similar values and consideration of the effect for each group. This means that the groups did not differ systematically from each other on any other variables that might cause a difference in the outcome.
If a particular variable is measured after the groups are formed, it is called a covariate, and the effect of this covariate can be removed using statistical analysis of covariates. Therefore the only effect left will be caused by the independent variable.
No matter how these methods are used to eliminate extraneous variables and control the effect of independent variables, it is impossible to eliminate all alternative variables. As such it is advised to utilize randomization of subjects in the formation of experimental groups and then create randomization of the groups/block. Randomization will not remove or equalize the effects of the alternative variables, but it will eliminate bias in experimental design.
Data Analysis #
The first and most important issue in data analysis is to develop specific documents like worksheets, data collection sheets, tables, etc., at the time of experimental design, that reflect the data in your objective and conclusions. This will help to properly document the collected data in relation to the results and outcome of experimentation. The data can be placed or transferred into a database sheet with a particular statistical package or simply in an excel sheet for statistical analysis. The type of statistical analysis required for data analysis is directly related to your research question, objectives, hypothesis and inferences.
The term Statistical Significance is complex. In statistics the black and white (Y or N) answer to a research question is extremely difficult. Most of the time statistics offer a level of significance reflected as a P-value, which shows the probability of a value accurately rejecting the null hypothesis. If the experiment is designed for a confidence level of 95%%, the P-value is ≤ 0.05; for 98%%, the P-value is ≤ 0.01; if 99%%, the P-value is ≤ 0.001. P-value is not the indicator of the size or importance of the observed effect. Simply put, P ≤ 0.05 indicates that if the experiment is repeated 100 times, 95%% of the time the same results will be obtained.
Data analysis has two main steps. The first step in data analysis is descriptive statistics that assess if the data obtained by the experiment meet all the requirements of the experimental design. Descriptive statistics can be used to summarize population data. Numerical descriptors include mean and standard of deviation as well as frequency and percentage for categorical data. Normal distribution is a bell-shaped curved with equal distribution on both the left and right side of the mean. Normal distribution is a mathematical assumption that indicates which probabilities are made on the basis of repeated measurements of specific variables. For example, normal distribution would tell us how many students did extremely well or extremely poorly on an examination in relation to the mean, median or mode, as charted by the majority of the students’ scores. Although most of the time it is assumed by researchers that their data will have a normal distribution with a bell-shaped curve, this is not always the case. Therefore it is important to look for the following abnormalities in the distribution of data.
First, distribution can be skewed to the left or right. When a median exceeds the mean, the curve is skewed positively to the right. In our example this means that more students who took the examination have a higher score than the mean score. When the median is less than the mean, the curve will be skewed negatively to the left. In our example this indicates that more students have a lower score than the mean score.
The shape of the bell curve can also be challenged. It can be flat or slim with a high peak. This is called kurtosis. Flat curves are called playtokurtic and tall slim curves are called leptokurtic in relation to mesokurtic, which is a normal distribution. The curve appears flat when there is a large amount of variability in the measurement with a large number of standard deviation from the mean. Slim peaked curves occur when there is little variability in the measurement and the standard deviation is very small. The area under the normal curve on both sides of the mean reflects the units of standard deviation, also referred to as the Z-score. In a normal distribution, the mean distribution has a Z-score of 0 and a standard deviation of 1.0. In the calculated value of a normal distribution, the area under the bell on both sides of the mean with a standard deviation of 1 is 34.13%%; with a standard deviation of 2 is 13.59%%; and with a standard deviation of 3 is 2.15%%. The calculation of these areas with a Z-score will provide an indication of the normality of the curve. Z-scores are also used to calculate sample sizes for experiments.
Bell curves are also used in education to determine percentile scores of students. If it is assumed that the mean is 50%%, the distribution on both sides of the mean will indicate where a particular student fits in relation to the mean score of the other students. In educational circles, normal distribution is also used to obtain other scores such as T-scores.
Scatter plot (Scatter graph) is another form of assessment of the distribution of experimental data, where individual data points are plotted in an area between the X and Y-axis.
Statistical analysis of experimental data is mainly about the relationship between dependent variables acquired as a result of the effect of the independent variables. The cause-effect relationship is established not only by observing the effects in a treatment group, but also by comparing the effects with an identical group where all of the conditions of the experiment are met without the treatment (control group). Therefore, the second step of data analysis is called Inferential Statistics. Inferential statistics is used to draw meaningful conclusions about the entire population. These inferences may include a simple yes or no answer to the scientific questions and hypothesis testing, estimated numerical characteristics of the data, describing relationships within the data as well as estimates of forecasting and predictions.
The most common inferential statistical analysis for the comparison of two independent groups is called an Unpaired T-test or a Student Test. If the subject of investigation is only one group but measurements were taken before and after the treatment, the statistical analysis used is called a Paired T-test. If the treatment is applied to several (2-3 or more) independent groups, the statistical analysis required is ANOVA (analysis of variance), a statistical test that shows if the mean of several groups are equal or if there are differences amongst group’s means and associated procedures. This is like a T-test but for more than 2 groups, and calculates variations amongst and between the groups.
For a single factor, a One-Way ANOVA studies the effect of a factor on different groups. A Two-Way or Multi-Factor ANOVA uses statistics to analyze multiple factors on several groups. When the experiment includes observations of the variety of the levels of the same factor, it is called factorial. In cases where the effects of treatment are observed in one group but repeated in several intervals, the appropriate statistical package is a Repeated Measure ANOVA.
In complex conditions when the effect of several treatments is observed in several groups or in several time intervals, and the researcher wishes to identify the cross factor effects among all treatments and groups, ANOVA with Orthogonal Contrast is applied.
Correlation is dependence between two random variables or two sets of data, and is used to indicate predictive relationships between variables not related to causality. In practice the degree of correlation is important and is calculated by Person’s Correlation Coefficient, which is speculative as to the strength of relationships amongst variables.
Regression Analysis is a statistical process for estimating the relationship among dependent and independent variables or variances, explaining how the value of dependent variables are changed when the effect of one independent variable is blocked or controlled. Regression analysis may predict or forecast the causality but one must be extremely careful when expressing this type of relationship as a cause-effect relationship. The major difference between correlation and regression is that correlation is all about the relationship between values and variables (X and Y) and it doesn’t matter where the position of X and Y is. Regression on the other hand is concerned with the effect of one variable independent on a dependent variable as a one-way street. It matters where the X and Y are located. Correlation cannot speak to causality but regression is all about causality and cause-effect relationships.
When the distribution of data is skewed, the numbers of experimental units are small or when the effects of the treatment are measured not by integer but by categorical data, nonparametric statistics are applied. Nonparametric statistics are where probability distribution is not based on preconceived parameters, which is why nonparametric tests don’t make assumptions on probability distributions. The parameters for calculation in the nonparametric arena is to not come from the data but be generated from the data. Some statistical analysis with nonparametric statistics that may be useful are listed below:
Histograms – Show nonparametric probability distribution in a graph. Carl Pearson described this data for nonparametric use.
Man Whitney’s U-Test – For the comparison of two samples coming from the same population with asymmetric distributions, when one population has a larger value than the other. It is essentially a T-Test for nonparametric values. In this test, a median is used instead of a mean.
Nonparametric regression –
Cohan’s Capa – This test measures interrated agreement for categorical items.
Freedman’s test – A nonparametric version of the two-way analysis of variance, determining when K treatment is in randomized block designs and has identical effects.
Corcoran’s Q-test – A test in which K treatment is in randomized block design with 1-0 outcomes having identical effects. This is used for the analysis of two-way randomized block design, where the response variable can express only two possible outcomes (0, 1).
Crisco Wally’s test – A one-way ANOVA nonparametric version that tests whether less than two independent samples are drawn from the same distribution. This is an extension of the Man Whitney U-test where there are more than two groups.
Candle’s W-test – Used for assessing agreement amongst raters, much like the Person’s Correlation Coefficient but not requiring probability distribution and can handle any number of distinct outcomes.
NOTE: This chapter is an introduction to the scientific method and experimental design, and certainly does not cover all aspects of research. The intent was to provide some general and basic understanding of the scientific design and experimental method for beginners. For more information on this topic one can take specific statistic courses and read other available literature.