Musings on science

2020-07-16

# Overview

Science provides systematic method of investigation which often has produced a useful and secure understanding of some aspects of the world. But, it is frayed around the edges and maybe a little moth eaten. (See John P. Iaonnidis at https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124. Also see for instance: https://slate.com/technology/2017/08/science-is-not-self-correcting-science-is-broken.html)

Science is also quite incomplete, probably massively incomplete (although John Horgan disagrees: https://www.goodreads.com/book/show/250814.The_End_of_Science ).

Each discipline and sub-discipline has a body of knowledge consisting of the findings of the researches and the methods used in research.

All of these findings should be held tentatively, as a useful approximation, and subject to revision as understanding deepens. Scientific understanding may change incrementally, or change in large ways called a paradigm shift by philosopher of science Thomas Kuhn (see https://www.goodreads.com/book/show/61539.The_Structure_of_Scientific_Revolutions. It does not necessarily follow that the change is for the better, and there are some examples of scientific regress.

For a further analysis of other factors affecting scientific progress, see my post Trusting the experts https://ephektikoi.ca/2020/06/27/trusting-the-experts/

Science is often described as being self-correcting in the long run. Maybe it is, overall, but in my view, it seems to make progress by lurching, sometimes forwards, sometimes backwards. It is not always easy to see the self-correcting aspects, although I will allow that they may be there.

Fun for Uber-geeks? Maybe we can look at a frivolous example of how we might approach a topic with scientific analysis:

- Can you catch warts from touching a toad? Maybe, or maybe not!
- How would you find out?
- Why would you want to?
- What if there was a researcher with unlimited time who wished to win the Ig Noble Prize? (see https://www.improbable.com/ig-about/winners/)
- Maybe that person might be interested in running a program of studies.
- What would such a program look like?
- How would it be funded?
- Why don’t you set up your own research designs for this program of study?
- Some of the considerations are outlined below.

# Scientific Culture

The culture and infrastructure around science determine what gets studied, what gets funded, what gets published, and what notice is taken of research. For a discussion of these issues, see my post Trusting the experts https://ephektikoi.ca/2020/06/27/trusting-the-experts/

# Understanding

In research, we attempt to understand the world, in a systematic and useful fashion. We look for explanations, either to explain what has gone on in the past (postdiction) or to explain the course of future events (prediction). We look for regularities, consistent and useful patterns of explanation, and try to refine them and document them. We attempt to grasp them qualitatively and quantitatively.

Evidence is typically ambiguous. We have to interpret it, and each person may arrive at a different interpretation of any piece of evidence. We can only interpret things in terms of our prior beliefs about the world, and we are always subject to incentive, bias, and self-deception. Our understanding of events may be thoroughly confounded, confused, perplexed and baffled.

In order to remove some possible causes of misunderstanding, we attempt to use research designs that reduce confounding factors. We call these methods experimental controls. There is a whole literature on research control procedures, and interested readers can start by looking at a discussion at Wikipedia (see here: https://en.wikipedia.org/wiki/Scientific_control). The “gold standard” for research would be studies that try to reduce bias with full randomization, control groups, and double blind participants. However, not all areas can be explored using these techniques. In some research areas, experimental controlled studies can only play a very minor role.

# Causality

Causality is a deep topic, the subject of numerous discussions by philosophers, but yet part of everyday experience. In simplest terms, some event or events happen, and as a consequence, another event happens. Causes can be chained together and always will be in a thorough analysis.

Variability is the notion that things are subject to change (and these changes often seem random). Don’t underestimate how important this aspect of the universe is to human understanding.

Determinism is a philosophical view holding that all events are dependent completely on previously existing causes, and if we could set up identical conditions for another run, we would always get identical results. It has been debated for millennia.

Underdetermination happens when the available evidence is insufficient to determine which conclusion we can reach. Some philosophers make a case that all conclusions are underdetermined in one manner or another. See https://ephektikoi.ca/2020/07/16/underdetermination/ for more discussion.

Confounding factors are those not currently under investigation that may have caused the result. The interpretation of the study results is thus ambiguous. Confounding factors in research are also called ‘confounds’ for brevity.

Coincidence is seen when there is no causality, but there appears to be a pattern of causality

# Replication

We really want to establish the truth, accuracy, or reality of research claims. One method that is supposed to be used, but seldom is, is replication of the study. By replication, the researcher hopes to obtain the same results. In practice, replication is seldom done. Apparently replication research seldom gets funded, and replication studies seldom get published. When replication has been attempted to see how well research has been done, they have frequently found that studies are not replicated at a very satisfactory rate. This is still an ongoing area for debate. It has been termed the replication crisis. See for instance https://www.embopress.org/doi/10.15252/embr.201744876, https://www.nature.com/collections/prbfkwmwvz/ and this https://www.wyatt.com/blogs/quality-standards-for-the-life-sciences-pqs.html.

# Ethics

Ethics committees examine proposed research to determine if ethical guidelines for the institution and the discipline will be adhered to. These guidelines can apply to both human and animal research subjects.

# Dissemination of Body of Knowledge

Scientific knowledge is disseminated in various ways. These include:

- Conferences where there is the presentation of papers.
- Informal exchanges of information among experts such as informal get-togethers, colloquia, and chats over drinks.
- Papers peer reviewed and formally published in research journals.
- Scholarly visits, where researchers may come from another institution for a period of time, maybe for a research sabbatical.
- Formal and informal teaching, where experts systematically explain the body of knowledge of their discipline.

There are problematic aspects to many of these activities, and the progress of science is undoubtedly retarded because of them.

# Peer review

The peer review of journal articles prior to publication is supposed to make sure that research is reasonably sound, and that the results may be trusted. The peer review process is unfortunately flawed. Research studies make it through the peer review process without being properly vetted in numerous cases. On the other hand, it tends to filter out studies which challenge accepted dogma. There are numerous critiques of the peer review process, and these can be readily found through Internet searching. See for instance https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7197125/

# Statistics

Statistics is a difficult mathematical study for most of us. I am not going to try to teach anyone statistics; that would be foolish. I am going to present as clearly and as briefly as I can some of the important ideas. This is a very cursory overview. Don’t take any of this as gospel; it has been decades since I formally studied the subject, and statistics for me was three courses and a few years running data through computer programs, not a deep study.

## Descriptive Statistics

Statistics can be broken into two major areas, descriptive statistics and inferential statistics. I will give a quick explanation of each below.

### Population and Sample

Inferences can be made from the results in a sample to the population at large using probabilistic methods, statistical inference.

### Independent Measurement

Statistics starts with measurement. There must be some scale or metric, and there must be a device for making the measurement. There will be some error in any measurement; we and our machines are not perfect. There will also be some variability in the measure, since the world seems to give us variable results, even when we are trying to be very careful. Some type of measurements are horrible in this regard, some are not.

There are three sorts of scales used in measurement for statistical purposes. These are categorical, ordinal, and continuous. For more information, see: https://www.scalelive.com/scales-of-measurement.html

### Variables

Measures taken of some factor of interest are called variables. Variables that we wish to study as outcomes of manipulations are termed dependent. Variable which we wish to manipulate to see their effect on outcomes are called independent. Experimental studies aim to determine how changes in the value of independent variables in a sample effect changes in the dependent variables in the sample. It is possible to investigate if the relationships are causal.

### Central tendency

There are three types of measurements all called averages. The first, the mean, is what most of us think of as average. For this statistic, add up all the figures and divide by the number of figures. The second, the mode, is not so well known. It is the figure that occurs most often in the set of measurements. The third, the median, is also not well known. It is the number having half of the measurements lower, and half of the measurements higher.

### Dispersion of a Set of Measurements

There are three common measures of how spread out, how dispersed the data are. These are range, variance and standard deviation. The first, range, is simply the difference between the lowest and highest numbers. The second, variance, is a measure of how far a set of numbers are spread out from their average value. The third, standard deviation, is the square root of the variance. There are many tutorials on the Internet showing how to calculate these statistics.

### Histograms and Graphical Representations

Data are often displayed with charts and graphs to give insight into patterns of the measurements. Pie charts, bar graphs, scatter plots, histograms, two dimensional and three dimensional plots, and other graphical displays are routinely used.

### Tabular Representations

It is very common to display graph in tabular format, to aid in understanding. Data may be charted using multiple dimensions specific to the needs of the analyst.

### Shape of a Distribution

When a set of continuous numbers are plotted on a graph, using the count for a variable on the vertical axis, and the value of the variable on the horizontal axis, we get a graph with a certain shape. A very common shape for many measures is the normal distribution, the so called bell-shaped curve. It may not be totally symmetrical, and there are some technical terms for the degree of deviation from the bell shape: kurtosis and skew. Kurtosis is a measure of how much the distribution is shaped by extreme values, outliers as they are called. Skew is a measure of asymmetry around the centre. More thorough explanations of these terms are again found on the Internet.

## Inferential Statistics

Descriptive statistics are combined with probability calculations to yield probabilistic inferences about the interpretation of the data. These come in two major varieties: frequentist statistics and Bayesian statistics. I formally studied frequentist statistics for research a few decades ago, and have only retained a bit. I have not studied Bayesian statistics formally, and have not yet succeeded in understanding it. Frequentist statistics are based on the odds of something being true based on the statistical properties of standard distributions. Bayesian statistics are based on the probabilities of something being true given some odds for prior evidence. Note that this is my conception, and may not be quite correct.

## Sampling

If you are running an experiment and want to select subjects or cases for your study, you want to control factors which can bias the results. So, if you want your experimental groups to be similar, or at least not biased in a way that can skew results, you need to take care with assigning subjects to each experimental group. The subjects should be assigned randomly, or maybe selected according to categories that are equivalently stratified in all cases. Without this, the interpretation of your results can be very problematic.

## Correlation

Correlation is the tendency of two or more things to vary together. There is a reciprocal and mutual relationship between them. I have briefly discussed correlation at https://ephektikoi.ca/2020/07/13/correlating-fish-and-water/

Linear regression is a method of using the ideas underlying correlation to predict the values of one dependent variable from the values of another independent variable. It yields a linear equation. Multiple regression extends this idea to predictions involving more than one independent variable.

## Probabilities

### Odds, chance, probability

The concept of chance is known to almost everyone. It is common place to assess the odds intuitively in almost all situations where there is uncertainty about outcomes. We are not particularly good at it, but we do it on a routine bases. Mathematicians, psychologists and numerous other academics have studied probability from multiple perspectives. Statisticians use it as the basis of their discipline. Statistics is the study of computed probabilities, and how to draw inferences from them.

### Sample versus population

When conducting a study, usually a subset of the total population is used as a sample of the broader group. Inferential statistics uses measurement from the sample, combined with probability calculations, to make inferences about the population. The researcher will explore the likelihood of apparent relationships being true in the population. The mathematics of probability and statistics allow this to be done.

### Being right and being wrong

In a binary world, there are two ways of being right and two ways of being wrong. We can say there was an effect when there is truly effect, or we can say there was no effect when there was truly no effect. We could call these respectively true positive and true negative, although no one bothers with these terms. Conversely we can say there was an effect when there is truly no effect, or we can say there was no effect when there was truly an effect. We call these respectively false positive (type I error) and false negative (type II error). These are analogous to missing that the house was on fire, or giving a false alarm.

In the statistical trade, these are referred to as type I and type II errors, although I don’t really like the terms, since they are arbitrary and confusing labels. Every time I use them, I have to go out and look them up again, since the terms do not stay in memory for me. I suspect that others have the same problem.

### Analysis methods and tests

Analysis methods and statistical tests proliferate. A person really needs a course in basic statistics to grasp the ideas, but they include such things as regression, multiple regression, analysis of co-variance (ANCOVA), analysis of variance (ANOVA), T-tests, Chi Square, and on and on. Each one yields probability values and maybe other derived statistics, primarily for the frequentist approach to analysis.

### Power

The power of a statistical conclusion is a measure of the trust we might place in our study to find a real effect in the population, if such exists. This concept of finding a real effect is called hypothesis testing. The power of a test is the probability of making a false negative finding, a type II error.

### Statistical significance

It is common in research to arbitrarily set a threshold of probability for a finding. Different disciplines use different thresholds. It is usual in many fields to accept the results of being correct if a statistical threshold of 95% is reached. The thinking is that you will be right 95 time out of 100, making a type I error 5% of the time. This is common in the frequentist approach, and has been criticized as being wrong headed and mis-applied. The Bayesian statisticians seem to have a different take on this, but I have yet to understand their reasoning.

### Effect Size

It is common to see reports on studies in the media saying that such and such a finding was significant. The problem with this is that the common implication of significant is important. The research meaning of significant is that the finding is likely to be real, and says nothing about how big the effect is. We can talk about the size of the effect as percentages or as actual values. We can give the effect size as absolute figures or figures relative to some other measurement. We can talk about the amount of variability accounted for in the dependent measure by the independent measures. All of these show the actual significance in terms of how big an effect we have found, as opposed to the likelihood of it being real.

## 2 thoughts on “Warts from touching a toad”