Correlating fish and water

Correlation:

  • Correlation is the tendency of two or more things to vary together.
  • There is a reciprocal and mutual relationship between them.
  • There is a regular association or connection.
  • They go together in a somewhat predictable way.
  • There is some relation existing between several phenomena or things.
  • Most usually it is between things which can be measured, although that goes beyond the core meaning.
  • There are various mathematical measures in use to describe correlation numerically.

Let’s look at an example:

Height and weight tend to go together. Tall people tend to be heavier. Short people tend to be lighter. Since the relationship is mutual, we can say the heavy people tend to be taller, and light people tend to be shorter. However, we know that some short people are very heavy, either muscular or overly fat. Some tall people are relatively light for the opposite reasons, not muscular or skinny. The degree of correlation varies.

Normally, correlation is computed mathematically. The common method is to compute a statistic called the Pearson Correlation Coefficient, also called Pearson r. It gives a measure ranging from -1 to +1. A Pearson r of -1 indicates that two measures are perfectly correlated, but move in the opposite direction. A Pearson r of +1 indicates that two measures are perfectly correlated, and move in the same direction. A Pearson r of 0 indicates that two measure show no relationship.

A lot of studies, particularly those reported on medicine and nutrition in the media, report correlation numbers (if they even do that). What they do not tell you is that correlation does not show that one factor causes the other. The correlations may be purely accidental, they may only show that some other factor is causing them both to mutually vary, or that one may actually be causally linked to the other.

For spurious and humorous correlations, see: https://www.tylervigen.com/spurious-correlations

Since the correlation coefficient varies from 0 to +/- 1, you might be forgiven for thinking that a correlation coefficient of .5 is a big deal. However, a better measure of the degree of association is given by the square of the coefficient. So, a Person r of .5, when squared, giving a statistic called the coefficient of determination, the proportion of variance of one measure predicting the other, shows that 25 percent of the variability can be determined by the correlation of .5. In similar fashion, a Pearson r of .4 gives a coefficient of determination of 16 percent. Maybe this correlation is not quite the big deal that you initially thought it to be.

So, what is the correlation between the presence of living fish and the presence of water? When you find living fish, you normally find water. It does not work the other way around, does it? It seems to me that the mutuality part is not in evidence. The correlation coefficient should not be all that high, if my thinking is correct.

This article goes into more detail: https://statisticsbyjim.com/basics/correlations/

It has been a few decades since I formally studied this stuff. My apologies if I dealt out misinformation; I don’t have a reviewer. However, the idea of correlation and its implications should be an essential part of your mental toolkit.

Systematic investigation

Science, a least in the official canon, starts with theories which then are used to generate hypotheses. These then are used to guide investigation.

With science, there is the stereotype picture of how it is done, a somewhat cartoonish view, and the reality of what is actually done. It is a flawed enterprise, subject to the same problems that arise in any field of expertise. See my article: Trusting the Experts on this site. See: https://wordpress.com/block-editor/post/ephektikoi.ca/533

The essence of good research is to study things in a systematic manner. A researcher will try to control as many factors as possible, and try to isolate key factors to see if they influence outcomes.

In experimental work, some factors of interest are identified. Those that are hypothesized to be factors of interest, to be affected by the experimental treatments, are called dependent variables. Those that are hypothesized to be factors of interest, to be varied in the hopes of obtaining results on the dependent variable, are called independent variables.

In some circumstances, in some areas of research, we can run studies which attempt to control factors which might influence the outcomes. Researchers set up experimental conditions to include a control group, which does not get the experimental treatment. This group of subjects is then assessed to see if they show the same results as the experimental group. There are limitations to such controlled studies, as not all areas of research lend themselves to such studies.

Experimenters try to reduce the effect of bias by making sure that the subject (if human) or the experimenter (if human), know which groups are given the various treatments. The first method is called single blind. The combined methods are called double blind.

In order to control for differences between groups which might affect the outcomes, the experimenters try to assign subject to groups in a random manner, so that the group selection process is not biased.

Despite all attempts to control variables, and determine if the effects are real, attempts to replicate studies distressingly often fail to give the same results as the original work. Replication is actually not attempted very often, because it is not likely to lead to a published paper. When it is investigated, it shows that the experimental results are often not verified.

Experimental science is not the only mode of useful inquiry. There are many specialized types of systematic research methods. Different research areas need different research techniques. Controlled experiment is only one method, and not useful or possible in many investigations.

In addition to controlled experimental studies there are exploratory studies looking at data, maybe large data sets (big data), and trying to see if there are patterns worthy of further exploration.

There are various historical sciences that only use controlled studies in a peripheral manner. These would include such things as geology, paleontology, archaeology, and other historical sciences.

There are also various types of observational studies conducted in the soft sciences such as experimental psychology, and sociology, where surveys and questionnaires may be the tools of choice. There is a limited role for controlled studies in these areas.

Studies in field biology, botany, and zoology are also less reliant on controlled experiment. A lot of research is done in the field and the researcher seldom makes it into the lab. Systematic observation is often the main method. Experimental methods may be employed, but more as an adjunct to field work. More and more, things such as DNA analysis are used, but these techniques are not necessarily used as part of controlled experiments.

There are also some areas where the study is primarily based on mathematical and computer modelling, using data obtained in various ways, but lacking the controls possible in experimental methods.

Studies in any field, using various methods, are seldom definitive. All results, however systematically and competently done, are subject to interpretation. The way a study is written up might make it seen that the process was well structured and well managed, and the conclusions follow naturally. Anybody who actually does science knows that this is not the case. Any set of data, or results, must be interpreted, and there is always bias and subjectivity in that interpretation.

In addition, a study may only bear some tenuous connection to the theory and hypothesis under contention. In fact, the study may be perfect in all details, except it fails to answer any questions of interest. It may be quite irrelevant to the problem at hand, or it may at the least examine a trivial question, maybe competently, maybe not, but it does not matter.

There are some things that should be kept in mind with respect to science:

  1. Science is not done by consensus, the consensus may be wrong, and historically it often has been
  2. Science is often asserted to be self-correcting; you might get it wrong now, but eventually it will be made right. This however, is not necessarily the case. Sometimes, changes in viewpoints occur in what Kuhn called a paradigm shift in his 1962 book “The Structure of Scientific Revolutions”. See: https://plato.stanford.edu/entries/thomas-kuhn/. It does not follow that the new paradigm is better than the old. In fact, there are cases where some have argued that the new paradigm represented regress, not progress.
  3. Science is never settled, it is an evolving process, and the body of knowledge is constantly changing.

For further reading, it is useful to look at some of the literature on the problems with science. A good place to start is with a much-cited paper by John P. A. Iaonnidis,   Why Most Published Research Findings Are False, PLOS, Published: August 30, 2005. See:  https://doi.org/10.1371/journal.pmed.0020124