Monday, May 29, 2017

Avoid atheoretical data analysis - a rule for any data specialist

The process of scientific discovery even within finance is important. One approach to finding new strategies could be to generate observations and then provide explanations, an inductive approach. The other is to first form a theory and then test a hypothesis, a deductive reasoning.  Much of machine and statistical learning is inductive reasoning where data are used to suggest general hypotheses. Deductive reasoning is used with experts systems where rules are made and then tested against the data.

The surge in data analysis is based on a belief that inductive analysis will be able to identify new relationships that may not have been previously hypothesized. The danger comes when the analysis is atheoretical. This has been given a name, HARKing (hypothesizing after the results are known). 

There also is mining of data for anomalies or risk premium that may not exist. Data can be tortured until it generated some level of significance, "p-hacking". A relationship is found, but only after the fact is it given an explanation. From data come stories not the testing of ideas. Call it meaning without structure. Now, we don't want to put all of these techniques on a trash heap to be ignored, but there is important room for experts and practitioners to guide and interpret what the data mines are producing. 

If data suggests a relationship but there is not an easy story to tell, the relationship should be suspect. Good modeling and data analysis tries to test hypotheses and stories and has an idea of what could be possibly found in the data. Can there be new surprises in data? Of course, but those should be the exception no the rule with machine learning.

No comments: