IU X Informatics Unit 4 Lesson 3 Data Deluge Implications for Scientific Method

So here is an interesting visual,
it comes from Wired. Wired had an issue in issue in 16-07
which was called The End of Science. And basically,
this issue discussed the point that the big data revolution
is changing science. So it’s not really
the end of science. It’s just the start of a new
science, dubbed the new methodology, which it data oriented
rather than theory oriented, or more precisely,
hypothesis oriented. The traditional way you analyzed
data was to state an hypothesis. Then you asked if that
hypothesis was true. This is how you were taught
to do experimental science. Now, that’s not so
obviously what you do. You just look at the data, and
you don’t state the hypothesis. You just see what
the data tells you. So that’s data-driven science
as opposed to hypothesis-driven science. So The End of Science here is
the end of hypothesis-driven and the end of theoretical science. It’s of course not,
it’s not really ended. It’s just not the only game in town. So let’s discuss this a little bit, so, There are now four
paradigms identified, paradigms or methodologies of
doing scientific research. The traditional ones were
around theory and experiment, where experiment is
driven by hypotheses. And the trivial example is Newton,
who looked at apples falling, and that helped him to design
his theory of mechanics. Then, around 1980,
it became clear that there was a third approach to science,
which was basically simulation. So this is taking a theory or often, actually, inverted commas,
just a model. And you took that theory or model,
and you used it to simulate what the theory or model predicted for
a certain observation. So that’s the so-called
computational science or the third paradigm of
scientific research. The fourth paradigm,
which was identified most clearly by this online book from
Microsoft with the URL here. The so-called
Fourth Paradigm book called Data-Intensive Scientific Discovery. That’s data-driven paradigms
of scientific research. Note this book is free, and it
basically, it has what we’ve said. It has a data-oriented rather than
a model-oriented view of discovery. So here’s an interesting comment. Namely, around 1990,
I said, computing. This method three became known
actually before that, but it became particularly
prevalent around 1990. Cuz there was a big initiative
started based on this importance to build supercomputers and
things like that. And I thought this
was pretty important. And I tried to persuade Caltech,
a university I was at at the time, to actually adopt a so-called
computational science curriculum and train people in this area. And I was not successful. They, possibly wise, they said this
was premature and inappropriate. And now, we sort of are doing
the same thing, but for data science, not for
computational science. Now, we can wonder if this
is going to succeed or not. Will data science curricula succeed? And I just note here that there is a
difference between data science and computational science, and
that is the number of jobs. The number of jobs, as we discussed,
in data science is pretty large. McKinsey tells us there’s
190,000 nerds and 1.5 million more general
people needed by 2018. So given the number of jobs,
it is reasonably interesting to ask whether we
shouldn’t directly train people. Get people with degrees
in data science rather than degrees in physics or
chemistry or computer science. Maybe degrees in data
science are good. There’s an alternative,
which is to give people degrees in computer science with
a specialization in data, or degrees in physics with
a specialization in data. This type of choice is
not quite clear, and I do not know how it will emerge. But I think the case for
data science is stronger than for computational science,
just because of the number of jobs. And that’s sort of why
maybe you’re at this class.

Leave a Reply

Your email address will not be published. Required fields are marked *