Opportunities and Perils of Data Science (V.3)
MIT EECS Seminar, Tuesday, November 5, 2021
Data science has provided unprecedented opportunities to learn new insights and to predict, recommend, cluster, classify, transform, and optimize. Catalyzed by statistics, operations research, and large-scale, networked computer systems, vast availability of data, and machine learning algorithms, data science has been extraordinarily impactful to-date, and it holds great promise in all disciplines. However, no new technology arrives without complications, and we have recently seen both the press and various political circles illustrating real, potential, and fictional implications of the field.
This presentation aims to balance the opportunities provided by data science against the many challenges that have ensued. At its core, the talk proposes a rubric that practitioners can apply to tease out data science’s complexities and also maps out seven categories of data sciences challenges, ranging from engineering to ethics. The talk is illustrated with examples from many applications, and it concludes with some suggested ways to address the downsides of the field.
A Holistic View of Data Science
JPM Distinguished Lecture Series in AI, Tuesday, October 29, 2020
Data-driven approaches have led to powerful prediction, optimization and automation techniques. Powered by large-scale, networked computer systems and machine learning algorithms, these have been very impactful to-date and hold great promise in many disciplines, in finance, but even the humanities. However, no new technology arrives without complications, and we have recently seen the press and various political circles illustrating real, potential, and fictional implications of Big Data.
This presentation aims to balance the opportunities provided by Data Science and its associated artificial intelligence techniques with a discussion of the various challenges that have ensued. I review eleven types of challenges, including those which are technical (resilience and complexity), societal (difficulties in setting objective functions or understanding causation), and humanist (issues relating to free will or privacy). I build on my experiences in finance and big technology, show example problems, and suggest ways to address some of the unanticipated consequences of Big Data.
Thought Experiment re: Novel Coronavirus and Location Tracing
Dr. Alfred Z. Spector
February 4, 2020
I thought it a good time to bring up this topic, though we all hope the current Novel Coronavirus outbreak will dissipate before the issues below become of immediate importance.
My physician wife brought up the challenge health officials have when an individual is suspected of having had a higher than average likelihood of contracting the novel coronavirus. When public health officials think the chance of infection is sufficiently low, they release the individual, perhaps only with advice. As I understand it, current US recommendations require that the person have known contact with an individual confirmed to have coronavirus to be considered at risk for infection, and, perhaps, have recently traveled to China. No doubt, officials are balancing the likelihood of spreading the disease, providing health care, restraining individual freedom, overloading the healthcare system, and not overburdening the economy. However, this may place contact tracing several steps (and days to weeks) behind actual exposure.
Indeed, location tracking at the individual level could help encourage and remind individuals to conform to official advice. Possibly, health officials might ask potential patients to voluntarily use their location history to help them or even opt-in to monitoring. Beyond that, if health officials were to have access to all cell phones, they could infer proximity across all phone-carrying individuals and construct rather accurate graphs. The resulting data would indicate how disease spreads and enable more effective control of that spread. But we also know this approach would be of very great concern, particularly in the great democracies with their tradition of freedom of assembly and strong privacy protection. Thus, to even contemplate such an approach, we would have to find ways to gather this data in a way so only legitimate queries could be conducted. It’s worth noting that even a fully centralized approach might never be complete since there are individuals who do not carry phones and some might deliberately resist compliance — even through use of burner phones.
There are certainly privacy-sensitive algorithms that could be applied by computer scientists to reduce the privacy implications of this; and just as certainly such threats could not be completely eliminated. For example, there is no reason to actually report on location history: it’s really the contact network of the potentially infected individual that is of interest, though this too has serious privacy implications. Geofencing could be used to exclude data from certain regions that could be particularly sensitive. No activities where an individual was not in contact with others would need to be divulged.
The current outbreak is a good chance to remind ourselves we should be doing more research into technologies to support health officials in times of disease outbreak. Technology, through increased mobility, may have made disease dissemination more likely, but it could also mitigate some of the risks that travel creates. This thought experiment also illustrates the opportunities and perils of the use of location data in a public health emergency, a policy topic around which we as society should develop a shared consensus.
Research on the Edge of the Expanding Sphere, V. 2.0 *
Dr. Alfred Z. Spector
Senior Vice Chancellor for Research Lecture
Science 2019, 18-October-2019
The multi-trillion-fold (!) increase in the capability of computation over the past 60 years, when coupled with global connectivity and vast data has made for vibrant fields of research that are growing with no end in sight.
- This explosive growth in computing and data has led to very excellent results in the core of the field: e.g., ever more capable and creative algorithms; the capability to build vast, globally networked systems that support most of the world’s population; and the effective solutions to grand challenge problems, such as speech and image recognition. However, many challenges remain in the the core. For example, we still have trouble designing and building robust, large scale systems, and we need breakthroughs in knowledge representation and inferencing. Both of these are needed to achieve the potential to have, for example, Level 5 autonomous vehicles.
- There are also immense opportunities on the edge of the field’s Expanding Sphere, at the border of Computer Science and X, for all fields X. Great creativity will be required to adapt our technologies to new application domains in healthcare, education, and manufacturing, to name a few.
- Finally, with the very growing import of computing in all aspects of society, important research areas abound at the intersection of computing and ethics/public policy.
In this talk, I’ll discuss the breadth of challenges that we have. I’ll illustrate the breadth of my points with many examples from my experience leading research teams in academia and industry.
* V. 1.0 of this talk was given on 11/8/04 at Harvard Center for Research on Computation and Society.
Opportunities and Perils in Data Science
Dr. Alfred Z. Spector
Presentations at Berkeley, Caltech, Cornell, Harvard, MIT, and Rice
Over the last few decades, empiricism has become the third leg of computer science, adding to the field’s traditional bases in mathematical analysis and engineering. This shift has occurred due to the sheer growth in the scale of computation, networking, and usage as well as progress in machine learning and related technologies. Resulting data-driven approaches have led to extremely powerful prediction and optimization techniques and hold great promise, even in the humanities and social sciences. However, no new technology arrives without complications. In this presentation, I will balance the opportunities provided by big data and associated A.I. approaches with a discussion of the various challenges. I’ll enumerate ten plus one categories including those which are technical (e.g., resilience and complexity), societal (e.g., difficulties in setting objective functions or understanding causation), and humanist (e.g., issues relating to free will or privacy). I’ll provide many example problems, and make suggestions on how to address some of the unanticipated consequences of big data.