Data Science


Introduction

Data Science is an emerging science that involves the ubiquitous problem of learning from data in a wide variety of disciplines. The building blocks of data science include computer science, statistics, and mathematics. Disciplines such as biology, English, economics, and geography generate and use data to tackle problems, and are a key part of the data science ecosystem. What data should and shouldn’t be collected? How should society use data, and data derived predictions? are fundamental questions related to the societal, cultural, philosophical, legal, and historical implications of the emerging uses of data.

The University of Toronto is a global destination for data-science-driven research and education, and is home to one of the world’s leading data science ecosystems that includes award winning faculty and data science researchers that are helping to address some of today’s most challenging issues. Within the Faculty of Arts & Science there is no single data science department; many of our faculty members across several departments teach courses with significant data science components. We also offer a variety of programs of study in data science and related disciplines, including the data science specialist program, programs in computer science, statistical sciences, and more in the humanities, social sciences, and sciences, listed throughout this page.

Foundational Data Science Courses

There are a wide range of foundational Data Science courses offered within the Faculty of Arts & Science, which provide a starting point for students of all backgrounds, interests, and academic and career goals.

Preparing for Programs in Data Science, Statistical Sciences, or Computer Science

Students who intend to pursue Specialist or Major programs in Data Science, Statistical Sciences, or Computer Science should enroll in STA130H1 and CSC108H1 (or CSC110Y1 if in the First-Year Computer Science Admission Category). STA130H1 is an introduction to the field of Data Science, using a combination of logical thinking, mathematics, computer simulation, and oral and written discussion and analysis. CSC108H1 and CSC110Y1 is an introduction to the field of computer science, including the programming and computational thinking skills that are vital to data science studies. See below for a description and course requirements of the Data Science Specialist Program and other related programs.

Preparing for Other Programs that Incorporate Data Science

For students who are interested in pursuing other programs but who want to apply Data Science knowledge and skills to their studies, we offer three interdisciplinary foundational Data Science courses: EEB125H1 (for students in the Physical and Life Sciences), GGR274H1 (for students in the Social Sciences), and ENG286H1 (for students in the Humanities). Each of these courses covers the same core Data Science content but is tailored to students in each sector through a combination of domain-specific applications and themes.

While the foundational Data Science courses all have a significant computational component, there is no programming experience required for any of them. Instead, these courses teach the relevant programming skills throughout, so that all students can write computer programs to analyze data by the end of the course. Students who complete EEB125H1, GGR274H1, or ENG286H1 and are interested in gaining a more in-depth introduction to computer programming should consider taking CSC108H1 and other computer science courses.

Data Science Specialist Program

The Data Science Specialist Program is a program offered jointly between the Department of Statistical Sciences and the Department of Computer Science. This program comprises three fundamental and highly-integrated aspects. Students will acquire expertise in statistical reasoning, methods, and inference essential for any data analyst; in-depth training in computer science - the design and analysis of algorithms and data structures for handling large amounts of data; and best practices in software design. This includes training in machine learning, which lies at the intersection of computer and statistical sciences. The third aspect is the application of computer science and statistics to produce analyses of complex, large-scale datasets, and the communication of the insights resulting from these analyses. Students will receive training in these areas by taking integrative courses that are designed specifically for the Data Science Specialist Program. These courses involve experiential learning: students will be solving real world data science problems that involve real datasets from a variety of domains such as business, social sciences, and medicine. The successful student will combine their expertise in computer and statistical science to collect, clean, wrangle complex large-scale datasets, and produce and communicate analyses.

Students may also be interested in exploring other programs and courses offered by the Department of Statistical Sciences or Department of Computer Science. In particular, their Specialist and Major programs share most of the same first-year requirements as the Data Science Specialist program.

Data Science in the Humanities

Large datasets and new computational techniques are reshaping our understanding of the humanities — and humanistic inquiry is raising pressing questions about their ethics and impact. A humanities approach to Data Science equips students with the practical and theoretical skills to engage critically with literary data and computation. What insights emerge when we move “close” to “distant” reading practices — from the close examination of a few canonical texts to algorithmic investigation of vast archives? How comprehensive are existing humanities datasets, and what gaps exist in them? What biases and assumptions are enshrined and naturalized in code?

In the humanities, scholars use computational approaches to large datasets to study a huge range of topics:

  • How cities work in science fiction
  • How Black women abolition activists changed the language and society of the nineteenth century United States
  • How Indigenous data sovereignty needs to inform digital research
  • How historians can use data about imprisonment to understand the past and the present
  • How data reflects power differences in our societies, and may be used to improve health and share culture – but also to surveil and discriminate
  • What datasets are missing, and what their absence tells us about our world

Students in humanities-focused Data Science courses learn how to create computer programs, to negotiate tradeoffs between computational and statistical techniques, and how to make arguments with and about data. They will weigh the merits of quantitative and qualitative approaches to problems in the humanities, engage critically with gaps and biases in datasets, learn to make persuasive arguments with data, and engage with urgent scholarly approaches to issues of the race, gender, disability, and environmental impacts of large-scale computing, data science, and algorithmic decision-making.

Data Science in the Social Sciences

Social scientists are increasingly working with big and complex datasets that contain spatial, group-level, and individual-level dimensions to answer questions about society. This involves projects as diverse as simulating the spread of disease across geographic space to using data on housing prices to understand why market bubbles occur to exploring how social network structure links to the spread of misinformation. Students will develop introductory programming knowledge and data acumen in order to create and run computer programs to explore where, when, and why social processes occur, drawing on theories from geography, linguistics, sociology, anthropology, economics, political science, and psychology. Instruction will focus on applying current data analysis libraries, communicating and translating data science methods to both researchers and the public, distinguishing causation from correlation and coincidence, and negotiating tradeoffs between different computational and statistical approaches.

Data Science in the Physical and Life Sciences

Life and physical scientists increasingly use big and complex datasets to answer questions about society and the natural world. More than ever, advances in biology, chemistry, and physics require data science thinking and quantitative skills. This importance highlights the urgency to gaining aptitude in the ability to break a complex problem into simpler parts with computational and statistical approaches, as well as in the collection, organization, interpretation, and clear communication of complex data. The Faculty of Arts & Science offers several programs of study that intentionally bridge data science learning to disciplinary topics in the life and physical sciences through the departments of Cell & Systems Biology, Chemistry, Ecology & Evolutionary Biology, and Physics.