Statistical Thinking for Data Science


Amount of credits – 5.

Forms of the educational process – lectures, laboratory classes.

Form of final control – exam.


In the current conditions of spreading knowledge, economy information becomes one of the main products. Timely receipt of information, its effective processing and the formation of optimal managerial decisions, based on the results obtained, allow us to develop strategies and tactics of the functioning of market relations actors, which are adequate to the conditions of the external and internal environment. The discipline "Statistical Thinking for Data Science" allows researchers to acquire knowledge and develop practical skills in gaining competitive advantage in working with information and large amounts of data.

Subject of discipline - statistical methods and economic and mathematical models of estimation, diagnostics and forecasting of socio-economic phenomena and processes on the basis of research and processing of large data arrays.

The purpose of course is formation of theoretical knowledge and practical skills of statistical thinking for the entrepreneurial activity.

The tasks of the discipline:

  • get acquainted with opportunities and gain practical skills in using statistical methods and data analysis models;
  • formation the study skills of large data sets;
  • formation competence in the field of statistical thinking to develop effective managerial decisions;
  • identify sets of economic and mathematical models for the study of socio-economic systems.
Subject learning objectives

Upon successful completion of this subject students should be able to:

  • Manage the complexity of real data science projects and their inevitable compromises;
  • Formulate authentic data science questions precise enough to be answered by valid statistical techniques;
  • Justify the use of different statistical concepts and tools to audiences from a wide range of backgrounds;
  • Find, clean, and merge datasets from a range of sources to answer real world data science problems;
  • Apply statistical methods that are appropriate to a dataset and stakeholder requirements;
  • Interpret the results of a statistical analysis correctly, visualizing and reporting upon them in ways that create value for, and are sensitive to the needs of, a wide range of stakeholders.
Course intended learning outcomes

This subject also contributes specifically to the development of the following Course Intended Learning Outcomes – competencies:

  • Exploring and testing models and describing behaviors of complex systems. Explore and test models and generalisations for describing the behavior of sociotechnical systems and selecting data sources, taking into account the needs and values of different contexts and stakeholders;
  • Making the invisible visible. Use transdisciplinary approaches to seeing and doing to uncover underrepresented, or misrepresented, elements of a system;
  • Exploring, interpreting and visualising data. Explore, analyze, manipulate, interpret and visualize data using data science techniques, software and technologies to make sense of data rich environments;
  • Designing & managing data investigations. Apply and assess data science concepts, theories, practices and tools for designing and managing data discovery investigations in professional environments that draw upon diverse data sources, including efforts to shed light on under-represented components;
  • Informing decision making. Develop, test, justify and deliver data project propositions, methodologies, analytics outcomes and recommendations for informing decision-making, both to specialist and non-specialist audiences.

Module 1. Dynamic Thinking for Data Science
Topic 1. Econometric methods of research
Topic 2. Decomposition models for data analysis.

Module 2. Multidimensional Thinking for Data Science
Topic 3. Factor analysis
Topic 4. Classification. Cluster analysis
Topic 5. Data Recognition and Discriminant Analysis

  1. H.R. Seddighi, K.A. Lawler, A.V. Katos. Econometrics. A practical approach. – Routledge, London, 2000.
  2. P. Kennedy. A guide to Econometrics. Blackwell, 1999.
  3. Gencay R. Differentiating intraday seasonalities through wavelet multi-scaling / R. Gencay, F. Selcuk, B. Whitcher // Physica A., 2001. – №289. – P. 543–556
  4. Магнус Я.Р., Катышев П.К., Пересецкий А.А. Эконометрика. Начальный курс: Учеб. — 8-е изд., испр. — М.: Дело, 2007. — 504 с.
  5. Статистика. Навчальний посібник / Під ред. д.е.н., проф. Раєвнєвої О.В. – Харків: Вид. ХНЕУ, 2010. – 520 с.
  6. Факторный, дискриминантный и кластерный анализ: Пер. с англ./ Дж. – О. Ким, Ч. У. Мьюллер, У.Р. Клекка и др.; Под ред. И.С. Енюкова. – М.: Финансы и статистика, 1989. – 215с
  7. Халафян А.А. STATISTICA 6. Статистический анализ данных. – М.: ООО «Бином-Пресс», 2008. – 512с.