Latviešu Krievu Angļu Vācu Franču
Statuss(Aktīvs) Izdruka Arhīvs(0) Studiju plāns Vecais plāns Kursu katalogs Vēsture

Course title Introduction to Data Science
Course code InfTD019
Credit points (ECTS) 3
Total Hours in Course 81
Number of hours for lectures 12
Number of hours for seminars and practical classes 12
Independent study hours 57
Date of course confirmation 28/09/2022
Responsible Unit Institute of Computer Systems and Data Science
 
Course developers
Dr. agr., prof. Līga Paura
Dr. sc. ing., prof. Irina Arhipova
PhD programme(līm.), Paolo Mignone
PhD programme(līm.), Domenico Redavid
Dr. sc. ing., prof. Gatis Vītols

There is no prerequisite knowledge required for this course
 
Course abstract
The study course covers data processing and data mining algorithms, which provide support for solving multiple issues in different fields and making effective decisions.
In the past few decades, we are experiencing a profound socio-economic transformation. We are recently observing a transition to a post-industrial society, where the key assets are data, information, and knowledge in all cultural and business fields. In this post-industrial society, information is no longer considered an accessory tool, the correct management of which can guarantee the survival of an organization itself or differentiation from other competitors. In this context, Data Sciences play a key role and favour the synthesis of new information and its dissemination in a single organization.
The aim of the study course is to provide students with in-depth knowledge in the managing databases and applying data mining algorithms, as well as presenting the results of data processing and analysis
Learning outcomes and their assessment
Knowledge and understanding:
•Acquisition of knowledge related to the best-known data mining algorithms in the literature - independent work with calculations;
•Understanding of the choices of data mining algorithms for specific tasks - completed practical works and independent work;
Applied knowledge and understanding:
•Ability to carry out a simple knowledge discovery project from a data collection through use of tools for the selection, preprocessing, and transformation of data, and for the validation of models and patterns - completed practical works and independent work;
•Use of data mining tools for the extraction of knowledge aimed at descriptive and predictive purposes - completed practical works and independent work;
Competences:
•to realize data visualisation and data analysis in the projects by using a data processing softwares - completed and independent work with calculations;
•ability to interpret the results of a data mining algorithm and formulate conclusions and to make decisions according to the results – completed and presented independent work;
Communication skills:
•Students can explain the topics included in the course program using the specific vocabulary of the discipline - presented independent work.
Ability to learn:
•Students can autonomously investigate the topics included in the course program even by resorting to resources not directly involved during the course completed independent work.
Course Content(Calendar)
1.Introduction to the CRISP-DM industrial methodology. [Lectures - 1h]
2.Knowledge Discovery from Data (KDD) Process. [Lectures - 1h]
3.Developing an understanding of the application domain. [Lectures - 1h]
4.Creating a target data set: Data cleaning, Data preprocessing, Data reduction. [Lectures - 2h]
5.Choosing the data mining task. [Lectures - 1h]
6.Choosing the data mining algorithm(s). [Lectures - 1h]
7.Interpreting mined patterns. [Lectures - 1h]
8.Methods of data preprocessing and transformation, validation of the extracted patterns and models. [Lectures - 2h]
9.Well-Known Data mining algorithms. [Lectures - 2h]
10.Variable association. [Lectures - 1h]
11.Regression [Lectures - 1h]
12.Naive Bayesian Framework. [Lectures - 2h]
13.Practice and exercises: use of tools for the selection, pre-processing, and transformation of data, and for the validation of the extracted patterns. [Practical works - 8h]
14.Practice and exercises: use of data mining tools for the extraction of knowledge aimed at predictive and descriptive purposes in different application contexts (business and scientific). [Practical works - 8h]
15.Independent work: Creating a report based on the selected data set with the Weka tool and write the interpretation of the obtained results.
Requirements for awarding credit points
The course includes practical exercises - to be taken during practice work in classroom and independent work: Creating a report based on the selected data set with the Weka tool and write the interpretation of the obtained results.
Final assessment of the study course – final grade is given as an accumulative assessment of the study results. To receive credit point’s students require to successfully execute the KDD process for a selected dataset.
Description of the organization and tasks of students’ independent work
Within the framework of the study course description for independent work is given 48 hours.
The complete KDD process with all its phases should be executed. The business understanding could be considered starting from a specific dataset by simulating a real commitment. The other phases of the KDD process should be performed by following the standard CRISP-DM. The students will present a case study orally indicating all the KDD phases via CRISP-DM by exploiting a dataset of interest chosen previously on the web.
Criteria for Evaluating Learning Outcomes
The study course is completed if all practical work has been done, homework has been developed and defended. The evaluation of the homework depends on the degree of completion. Preparation (40 points) and presentation (20 points) and oral explanation of theoretical concepts (40 points). The homework is passed if 50 points have been obtained.
Compulsory reading
1.Ethem Alpaydin. Introduction to Machine Learning. Fourth Edition: The MIT Press, 2020. - 659 p.
2.Laura Igual, Santi Seguí. Introduction to Data Science. A Python Approach to Concepts, Techniques and Applications. – Switzerland: Springer International Publishing, 2017. - 217 p.
3.Oded Maimon, Lior Rokach. Data Mining and Knowledge Discovery Handbook. Springer New York, NY, 2010. - 1285 p.
Further reading
1.Ian H. Witten, Eibe Frank and Mark A. Hall. Data Mining: Practical Machine Learning Tools and Techniques. - Elsevier Science, 2011. – 1190 p.
2.Data science & big data analytics: discovering, analyzing, visualizing and presenting data / EMC Education Services. - Indianapolis, IN: John Wiley and Sons, 2015. - xviii, 410 p.
3.Weka Wiki: The University of Waikato. Pieejams: https://waikato.github.io/weka-wiki/documentation/
Periodicals and other sources
1.Journal of Data Analysis and Information Processing: ISSN Online: 2327-7203, www.scirp.org/journal/jdaip
Notes
Optional course for LBTU PhD students.