Statuss(Aktīvs) | Izdruka | Arhīvs(0) | Studiju plāns Vecais plāns | Kursu katalogs | Vēsture |
Course title | Introduction to Data Science |
Course code | InfTD019 |
Credit points (ECTS) | 3 |
Total Hours in Course | 81 |
Number of hours for lectures | 12 |
Number of hours for seminars and practical classes | 12 |
Independent study hours | 57 |
Date of course confirmation | 28/09/2022 |
Responsible Unit | Institute of Computer Systems and Data Science |
Course developers | |
Dr. agr., prof. Līga Paura Dr. sc. ing., prof. Irina Arhipova PhD programme(līm.), Paolo Mignone PhD programme(līm.), Domenico Redavid Dr. sc. ing., prof. Gatis Vītols |
|
There is no prerequisite knowledge required for this course | |
Course abstract | |
The study course covers data processing and data mining algorithms, which provide support for solving multiple issues in different fields and making effective decisions.
In the past few decades, we are experiencing a profound socio-economic transformation. We are recently observing a transition to a post-industrial society, where the key assets are data, information, and knowledge in all cultural and business fields. In this post-industrial society, information is no longer considered an accessory tool, the correct management of which can guarantee the survival of an organization itself or differentiation from other competitors. In this context, Data Sciences play a key role and favour the synthesis of new information and its dissemination in a single organization. The aim of the study course is to provide students with in-depth knowledge in the managing databases and applying data mining algorithms, as well as presenting the results of data processing and analysis |
|
Learning outcomes and their assessment | |
Knowledge and understanding:
•Acquisition of knowledge related to the best-known data mining algorithms in the literature - independent work with calculations; •Understanding of the choices of data mining algorithms for specific tasks - completed practical works and independent work; Applied knowledge and understanding: •Ability to carry out a simple knowledge discovery project from a data collection through use of tools for the selection, preprocessing, and transformation of data, and for the validation of models and patterns - completed practical works and independent work; •Use of data mining tools for the extraction of knowledge aimed at descriptive and predictive purposes - completed practical works and independent work; Competences: •to realize data visualisation and data analysis in the projects by using a data processing softwares - completed and independent work with calculations; •ability to interpret the results of a data mining algorithm and formulate conclusions and to make decisions according to the results – completed and presented independent work; Communication skills: •Students can explain the topics included in the course program using the specific vocabulary of the discipline - presented independent work. Ability to learn: •Students can autonomously investigate the topics included in the course program even by resorting to resources not directly involved during the course completed independent work. |
|
Course Content(Calendar) | |
1.Introduction to the CRISP-DM industrial methodology. [Lectures - 1h]
2.Knowledge Discovery from Data (KDD) Process. [Lectures - 1h] 3.Developing an understanding of the application domain. [Lectures - 1h] 4.Creating a target data set: Data cleaning, Data preprocessing, Data reduction. [Lectures - 2h] 5.Choosing the data mining task. [Lectures - 1h] 6.Choosing the data mining algorithm(s). [Lectures - 1h] 7.Interpreting mined patterns. [Lectures - 1h] 8.Methods of data preprocessing and transformation, validation of the extracted patterns and models. [Lectures - 2h] 9.Well-Known Data mining algorithms. [Lectures - 2h] 10.Variable association. [Lectures - 1h] 11.Regression [Lectures - 1h] 12.Naive Bayesian Framework. [Lectures - 2h] 13.Practice and exercises: use of tools for the selection, pre-processing, and transformation of data, and for the validation of the extracted patterns. [Practical works - 8h] 14.Practice and exercises: use of data mining tools for the extraction of knowledge aimed at predictive and descriptive purposes in different application contexts (business and scientific). [Practical works - 8h] 15.Independent work: Creating a report based on the selected data set with the Weka tool and write the interpretation of the obtained results. |
|
Requirements for awarding credit points | |
The course includes practical exercises - to be taken during practice work in classroom and independent work: Creating a report based on the selected data set with the Weka tool and write the interpretation of the obtained results.
Final assessment of the study course – final grade is given as an accumulative assessment of the study results. To receive credit point’s students require to successfully execute the KDD process for a selected dataset. |
|
Description of the organization and tasks of students’ independent work | |
Within the framework of the study course description for independent work is given 48 hours.
The complete KDD process with all its phases should be executed. The business understanding could be considered starting from a specific dataset by simulating a real commitment. The other phases of the KDD process should be performed by following the standard CRISP-DM. The students will present a case study orally indicating all the KDD phases via CRISP-DM by exploiting a dataset of interest chosen previously on the web. |
|
Criteria for Evaluating Learning Outcomes | |
The study course is completed if all practical work has been done, homework has been developed and defended. The evaluation of the homework depends on the degree of completion. Preparation (40 points) and presentation (20 points) and oral explanation of theoretical concepts (40 points). The homework is passed if 50 points have been obtained. | |
Compulsory reading | |
1.Ethem Alpaydin. Introduction to Machine Learning. Fourth Edition: The MIT Press, 2020. - 659 p.
2.Laura Igual, Santi Seguí. Introduction to Data Science. A Python Approach to Concepts, Techniques and Applications. – Switzerland: Springer International Publishing, 2017. - 217 p. 3.Oded Maimon, Lior Rokach. Data Mining and Knowledge Discovery Handbook. Springer New York, NY, 2010. - 1285 p. |
|
Further reading | |
1.Ian H. Witten, Eibe Frank and Mark A. Hall. Data Mining: Practical Machine Learning Tools and Techniques. - Elsevier Science, 2011. – 1190 p.
2.Data science & big data analytics: discovering, analyzing, visualizing and presenting data / EMC Education Services. - Indianapolis, IN: John Wiley and Sons, 2015. - xviii, 410 p. 3.Weka Wiki: The University of Waikato. Pieejams: https://waikato.github.io/weka-wiki/documentation/ |
|
Periodicals and other sources | |
1.Journal of Data Analysis and Information Processing: ISSN Online: 2327-7203, www.scirp.org/journal/jdaip | |
Notes | |
Optional course for LBTU PhD students. |