Course code BūvZD019

Credit points 6

Statistical and data mining methods for engineering research

Total Hours in Course162

Number of hours for lectures32

Number of hours for laboratory classes32

Independent study hours98

Date of course confirmation04.09.2019

Responsible UnitInstitute of Computer Systems and Data Science

Course developer

author prof.

Irina Arhipova

Dr. sc. ing.

Course abstract

PhD students acquire statistical and data mining methods for engineering research from problem identification to data analysis and interpretation. The tasks solving will be based on the statistical analysis software: SPSS and RStudio. The course includes traditional methods of data analysis and hypothesis testing: analysis of variance, regression analysis, analysis of covariance, discriminant analysis, principal component analysis, cluster analysis. The methods of the data mining will be analysed: classification, clustering, association, neural networks, sequential patterns and decision trees.

Learning outcomes and their assessment

After completing the course Ph.D. student will have:
• knowledge - is able to show that familiar with and understand the most advanced frontier of scientific theories and knowledge, manage statistical and data mining methods in engineering research (during the practical classes the necessary statistical and data mining methods for testing the hypothesis of the doctoral thesis are analyzed);
• skills - is able independently evaluate and select appropriate statistical and data mining methods for engineering research, have made a contribution to the expansion of the knowledge limit or give new understanding of existing knowledge and their applications in practice, to implement a significant amount of original research, some of which are at the cited international publications level (during the practical classes the necessary statistical and data mining methods for the data analysis of the doctoral thesis have been selected);
• competences - is able, making independent, critical analysis, synthesis and evaluation, to solve significant research tasks in engineering field, alone to raise the research idea, plan and find a solution (independent work has been developed and during the practical classes defended).

Course Content(Calendar)

1. Classification of statistical methods.
2. Data analysis with hypothesis testing.
3. Analysis of variance (ANOVA)
4. Regression analysis.
5. Analysis of covariance (ANCOVA).
6. Discriminant analysis.
7. Principal component analysis (PCA).
8. Cluster analysis.
9. Data mining process and methods.
10. Classification.
11. Clustering.
12. Association.
13. Neural networks.
14. Sequential patterns.
15. Decision trees.
16. Defending of the independent work.

Requirements for awarding credit points

Independent work has been developed and defended. At least three different statistical and data mining methods for the real data analysis should be used.

Description of the organization and tasks of students’ independent work

The organization of independent work during the semester is independently studying literature, using academic staff member consultations.

Criteria for Evaluating Learning Outcomes

The assessment of learning outcomes depends on the degree of development of the independent work. To obtain the minimal assessment it is necessary to formulate and test the hypotheses using at least 3 statistical and data mining methods, based on the data of the doctoral thesis.

Compulsory reading

1. John H. Schuenemeyer, Lawrence J. Drew. Statistics for Earth and Environmental Scientists. Hoboken, New Jersey: John Wiley & Sons, 2011. P.407.
2. Nathabandu T. Kottegoda, Renzo Rosso. Applied statistics for civil and environmental engineers. Oxford; Malden, MA: Blackwell Publishing, 2008, P.718.
3. Jiawei Han, Micheline Kamber. Data mining : concepts and techniques. San Fransisco: Morgan Kaufmann; Amsterdam [etc.] : Elsevier, 2006, P.770.
4. Michael P. Marder. Research methods for science. Cambridge; New York, NY: Cambridge University Press, 2011, P.227.

Further reading

1. Hahn G. J., Shapiro S. S. Statistical Models in Engineering. A Wiley-Interscience Publication. John Wiley & Sons, INC, 1994, P. 347.
2. Gary D. Bouma, Rod Ling, Lori Wilkinson.The research process. Ontario: Oxford University Press, 2009, P.257.
3. John W. Creswell.Research design: qualitative, quantitative, and mixed methods approaches. Thousand Oaks, Calif.: Sage, 2009, P.260.

Periodicals and other sources

1. Statistical Analysis and Data Mining.Hoboken, Wiley-Blackwell. ISSN:1932-1864. E-ISSN:1932-1872
2. International Journal of Data Mining, Modelling and Management. Inderscience. ISSN:1759-1163. E-ISSN:1759-1171
3. Electronic Journal of Applied Statistical Analysis. ESE - Salento University Publishing. ISSN:2070-5948

Notes

Study course for all LLU doctoral study programs.