MS4S09 - Data Mining and Statistical Modelling 01 Apr 2025 - 31 Aug 2027 | Version 5
Associated Module Information
| Module Code: | MS4S09 | ||
|---|---|---|---|
| Module Title: | Data Mining and Statistical Modelling | ||
| Faculty: | Faculty of Computing, Engineering and Science | ||
| Faculty Group: | Computing and Mathematical Sciences | ||
| Faculty Sub Group: | Mathematical Sciences | ||
| Module Leader: | Jennifer Whewell | ||
| Module Team: | Rebecca Peters, Ieuan Griffiths, Joel Harris, Peter Parody, Shauna Ford, Abigail Peters, Sharan Johnstone | ||
| First Intended Intake: | SEP 2018 | Final Year of Intake: | 2024 |
| Date Closed: | |||
| Credit Value: | 20 | Credit Level: | 7 |
| Language: | English | ||
| Percentage of Module Taught in Welsh: | 0 | ||
| Equivalent Module: | |||
| HECOS codes: | 100956 - programming | ||
| HECOS Code Weighting: | 100 | ||
Document Version Information
| Version | 5 |
|---|---|
| Valid From | 01 Apr 2025 |
| Valid To | 31 Aug 2027 |
Module Aims
To equip students with the necessary skills to interrogate and evaluate complex datasets.
To provide an understanding into the discovery, interpretation, and communication of meaningful patterns in data and the ability to implement, interpret and critically analyse results from complex models.
Content Summary
Manipulating data: Importing data, creating new variables, performing data queries and anomaly detection.
SQL - creating new tables, querying tables, joining tables.
Logistic regression – categorical predictor, continuous predictor, combinations of categorical and continuous variables. Interpreting model fits, odds ratios and ROC curve.
Text Mining – Introduction to the concepts of text mining including natural language processing techniques and text representation, which are the foundation for all kinds of text-mining applications.
Case studies on text classification, topic modelling, sentiment analysis and social media mining.
Basic ideas of time series forecasting: level, trend and seasonality.
Characterising time series: Decomposition of time series into individual components using standard techniques associated with Holt and Holt-Winters. Exponential Smoothing, Holt’s and Holt-Winters’ models. The SSE. Trend and method of least squares on quadratic, cubic fits. Differencing. Ideas of Stationarity. Autocovariance and Autocorrelation.
Autoregressive models: AR(1), AR(2), Yule Walker Equations, Random Walks and Random Walks with Drift. General AR(p) models. Auxilliary equation and stationarity. Moving Average ModelsMA(1) and MA(2) models. General MA(q) models. Box Jenkins ARIMA(p,d,q). The Invertibility Condition. The Random Shock Model.
The Autocorrelation Generating Function. Seasonality in Models. ARIMA(p,d,q)(P,D,Q)s.
Learning and Teaching Methods
| Activity Type | Hours |
|---|---|
| Lecture | 24 |
| Tutorial | 8 |
| Independent Study | 80 |
| Directed Study | 88 |
| Total Hours Selected | 200 |
Learning Outcomes
| # | Learning Outcome |
|---|---|
| LO1 | Learning Outcome 1:To understand techniques for interrogating and evaluating complex datasets, and to design macros for extraction of patterns and relationships. |
| LO2 | Learning Outcome 2:Critically analyse, interpret and evaluate the outputs of statistical modelling techniques to support useful insights from complex datasets. |
Module Requisites
N/A
Assessment Criteria
| Assessment Category | Assessment Type | Description | Duration | Word Count | Weight (%) | Best of? | Pass Mark |
|---|---|---|---|---|---|---|---|
| Asynchronous Assessment | Practical Coursework 1 (Asynch) | Analyse a data source using suitable statistical modelling techniques and report appropriate conclusions from the results. | 0 | 2000 | 50 | No | 40 |
| Synchronous Onsite Oral Assessment | Oral Assessment (Internally assessed, Onsite) 1 | Within a software package, select and demonstrate the application of data mining tools through evaluation of outputs. | 15 | N/A | 50 | No | 40 |
Assessment Matrix
| Assessment Type | Learning Outcomes | ||
|---|---|---|---|
| LO1 | LO2 | ||
| Practical Coursework 1 (Asynch) | ✔ | ✔ | |
| Oral Assessment (Internally assessed, Onsite) 1 | ✔ | ✔ | |