MS4S09 - Data Mining and Statistical Modelling 01 Apr 2025 - 31 Aug 2027 | Version 5

Associated Module Information

Module Code: MS4S09
Module Title: Data Mining and Statistical Modelling
Faculty: Faculty of Computing, Engineering and Science
Faculty Group: Computing and Mathematical Sciences
Faculty Sub Group: Mathematical Sciences
Module Leader: Jennifer Whewell
Module Team: Rebecca Peters, Ieuan Griffiths, Joel Harris, Peter Parody, Shauna Ford, Abigail Peters, Sharan Johnstone
First Intended Intake: SEP 2018 Final Year of Intake: 2024
Date Closed:
Credit Value: 20 Credit Level: 7
Language: English
Percentage of Module Taught in Welsh: 0
Equivalent Module:
HECOS codes: 100956 - programming
HECOS Code Weighting: 100

Document Version Information

Version 5
Valid From 01 Apr 2025
Valid To 31 Aug 2027

Module Aims

To equip students with the necessary skills to interrogate and evaluate complex datasets.

To provide an understanding into the discovery, interpretation, and communication of meaningful patterns in data and the ability to implement, interpret and critically analyse results from complex models.

Content Summary

Manipulating data: Importing data, creating new variables, performing data queries and anomaly detection.

SQL - creating new tables, querying tables, joining tables.

Logistic regression – categorical predictor, continuous predictor, combinations of categorical and continuous variables. Interpreting model fits, odds ratios and ROC curve.

Text Mining – Introduction to the concepts of text mining including natural language processing techniques and text representation, which are the foundation for all kinds of text-mining applications.

Case studies on text classification, topic modelling, sentiment analysis and social media mining.

Basic ideas of time series forecasting: level, trend and seasonality.

Characterising time series: Decomposition of time series into individual components using standard techniques associated with Holt and Holt-Winters. Exponential Smoothing, Holt’s and Holt-Winters’ models. The SSE. Trend and method of least squares on quadratic, cubic fits. Differencing. Ideas of Stationarity. Autocovariance and Autocorrelation.

Autoregressive models: AR(1), AR(2), Yule Walker Equations, Random Walks and Random Walks with Drift. General AR(p) models. Auxilliary equation and stationarity. Moving Average ModelsMA(1) and MA(2) models. General MA(q) models. Box Jenkins ARIMA(p,d,q). The Invertibility Condition. The Random Shock Model.

The Autocorrelation Generating Function. Seasonality in Models. ARIMA(p,d,q)(P,D,Q)s.

Learning and Teaching Methods

Activity Type Hours
Lecture 24
Tutorial 8
Independent Study 80
Directed Study 88
Total Hours Selected 200

Learning Outcomes

# Learning Outcome
LO1 Learning Outcome 1:To understand techniques for interrogating and evaluating complex datasets, and to design macros for extraction of patterns and relationships.  
LO2 Learning Outcome 2:Critically analyse, interpret and evaluate the outputs of statistical modelling techniques to support useful insights from complex datasets.

Module Requisites

N/A

Assessment Criteria

Assessment Category Assessment Type Description Duration Word Count Weight (%) Best of? Pass Mark
Asynchronous Assessment Practical Coursework 1 (Asynch) Analyse a data source using suitable statistical modelling techniques and report appropriate conclusions from the results. 0 2000 50 No 40
Synchronous Onsite Oral Assessment Oral Assessment (Internally assessed, Onsite) 1 Within a software package, select and demonstrate the application of data mining tools through evaluation of outputs. 15 N/A 50 No 40

Assessment Matrix

Assessment Type Learning Outcomes
LO1 LO2
Practical Coursework 1 (Asynch)
Oral Assessment (Internally assessed, Onsite) 1

Reading List

Ohlhorst (2013) Big data analytics : turning big data into big money Latest edition, Hoboken, N.J. : Wiley, ISBN 9781118147597

Artun & Levin (2015) Predictive marketing : easy ways every marketer can use customer analytics and big data, Latest Edition., Hoboken, New Jersey : Wiley, ISBN 9781119037361

Verhoef, Kooge & Walk (2016) Creating Value with Big Data Analytics: Making Smarter Marketing Decisions, Routledge ISBN 1317561929

Delwiche, L. and Slaughter, S. (2012) The Little SAS Book: A Primer, Fifth Edition. United States: SAS Institute

Chatfield, C. (2003) The Analysis of Time Series An Introduction (Texts in Statistical Science Series). 6th edn. Boca Raton, FL: Chapman & Hall/CRC

Makridakis, S. G., Wheelwright, S. C. and Hyndman, R. J. (2008) Forecasting Methods and Applications. India: Wiley india Pvt.

Madsen, H. (2007) Time Series Analysis. Boca Raton: Chapman & Hall/CRC