It would be worth beginning by asking whether firms that leverage data analytics have access to a capability that differentiates them and gives them an edge over their competitors.

It turns out that there is a wide body of evidence that confirms that the answer to this question is “yes” including a report from Wegener & Sinha showing that firms using data analytics are twice as likely to be top-quartile financial performers and another from McKinsey & Company showing that data-analytic firms increase their earnings before tax by 20% compared.

Beyond these two examples there is a wide body of…

It is very common that we can obtain a sample of values, maybe a sample of 200 users clicking on our new web-site page or a sample of 30 voters in an exit poll on their way out of the voting station.

In order to make a statement about these observations, we need to use confidence intervals rather than just quoting the percentage or result of the trial.

A Confidence Interval is a range of values we are fairly sure our true value lies in i.e. …

A boxplot showing the median and inter-quartile ranges is a good way to visualise a distribution, especially when the data contains outliers. The meaning of the various aspects of a box plot can be explained as follows -

We are going to need some test data to explore the issues around outliers …

The generate() function below (taken from Stack Overflow) will generate a list of floats with a given median that contains outliers (values a long way from the median) which we can use to explore the concept.

The generate() function was modified from https://stackoverflow.com/questions/55351782/how-should-i-generate-outliers-randomly

Let’s get the…

There are many reasons why further education providers are increasing their focus on the alignment of supply of education vs. the demand for skills and qualifications of local employers and local economic need.

The new Ofsted Education Inspection Framework (EIF) has a focus on the intent, implementation and impact of education and according to FE News curriculum intent can be bespoke to [educational] programmes and employers.

The government’s Further Education Workforce Strategy gives a set of strategic priorities with the “Quantity and quality of teachers and trainers” as the first priority and “Responsiveness to employer need” as the second.

Lastly…

In this excellent article, the author explores the p-value in statistics and uses the example of an archery team to compare two distributions.

I found this article very insightful but I was left wanting more in that I wanted the statistics breaking down into simple steps that a non-statistician could follow and I wanted to see the full python code that explored the datasets to see if the archery team had improved or not.

With that in mind, I invented my own scenario based on some fictitious, normally distributed data so that I could build a Python / Jupyter example…

I had learned about logistic regression, confusion matrices, ROC curves, thresholds etc. on the various data science courses I have undertaken but I never fully understood them and wanted to explore them in more detail.

Also, I had seen various examples online about how to recalculate the true and false classifications given a chosen threshold but these examples fell short of the detail I needed to put thresholds to work in the real world.

I suspected there might be a way to wrap threshold optimisation into a simple, object-oriented class so I could use them easily in future hence this…

I have come across many articles on decision tree machine learning algorithms in Python across various mediums but they have always left me wanting more.

They either seem to leap in part-way through the process, or the code does not work when I apply it to my data or they omit important parts out of the process.

As I could not find anything to completely fit the bill I thought I would have a go myself which spawned the idea for this article.

If anyone needs an introduction or a refresher as to how decision trees can be used to…

Group Director of IT, Information Management and Projects at The Lincoln College Group