Data Science – Aaron Pickering

The Intuition Behind Double ML

Double Machine Learning (DML or DoubleML) is one of the most powerful, modern techniques in data science & machine learning. However, I often find treatments of the subject a little convoluted, with writers reaching for math or using very particular domain vocabulary. In this article I’m going to drop the math, avoid all the jargon…

–

February 6, 2026

Data Science, Machine Learning, Statistics

Data Science, digital-marketing, Machine Learning, marketing, Statistics, technology

Sketulate: Sketchable Functions & Densities for Data Science

I simulate a lot of data to test my ideas, particularly the more complex ones. With non-standard stuff, it can be pretty time consuming to find the right function and/or parameters to do what I want. If the shape you want isn’t a common function or distribution, you can spend too much time searching for…

–

August 27, 2025

Data Science

Data Science, experimentation, Machine Learning, Statistics

Calibration Plots with Generalized Additive Models (GAMs)

Calibration of probabilistic predictions is a crucial task in many machine learning applications, especially when the outputs are used downstream for decision-making. It’s actually a simple concept, with calibration just ensuring that the predicted probabilities align with the actual outcomes. Despite that, I’ve noticed many people neglect the step in their modelling process! Typically, when…

–

July 2, 2025

Data Science

Data Science, Machine Learning

XGBoost Can’t Extrapolate

A common pattern I observe among inexperienced Data Scientists is the following: they often default to XGBoost or a similar Gradient Boosted Model for their problem without considering whether it’s the right choice for the job. Given how powerful these methods are, this isn’t the most egregious mistake one can make. However, it’s important to…

–

April 21, 2025

Data Science, Machine Learning

Learn How to Containerize Your R Notebooks with Docker

In this article I show how to containerize your R Notebooks with Docker, for reproducible sharing, using renv for dependency management.

–

March 4, 2024

Data Engineering, Data Science

Data Engineering, Data Science, deployment, reproducibility

Small Data: Creativity, Explainability & Precision

This post discusses the significance of small data in the context of data science and modeling, detailing tips and tricks for working with small data sets.

–

February 8, 2024

Data Science

Data Science, Machine Learning

Genetic Algorithms with PyGAD and PyTorch

Deep dive into Genetic Algorithms (GAs), an optimization algorithm inspired by the concept of natural evolution, including using a GA to train a Pytorch model with the Pygad library.

–

December 16, 2023

Data Science, Machine Learning

Data Science, Machine Learning, Statistics

How to Choose a Distribution for your Regression Model

This article is all about distributions! In it, I explore the most common distributions including Gaussian, Uniform, Student-T, Gamma, and others. I also discuss their applications and when to choose them for regression modelling.

–

December 4, 2023

Data Science, Statistics

Working with Heteroscedastic Data

Heteroscedasticity, variance inconsistency across a related variable’s range, is often overlooked in data analysis. The two-part series explores handling heteroscedastic data through transformations, linear models, and model-based solutions, using LinkedIn engagements for instance. Techniques include Quantile Regression, Conditional Variance model, and common transformations like power or square root transformations.

–

January 25, 2023

Data Science

Hierarchical (Multilevel) Modelling

Hierarchical modeling is a powerful statistical technique to analyze nested or grouped data. It considers both global structure and individual characteristics, delivering more accurate and robust estimates. Hierarchical models outperform traditional machine learning models, providing lower error rates and better handle outliers, especially with small datasets.

–

December 14, 2022

Uncategorized

Data Science, Statistics

Tag: Data Science