Aaron Pickering

Data Science & Machine Learning

  • Home
  • Resources
  • Bookings
  • CVI am the co-founder and head of data at Seenly.io, social media scheduler and analytics.
  • Stimmt So & German Practice

    Stimmt So & German Practice

    This month marks 10 years in Germany for me. It’s a big milestone! But, after all this time my German is still cringe. I’m in this real weird zone where I can understand fine, I can hold a conversation, but I make a grammar mistake about twice a sentence. I’m like a toddler with a…

    –

    November 10, 2025
    Uncategorized
    ai, artificial-intelligence, deutsch, german, german-language
  • Sketulate: Sketchable Functions & Densities for Data Science

    Sketulate: Sketchable Functions & Densities for Data Science

    I simulate a lot of data to test my ideas, particularly the more complex ones. With non-standard stuff, it can be pretty time consuming to find the right function and/or parameters to do what I want. If the shape you want isn’t a common function or distribution, you can spend too much time searching for…

    –

    August 27, 2025
    Data Science
    Data Science, experimentation, Machine Learning, Statistics
  • Calibration Plots with Generalized Additive Models (GAMs)

    Calibration Plots with Generalized Additive Models (GAMs)

    Calibration of probabilistic predictions is a crucial task in many machine learning applications, especially when the outputs are used downstream for decision-making. It’s actually a simple concept, with calibration just ensuring that the predicted probabilities align with the actual outcomes. Despite that, I’ve noticed many people neglect the step in their modelling process! Typically, when…

    –

    July 2, 2025
    Data Science
    Data Science, Machine Learning
  • XGBoost Can’t Extrapolate

    XGBoost Can’t Extrapolate

    A common pattern that I observe among inexperienced Data Scientists is the following – they often default to XGBoost or a similar Gradient Boosted Model for their problem with any thought as to whether its the right choice for the job. Given how powerful these methods are, this isn’t the most egregious mistake one can…

    –

    April 21, 2025
    Data Science, Machine Learning
    Data Science, Machine Learning
  • Learn How to Containerize Your R Notebooks with Docker

    Learn How to Containerize Your R Notebooks with Docker

    In this article I show how to containerize your R Notebooks with Docker, for reproducible sharing, using renv for dependency management.

    –

    March 4, 2024
    Data Engineering, Data Science
    Data Engineering, Data Science, deployment, reproducibility
  • Small Data: Creativity, Explainability & Precision

    Small Data: Creativity, Explainability & Precision

    This post discusses the significance of small data in the context of data science and modeling, detailing tips and tricks for working with small data sets.

    –

    February 8, 2024
    Data Science
    Data Science, Machine Learning
  • Bayesian Time Series Interpolation

    Bayesian Time Series Interpolation

    The third article in the series, I explore Bayesian modeling for multivariate time series interpolation with hierarchical models with Numpyro and Bambi.

    –

    January 14, 2024
    Data Science, Machine Learning, Statistics, Time Series
    bambi, bayesian, numpyro, Time Series
  • Interrupted Time Series

    Interrupted Time Series

    When experimenting with something new, everyone has an opinion! Thats why it’s especially important to gather empirical evidence, to truly measure success. In this series of articles, I will explore a variety of techniques for experimentation, measurement and the gathering of evidence. Today’s article concerns one such fundamental technique – Interrupted Time Series analysis. The…

    –

    December 28, 2023
    Experimentation and Hypothesis Testing, Time Series
    experimentation, Hypothesis testing, Time Series
  • Genetic Algorithms with PyGAD and PyTorch

    Genetic Algorithms with PyGAD and PyTorch

    Deep dive into Genetic Algorithms (GAs), an optimization algorithm inspired by the concept of natural evolution, including using a GA to train a Pytorch model with the Pygad library.

    –

    December 16, 2023
    Data Science, Machine Learning
    Data Science, Machine Learning, Statistics
  • Expectation of a Gaussian Likelihood Function

    Expectation of a Gaussian Likelihood Function

    This article explores the calculation of the expected likelihood of the Gaussian function rather than its maximum. It includes deriving the expectation of the Gaussian likelihood function, and the expectation of the likelihood of one Gaussian given the parameters of another Gaussian.

    –

    December 10, 2023
    Statistics
    Statistics
  • How to Choose a Distribution for your Regression Model

    How to Choose a Distribution for your Regression Model

    This article is all about distributions! In it, I explore the most common distributions including Gaussian, Uniform, Student-T, Gamma, and others. I also discuss their applications and when to choose them for regression modelling.

    –

    December 4, 2023
    Data Science, Statistics
    Data Science, Statistics
  • Stochastic Time Delay in Regression Analysis

    Stochastic Time Delay in Regression Analysis

    I revisit a previous article on designing a regression model for stochastic time delay problems, where input-output delays vary randomly. The proposed model treats time delay components as part of the analysis, achieving improved results over standard regression methods in simulated experiments. Potential applications include marketing and medical settings. Future extensions might tackle multiple regression…

    –

    November 30, 2023
    Machine Learning, Statistics, Time Series
    Machine Learning, Statistics, Time Series
  • Time Series Interpolation using Embeddings & Pytorch

    Time Series Interpolation using Embeddings & Pytorch

    In this article, I use categorical embeddings to tackle time series data interpolation. An algorithmic approach is introduced using Pytorch and I discuss the benefits and drawbacks. Eventually, the creation of a model with these components is demonstrated, as well as the strengths and weaknesses compared to other statistical approaches.

    –

    November 13, 2023
    Machine Learning, Time Series
  • Multivariate Time Series Interpolation

    Multivariate Time Series Interpolation

    The post describes a method to deal with sparse time series data using a Hierarchical Model and linear basis functions. The model learns the relationship between related series and movements over time, facilitating data interpolation even with minimal individual data points.

    –

    October 26, 2023
    Statistics, Time Series
  • A/B and C (Multivariate) Tests with PyMC

    A/B and C (Multivariate) Tests with PyMC

    The blog post provides a guide to using PyMC for Bayesian A/B/C tests, using a Bernoulli likelihood and the panel regression style. The post explores generating data samples, adopting a Bernoulli model approach, using Bambi to simplify the model setup, and interpreting results.

    –

    October 21, 2023
    Experimentation and Hypothesis Testing
  • Bayesian Regression from Scratch

    Bayesian Regression from Scratch

    The post provides an in-depth explanation on how to understand Bayesian methods through building regression models from scratch. This includes generating sample data, explaining the Bayesian rule and Maximum Likelihood, describing Metropolis-Hastings algorithm, fits and uncertainties. The tutorial ends with performing Bayesian inference. The post acts as a hands-on guide to understand Bayesian modelling.

    –

    May 24, 2023
    Statistics
  • Heteroscedastic Regression

    Heteroscedastic Regression

    In this article, I deal with complex heteroscedastic problems for data analysis and inference. With a simulated dataset to demonstrate, I tackle issues such as strong trends, non-linearity, non-normal target distributions, multiple seasonalities, and changing variance. I use the advanced techniques of Bayesian Spline Modelling and Bayesian Additive Regression Trees (BART), emphasizing their efficacy in…

    –

    March 27, 2023
    Data Science, Statistics
  • Heteroscedastic Data – Part 2: Inference

    Heteroscedastic Data – Part 2: Inference

    In this article, I deal with complex heteroscedastic problems for data analysis and inference. With a simulated dataset to demonstrate, I tackle issues such as strong trends, non-linearity, non-normal target distributions, multiple seasonalities, and changing variance. I use the advanced techniques of Bayesian Spline Modelling and Bayesian Additive Regression Trees (BART), emphasizing their efficacy in…

    –

    March 27, 2023
    Data Science, Statistics
  • Mutual Information

    Mutual Information

    While Pearson Correlation is effective in identifying relationships during exploratory data analysis, Mutual Information (MI) offers a powerful alternative, capable of uncovering non-linear relationships between variables. Despite being harder to interpret, its implementation in Scikitlearn is worth exploring.

    –

    March 9, 2023
    Data Science, Statistics
  • Mutual Information: a powerful alternative to correlation

    Mutual Information: a powerful alternative to correlation

    While Pearson Correlation is effective in identifying relationships during exploratory data analysis, Mutual Information (MI) offers a powerful alternative, capable of uncovering non-linear relationships between variables. Despite being harder to interpret, its implementation in Scikitlearn is worth exploring.

    –

    March 9, 2023
    Data Science, Statistics
  • Working with Heteroscedastic Data

    Working with Heteroscedastic Data

    Heteroscedasticity, variance inconsistency across a related variable’s range, is often overlooked in data analysis. The two-part series explores handling heteroscedastic data through transformations, linear models, and model-based solutions, using LinkedIn engagements for instance. Techniques include Quantile Regression, Conditional Variance model, and common transformations like power or square root transformations.

    –

    January 25, 2023
    Data Science
    Data Science
  • Hierarchical (Multilevel) Modelling

    Hierarchical (Multilevel) Modelling

    Hierarchical modeling is a powerful statistical technique to analyze nested or grouped data. It considers both global structure and individual characteristics, delivering more accurate and robust estimates. Hierarchical models outperform traditional machine learning models, providing lower error rates and better handle outliers, especially with small datasets.

    –

    December 14, 2022
    Uncategorized
    Data Science, Statistics
  • Introduction to Exponential Smoothing

    Introduction to Exponential Smoothing

    This article intros the enduring Exponential Smoothing (ETS) in time series forecasting despite. I explore 4 key ETS variants, their formulas, and practical applications. Additionally, it touches on auto parameter selection for ETS models.

    –

    November 28, 2022
    Time Series
    Statistics, Time Series

Blog at WordPress.com.

  • Subscribe Subscribed
    • Aaron Pickering
    • Already have a WordPress.com account? Log in now.
    • Aaron Pickering
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar