Tag: Data Science
-
Reproducible R Notebooks with Docker
In this article I show how to containerize your R Notebooks with Docker, for reproducible sharing, using renv for dependency management.
-
How to Work with Small Data
This post discusses the significance of small data in the context of data science and modeling, detailing tips and tricks for working with small data sets.
-
Genetic Algorithms with PyGAD and PyTorch
Deep dive into Genetic Algorithms (GAs), an optimization algorithm inspired by the concept of natural evolution, including using a GA to train a Pytorch model with the Pygad library.
-
How to Choose a Distribution for your Regression Model
This article is all about distributions! In it, I explore the most common distributions including Gaussian, Uniform, Student-T, Gamma, and others. I also discuss their applications and when to choose them for regression modelling.
-
Working with Heteroscedastic Data
Heteroscedasticity, variance inconsistency across a related variable’s range, is often overlooked in data analysis. The two-part series explores handling heteroscedastic data through transformations, linear models, and model-based solutions, using LinkedIn engagements for instance. Techniques include Quantile Regression, Conditional Variance model, and common transformations like power or square root transformations.
-
Hierarchical (Multilevel) Modelling
Hierarchical modeling is a powerful statistical technique to analyze nested or grouped data. It considers both global structure and individual characteristics, delivering more accurate and robust estimates. Hierarchical models outperform traditional machine learning models, providing lower error rates and better handle outliers, especially with small datasets.