If you believe the impossible, the incredible can come true.

Field of Dreams is a 1989 American sports fantasy drama film written and directed by Phil Alden Robinson, adapting W. P. Kinsella’s 1982 novel Shoeless Joe. The film stars Kevin Costner, Amy Madigan, James Earl Jones, Ray Liotta and Burt Lancaster in his final film role. It was nominated for three Academy Awards, including for Best Original Score, Best Adapted Screenplay and Best Picture. In 2017, the film was selected for preservation in the United States National Film Registry by the Library of Congress as being “culturally, historically, or aesthetically significant”. (via wikipedia, image via IMDB)


The Preprint

image via @notredamedeparis

Abstract

Missing data is a fundamental obstacle in the practice of data science. This paper surveys a few conventions for imputation as available in the Automunge open source python library platform for tabular data preprocessing, including “ML infill” in which auto ML models are trained for target features from partitioned extracts of a training set. A series of validation experiments were performed to benchmark imputation scenarios towards downstream model performance, in which it was found for the given benchmark sets that in many cases ML infill outperformed for both numeric and categoric target features, and was otherwise at minimum within noise…


Hashtag #ICLR2021

Introduction

With the annual ICLR conference right around the corner, have decided this year to reconsider my approach for explorations. You see there has become somewhat of an explosion of papers accepted at these top tier machine learning research venues, and well finally came to the conclusion that if you don’t approach with an objective in mind an attendee is likely to get somewhat overwhelmed amongst the forest of signal and noise. So what will be my reward function for this year’s ICLR? Trying to get caught up in the specialty of reinforcement learning.

Of course I’ve had a few brush-ins…


Competition is healthy

Introduction

As the Automunge project closes in on two years of full time focus, figured it would be worth a little reflection on the surrounding competitive environment, which has gone through some considerable changes in the time since inception. Automunge was conceived as a platform for tabular data preprocessing, basically with the intent to automate most if not all of the data science workflow immediately preceding the application of machine learning.

We made an early decision to build on top of the Pandas dataframe library, which for our purposes has served quite well, however this library does have some built in…


Iterating your way to orbit

Starship rendering by @Neopork

The following essay was directly inspired by the reporting of Eric Berger in his recently published book Liftoff, documenting the early days of SpaceX. A link to purchase is provided here and again at the conclusion.

Liftoff — Eric Berger


Will it go round in circles?

image via Voyager mission

Just finished reading a really impressive book on the fundamentals of quantum computing. In my experience many books that cover this territory get somewhat lost in the formality of quantum notations and linear algebra formulations without imparting any real intuition on the mechanics behind quantum algorithms. This book, Programming Quantum Computers by Eric Johnston, Nic Hurrigan, and Mercedes Gimeno-Segovia, turned out to be the single most helpful book I’ve come across for clearly articulating what is taking place in fundamental algorithms like Grover’s search and Shor’s factoring algorithms. …


An ML infill validation

Abstract

Missing data is a fundamental obstacle in the practice of data science. This paper surveys a few conventions for imputation as available in the Automunge open source python library platform for tabular data preprocessing, including ML infill in which auto ML models are trained for target features from partitioned extracts of a training set. A series of validation experiments were performed to benchmark imputation scenarios towards downstream model performance, in which it was found for the given benchmark sets that ML infill performed best for numeric target columns in cases of missing not at random, and was otherwise at minimum…


Better than arXiv

Abstract

The developers of the Automunge open source platform for tabular data preprocessing have taken a somewhat unorthodox approach to documentation and communications, making use of multimedia, blogging, tweets, jupyter notebooks, as well as music and photography in publication. This submission will offer an exhibited excerpt of such communication practices, featuring elements of multimedia videos with narration, accompanied with hand drawn slides and transcript, presented as both a brief introduction and extended walkthrough. We believe this form of presentation is a very accessible low cost option to communicate complex subject matter in a concise and accessible form. …

Nicholas Teague

Writing for fun and because it helps me organize my thoughts. I also write software to prepare data for machine learning at automunge.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store