From the Diaries of John Henry

The essays of Book 5 I think did a good job keeping balance between Automunge and other interests, with treatments ranging from machine learning, physics, music, quantum computing, and even a little space sprinkled in.

The book starts off with an introduction to a prominent foundation model for natural language processing targeting a mainstream audience, and then, just in case any mainstream audience decided to come back, quickly followed with a dense academic survey of the Automunge library for numeric transforms to scare them off again.

The October triangle countered three points of interest, physics, politics, and Automunge. I owe…

From the Diaries of John Henry

Book 4 was the year I started taking incremental software refinements from weekly rollouts to a much more rapid pace, and the opening essay’s disorganized structure I think was nice counter to the close of Book 3. The embedded tweets were the start of a practice of sharing rollout notes on twitter that has continued to this day.

The Legend of Bagger Vance reviewed a movie that was special to me.

An intro to Automunge was the first that had attempted a full walkthrough of various library parameters. …

From the Diaries of John Henry

Book 3 was the year of Automunge. The first few essays are not exactly elegant writing, as I was figuring out how to code I was parallel figuring out how to document code. So these opening chapters were really just setting the groundwork for some of the more elaborate to come. I think the creative elements made these a little more palatable, as some of the coding aspects were obviously kind of dry. Once I found my groove the essay form started to return.

The Automunge software wasn’t created in a vacuum, for instance I reference several books in essays…

From the Diaries of John Henry

Depending on whether you’re reading these books in chronological or inverse order, this book may either be the second or the next to last. Will approach this introduction assuming the former.

In book 2 I started spending more time on these musings, approaching and I believe never exceeding about a week’s worth of focus on each, which was a pattern I maintained going forward. Is was in this collection that my machine learning studies became more intentional, although really there’s probably as much here on the quantum side as on machine learning. …

From the Diaries of John Henry

It’s been a fun trip writing in this medium. Having ventured down so many roads over the last year or two, it seemed appropriate to consolidate these writings into a sort of table of contents for ease of browsing. Collected here are all of my posts in chronological order. Although they could certainly be read in the order presented if one was so inclined, I would offer that the writing probably improved over the course of the journey — this was my first real venture into the written word outside of 140 character increments (other than a few stray blog…

Channel surfing for an identity

Something that has become the new normal for me in these working from home pandemic times is that my rush hour commutes are blessedly no more. One of the few consolations for those long wasted hours accumulating to days and weeks with hands at ten and two were the distraction free half hour blocks for podcasts and CD’s (yes I still drive a car with a CD player, not sure what that says about me :). In the new normal, most of my driving is for running errands, picking up takeout, and weekends with the family crew — so basically…

Because sometimes you want to drive the car yourself

For those that haven’t been following along, I’ve been using this forum to document the development of Automunge, an open source python library platform for tabular data preprocessing — we prepare data for machine learning. The library is intended as a resource for all of the tabular learning workflow in between receipt of “tidy data” (one column per feature and one row per sample) and returned sets suitable for direct application of machine learning. A helpful way to think about it is that Automunge is a resource for applying univariate data transformations to the features of a tabular data set…

The full scope of Automunge

Had a good experience interacting with reviewers for a recent conference submittal associated with our paper Missing Data Infill with Automunge, the following are inspired by a few of those exchanges.

The full scope of Automunge

Automunge has attempted to consolidate the full range of the tabular learning workflow in between the two specific boundaries of 1) received “tidy data” (one column per feature and one row per sample) and 2) returned sets suitable for direct application of machine learning — by channeling through a single interface of a preprocessing platform built on top of the Pandas dataframe library. A helpful way to think of…

Bit width defender

Was just thinking that since is becoming common practice in data science to normalize numeric data, it would be neat to have a new data type with limited integer registers but high capacity in the fractionals. Might be more bit width efficient for this use case.

Ok just offered a suggestion to the IEEE workgroup behind standard 754 for floating point arithmetic and thought would formalize with a demonstration to flesh out the details.

To offer a little background, IEEE-754 is the standard that defines in-memory representations of floating point numbers, which for an arbitrary number (e.g. 1234.56), when that…

From my family tree to yours

The Automunge family tree primitives
Gershwin’s Rhapsody in Blue — Nicholas Teague

For further readings please check out A Table of Contents, Book Recommendations, and Music Recommendations. For more on Automunge:

Nicholas Teague

Writing for fun and because it helps me organize my thoughts. I also write software to prepare data for machine learning at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store