The developers of the Automunge open source platform for tabular data preprocessing have taken a somewhat unorthodox approach to documentation and communications, making use of multimedia, blogging, tweets, jupyter notebooks, as well as music and photography in publication. This submission will offer an exhibited excerpt of such communication practices, featuring elements of multimedia videos with narration, accompanied with hand drawn slides and transcript, presented as both a brief introduction and extended walkthrough. We believe this form of presentation is a very accessible low cost option to communicate complex subject matter in a concise and accessible form. …
Having spent the better part of the last five years writing these essays, have finally come to the convincing realization that the table of contents is getting a little out of hand to put it mildly. Yeah I mean the goal is to contribute and share lessons, build connections, and etc, and suspect the full table of contents may now be interfering with that by way of signal getting lost in the noise so to speak.
So, without further ado, here now is a greatly abbreviated collection of essays that I think is somewhat representative of the better parts of…
Took a week off from working on the preprocessing library to play a little catch up in another interesting domain, one occurring at the intersection of quantum computing and machine learning aka quantum machine learning — a topic this blog has explored previously such as in our 2018 presentation on the same subject.
This sidetrack manifested primarily as a deep dive into the TensorFlow Quantum library, sort of an extension from the well known TensorFlow library for training neural networks. The progress made in the field since that 2018 presentation has been considerable, with nearly every nook of machine learning…
“Hashing is a form of cryptography in which a message is transformed into an encoded representation.”
=> '0f44cb01d838c981156d9f0c030159fb'
In common practice hashing may be used to validate voracity of a message’s sender, such as e.g. by comparing a received hash of a bank account number to a hash of that number on file without having to transmit the actual account number through a channel which may be exposed to an eavesdropper. Thus, a hashing is a deterministic transform where consistently received data will return a consistent encoding. …
This will be a short essay, wanted to just document a theory that I think is helpful way to think about deep learning. There is an open question in research as to why deep over-parameterized models have a regularizing effect, even when the number of parameters exceeds the number of training data points — which intuition might suggest would result in a model simply memorizing the training points, but in practice this type of deep learning instead successfully achieves a kind of generalization. …
After a whirlwind of a week at this year’s online NeurIPS conference (a gathering of machine learning and artificial intelligence researchers), thought I’d take a few minutes to pay tribute to one of the top paper awards, not exactly a huge surprise, as it went to the OpenAI team’s “Language Models are Few Shot Learners” for their work on the GPT-3 natural language model, which is sort of a hugely scaled up version of prior generations (think hundreds of billions of parameters) reaching new thresholds of performance — now achieving what is known as “few shot learning” where a pretrained…
A few excerpts from discussion with reviewers, sharing for transparency purposes:
I appreciate that you offered two specific criteria for software packages, I believe this software has met both of these criteria as follows:
Criteria one: “The software implements a scientifically novel algorithm, framework, model, etc.” I believe the family tree primitives as described in Figure 6 meet this criteria, for the reason that they have formalized a fundamental aspect of processing tabular data, as enabling a simple means for command line specification of multi-transform sets that may include generations and branches of derivations. …
…our life, like the harmony of the world, is composed of contrary things — of diverse tones, sweet and harsh, sharp and flat, sprightly and solemn: the musician who should only affect some of these, what would he be able to do? he must know how to make use of them all, and to mix them; and so we should mingle the goods and evils which are consubstantial with our life; our being cannot subsist without this mixture, and the one part is no less necessary to it than the other.
Michel De Montaigne
In this world of on-demand 24…
For those that haven’t been following along, I’ve been using this forum over the last two years to document the development of Automunge, an open source python library platform for preparing tabular data for machine learning. …
America is at a crossroads. This isn’t exaggeration. This isn’t hyperbole. It’s a simple statement of fact. The election taking place next month has so much on the ballot. The climate is on the ballot. Healthcare is on the ballot. Truth is on the ballot. Democracy is on the ballot.
The sitting president is unfit for office. He has demonstrated with fervent consistency that he is incapable of even remotely truthful communications. His social media streams of consciousness are a window into a self-destructive psyche. He openly praises dictators and refuses to disavow white supremacists. He has alienated our country…
Writing for fun and because it helps me organize my thoughts. I also write software to prepare data for machine learning at automunge.com