Took a week off from working on the preprocessing library to play a little catch up in another interesting domain, one occurring at the intersection of quantum computing and machine learning aka quantum machine learning — a topic this blog has explored previously such as in our 2018 presentation on the same subject.

This sidetrack manifested primarily as a deep dive into the TensorFlow Quantum library, sort of an extension from the well known TensorFlow library for training neural networks. The progress made in the field since that 2018 presentation has been considerable, with nearly every nook of machine learning now finding some theoretic claim staked within QML. Interestingly, this progression of QML has sort of followed in a similar trajectory as the path first trodden decades ago with classical machine learning, as the early branches to establish beachheads were applications like clustering algorithms, kernel methods, principle components, things like that. A common similarity between these methods were the use of quantum algorithms that could speed up linear algebra operations, such as the calculation of eigenvalues, using quantum algorithms like HHL or QSVT. Other applications could make use of quantum optimization algorithms like QAOA. Of course while these speedups were demonstrated in theory, their potential in practice has been somewhat constrained under the reality of current generations of gate based quantum computing hardware, which are still in the NISQ (noisy intermediate scale quantum) era lacking sufficient noise tolerance to reach scale for fault tolerance enabling error correction. …

“Hashing is a form of cryptography in which a message is transformed into an encoded representation.”

`=> '0f44cb01d838c981156d9f0c030159fb'`

In common practice hashing may be used to validate voracity of a message’s sender, such as e.g. by comparing a received hash of a bank account number to a hash of that number on file without having to transmit the actual account number through a channel which may be exposed to an eavesdropper. Thus, a hashing is a deterministic transform where consistently received data will return a consistent encoding. …

This will be a short essay, wanted to just document a theory that I think is helpful way to think about deep learning. There is an open question in research as to why deep over-parameterized models have a regularizing effect, even when the number of parameters exceeds the number of training data points — which intuition might suggest would result in a model simply memorizing the training points, but in practice this type of deep learning instead successfully achieves a kind of generalization. …

After a whirlwind of a week at this year’s online NeurIPS conference (a gathering of machine learning and artificial intelligence researchers), thought I’d take a few minutes to pay tribute to one of the top paper awards, not exactly a huge surprise, as it went to the OpenAI team’s “Language Models are Few Shot Learners” for their work on the GPT-3 natural language model, which is sort of a hugely scaled up version of prior generations (think hundreds of billions of parameters) reaching new thresholds of performance — now achieving what is known as “few shot learning” where a pretrained model can impressively produce reasonable generated language from a small textual prompt provided as input. …

A few excerpts from discussion with reviewers, sharing for transparency purposes:

I appreciate that you offered two specific criteria for software packages, I believe this software has met both of these criteria as follows:

Criteria one: “The software implements a scientifically novel algorithm, framework, model, etc.” I believe the family tree primitives as described in Figure 6 meet this criteria, for the reason that they have formalized a fundamental aspect of processing tabular data, as enabling a simple means for command line specification of multi-transform sets that may include generations and branches of derivations. …

…our life, like the harmony of the world, is composed of contrary things — of diverse tones, sweet and harsh, sharp and flat, sprightly and solemn: the musician who should only affect some of these, what would he be able to do? he must know how to make use of them all, and to mix them; and so we should mingle the goods and evils which are consubstantial with our life; our being cannot subsist without this mixture, and the one part is no less necessary to it than the other.

Michel De Montaigne

In this world of on-demand 24 hour streaming music, there is a risk of taking for granted the wonder of the form. The same songs played on repeat lose their meaning, they become part of the background. Perhaps comforting for their familiarity, but without the stirring of the soul, without the goosebumps and the exhilaration of newly discovered resonance. …

For those that haven’t been following along, I’ve been using this forum over the last two years to document the development of Automunge, an open source python library platform for preparing tabular data for machine learning. …

America is at a crossroads. This isn’t exaggeration. This isn’t hyperbole. It’s a simple statement of fact. The election taking place next month has so much on the ballot. The climate is on the ballot. Healthcare is on the ballot. Truth is on the ballot. Democracy is on the ballot.

The sitting president is unfit for office. He has demonstrated with fervent consistency that he is incapable of even remotely truthful communications. His social media streams of consciousness are a window into a self-destructive psyche. He openly praises dictators and refuses to disavow white supremacists. He has alienated our country from our allies and openly encouraged foreign interference in our elections. …

There have been a few paradigm shifts of note in modern physics. The principles of relativity bent the constancy of space and time at extreme scales, then quantum dynamics broke point-wise precision at the nano. The library of atoms and constituent particles was eventually revealed as an abstraction for aggregations of the subatomic, whose newest member, the Higgs boson, required near light speed particle collisions for evidence.

Marriages between these domains have long been sought by researchers, as macro scale relativity and nano scale quantum have trouble reconciling the nature of gravity, one of the four fundamental forces. One channel of investigation has been the invention of new kinds of mathematics, finding higher dimensions manifesting particles from the vibrations of strings and membranes, and symmetries between dimensions even demonstrated through AdS/CFT correspondence, which translations may yet be shown as a kind of Penrose triangle, with the direction determining the destination. …

Mainstream practice in machine learning with tabular data may take for granted that any feature engineering beyond scaling for numeric sets is superfluous in context of deep neural networks. This paper will offer arguments for potential benefits of extended encodings of numeric streams in deep learning by way of a survey of options for numeric transformations as available in the Automunge open source python library platform for tabular data pipelines, where transformations may be applied to distinct columns in “family tree” sets with generations and branches of derivations. Automunge transformation options include normalization, binning, noise injection, derivatives, and more. The aggregation of these methods into family tree sets of transformations are demonstrated for use to present numeric features to machine learning in multiple configurations of varying information content, as may be applied to encode numeric sets of unknown interpretation. …

About