AI training data comes from humans, not AIs, so every piece of training data for “What would an AI say to X?” is from a human pretending to be an AI. The training data does not contain AIs describing their inner experiences or thought processes. Even synthetic training data only contains AIs predicting what a human pretending to be an AI would say. AIs are trained to predict the training data, not to learn unrelated abilities, so we should expect an AI asked to predict the thoughts of an AI to describe the thoughts of a human pretending to be an AI.

What would a human pretending to be an AI say?

Read more

As incomes have risen, it’s important for Americans to find new ways to spend ever-increasing amounts of money. I propose that we spend some of it traveling to pick and eat fresh fruit that doesn’t travel well.

Content Warning: Knowing how delicious fresh fruit can be is an infohazard for your wallet.

Fresh Fruit Tourism

Read more

A few jobs ago, I worked at company that collected data from disparate sources, then processed and deduplicated it into spreadsheets for ingestion by the data science and customer support teams. Some common questions the engineering team got were:

  • Why is the data in some input CSV missing in the output?
  • Why is data in the output CSV not matching what we expect?

To debug these problems, the process was to try to reverse engineer where the data came from, then try to guess which path that data took through the monolithic data processor.

This is the story of how we stopped doing that, and started storing references to all source data for every piece of output data.

Read more