AI Alignment is Philosophy, not Science.

Steven D Marlow
5 min readFeb 24, 2021


This post is a collection of thoughts I had while listening to the Brain Inspired podcast episode 98, with Brian Christian, talking about the alignment problem. There are no specific timestamps for each section, nor did I try to connect all of my thoughts into a single exposition.

For every company that builds a floor cleaning robot, there are nine others trying to turn the latest research paper into money. AI, as a technology, is not being driven by a desire to make everyone’s life better. It’s a tool for gaining marketplace dominance and the endless cycle of data collection which can feed the next iteration of Machine Learning algorithm. We are still in the early days of automation, and any kind of “alignment problem” is as fictional as AGI (for now). There should be a line, or gap, between current and future abilities, but for marketing reasons, ML is over-sold. There are no tools to automate the automation of a business or process. Each step in that direction requires human understanding of the problem space, which is a combination of methods and data.

Human values and the objective function. I can’t take any discussion of what future AI should look like, or where it is going in the long run, as being serious when people mention “the objective.” It’s a core concept in ML and is so single-minded in it’s focus that you can easily imagine 3 generations of researchers all going off the same cliff together, one after the other. In fact, this podcast seems to connect perfectly with Robert Miles recent video on the OTHER AI alignment problem. Proponents always give very simplified examples where an AI is trying to maximize its score in a game, and then suggest that same method could lead to unexpected outcomes when asked to solve cancer. Just, all of cancer, as if the system dynamics between that and small and simple rule-based game with no external factors is basically the same.

Now while cancer is at least something you can study, and run tests on to see what works and what doesn’t, human values are very non-physical and inconsistent. There is no definitive Book of Morals, and we already know that different cultures have some fundamental disagreements that would preclude such a text from ever being created. At best it’s a discussion of “Western values” for a Western market, but good luck getting that idea past the AI Ethics community.

There was a tweet making the rounds, where if you read the first or 3rd tweet in an image, an illusion of a line would be visible thru the 2nd tweet. *spoiler alert* It starts as the lower resolution from your field of view is being preprocessed, and the unresolved text (specifically, the ‘g’ and ‘r’ in the word garlic that is used twice, creating a diagonal “line segment” from the font used) triggers a match with line segment detection that predates walking upright. The visual cortex triggers the subconscious context of there being a line segment there (it’s essentially your brain making you think that you are seeing something) which drives your eyes toward the segment and boom, you don’t actually see it.

Now let’s play the same kind of illusion game with ML. Given nothing but static (just a random assortment of pixels with only 16 different color values), an algorithm that has been “shaped” by 5,000 images of cats is going to “see” different cat images. This is why you see articles that mention AI hallucinating (because it sounds like something scary you should read about!). Instead of turning a few dark blobs into a line, it takes thousands of dots and does a best fit on thousands of images. It’s about replacing simple line detection with a warehouse full of different cookie cutters, all representing cats, dogs, planes, or children’s faces that are associated with “feeling sad.” “We” tell them what we want them to find, making the data fit the results. *This is why I don’t consider ML to be AI, or anything that will lead to actual AGI.

This form of “hidden activation” is actually meaningless without that association I mentioned. All of the computational effort is for matching known patterns (or for turning millions of images into patterns that can be easily searched by removing all the visual meaning we humans might use). The “understanding” part depends on what the humans encoded as categories. Cat standing, cat sitting, and cat jumping does not produce results such as cat reaching into fish bowl. It’s outside of that narrow scope that data labeling provided. The classic AI stuff, the symbolic research, is this post-ML processing area which is very reluctantly getting some attention these days.

In the podcast they talk about bias and a system only seeing one type of thing (such as IR sensor data only being tested on “white” hands that reflect well and having it fail to work on people with darker skin), but in the video I mentioned, it goes a step further to talk about “optimizers” that feed data into models of the world that are also optimizers. It’s the turtles all the way down meme, further pointing to the idea that ML, regardless of how good the data and model might be, just can’t dig its way out of the symbolic hole without added symbolic data.

You can watch someone play an old Atari game for a short time before getting the full understanding of how it works (enough to start playing it). How long would it take you to learn how the different chess pieces “work” from watching people play? How many games of football would you have to watch before you could design your own plays? The more “real world” a situation becomes, the more layers of “hidden games” you would have to observe. And this mapping of knowledge is also a hidden game. Without the network complexity and scale of the human brain, not to mention decades of examples to build on, ML has no chance of ever getting close.



Steven D Marlow

I'm applying for the mad scientist position. Have robot. Will travel.