Data Skeptic (general)

Algorithms with similar runtimes are said to be in the same complexity class. That runtime is measured in the how many steps an algorithm takes relative to the input size.

The class P contains all algorithms which run in polynomial time (basically, a nested for loop iterating over the input).  NP are algorithms which seem to require brute force.  Brute force search cannot be done in polynomial time, so it seems that problems in NP are more difficult than problems in P.  I say it "seems" this way because, while most people believe it to be true, it has not been proven.  This is the famous P vs. NP conjecture.  It will be discussed in more detail in a future episode.

Given a solution to a particular problem, if it can be verified/checked in polynomial time, that problem might be in NP.  If someone hands you a completed Sudoku puzzle, it's not difficult to see if they made any mistakes.  The effort of developing the solution to the Sudoku game seems to be intrinsically more difficult.  In fact, as far as anyone knows, in the general case of all possible examples of the game, it seems no strategy can do better on average than just random guessing.

This notion of random guessing the solution is where the N in NP comes from: Non-deterministic.  Imagine a machine with a random input already written in its memory.  Given enough such machines, one of them will have the right answer.  If they all ran in parallel, one of them could verify it's input in polynomial time.  This guess / provided input is often called a witness string.

NP is an important concept for many reasons.  To me, the most reason to know about NP is a practical one.  Depending on your goals or the goals of your employer, there are many challenging problems you may attempt to solve.  If a problem you are trying to solve happens to be in NP, then you should consider the implications very carefully.  Perhaps you'll be lucky and discover that your particular instance of the problem is easy.  Sudoku is pretty easy if only 2 remaining squares need to be filled in.  The traveling salesman problem is easy to solve if you live in a country where all roads for a ring with exactly one road in and out.

If the problem you wish to solve is not trivial, or if you will face many instances of the problem and expect some will not be trivial, then it's unlikely you'll be able to find the exact solution.  Sure, maybe you can grab a bunch of commodity servers and try to scale the heck out of your attempt.  Depending on the problem you're solving, that might just work.  If you can out-purchase your problem in computing power, then problems in NP will surrender to you.  But if your input size ever grows, it's unlikely you'll be able to keep up.

If your problem is intractable in this way, all is not lost.  You might be able to find an approximate solution to your problem.  Good enough is better than no solution at all, right?  Most of the time, probably.  However, some tremendous work has also been done studying topics like this.  Are there problems which are not even approximable in polynomial time?  What approximation techniques work best?  Alas, those answers lie elsewhere.

This episode avoids a discussion of a few key points in order to keep the material accessible.  If you find this interesting, you should next familiarize yourself with the notions of NP-Complete, NP-Hard, and co-NP.  These are topics we won't necessarily get to in future episodes.  Michael Sipser's Introduction to the Theory of Computation is a good resource.

 

Direct download: sudoku-in-np.mp3
Category:general -- posted at: 8:00am PST

In this episode, Professor Michael Kearns from the University of Pennsylvania joins host Kyle Polich to talk about the computational complexity of machine learning, complexity in game theory, and algorithmic fairness. Michael's doctoral thesis gave an early broad overview of computational learning theory, in which he emphasizes the mathematical study of efficient learning algorithms by machines or computational systems.

When we look at machine learning algorithms they are almost like meta-algorithms in some sense. For example, given a machine learning algorithm, it will look at some data and build some model, and it’s going to behave presumably very differently under different inputs. But does that mean we need new analytical tools? Or is a machine learning algorithm just the same thing as any deterministic algorithm, but just a little bit more tricky to figure out anything complexity-wise? In other words, is there some overlap between the good old-fashioned analysis of algorithms with the analysis of machine learning algorithms from a complexity viewpoint? And what is the difference between strategies for determining the complexity bounds on samples versus algorithms?

A big area of machine learning (and in the analysis of learning algorithms in general) Michael and Kyle discuss is the topic known as complexity regularization. Complexity regularization asks: How should one measure the goodness of fit and the complexity of a given model? And how should one balance those two, and how can one execute that in a scalable, efficient way algorithmically? From this, Michael and Kyle discuss the broader picture of why one should care whether a learning algorithm is efficiently learnable if it's learnable in polynomial time.

Another interesting topic of discussion is the difference between sample complexity and computational complexity. An active area of research is how one should regularize their models so that they're balancing the complexity with the goodness of fit to fit their large training sample size.

As mentioned, a good resource for getting started with correlated equilibria is: https://www.cs.cornell.edu/courses/cs684/2004sp/feb20.pdf

Thanks to our sponsors:

Mendoza College of Business - Get your Masters of Science in Business Analytics from Notre Dame.

brilliant.org - A fun, affordable, online learning tool.  Check out their Computer Science Algorithms course.

Direct download: the-computational-complexity-of-machine-learning.mp3
Category:general -- posted at: 8:00am PST

TMs are a model of computation at the heart of algorithmic analysis.  A Turing Machine has two components.  An infinitely long piece of tape (memory) with re-writable squares and a read/write head which is programmed to change it's state as it processes the input.  This exceptionally simple mechanical computer can compute anything that is intuitively computable, thus says the Church-Turing Thesis.

Attempts to make a "better" Turing Machine by adding things like additional tapes can make the programs easier to describe, but it can't make the "better" machine more capable.  It won't be able to solve any problems the basic Turing Machine can, even if it perhaps solves them faster.

An important concept we didn't get to in this episode is that of a Universal Turing Machine.  Without the prefix, a TM is a particular algorithm.  A Universal TM is a machine that takes, as input, a description of a TM and an input to that machine, and subsequently, simulates the inputted machine running on the given input.

Turing Machines are a central idea in computer science.  They are central to algorithmic analysis and the theory of computation.

Direct download: turing-machines.mp3
Category:general -- posted at: 8:00am PST

How long an algorithm takes to run depends on many factors including implementation details and hardware.  However, the formal analysis of algorithms focuses on how they will perform in the worst case as the input size grows.  We refer to an algorithm's runtime as it's "O" which is a function of its input size "n".  For example, O(n) represents a linear algorithm - one that takes roughly twice as long to run if you double the input size.  In this episode, we discuss a few everyday examples of algorithmic analysis including sorting, search a shuffled deck of cards, and verifying if a grocery list was successfully completed.

Thanks to our sponsor Brilliant.org, who right now is featuring a related problem as their Brilliant Problem of the Week.

Direct download: big-oh-analysis.mp3
Category:general -- posted at: 8:00am PST

One Shot Learning is the class of machine learning procedures that focuses learning something from a small number of examples.  This is in contrast to "traditional" machine learning which typically requires a very large training set to build a reasonable model.

In this episode, Kyle presents a coded message to Linhda who is able to recognize that many of these new symbols created are likely to be the same symbol, despite having extremely few examples of each.  Why can the human brain recognize a new symbol with relative ease while most machine learning algorithms require large training data?  We discuss some of the reasons why and approaches to One Shot Learning.

Direct download: one-shot-learning.mp3
Category:general -- posted at: 8:00am PST

Recommender systems play an important role in providing personalized content to online users. Yet, typical data mining techniques are not well suited for the unique challenges that recommender systems face. In this episode, host Kyle Polich joins Dr. Joseph Konstan from the University of Minnesota at a live recording at FARCON 2017 in Minneapolis to discuss recommender systems and how machine learning can create better user experiences. 

Direct download: recommender-systems-live-from-farcon.mp3
Category:general -- posted at: 8:00am PST

Thanks to our sponsor brilliant.org/dataskeptics

A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An LSTM unit remembers values for either long or short time periods. The key to this ability is that it uses no activation function within its recurrent components. Thus, the stored value is not iteratively modified and the gradient does not tend to vanish when trained with backpropagation through time.

Direct download: long-short-term-memory.mp3
Category:general -- posted at: 8:00am PST

Zillow is a leading real estate information and home-related marketplace. We interviewed Andrew Martin, a data science Research Manager at Zillow, to learn more about how Zillow uses data science and big data to make real estate predictions.

Direct download: zillow-zestimate.mp3
Category:general -- posted at: 8:00am PST

Our guest Pranav Rajpurkar and his coauthored recently published Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks, a paper in which they demonstrate the use of Convolutional Neural Networks which outperform board certified cardiologists in detecting a wide range of heart arrhythmias from ECG data.

Direct download: cardiologist-level-arrhythmia-detection-with-cnns.mp3
Category:general -- posted at: 8:00am PST

RNNs are a class of deep learning models designed to capture sequential behavior.  An RNN trains a set of weights which depend not just on new input but also on the previous state of the neural network.  This directed cycle allows the training phase to find solutions which rely on the state at a previous time, thus giving the network a form of memory.  RNNs have been used effectively in language analysis, translation, speech recognition, and many other tasks.

Direct download: recurrent-neural-networks.mp3
Category:general -- posted at: 8:00am PST