Data Skeptic

## Feed Forward Neural Networks

In a feed forward neural network, neurons cannot form a cycle. In this episode, we explore how such a network would be able to represent three common logical operators: OR, AND, and XOR. The XOR operation is the interesting case.

Below are the truth tables that describe each of these functions.

### AND Truth Table

Input 1 Input 2 Output
0 0 0
0 1 0
1 0 0
1 1 1

### OR Truth Table

Input 1 Input 2 Output
0 0 0
0 1 1
1 0 1
1 1 1

### XOR Truth Table

Input 1 Input 2 Output
0 0 0
0 1 1
1 0 1
1 1 0

The AND and OR functions should seem very intuitive. Exclusive or (XOR) if true if and only if exactly single input is 1. Could a neural network learn these mathematical functions?

Let's consider the perceptron described below. First we see the visual representation, then the Activation function , followed by the formula for calculating the output.

Can this perceptron learn the AND function?

Sure. Let and

Yup. Let and

An infinite number of possible solutions exist, I just picked values that hopefully seem intuitive. This is also a good example of why the bias term is important. Without it, the AND function could not be represented.

No. It is not possible to represent XOR with a single layer. It requires two layers. The image below shows how it could be done with two laters.

In the above example, the weights computed for the middle hidden node capture the essence of why this works. This node activates when recieving two positive inputs, thus contributing a heavy penalty to be summed by the output node. If a single input is 1, this node will not activate.

Universal approximation theorem tells us that any continuous function can be tightly approximated using a neural network with only a single hidden layer and a finite number of neurons. With this in mind, a feed forward neural network should be adaquet for any applications. However, in practice, other network architectures and the allowance of more hidden layers are empirically motivated.

Other types neural networks have less strict structal definitions. The various ways one might relax this constraint generate other classes of neural networks that often have interesting properties. We'll get into some of these in future mini-episodes.

Check out our recent blog post on how we're using Periscope Data cohort charts.

Thanks to Periscope Data for sponsoring this episode. More about them at periscopedata.com/skeptics

Category:general -- posted at: 8:00am PDT

In this Data Skeptic episode, Kyle is joined by guest Ruggiero Cavallo to discuss his latest efforts to mitigate the problems presented in this new world of online advertising. Working with his collaborators, Ruggiero reconsiders the search ad allocation and pricing problems from the ground up and redesigns a search ad selling system. He discusses a mechanism that optimizes an entire page of ads globally based on efficiency-maximizing search allocation and a novel technical approach to computing prices.

Category:general -- posted at: 8:00am PDT

Today's episode overviews the perceptron algorithm. This rather simple approach is characterized by a few particular features. It updates its weights after seeing every example, rather than as a batch. It uses a step function as an activation function. It's only appropriate for linearly separable data, and it will converge to a solution if the data meets these criteria. Being a fairly simple algorithm, it can run very efficiently. Although we don't discuss it in this episode, multi-layer perceptron networks are what makes this technique most attractive.

Category:general -- posted at: 8:00am PDT

DataRefuge is a public collaborative, grassroots effort around the United States in which scientists, researchers, computer scientists, librarians and other volunteers are working to download, save, and re-upload government data. The DataRefuge Project, which is led by the UPenn Program in Environmental Humanities and the Penn Libraries group at University of Pennsylvania, aims to foster resilience in an era of anthropogenic global climate change and raise awareness of how social and political events affect transparency.

Category:general -- posted at: 8:00am PDT

If a CEO wants to know the state of their business, they ask their highest ranking executives. These executives, in turn, should know the state of the business through reports from their subordinates. This structure is roughly analogous to a process observed in deep learning, where each layer of the business reports up different types of observations, KPIs, and reports to be interpreted by the next layer of the business. In deep learning, this process can be thought of as automated feature engineering. DNNs built to recognize objects in images may learn structures that behave like edge detectors in the first hidden layer. Proceeding layers learn to compose more abstract features from lower level outputs. This episode explore that analogy in the context of automated feature engineering.

Linh Da and Kyle discuss a particular image in this episode. The image included below in the show notes is drawn from the work of Lee, Grosse, Ranganath, and Ng in their paper Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations.

Category:general -- posted at: 8:00am PDT

In this episode, I speak with Raghu Ramakrishnan, CTO for Data at Microsoft.  We discuss services, tools, and developments in the big data sphere as well as the underlying needs that drove these innovations.

Category:general -- posted at: 9:28am PDT

In this episode, we talk about a high-level description of deep learning.  Kyle presents a simple game (pictured below), which is more of a puzzle really, to try and give  Linh Da the basic concept.

Thanks to our sponsor for this week, the Data Science Association. Please check out their upcoming Dallas conference at dallasdatascience.eventbrite.com

Category:general -- posted at: 8:00am PDT

Versioning isn't just for source code. Being able to track changes to data is critical for answering questions about data provenance, quality, and reproducibility. Daniel Whitenack joins me this week to talk about these concepts and share his work on Pachyderm. Pachyderm is an open source containerized data lake.

During the show, Daniel mentioned the Gopher Data Science github repo as a great resource for any data scientists interested in the Go language. Although we didn't mention it, Daniel also did an interesting analysis on the 2016 world chess championship that complements our recent episode on chess well. You can find that post here

Supplemental music is Lee Rosevere's Let's Start at the Beginning.

Thanks to Periscope Data for sponsoring this episode. More about them at periscopedata.com/skeptics

Category:general -- posted at: 9:09am PDT

Logistic Regression is a popular classification algorithm. In this episode we discuss how it can be used to determine if an audio clip represents one of two given speakers. It assumes an output variable (isLinhda) is a linear combination of available features, which are spectral bands in the discussion on this episode.

Thanks to our sponsor this week, the Data Science Association.  Please check out their upcoming conference in Dallas on Saturday, February 18th, 2017 via the link below.

dallasdatascience.eventbrite.com

The figures below are referenced during the episode.

The top waveform is Linh Da, the bottom is Kyle.  We use the same order below.