Data Skeptic

Deepjazz is a project from Ji-Sung Kim, a computer science student at Princeton University. It is built using Theano, Keras, music21, and Evan Chow's project jazzml. Deepjazz is a computational music project that creates original jazz compositions using recurrent neural networks trained on Pat Metheny's "And Then I Knew". You can hear some of deepjazz's original compositions on soundcloud.

Direct download: deepjazz.mp3
Category:general -- posted at: 8:00am PDT

When working with time series data, there are a number of important diagnostics one should consider to help understand more about the data. The auto-correlative function, plotted as a correlogram, helps explain how a given observations relates to recent preceding observations. A very random process (like lottery numbers) would show very low values, while temperature (our topic in this episode) does correlate highly with recent days.
See the show notes with details about Chapel Hill, NC weather data by visiting:
Direct download: acf.mp3
Category:general -- posted at: 8:00am PDT

This week I spoke with Elham Shaabani and Paulo Shakarian (@PauloShakASU) about their recent paper Early Identification of Violent Criminal Gang Members (also available onarXiv). In this paper, they use social network analysis techniques and machine learning to provide early detection of known criminal offenders who are in a high risk group for committing violent crimes in the future. Their techniques outperform existing techniques used by the police. Elham and Paulo are part of the Cyber-Socio Intelligent Systems (CySIS) Lab.

Direct download: predicting-violent-offenders.mp3
Category:general -- posted at: 8:00am PDT

A dinner party at Data Skeptic HQ helps teach the uses of fractional factorial design for studying 2-way interactions.

Direct download: Fractional_factorial_design.mp3
Category:general -- posted at: 8:00am PDT

Cheng-tao Chu (@chengtao_chu) joins us this week to discuss his perspective on common mistakes and pitfalls that are made when doing machine learning. This episode is filled with sage advice for beginners and intermediate users of machine learning, and possibly some good reminders for experts as well. Our discussion parallels his recent blog postMachine Learning Done Wrong.

Cheng-tao Chu is an entrepreneur who has worked at many well known silicon valley companies. His paper Map-Reduce for Machine Learning on Multicore is the basis for Apache Mahout. His most recent endeavor has just emerged from steath, so please check out

Direct download: machine_learning_done_wrong.mp3
Category:general -- posted at: 8:00am PDT