Fri, 25 March 2016
Co-host Linh Da was in a biking accident after hitting a pothole. She sustained an injury that required stitches. This is the story of our quest to file a 311 complaint and track it through the City of Los Angeles's open data portal. My guests this episode are Chelsea Ursaner (LA City Open Data Team), Ben Berkowitz (CEO and founder of SeeClickFix), and Russ Klettke (Editor of pothole.info) |
Fri, 18 March 2016
Certain data mining algorithms (including k-means clustering and k-nearest neighbors) require a user defined parameter k. A user of these algorithms is required to select this value, which raises the questions: what is the "best" value of k that one should select to solve their problem? This mini-episode explores the appropriate value of k to use when trying to estimate the cost of a house in Los Angeles based on the closests sales in it's area. |
Fri, 11 March 2016
Today on Data Skeptic, Lachlan Gunn joins us to discuss his recent paper Too Good to be True. This paper highlights a somewhat paradoxical / counterintuitive fact about how unanimity is unexpected in cases where perfect measurements cannot be taken. With large enough data, some amount of error is expected. The "Too Good to be True" paper highlights three interesting examples which we discuss in the podcast. You can also watch a lecture from Lachlan on this topic via youtube here. |
Fri, 4 March 2016
How well does your model explain your data? R-squared is a useful statistic for answering this question. In this episode we explore how it applies to the problem of valuing a house. Aspects like the number of bedrooms go a long way in explaining why different houses have different prices. There's some amount of variance that can be explained by a model, and some amount that cannot be directly measured. R-squared is the ratio of the explained variance to the total variance. It's not a measure of accuracy, it's a measure of the power of one's model. |
Fri, 26 February 2016
Jessica Hamrick joins us this week to discuss her work studying mental simulation. Her research combines machine learning approaches iwth behavioral method from cognitive science to help explain how people reason and predict outcomes. Her recent paper Think again? The amount of mental simulation tracks uncertainty in the outcome is the focus of our conversation in this episode. Lastly, Kyle invited Samuel Hansen from the Relative Prime podcast to mention the Relatively Prime Season 3 kickstarter, which needs your support now through Friday, March 11th, 2016. |
Fri, 19 February 2016
This episode is a discussion of multiple regression: the use of observations that are a vector of values to predict a response variable. For this episode, we consider how features of a home such as the number of bedrooms, number of bathrooms, and square footage can predict the sale price. Unlike a typical episode of Data Skeptic, these show notes are not just supporting material, but are actually featured in the episode. The site Redfin gratiously allows users to download a CSV of results they are viewing. Unfortunately, they limit this extract to 500 listings, but you can still use it to try the same approach on your own using the download link shown in the figure below. |
Fri, 12 February 2016
Samuel Mehr joins us this week to share his perspective on why people are musical, where music comes from, and why it works the way it does. We discuss a number of empirical studies related to music and musical cognition, and dispense a few myths about music along the way. Some of Sam's work discussed in this episode include Music in the Home: New Evidence for an Intergenerational Link,Two randomized trials provide no consistent evidence for nonmusical cognitive benefits of brief preschool music enrichment, and Miscommunication of science: music cognition research in the popular press. Additional topics we discussed are also covered in a Harvard Gazette article featuring Sam titled Muting the Mozart effect. You can follow Sam on twitter via @samuelmehr. |
Fri, 5 February 2016
This episode reviews the concept of k-d trees: an efficient data structure for holding multidimensional objects. Kyle gives Linhda a dictionary and asks her to look up words as a way of introducing the concept of binary search. We actually spend most of the episode talking about binary search before getting into k-d trees, but this is a necessary prerequisite. |
Fri, 29 January 2016
Algorithms are pervasive in our society and make thousands of automated decisions on our behalf every day. The possibility of digital discrimination is a very real threat, and it is very plausible for discrimination to occur accidentally (i.e. outside the intent of the system designers and programmers). Christian Sandvig joins us in this episode to talk about his work and the concept of auditing algorithms. Christian Sandvig (@niftyc) has a PhD in communications from Stanford and is currently an Associate Professor of Communication Studies and Information at the University of Michigan. His research studies the predictable and unpredictable effects that algorithms have on culture. His work exploring the topic of auditing algorithms has framed the conversation of how and why we might want to have oversight on the way algorithms effect our lives. His writing appears in numerous publications including The Social Media Collective, The Huffington Post, and Wired. One of his papers we discussed in depth on this episode was Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms, which is well worth a read. |
Fri, 22 January 2016
Today's episode begins by asking how many left handed employees we should expect to be at a company before anyone should claim left handedness discrimination. If not lefties, let's consider eye color, hair color, favorite ska band, most recent grocery store used, and any number of characteristics could be studied to look for deviations from the norm in a company. When multiple comparisons are to be made simultaneous, one must account for this, and a common method for doing so is with the Bonferroni Correction. It is not, however, a sure fire procedure, and this episode wraps up with a bit of skepticism about it. |