Data Skeptic
Using Data to Help Those in Crisis

This week Noelle Sio Saldana discusses her volunteer work at Crisis Text Line - a 24/7 service that connects anyone with crisis counselors. In the episode we discuss Noelle's career and how, as a participant in the Pivotal for Good program (a partnership with DataKind), she spent three months helping find insights in the messaging data collected by Crisis Text Line. These insights helped give visibility into a number of different aspects of Crisis Text Line's services. Listen to this episode to find out how!

If you or someone you know is in a moment of crisis, there's someone ready to talk to you by texting the shortcode 741741.

Direct download: Crisis_Text_Line.mp3
Category:data philanthropy -- posted at: 3:00am PDT

Have you ever wondered what is lost when you compress a song into an MP3? This week's guest Ryan Maguire did more than that. He worked on software to issolate the sounds that are lost when you convert a lossless digital audio recording into a compressed MP3 file.

To complete his project, Ryan worked primarily in python using the pyo library as well as the Bregman Toolkit

Ryan mentioned humans having a dynamic range of hearing from 20 hz to 20,000 hz, if you'd like to hear those tones, check the previous link.

If you'd like to know more about our guest Ryan Maguire you can find his website at the previous link. To follow The Ghost in the MP3 project, please checkout their Facebook page, or on the sitetheghostinthemp3.com.

A PDF of Ryan's publication quality write up can be found at this link: The Ghost in the MP3 and it is definitely worth the read if you'd like to know more of the technical details.

Direct download: The_Ghost_in_the_MP3.mp3
Category:audio -- posted at: 10:57pm PDT

This episode contains converage of the 2015 Data Fest hosted at UCLA.  Data Fest is an analysis competition that gives teams of students 48 hours to explore a new dataset and present novel findings.  This year, data from Edmunds.com was provided, and students competed in three categories: best recommendation, best use of external data, and best visualization.

Direct download: Data_Fest_2015.mp3
Category:general -- posted at: 11:55pm PDT

For our 50th episode we enduldge a bit by cooking Linhda's previously mentioned "healthy" cornbread.  This leads to a discussion of the statistical topic of overdispersion in which the variance of some distribution is larger than what one's underlying model will account for.

Direct download: MINI_Cornbread_and_Overdispersion.mp3
Category:miniepisode -- posted at: 12:19am PDT

This episode overviews some of the fundamental concepts of natural language processing including stemming, n-grams, part of speech tagging, and th bag of words approach.

Direct download: nlp.mp3
Category:miniepisode -- posted at: 11:44pm PDT

Guest Youyou Wu discuses the work she and her collaborators did to measure the accuracy of computer based personality judgments. Using Facebook "like" data, they found that machine learning approaches could be used to estimate user's self assessment of the "big five" personality traits: openness, agreeableness, extraversion, conscientiousness, and neuroticism. Interestingly, the computer-based assessments outperformed some of the assessments of certain groups of human beings. Listen to the episode to learn more.

The original paper Computer-based personality judgements are more accurate than those made by humansappeared in the January 2015 volume of the Proceedings of the National Academy of Sciences (PNAS).

For her benevolent Youyou recommends Private traits and attributes are predictable from digital records of human behavior by Michal Kosinski, David Stillwell, and Thore Graepel. It's a similar paper by her co-authors which looks at demographic traits rather than personality traits.

And for her self-serving recommendation, Youyou has a link that I'm very excited about. You can visitApplyMagicSauce.com to see how this model evaluates your personality based on your Facebook like information. I'd love it if listeners participated in this research and shared your perspective on the results via The Data Skeptic Podcast Facebook page. I'm going to be posting mine there for everyone to see.

Direct download: Computer_Based_Personality_Judgments_with_Youyou_Wu.mp3
Category:psychology -- posted at: 8:08pm PDT

This episode explores how going wine testing could teach us about using markov chain monte carlo (mcmc).

Direct download: MINI_mcmc.mp3
Category:miniepisode -- posted at: 11:24pm PDT

This episode introduces the idea of a Markov Chain. A Markov Chain has a set of states describing a particular system, and a probability of moving from one state to another along every valid connected state. Markov Chains are memoryless, meaning they don't rely on a long history of previous observations. The current state of a system depends only on the previous state and the results of a random outcome.

Markov Chains are a useful way method for describing non-deterministic systems. They are useful for destribing the state and transition model of a stochastic system.

As examples of Markov Chains, we discuss stop light signals, bowling, and text prediction systems in light of whether or not they can be described with Markov Chains.

Direct download: MINI_Markov_Chains.mp3
Category:miniepisode -- posted at: 12:00am PDT

Nicole Goebel joins us this week to share her experiences in oceanography studying phytoplankton and other aspects of the ocean and how data plays a role in that science.

 

We also discuss Thinkful where Nicole and I are both mentors for the Introduction to Data Science course.

Last but not least, check out Nicole's blog Data Science Girl and the videos Kyle mentioned on her Youtube channel featuring one on the diversity of phytoplankton and how that changes in time and space.

Direct download: Oceanography_and_Data_Science.mp3
Category:general -- posted at: 12:06am PDT

This episode explores Ordinary Least Squares or OLS - a method for finding a good fit which describes a given dataset.

Direct download: MINI_Ordinary_Least_Squares_Regression.mp3
Category:miniepisode -- posted at: 12:43am PDT