Data Skeptic

Our guest today is Eric Boyd, the Corporate Vice President of AI at Microsoft. Eric joins us to share how organizations can leverage AI for faster development.

Eric shared the benefits of using natural language to build products. He discussed the future of version control and the level of AI background required to get started with Azure AI. He mentioned some foundational models in Azure AI and their capabilities. Follow Eric on LinkedIn to learn more about his work.

Visit today's sponsor at https://webai.com/dataskeptic

Direct download: ai-platforms.mp3
Category:general -- posted at: 6:33am PST

We are excited to be joined by Aaron Reich and Priyanka Shah. Aaron is the CTO at Avanade, while Priyanka leads their AI/IoT offering for the SEA Region. Priyanka is also the MVP for Microsoft AI. They join us to discuss how LLMs are deployed in organizations.

Direct download: deploying-llms.mp3
Category:general -- posted at: 10:28am PST

In this episode, we are joined by Jenny Liang, a PhD student at Carnegie Mellon University, where she studies the usability of code generation tools. She discusses her recent survey on the usability of AI programming assistants.

Jenny discussed the method she used to gather people to complete her survey. She also shared some questions in her survey alongside vital takeaways. She shared the major reasons for developers not wanting to us code-generation tools. She stressed that the code-generation tools might access the software developers' in-house code, which is intellectual property.

Learn more about Jenny Liang via https://jennyliang.me/

 

Direct download: a-survey-assessing-github-copilot.mp3
Category:general -- posted at: 11:11am PST

We are joined by Aman Madaan and Shuyan Zhou. They are both PhD students at the Language Technology Institute at Carnegie Mellon University. They join us to discuss their latest published paper, PAL: Program-aided Language Models.

Aman and Shuyan started by sharing how the application of LLMs has evolved. They talked about the performance of LLMs on arithmetic tasks in contrast to coding tasks. Aman introduced their PAL model and how it helps LLMs improve at arithmetic tasks. He shared examples of the tasks PAL was tested on. Shuyan discussed how PAL’s performance was evaluated using Big Bench hard tasks.

They discussed the kind of mistakes LLMs tend to make and how the PAL’s model circumvents these limitations. They also discussed how these developments in LLMS can improve kids learning.

Rounding up, Aman discussed the CoCoGen project, a project that enables NLP tasks to be converted to graphs. Shuyan and Aman shared their next research steps.

Follow Shuyan on Twitter @shuyanzhxyc. Follow Aman on @aman_madaan.

Direct download: program-aided-language-models.mp3
Category:general -- posted at: 7:00am PST

In this episode, we have Alessio Buscemi, a software engineer at Lifeware SA. Alessio was a post-doctoral researcher at the University of Luxembourg. He joins us to discuss his paper, A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages.  Alessio shared his thoughts on whether ChatGPT is a threat to software engineers. He discussed how LLMs can help software engineers become more efficient.

Direct download: which-programming-language-is-chatgpt-best-at.mp3
Category:general -- posted at: 6:00am PST

On the show today, we are joined by Jianan Zhao, a Computer Science student at Mila and the University of Montreal. His research focus is on graph databases and natural language processing. He joins us to discuss how to use graphs with LLMs efficiently.

 

Direct download: graph-text.mp3
Category:general -- posted at: 11:33am PST

Today, we are joined by Rajiv Movva, a PhD student in Computer Science at Cornell Tech University. His research interest lies in the intersection of responsible AI and computational social science. He joins to discuss the findings of this work that analyzed LLM publication patterns.

He shared the dataset he used for the survey. He also discussed the conditions for determining the papers to analyze. Rajiv shared some of the trends he observed from his analysis. For one, he observed there has been an increase in LLMs research. He also shared the proportions of papers published by universities, organizations, and industry leaders in LLMs such as OpenAI and Google. He mentioned the majority of the papers are centered on the social impact of LLMs. He also discussed other exciting application of LLMs such as in education.

Direct download: arxiv-publication-patterns.mp3
Category:general -- posted at: 10:20am PST

We are excited to be joined by Josh Albrecht, the CTO of Imbue. Imbue is a research company whose mission is to create AI agents that are more robust, safer, and easier to use. He joins us to share findings of his work; Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety.

 

Direct download: do-llms-make-ethical-choices.mp3
Category:general -- posted at: 6:00am PST

On today’s show, we are joined by Thilo Hagendorff, a Research Group Leader of Ethics of Generative AI at the University of Stuttgart. He joins us to discuss his research, Deception Abilities Emerged in Large Language Models.

Thilo discussed how machine psychology is useful in machine learning tasks. He shared examples of cognitive tasks that LLMs have improved at solving. He shared his thoughts on whether there’s a ceiling to the tasks ML can solve.

Direct download: emergent-deception-in-llms.mp3
Category:general -- posted at: 8:26am PST

Nieves Montes, a Ph.D. student at the Artificial Intelligence Research Institute in Barcelona, Spain, joins us. Her PhD research revolves around value-based reasoning in relation to norms. She shares her latest study, Combining theory of mind and abductive reasoning in agent‑oriented programming.

Direct download: agents-with-theory-of-mind-play-hanabi.mp3
Category:general -- posted at: 6:58am PST

We are joined by Maximilian Mozes, a PhD student at the University College, London. His PhD research focuses on Natural Language Processing (NLP), particularly the intersection of adversarial machine learning and NLP. He joins us to discuss his latest research, Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities.

Direct download: llms-for-evil.mp3
Category:general -- posted at: 7:00am PST

Our guest today is Vid Kocijan, a Machine Learning Engineer at Kumo AI. Vid has a Ph.D. in Computer Science at the University of Oxford. His research focused on common sense reasoning, pre-training in LLMs, pretraining in knowledge-based completion, and how these pre-trainings impact societal bias. He joins us to discuss how he built a BERT model that solved the Winograd Schema Challenge.

Direct download: the-defeat-of-the-winograd-schema-challenge.mp3
Category:general -- posted at: 6:33am PST

Today, We are joined by Petter Törnberg, an Assistant Professor in Computational Social Science at the University of Amsterdam and a Senior Researcher at the University of Neuchatel. His research is centered on the intersection of computational methods and their applications in social sciences. He joins us to discuss findings from his research papers, ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning, and How to use LLMs for Text Analysis.

Direct download: llms-in-social-science.mp3
Category:general -- posted at: 9:25am PST

In this episode, we are joined by Carlos Hernández Oliván, a Ph.D. student at the University of Zaragoza. Carlos’s interest focuses on building new models for symbolic music generation.

Carlos shared his thoughts on whether these models are genuinely creative. He revealed situations where AI-generated music can pass the Turing test. He also shared some essential considerations when constructing models for music composition.

Direct download: llms-in-music-composition.mp3
Category:general -- posted at: 6:45am PST

Hongyi Wang, a Senior Researcher at the Machine Learning Department at Carnegie Mellon University, joins us. His research is in the intersection of systems and machine learning. He discussed his research paper, Cuttlefish: Low-Rank Model Training without All the Tuning, on today’s show.

Hogyi started by sharing his thoughts on whether developers need to learn how to fine-tune models. He then spoke about the need to optimize the training of ML models, especially as these models grow bigger. He discussed how data centers have the hardware to train these large models but not the community. He then spoke about the Low-Rank Adaptation (LoRa) technique and where it is used.

Hongyi discussed the Cuttlefish model and how it edges LoRa. He shared the use cases of Cattlefish and who should use it. Rounding up, he gave his advice on how people can get into the machine learning field. He also shared his future research ideas.

Direct download: cuddlefish-model-tuning.mp3
Category:general -- posted at: 7:38am PST

On today’s episode, we have Daniel Rock, an Assistant Professor of Operations Information and Decisions at the Wharton School of the University of Pennsylvania. Daniel’s research focuses on the economics of AI and ML, specifically how digital technologies are changing the economy.

Daniel discussed how AI has disrupted the job market in the past years. He also explained that it had created more winners than losers.

Daniel spoke about the empirical study he and his coauthors did to quantify the threat LLMs pose to professionals. He shared how they used the O-NET dataset and the BLS occupational employment survey to measure the impact of LLMs on different professions. Using the radiology profession as an example, he listed tasks that LLMs could assume.

Daniel broadly highlighted professions that are most and least exposed to LLMs proliferation. He also spoke about the risks of LLMs and his thoughts on implementing policies for regulating LLMs.

Direct download: which-professions-are-threatened-by-llms.mp3
Category:general -- posted at: 5:00am PST

We are excited to be joined by J.D. Zamfirescu-Pereira, a Ph.D. student at UC Berkeley. He focuses on the intersection of human-computer interaction (HCI) and artificial intelligence (AI). He joins us to share his work in his paper, Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts.  The discussion also explores lessons learned and achievements related to BotDesigner, a tool for creating chat bots.

Direct download: why-prompting-is-hard.mp3
Category:general -- posted at: 10:13am PST

In this episode, we are joined by Ryan Liu, a Computer Science graduate of Carnegie Mellon University. Ryan will begin his Ph.D. program at Princeton University this fall. His Ph.D. will focus on the intersection of large language models and how humans think. Ryan joins us to discuss his research titled "ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing"

Direct download: automated-peer-review.mp3
Category:general -- posted at: 7:00am PST

The creators of large language models impose restrictions on some of the types of requests one might make of them.  LLMs commonly refuse to give advice on committing crimes, producting adult content, or respond with any details about a variety of sensitive subjects.  As with any content filtering system, you have false positives and false negatives.

Today's interview with Max Reuter and William Schulze discusses their paper "I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models".  In this work, they explore what types of prompts get refused and build a machine learning classifier adept at predicting if a particular prompt will be refused or not.

Direct download: prompt-refusal.mp3
Category:general -- posted at: 6:00am PST

Our guest today is Maciej Świechowski. Maciej is affiliated with QED Software and QED Games. He has a Ph.D. in Systems Research from the Polish Academy of Sciences. Maciej joins us to discuss findings from his study, Deep Learning and Artificial General Intelligence: Still a Long Way to Go.

Direct download: a-long-way-till-agi.mp3
Category:general -- posted at: 4:00am PST

Today on the show, we are joined by Lin Zhao and Lu Zhang. Lin is a Senior Research Scientist at United Imaging Intelligence, while Lu is a Ph.D. candidate at the Department of Computer Science and Engineering at the University of Texas. They both shared findings from their work When Brain-inspired AI Meets AGI.

Lin and Lu began by discussing the connections between the brain and neural networks. They mentioned the similarities as well as the differences. They also shared whether there is a possibility for solid advancements in neural networks to the point of AGI. They shared how understanding the brain more can help drive robust artificial intelligence systems.

Lin and Lu shared how the brain inspired popular machine learning algorithms like transformers. They also shared how AI models can learn alignment from the human brain. They juxtaposed the low energy usage of the brain compared to high-end computers and whether computers can become more energy efficient.

Direct download: brain-inspired-ai.mp3
Category:general -- posted at: 5:45pm PST

On today’s show, we are joined by Michael Timothy Bennett, a Ph.D. student at the Australian National University. Michael’s research is centered around Artificial General Intelligence (AGI), specifically the mathematical formalism of AGIs. He joins us to discuss findings from his study, Computable Artificial General Intelligence.

Direct download: computable-agi.mp3
Category:general -- posted at: 6:00am PST

We are joined by Koen Holtman, an independent AI researcher focusing on AI safety. Koen is the Founder of Holtman Systems Research, a research company based in the Netherlands.

Koen started the conversation with his take on an AI apocalypse in the coming years. He discussed the obedience problem with AI models and the safe form of obedience.

Koen explained the concept of Markov Decision Process (MDP) and how it is used to build machine learning models.

Koen spoke about the problem of AGIs not being able to allow changing their utility function after the model is deployed. He shared another alternative approach to solving the problem. He shared how to engineer AGI systems now and in the future safely. He also spoke about how to implement safety layers on AI models.

Koen discussed the ultimate goal of a safe AI system and how to check that an AI system is indeed safe. He discussed the intersection between large language Models (LLMs) and MDPs. He shared the key ingredients to scale the current AI implementations.

Direct download: agi-can-be-safe.mp3
Category:general -- posted at: 1:09pm PST

An assistant professor of Psychology at Harvard University, Tomer Ullman, joins us. Tomer discussed the theory of mind and whether machines can indeed pass it. Using variations of the Sally-Anne test and the Smarties tube test, he explained how LLMs could fail the theory of mind test.

Direct download: ai-fails-on-theory-of-mind-tasks.mp3
Category:general -- posted at: 9:35am PST

The application of LLMs cuts across various industries. Today, we are joined by Steven Van Vaerenbergh, who discussed the application of AI in mathematics education. He discussed how AI tools have changed the landscape of solving mathematical problems. He also shared LLMs' current strengths and weaknesses in solving math problems.

Direct download: ai-for-mathematics-education.mp3
Category:general -- posted at: 6:00am PST

Fabricio Goes, a Lecturer in Creative Computing at the University of Leicester, joins us today. Fabricio discussed what creativity entails and how to evaluate jokes with LLMs. He specifically shared the process of evaluating jokes with GPT-3 and GPT-4. He concluded with his thoughts on the future of LLMs for creative tasks.

Direct download: evaluating-jokes-with-llms.mp3
Category:general -- posted at: 9:06am PST

Barry Smith and Jobst Landgrebe, authors of the book “Why Machines will never Rule the World,” join us today. They discussed the limitations of AI systems in today’s world. They also shared elaborate reasons AI will struggle to attain the level of human intelligence.

Direct download: why-machines-will-never-rule-the-world.mp3
Category:general -- posted at: 3:17pm PST

While the possibilities with AGI emergence seem great, it also calls for safety concerns. On the show, Vahid Behzadan, an Assistant Professor of Computer Science and Data Science, joins us to discuss the complexities of modeling AGIs to accurately achieve objective functions. He touched on tangent issues such as abstractions during training, the problem of unpredictability, communications among agents, and so on.

Direct download: a-psychopathological-approach-to-safety-in-agi.mp3
Category:general -- posted at: 11:59pm PST

Julian Michael, a postdoc at the Center for Data Science, New York University, joins us today. Julian’s conversation with Kyle was centered on the NLP community metasurvey: a survey aimed at understanding expert opinions on controversial NLP issues. He shared the process of preparing the survey as well as some shocking results.

Direct download: the-nlp-community-metasurvey.mp3
Category:general -- posted at: 6:00am PST

Kyle shares his own perspectives on challenges getting insight from surveys. The discussion ranges from commentary on the market research industry to specific advice for detecting disingenuous or fraudulent responses and filtering them from your analysis. Finally, he shares some quick thoughts on the usage of the Chi-Square test for interpreting cross tab results in survey analysis.

 
Direct download: skeptical-survey-interpretation.mp3
Category:general -- posted at: 5:10pm PST

Jeff Jones, a Senior Editor at Gallup, joins us today. His conversation with Kyle spanned a range of topics on Gallup’s poll creation process. He discussed how Gallup generates unbiased questionnaires, gets respondents, analyzes results, and everything in between.

Direct download: the-gallup-poll.mp3
Category:general -- posted at: 9:49am PST

Gireeja Ranade, a University of California at Berkeley professor, speaks with us today. She presented her study on implementing inclusive study groups at scale and shared the observed student performance improvements after the intervention.

Direct download: inclusive-study-group-formation-at-scale.mp3
Category:general -- posted at: 9:34pm PST

Today, we are joined by David Bourget. David is an Associate Professor in Philosophy at Western University in London, Ontario. David is also the co-director of the PhilPapers Foundation and Director of the Center for Digital Philosophy. He joins us to discuss the PhilPapers Survey project.

The PhilPapers survey was initially taken in 2009, but there was a follow-up survey in 2020. David discussed the need for the subsequent survey and what changed. He mentioned the metric for measuring the opinion changes between the 2009 and 2020 surveys. He also shared future plans for the PhilPapers surveys.

Direct download: the-phil-papers-survey.mp3
Category:general -- posted at: 11:51am PST

Today’s show focused on an essential part of surveys — missing values. This is typically caused by a low response rate or non-response from respondents. Yajuan Si is a Research Associate Professor at the Survey Research Center at the University of Michigan. She joins us to discuss dealing with bias from low survey response rates.

Direct download: non-response-bias.mp3
Category:general -- posted at: 10:00am PST

We are joined by two guests today, Mariah, a Ph.D. student in the CORE Robotics Lab at Georgia Tech, and Matthew Gombolay, the Director of the CORE Robotics Lab. They both discuss practices for measuring a respondent’s perception in a survey.

Direct download: measuring-trust-in-robots-with-likert-scales.mp3
Category:general -- posted at: 11:05am PST

Ever wondered what your next career would be? Today, Keyon Vafa, a computer science Ph.D. student at Columbia University, joins us to discuss his latest research on developing a machine-learning model for career prediction. Keyon extensively spoke about how the model was developed and the possibilities it brings.

Direct download: career-prediction.mp3
Category:general -- posted at: 8:36am PST

Noura Insolera, a Research Investigator with the Panel Study of Income Dynamics (PSID), joins us to share how PSID conducts longitudinal household surveys. She also shared some interesting findings from their data exploration, particularly on the observation and trends in food insecurity.

Direct download: the-panel-study-of-income-dynamics.mp3
Category:general -- posted at: 11:13pm PST

Susan Gerbic joins Kyle to review some of the surveys Data Skeptic has launch, draft a new survey about podcast listening habits, and then review the results of that survey. You can see those results at the link below.

https://survey.dataskeptic.com/survey/result/1675102237053

Watch the videos Susan mentioned on her Youtube page at the link below.

https://www.youtube.com/playlist?list=PL7VAuaQDhPTVaLeI1IcpYph5lH19xA1u4

Direct download: survey-design-working-session.mp3
Category:general -- posted at: 11:10am PST

The use of social bots to fill out online surveys is becoming prevalent. Today, we speak with Sara Bybee, a postdoctoral research scholar at the University of Utah. Sara shares from her research, how she detected social bots, the strategies to curb them, and how underrepresented groups can be more represented in surveys.

Direct download: bot-detection-and-dyadic-surveys.mp3
Category:general -- posted at: 10:08am PST

Our guest today is Zoltán Kekecs, a Ph.D. holder in Behavioural Science. Zoltán highlights the problem of low replicability in journal papers and illustrates how researchers can better ensure complete replication of their research and findings. He used Bem’s experiment as an example, extensively talking about his methodology and results.

Direct download: reproducible-esp-testing.mp3
Category:general -- posted at: 6:00am PST

On the show, Iñigo Martinez, a Ph.D. student at the University of Navarra shares his survey results which investigated how data practitioners perform data science projects. He revealed the methodologies typically used by data practitioners and the success factors in data science projects.

Direct download: a-survey-of-data-science-methodologies.mp3
Category:general -- posted at: 9:43am PST

On the show today, Dino Carpentras, a post-doctoral researcher at the Computational Social Science group at ETH Zürich joins us to discuss how opinion dynamics models are built and validated. He explained how quantifying opinions is complex, and strategies to develop robust models for measuring and predicting public opinions.

Direct download: opinion-dynamics-models.mp3
Category:general -- posted at: 12:17pm PST

Crafting survey questions is one thing but getting your audience to fill it is yet another. On the show today, we speak with Alexander Nolte, an Associate Professor at the University of Tartu. Alexander discussed the use of Casual Affective Triggers (CAT) to incentivize people to accept survey invitations and improve the completion rate. He revealed the impact of CATs on survey response rates from a study he conducted.

Direct download: causal-affective-triggers.mp3
Category:general -- posted at: 11:58am PST

Traditional surveys have straight-jacket questions to be answered, thus restricting the information that can be gotten. Today, Ziang Xiao, a Postdoc Researcher in the FATE group at Microsoft Research Montréal, talks about conversational surveys, a type of survey that asks questions based on preceding answers. He discussed the benefits of conversational surveys and some of the challenges it poses.

Direct download: conversational-surveys.mp3
Category:general -- posted at: 6:00am PST

Today, Jenny Tang, a Ph.D. student of societal computing at Carnegie Mellon University discusses her work on the generalization of privacy and security surveys on platforms such as Amazon MTurk and Prolific. Jenny shared the drawbacks of using such online platforms, the discrepancies observed about the samples drawn, and key insights from her results.

Direct download: do-results-generalize-for-privacy-and-security-surveys.mp3
Category:general -- posted at: 12:39pm PST

This episode kicks off the new season of the show, Data Skeptic: Surveys.  Linhda rejoins the show for a conversation with Kyle about her experience taking surveys and what questions she has for the season.  Lastly, Kyle announces the launch of survey.dataskeptic.com, a new site we're launching to gather your opinions.  Please take a moment and share your thoughts!

Direct download: 4-out-of-5-data-scientists-agree.mp3
Category:general -- posted at: 8:46am PST

It may be intuitive to think crowdfunding a project drives its innovation and novelty, but there are no empirical studies that prove this. On the show, Johannes Wachs shares his research that sought to determine whether crowdfunding truly drives innovation. He used board games as a case study and shared the results he found.

Direct download: crowdfunded-board-games.mp3
Category:general -- posted at: 6:00am PST

There were reports of Russia’s interference in the 2016 US elections. In today’s episode, Koustuv Saha, a researcher at Microsoft Research walks us through the effect of targeted ads for political campaigns. Using practical examples, he discusses how targeted ads can propagate fake news, its ripple effects on electioneering, and how to find a sweet spot with targeted ads.

Direct download: russian-election-interference-effectiveness.mp3
Category:general -- posted at: 6:05am PST

There is an unsung kind of ad fraud brewing in the ad tech space — placement laundering fraud. On the show, Jeff Kline discusses what placement laundering fraud is, how it can be identified, and possible solutions to it. Listen to learn more.

Direct download: placement-laundering-fraud.mp3
Category:general -- posted at: 9:38am PST

Bosko Milekic, the Co-founder of Optable, a data collaboration platform for the media and advertising industry, joins us today. Bosko talked about the clean rooms, the technology driving data privacy during collaboration. He discussed why clean rooms are gaining widespread adoption, and how users can exploit Optable’s clean room platform for a secured data-sharing experience.

Direct download: data-clean-rooms.mp3
Category:general -- posted at: 9:54am PST

Kerstin Bongard-Blanchy is a Research Associate at the University of Luxembourg. She joins us to discuss her study that investigated dark patterns in web designs. She discussed the results, the effect of dark patterns effect on users, whether an average user can detect them, and the way forward to a more ethical web space.

Direct download: dark-patterns-in-site-design.mp3
Category:general -- posted at: 9:04am PST

We are joined by Anthony Katsur, the CEO of IAB Tech Lab. Anthony discusses standards within the ad tech industry. He explained how IAB Tech Lab set and propagates global standards, actions to ensure compliance from advertisers, and industry trends for a more privacy-centric ad tech space.

Direct download: internet-advertising-bureau-media-lab.mp3
Category:general -- posted at: 9:19am PST

When we navigate a webpage, it is fairly easy for our mouse movement to be tracked and collected. Today, Luis Leiva, a Professor of Computer Science discusses how these mouse tracking data can be used to predict age, gender and user attention. He also discusses the privacy concerns with mouse tracking data and possible ways it can be curtailed.

Direct download: your-mouse-reveals-your-gender-and-age.mp3
Category:general -- posted at: 6:00am PST

On the show, Aleksandra Urman and Mykola Makhortykh join us to discuss their work on the comparative analysis of web search behavior using web tracking data. They shared interesting results from their analysis, bordering around the user preferences for search engines, demographic patterns, and differences between how men and women surf the net.

Direct download: measuring-web-search-behavior.mp3
Category:general -- posted at: 7:56am PST

Did Aristotle Use a Laptop?  That's a question from the StrategyQA benchmark which highlights the stretch goals for current artificial intelligence systems.  Answering a question like that requires several cognitive steps and reasoning.  Constructing a dataset of similarly challenging questions is a major undertaking.  On today's episode, Mor Geva returns to share details about the creation of StrategyQA and the larger Big Bench dataset it has been included in.

Direct download: big-bench.mp3
Category:general -- posted at: 7:39pm PST

While at first glance, the use of ad blockers drops the revenue of news publishers, this may not be completely true. On the show today, Shunyao Yan, an Assistant Professor in Marketing at Leavey School of Business, Santa Clara University, discussed the effect of ad blockers on news consumption and how ad blockers can potentially be helpful for news publishers.

Direct download: ad-blockers-effect-on-news-consumption.mp3
Category:general -- posted at: 8:17am PST

People who do not want their data tracked and shared online can pay a token for a cookie paywall. But are the websites keeping to their side of the bargain? Victor Morel, a Postdoc candidate at the Chalmers University of Technology joins us to discuss his work around auditing the activities of cookie paywalls. He discussed the findings from his analysis and proffers some solutions to making cookie paywalls more transparent.

Direct download: your-consent-is-worth-75-euros-a-year.mp3
Category:general -- posted at: 7:00am PST

The advancement of generative language models has been a force for good, but also for evil. On the show, Avisha Das, a post-doctoral scholar at the University of Texas Health Center, joins us to discuss how attackers use machine learning to create unsuspecting phishing emails. She also discussed how she used RNN for automated email generation, with the goal of defeating statistical detectors. 

Direct download: automated-email-generation-for-targeted-attacks.mp3
Category:general -- posted at: 8:52am PST

Peter Gloor, a Research Scientist at the MIT Center for Collective Intelligence, takes us on a new world of tribe classification. He extensively discussed the need for such classification on the internet and how he built a machine learning model that does it. Listen to find out more!

Direct download: tribal-marketing.mp3
Category:general -- posted at: 8:33am PST

Direct download: nano-targetted-facebook-ads.mp3
Category:general -- posted at: 5:55am PST

We hear about the impeccable achievements of GPT-3 models, but such large generative models come with their bias. On the show today, Conrad Borchers, a Ph.D. student in Human-Computer Interaction, joins us to discuss the bias in GPT-3 for job ads and how such large models can be de-biased. Listen to learn more!

Direct download: debiasing-gpt3-job-ads.mp3
Category:general -- posted at: 6:00am PST

Moses Guttman from Clear ML joins us to share insights about how organizations leveraging machine learning keep their programs on track.  While many parallels exist between the software development life cycle (SWLC) and the machine learning development life cycle, successful deployments of ML in production have demonstrated that a unique set of tools is required.  Moses and I discuss the emergence of ML Ops, success stories, and how modern teams leverage tools like Clear ML's open source solution to maximize the value of ML in the organization.

 

Direct download: ml-ops-in-production.mp3
Category:general -- posted at: 2:03pm PST

Data sharing in the ad tech space has largely been a black box system. While it is obvious the data is being collected, the data sharing process is obscure to users. On the show today, Maaz Bin Musa and Rishab, both researchers at the University of Iowa, speak about the importance of data transparency and their tool, ATOM for data transparency. Listen to find out how ATOM uncovers data-sharing relationships in the ad-tech space.

Direct download: ad-network-tomography.mp3
Category:general -- posted at: 6:00am PST

When you accept cookies on a website, you cannot tell whether the cookies are used for tracking your personal data or not. Shaoor Munir’s machine learning model does that. On the show today, the Ph.D student at the University of California, discussed the world of first-party cookies and how he developed a machine learning model that predicts whether a first-party cookie is used for tracking purposes.

Direct download: first-party-tracking-cookies.mp3
Category:general -- posted at: 6:00am PST

Liza Gak, a Ph.D. student at UC Berkeley, joins us to discuss her research on harmful weight loss advertising. She discussed how weight loss ads are not fact-checked, and how they typically target the most vulnerable. She extensively discussed her interview process, data analysis, and results. Listen for more!

Direct download: the-harms-of-targeted-weight-loss-ads.mp3
Category:general -- posted at: 6:00am PST

Growing your podcast to the point of monetization is not a walk in the park. Today, Rob Walch, the VP of Podcast Relations at Libsyn talks about podcast advertising. He discussed how advertising works, how to grow your audience and some blueprints to being a successful podcaster. Listen for more.

Direct download: podcast-advertising.mp3
Category:general -- posted at: 6:00am PST

When we search for products in e-commerce stores, we do not care what goes on under the hood to generate the results. However, there may be an intentional algorithmic effort to gravitate us toward a particular product. On the show, today, Abhisek Dash and Saptarshi Ghosh discuss their research on fairness in the search result of Amazon smart speakers.

Direct download: fairness-in-e-commerce-search.mp3
Category:general -- posted at: 7:41am PST

Chances are that you have bought a product online majorly because of the reviews you saw. Unfortunately, not all reviews are genuine. Today, Rajvardhan Oak shares some insight from his research on fraudulent Amazon reviews. He explained the inner workings of fraudulent reviews and revealed key insights from his qualitative and quantitative study.

Direct download: fraudulent-amazon-reviewers.mp3
Category:general -- posted at: 6:00am PST

While we give attention to textual data on the web, many do not know the unique power of echo interactions with smart devices for ad targeting. Today, our guest, Umar Iqbal joins us to discuss his study on using Amazon Smart Speakers for ad targeting. He gave interesting revelations about how voice data is captured and analysed for ad purposes. Listen to find out more.

Direct download: ad-targeting-in-amazon-smart-speakers.mp3
Category:general -- posted at: 6:14am PST

Rajan Udwani, an Assistant Professor at the University of California Berkeley joins us to discuss his work on AdWords with unknown budgets. He discussed the previous approaches to ad allocation, as well as his maiden approach that introduced randomization for better results. Listen for more.

Direct download: adwords-with-unknown-budgets.mp3
Category:general -- posted at: 6:00am PST

Today, we are joined by Piotr Niedźwiedź, Founder and CEO of Neptune.ai. Piotr discusses common MLOps activities by data science teams and how they can take advantage of Neptune.ai for better experiment tracking and efficiency. Listen for more!

Direct download: ml-ops-best-practices.mp3
Category:general -- posted at: 5:00am PST

Affiliate marketing creates an opportunity for marketers to gain a commission by promoting a product or service.  Cookies are typically used for tracking and the advertiser whose product or service is being featured pays the marketing only on transactions.

Today's episode covers those approaches and is also a story of conflict between two large companies and how one affiliate marketer got caught in the middle.

Direct download: affiliate-marketing-rabbithole.mp3
Category:general -- posted at: 5:46am PST

Cameron Ballard joins us today to discuss his work around YouTube conspiracy theories. He revealed interesting observations about conspiracy theories on YouTube including how predatory ads are most common in conspiracy theory videos and how YouTube’s algorithm subtly works for predatory ads. 

Direct download: monetization-of-youtube-conspiracy-theorists.mp3
Category:general -- posted at: 6:00am PST

Eric Zeng joins us to discuss his study around understanding bad ads and efforts that can be taken to limit bad ads online. He discussed how he and his co authors scrapped a large amount of ad data, applied a machine learning algorithm, and commensurate statistical results.

Direct download: user-perceptions-of-problematic-ads.mp3
Category:general -- posted at: 6:00am PST

NaLette Brodnax, a political scientist and an Assistant Professor in the McCourt School of Public Policy at Georgetown University joins us to discuss her work on analyzing digital advertisements for political campaigns. She used data for electoral campaigns on Facebook to answer questions that help us better understand how digital ads affect the outcome of elections.

 

Click here for additional show notes!

Thanks to our sponsor!
https://neptune.ai/ Log, store, query, display, organize and compare all your model metadata in a single place

Direct download: political-digital-advertising-analysis.mp3
Category:general -- posted at: 11:15am PST

Direct download: fraud-detection-in-crowdfunding-campaigns.mp3
Category:general -- posted at: 8:13am PST

Direct download: artificial-intelligence-and-auction-design.mp3
Category:general -- posted at: 5:59am PST

Have you ever wondered what goes on under the hood when you accept a website’s cookies? Today, Maximilian Hils, a PhD student in Computer Science, at the University of Innsbruck, Austria, dissects the ad tech industry and the standards put in place to protect users’ data. He also shares his thoughts on the use of VPNs as well as other tools that help shield your data from prying eyes on the internet.

Click here for additional show notes

Thanks to our sponsor:
https://clear.ml/ ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML workflows at scale.

Direct download: privacy-preference-signals.mp3
Category:general -- posted at: 6:00am PST

Ravi Krishna joins us today to talk about his recent work on a differentiable NAS framework for ads CTR prediction. He discussed what CTR prediction is about and why his NAS framework helps in building neural networks for better ads recommendation. Listen to learn about methodology, related literature and his results.

Click for additional show notes

Thanks to our sponsor:
https://astrato.io Astrato is a modern BI and analytics platform built for the Snowflake Data Cloud. A next-generation live query data visualization and analytics solution, empowering everyone to make live data decisions.

Direct download: neural-architecture-search-for-ctr-prediction.mp3
Category:general -- posted at: 8:19am PST

Effectively managing a large budget of pay per click advertising demands software solutions. When spending multi-million dollar budgets on hundreds of thousands of keywords, an effective algorithmic strategy is required to optimize marketing objectives.

In this episode, Nathan Janos joins us to share insights from his work in the ad tech industry.

Click for additional show notes

Thanks to our sponsor!
https://wandb.com/ The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management.

Direct download: algorithmic-ppc-management.mp3
Category:general -- posted at: 3:10pm PST

Increasingly, people get most if not all of the information they consume online. Alongside the web sites, videos, apps, and other destinations, we’re consistently served advertisements alongside the organic content we search for or discover. Targetted ads make it possible for you to discover relevant new products you might otherwise not have heard about. Targetting can also open a pandora’s box of ethical considerations. Online advertising is a complex network of automated systems. Algorithms controlling algorithms controlling what we see.

This season of Data Skeptic will focus on the applications of data science to digital advertising technology. In this first episode in particular, Kyle shares some of his own personal experiences and insights working in pay-per-click marketing.

Click for additional show notes

 

 

Direct download: ad-tech.mp3
Category:general -- posted at: 8:44pm PST

Our mobile phones generate an incredible amount of data inbound and outbound. In today’s episode, Nishant Kishore, a PhD graduate of Harvard University in Infectious Disease Epidemiology, explains how mobility data from mobile phones can be captured and analysed to understand the spread of infectious diseases.

Click here for additional show notes

Thanks to our sponsor!
https://neptune.ai/ Log, store, query, display, organize, and compare all your model metadata in a single place

Direct download: the-reliability-of-mobile-phone-data.mp3
Category:general -- posted at: 10:31pm PST

The pandemic changed how we lived. And this had a ripple effect on the performance of machine learning models. Ravi Parikh joins us today to discuss how the pandemic has affected the performance of machine learning models in clinical care and some actionable steps to fix it.

Click here for additional show notes

Thanks to our sponsor:
Astera Centerprise is a no-code data integration platform that allows users to build ETL/ELT pipelines for modern data warehousing and analytics.

Direct download: haywire-algorithms.mp3
Category:general -- posted at: 6:00am PST

Carly Lupton-Smith joins us today to speak about her research which investigated the consistency between household and county measures of school reopening. Carly is a doctoral researcher in Biostatistics at Johns Hopkins Bloomberg School of Public Health. Listen to know about her findings.

Click here for additional show notes on our website!

Thanks to our sponsor!
ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML workflows at scale.

Astera Centerprise is a no-code data integration platform that allows users to build ETL/ELT pipelines for modern data warehousing and analytics.

 

Direct download: school-reopening-analysis.mp3
Category:general -- posted at: 7:00am PST

Today, we are joined by Alexander Thor, a Product Manager at Vizlib, makers of Astrato. Astrato is a data analytics and business intelligence tool built on the cloud and for the cloud. Alexander discusses the features and capabilities of Astrato for data professionals.

Visit our website for additional show notes!

 

Direct download: modern-data-stacks.mp3
Category:general -- posted at: 7:00am PST

Emojis are arguably one of the most effective ways to express emotions when texting. In today’s episode, Xuan Lu shares her research on the use of emojis by developers. She explains how the study of emojis can track the emotions of remote workers and predict future behavior. Listen to find out more!

Direct download: emoji-as-a-predictor.mp3
Category:general -- posted at: 7:25am PST

On the show today, Fabian Braesemann, a research fellow at the University of Oxford, joins us to discuss his study analyzing the gig economy. He revealed the trends he discovered since remote work became mainstream, the factors causing spatial polarization and some downsides of the gig economy. Listen to learn what he found. 
Direct download: polarizing-trends-in-the-gig-economy.mp3
Category:general -- posted at: 6:39am PST

On the show today, we interview Mouhamed Abdulla, a professor of Electrical Engineering at Sheridan Institute of Technology. Mouhamed joins us to discuss his study on remote teaching and learning in applied engineering. He discusses how he embraced the new approach after the pandemic, the challenges he faced and how he tackled them. Listen to find out more.

Click here for additional show notes on our website!

Thanks to our sponsor!
https://neptune.ai/

Log, store, query, display, organize, and compare all your model metadata in a single place

 

Direct download: remote-learning-in-applied-engineering.mp3
Category:general -- posted at: 5:29am PST

It is difficult to estimate the effect on remote working across the board. Darja Šmite, who speaks with us today, is a professor of Software Engineering at the Blekinge Institute of Technology. In her recently published paper, she analyzed data on several companies' activities before and after remote working became prevalent. She discussed the results found, why they were and some subtle drawbacks of remote working. Check it out!

 

Click here for additional show notes on our website!

Direct download: remote-productivity.mp3
Category:general -- posted at: 5:45am PST

We explore this complex question in two interviews today.  First, Kasey Wagoner describes 3 approaches to remote lab sessions and an analysis of which was the most instrumental to students.  Second, Tahiya Chowdhury shares insights about the specific features of video-conferencing platforms that are lacking in comparison to in-person learning.

Click here for additional show notes on our website!

Thanks to our sponsor!
ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML workflows at scale.

 

Direct download: does-remote-learning-work.mp3
Category:general -- posted at: 6:00am PST

In this episode, we speak with Abdullah Kurkcu, a Lead Traffic Modeler. Abdullah joins us to discuss his recent study on the effect of COVID-19 on bicycle usage in the US. He walks us through the data gathering process, data preprocessing, feature engineering, and model building. Abdullah also disclosed his results and key takeaways from the study. Listen to find out more. 

Click here for additional show notes on our website.

Thanks to our sponsor!
Astrato is a modern BI and analytics platform built for the Snowflake  Data Cloud. A next-generation live query data visualization and analytics solution, empowering everyone to make live data decisions.

 

 

Direct download: covid-19-impact-on-bicycle-usage.mp3
Category:general -- posted at: 5:46am PST

Today, we are joined by Jennifer Jacobs and Nadya Peek, who discuss their experience in teaching remote classes for a course that is largely hands-on. The discussion was focused on digital fabrication, why it is important, the prospect for the future, the challenges with remote lectures, and everything in between.

Click here for additional show notes on our website!

Thanks to our sponsor!
https://neptune.ai/

Log, store, query, display, organize, and compare all your model metadata in a single place

Direct download: learning-digital-fabrication-remotely.mp3
Category:general -- posted at: 4:55am PST

Today, we are joined by Denae Ford, a Senior Researcher at Microsoft Research and an Affiliate Assistant Professor at the University of Washington. Denae discusses her work around remote work and its culminating impact on workers. She narrowed down her research to how COVID-19 has affected the working system of software engineers and the emerging challenges it brings.

 

 

Click here to access additional show notes on our website!

 

Thanks to our sponsor! 

Weights & Biases : The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management.

 

Direct download: remote-software-development.mp3
Category:general -- posted at: 9:49am PST

In this episode, we interview Jonas Landman, a Postdoc candidate at the University of Edinburg. Jonas discusses his study around quantum learning where he attempted to recreate the conventional k-means clustering algorithm and spectral clustering algorithm using quantum computing. 

Click here to access additional show notes on our website!

Direct download: quantum-k-means.mp3
Category:general -- posted at: 6:00am PST

K-means is widely used in real-life business problems. In this episode, Mujtaba Anwer, a researcher and Data Scientist walks us through some use cases of k-means. He also spoke extensively on how to prepare your data for clustering, find the best number of clusters to use, and turn the ‘abstract’ result into real business value. Listen to learn.  Click here to access additional show notes on our website! Thanks to our sponsor!
ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML workflows at scale.
Direct download: k-means-in-practice.mp3
Category:general -- posted at: 6:00am PST

Building a fair machine learning model has become a critical consideration in today’s world. In this episode, we speak with Anshuman Chabra, a Ph.D. candidate in Computer Networks. Chhabra joins us to discuss his research on building fair machine learning models and why it is important. Find out how he modeled the problem and the result found.

Click here to access additional show notes on our webiste!

Thanks to our sponsor!
https://astrato.io

Astrato is a modern BI and analytics platform built for the Snowflake Data Cloud. A next-generation live query data visualization and analytics solution, empowering everyone to make live data decisions.

Direct download: fair-hierarchical-clustering.mp3
Category:general -- posted at: 6:23am PST

Many people know K-means clustering as a powerful clustering technique but not all listeners will be as familiar with spectral clustering. In today’s episode, Sibylle Hess from the Data Mining group at TU Eindhoven joins us to discuss her work around spectral clustering and how its result could potentially cause a massive shift from the conventional neural networks. Listen to learn about her findings.

Visit our website for additional show notes

Thanks to our sponsor, Weights & Biases

Direct download: matrix-factorization-for-k-means.mp3
Category:general -- posted at: 6:00am PST

In this episode, we speak with Bernd Fritzke, a proficient financial expert and a Data Science researcher on his recent research - the breathing K-means algorithm. Bernd discussed the perks of the algorithms and what makes it stand out from other K-means variations. He extensively discussed the working principle of the algorithm and the subtle but impactful features that enables it produce top-notch results with low computational resources. Listen to learn about this algorithm.

Direct download: breathing-k-means.mp3
Category:general -- posted at: 6:00am PST

In today’s episode, Jason, an Assistant Professor of Statistical Science at Duke University talks about his research on K power means. K power means is a newly-developed algorithm by Jason and his team, that aims to solve the problem of local minima in classical K-means, without demanding heavy computational resources. Listen to find out the outcome of Jason's study.

Click here to access additional show notes on our website!

Thanks to our Sponsors:
ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML workflows at scale. https://clear.ml

Springboard
Springboard offers end-to-end online data career programs that encompass data science, data analytics, data engineering, and machine learning engineering.

Direct download: power-k-means.mp3
Category:general -- posted at: 6:00am PST

In this episode, Kyle interviews Lucas Murtinho about the paper "Shallow decision treees for explainable k-means clustering" about the use of decision trees to help explain the clustering partitions. 

Thanks to our Sponsors:
ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML workflows at scale.

Direct download: explainable-k-means.mp3
Category:general -- posted at: 6:17am PST