Data Skeptic

In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares insights on why TikTok's algorithm works so well (clean data and massive interaction volume), the crucial role of homepage curation, and how human curators help by contextualizing content, cleaning data, and identifying positive feedback loops that algorithms might miss.

The conversation covers practical challenges like measuring "surprise and delight," the content deluge created by democratized creation tools, and why trust in tech companies is essential for better personalization. Cory emphasizes that discovery is "a good type of friction" and explains how the CODE framework (Capture, Organize, Distill, Express, plus Analysis) guides professional curation work. Looking to the future, they discuss the need for systems thinking that creates narrative connections between content, the potential for conversational AI to help users articulate preferences, and why diverse perspectives beyond engineering are crucial for building effective discovery systems. Resources mentioned include the newsletter "Top Information Retrieval Papers of the Week" and Notebook LM for synthesizing research.

 

Direct download: Cory_With_Ads_V1.mp3
Category:general -- posted at: 7:20am PST

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrates how eye tracking can uncover insights about positional bias and user engagement that traditional click data misses.

Beyond the technical aspects, Santiago addresses the ethical considerations surrounding eye tracking data, particularly concerning pupil data and privacy. He emphasizes the importance of questioning assumptions in recommender systems and shares practical advice for improving recommendation algorithms by understanding actual user behavior rather than relying solely on click patterns. Looking forward, Santiago discusses exciting future directions including simulating user behavior using eye tracking data, addressing the cold start problem, and translating these findings to e-commerce applications. This conversation challenges researchers and practitioners to think more deeply about de-biasing clicks and leveraging eye tracking as a powerful tool to enhance user experience in recommendation systems.

 

Direct download: Santiago_With_Ads_V2.mp3
Category:general -- posted at: 6:48am PST

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations.

Boya shares insights from her research on how recommender systems impact both consumers and content creators across e-commerce and social media platforms. We explore critical challenges like the cold start problem—how to make good recommendations for brand new users—and discuss how her approach uses demographic information to create informative priors that accelerate learning. The conversation also touches on algorithmic fairness, revealing how her method reduces bias between majority and minority (niche preference) users by incorporating active learning through bandit algorithms. Whether you're interested in the mathematics of recommendation engines or the broader implications for digital platforms, this episode offers a comprehensive look at the state-of-the-art in recommender system design.

Direct download: Boya_No_Ads_V1.mp3
Category:general -- posted at: 7:00am PST

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors.

Florian explains the technical challenges of building a recommender system for cultural heritage materials, including dealing with sparse user-item interaction matrices, the cold start problem, and the need for multi-modal similarity approaches that can handle text, images, metadata, and historical context. The platform leverages various embedding techniques and gives users control over weighting different modalities—whether they're searching based on text similarity, visual imagery, or diplomatic features like issuers and receivers. A key insight from Florian's research is the importance of balancing serendipity with utility, collection representation to prevent bias, and system explainability while maintaining effectiveness.

The discussion also touches on unique evaluation challenges in non-commercial recommendation contexts, including Florian's "research funnel" framework that considers discovery, interaction, integration, and impact stages. Looking ahead, Florian envisions recommendation systems becoming standard tools for exploration across digital archives and cultural heritage repositories throughout Europe, potentially transforming how researchers discover and engage with historical materials. The new version of Monasterium.net, set to launch with enhanced semantic search and recommendation features, represents an important step toward making cultural heritage more accessible and discoverable for everyone.

 

Direct download: Florian_With_Ads_V1.mp3
Category:general -- posted at: 8:50am PST

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Maria Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews. 

The conversation covers Alberto's research journey through knowledge graphs, graph-based recommenders, privacy considerations, and recommendation novelty. He explains why small modifications in datasets can significantly impact research outcomes, the importance of offline evaluation, and DataRec's vision as a lightweight library that integrates with existing frameworks rather than replacing them. Whether you're benchmarking new algorithms or exploring recommendation techniques, this episode offers practical insights into one of the most critical yet overlooked aspects of reproducible ML research.

Direct download: Alberto_With_Ads_V1.mp3
Category:general -- posted at: 1:41pm PST

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a friend's ska band on Spotify to inflating product ratings on e-commerce platforms, shilling attacks represent a significant threat in an industry where approximately 4% of reviews are fake, translating to $800 billion in annual sales in the US alone.

The discussion delves deep into collaborative filtering, explaining both user-user and item-item approaches that create similarity matrices to predict user preferences. However, these systems face various shilling attacks of increasing sophistication: random attacks use minimal information with average ratings, while segmented attacks strategically target popular items (like Taylor Swift albums) to build credibility before promoting target items. Bandwagon attacks focus on highly popular items to connect with genuine users, and average attacks leverage item rating knowledge to appear authentic. User-user collaborative filtering proves particularly vulnerable, requiring as few as 500 fake profiles to impact recommendations, while item-item filtering demands significantly more resources. Aditya addresses detection through machine learning techniques that analyze behavioral patterns using methods like PCA to identify profiles with unusually high correlation and suspicious rating consistency. However, this remains an evolving challenge as attackers adapt strategies, now using large language models to generate more authentic-seeming fake reviews. His research with the MovieLens dataset tested detection algorithms against synthetic attacks, highlighting how these concerns extend to modern e-commerce systems. While companies rarely share attack and detection data publicly to avoid giving attackers advantages, academic research continues advancing both offensive and defensive strategies in recommender systems security.

Direct download: Aditya_With_Ads_Ad_segment_1_V1.mp3
Category:general -- posted at: 6:11am PST

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start problem.

A significant contribution of Rebecca's work is the Music Semantics dataset, created by scraping Reddit discussions to capture how people naturally describe music using atmospheric qualities, contextual comparisons, and situational associations rather than just technical features. This dataset, available on Hugging Face, enables more nuanced recommendation systems that better understand user preferences and support niche tastes. Her research utilizes industry datasets including Last.fm and Spotify's Million Playlist Dataset, and points toward exciting future applications in music generation and multimodal systems that combine audio, text, and video.

 

Direct download: rebecca-with-ads.mp3
Category:general -- posted at: 7:00am PST

Direct download: Vaclav_No_Ads_V1.mp3
Category:general -- posted at: 8:33am PST

In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable tourism practices through innovative approaches to data acquisition and algorithm design.  Key highlights include leveraging large language models for synthetic data generation, developing recommendation architectures that balance user satisfaction with environmental concerns, and creating frameworks that distribute tourism more equitably across destinations. Ashmi's insights offer valuable perspectives for both AI researchers and tourism industry professionals seeking to implement more responsible recommendation technologies.

Direct download: Ashmi_with_Ads_V2.mp3
Category:general -- posted at: 6:09am PST

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich interviews Dr. Kunal Mukherjee, a postdoctoral research associate at Virginia Tech, about the paper "Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations"

The discussion explores how the post-COVID real estate landscape has created a need for better recommendation systems that can introduce home buyers to emerging neighborhoods they might not know about.  Dr. Mukherjee, explains how his team developed a graph neural network approach that not only recommends properties but provides human-interpretable explanations for why certain regions are suggested. The conversation covers the advantages of using graph-based models over traditional recommendation systems, the importance of regional context in real estate features, and how co-click data from similar users can create more effective recommendations.

Key topics include the distinction between model developer explanations and end-user explanations, the challenges of feature perturbation in recommendation systems, and how graph neural networks can discover novel pathways to emerging real estate markets that traditional models might miss.

Direct download: Kunal_With_Ads.mp3
Category:general -- posted at: 2:45pm PST

In this episode of Data Skeptic, we explore the challenges of studying social media recommender systems when exposure data isn't accessible. Our guests Sabrina Guidotti, Gregor Donabauer, and Dimitri Ognibene introduce their innovative "recommender neutral user model" for inferring the influence of opaque algorithms.

Direct download: Dimitri_Gregor_Sabrina_Podcast_with_Ads.mp3
Category:general -- posted at: 1:11pm PST

In this episode of Data Skeptic, we dive into eco-friendly AI with Antonio Purificato, a PhD student from Sapienza University of Rome. Antonio discusses his research on "EcoAware Graph Neural Networks for Sustainable Recommendations" and explores how we can measure and reduce the environmental impact of recommender systems without sacrificing performance.

Direct download: Antonio_Podcast_with_Ad_Mixdown.mp3
Category:data science -- posted at: 2:00pm PST

Kyle reveals the next season's topic will be "Recommender Systems".  Asaf shares insights on how network science contributes to the recommender system field.

Direct download: networks-and-recommender-systems.mp3
Category:general -- posted at: 3:17pm PST

Kyle and Asaf discuss a project in which we link former guests of the podcast based on their co-authorship of academic papers. 

Direct download: network-of-past-guests-collaborations.mp3
Category:general -- posted at: 3:59pm PST

In this episode, Professor Pål Grønås Drange from the University of Bergen, introduces the field of Parameterized Complexity - a powerful framework for tackling hard computational problems by focusing on specific structural aspects of the input. This framework allows researchers to solve NP-complete problems more efficiently when certain parameters, like the structure of the graph, are "well-behaved".

At the center of the discussion is the network diversion problem, where the goal isn’t to block all routes between two points in a network, but to force flow - such as traffic, electricity, or data - through a specific path. While this problem appears deceptively similar to the classic "Min.Cut/Max.Flow" algorithm, it turns out to be much harder and, in general, its complexity is still unknown. Parameterized complexity plays a key role here by offering ways to make the problem tractable under constraints like low treewidth or planarity, which often exist in real-world networks like road systems or utility grids.

Listeners will learn how vulnerability measures help identify weak points in networks, such as geopolitical infrastructure (e.g., gas pipelines like Nord Stream).

Follow out guest: Pål Grønås Drange

Direct download: the-network-diversion-problem.mp3
Category:general -- posted at: 4:13pm PST

In this episode, we learn why simply analyzing the structure of a network is not enough, and how the dynamics - the actual mechanisms of interaction between components - can drastically change how information or influence spreads.  Our guest, Professor Baruch Barzel of Bar-Ilan University, is a leading researcher in network dynamics and complex systems ranging from biology to infrastructure and beyond. 

BarzelLab

BarzelLab on Youtube

Paper in focus: Universality in network dynamics, 2013

Direct download: complex-dynamics-in-networks.mp3
Category:general -- posted at: 9:53am PST

In this episode we'll discuss how to use Github data as a network to extract insights about teamwork.

Our guest, Gabriel Ramirez, manager of the notifications team at GitHub, will show how to apply network analysis to better understand and improve collaboration within his engineering team by analyzing GitHub metadata - such as pull requests, issues, and discussions - as a bipartite graph of people and projects.

Some insights we'll discuss are how network centrality measures (like eigenvector and betweenness centrality) reveal organizational dynamics, how vacation patterns influence team connectivity, and how decentralizing communication hubs can foster healthier collaboration. 

Gabriel’s open-source project, GH Graph Explorer, enables other managers and engineers to extract, visualize, and analyze their own GitHub activity using tools like Python, Neo4j, Gephi and LLMs for insight generation, but always remember – don't take the results on face value. Instead, use the results to guide your qualitative investigation. 

Direct download: github-network-analysis.mp3
Category:general -- posted at: 8:41pm PST

In this episode, Kyle does an overview of the intersection of graph theory and computational complexity theory.  In complexity theory, we are about the runtime of an algorithm based on its input size.  For many graph problems, the interesting questions we want to ask take longer and longer to answer!  This episode provides the fundamental vocabulary and signposts along the path of exploring the intersection of graph theory and computational complexity theory.

Direct download: networks-and-complexity.mp3
Category:general -- posted at: 6:05pm PST

How to build artificial intelligence systems that understand cause and effect, moving beyond simple correlations?

As we all know, correlation is not causation. "Spurious correlations" can show, for example, how rising ice cream sales might statistically link to more drownings, not because one causes the other, but due to an unobserved common cause like warm weather.

Our guest, Utkarshani Jaimini, a researcher from the University of South Carolina's Artificial Intelligence Institute, tries to tackle this problem by using knowledge graphs that incorporate domain expertise. 

Knowledge graphs (structured representations of information) are combined with neural networks in the field of neurosymbolic AI to represent and reason about complex relationships. This involves creating causal ontologies, incorporating the "weight" or strength of causal relationships and hyperrelations. This field has many practical applications such as for AI explainability, healthcare and autonomous driving.

Follow our guest

Utkarshani Jaimini's Webpage

Linkedin

Papers in focus

CausalLP: Learning causal relations with weighted knowledge graph link prediction, 2024

HyperCausalLP: Causal Link Prediction using Hyper-Relational Knowledge Graph, 2024

 

Direct download: graphs-for-causal-ai.mp3
Category:general -- posted at: 8:21am PST

Direct download: power-networks.mp3
Category:general -- posted at: 4:52pm PST

Direct download: unveiling-graph-datasets.mp3
Category:general -- posted at: 4:15pm PST

In this episode we talk with Manita Pote, a PhD student at Indiana University Bloomington, specializing in online trust and safety, with a focus on detecting coordinated manipulation campaigns on social media. 

Key insights include how coordinated reply attacks target influential figures like journalists and politicians, how machine learning models can detect these inauthentic campaigns using structural and behavioral features, and how deletion patterns reveal efforts to evade moderation or manipulate engagement metrics.

Follow our guest

X/Twitter
Google Scholar

Papers in focus
Coordinated Reply Attacks in Influence Operations: Characterization and Detection ,2025
Manipulating Twitter through Deletions,2022

Direct download: network-manipulation.mp3
Category:general -- posted at: 5:52pm PST

Kyle discusses the history and proof for the small world hypothesis.

Direct download: small-world-networks.mp3
Category:general -- posted at: 10:01pm PST

Kyle asks Asaf questions about the new network science course he is now teaching.  The conversation delves into topics such as contact tracing, tools for analyzing networks, example use cases, and the importance of thinking in networks.

Direct download: thinking-in-networks.mp3
Category:general -- posted at: 9:03am PST

In this episode we talk with Bavo DC Campo, a data scientist and statistician, who shares his expertise on the intersection of actuarial science, fraud detection, and social network analytics.

Together we will learn how to use graphs to fight against insurance fraud by uncovering hidden connections between fraudulent claims and bad actors.

Key insights include how social network analytics can detect fraud rings by mapping relationships between policyholders, claims, and service providers, and how the BiRank algorithm, inspired by Google’s PageRank, helps rank suspicious claims based on network structure.

Bavo will also present his iFraud simulator that can be used to model fraudulent networks for detection training purposes.

Do you have a question about fraud detection? Bavo says he will gladly help. Feel free to contact him.  

-------------------------------

Want to listen ad-free?  Try our Graphs Course?  Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

Direct download: fraud-networks.mp3
Category:general -- posted at: 3:47pm PST

In this episode we talk with Justin Wang Ngai Yeung, a PhD candidate at the Network Science Institute at Northeastern University in London, who explores how network science helps uncover criminal networks.

Justin is also a member of the organizing committee of the satellite conference dealing with criminal networks at the network science conference in The Netherlands in June 2025.

Listeners will learn how graph-based models assist law enforcement in analyzing missing data, identifying key figures in criminal organizations, and improving intervention strategies.

Key insights include the challenges of incomplete and inaccurate data in criminal network analysis, how law enforcement agencies use network dismantling techniques to disrupt organized crime, and the role of machine learning in predicting hidden connections within illicit networks.

 

-------------------------------

Want to listen ad-free?  Try our Graphs Course?  Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

Direct download: criminal-networks.mp3
Category:general -- posted at: 8:00am PST

In this episode today’s guest is Celine Wüst, a master’s student at ETH Zurich specializing in secure and reliable systems, shares her work on automated software testing for graph databases. Celine shows how fuzzing—the process of automatically generating complex queries—helps uncover hidden bugs in graph database management systems like Neo4j, FalconDB, and Apache AGE.

Key insights include how state-aware query generation can detect critical issues like buffer overflows and crashes, the challenges of debugging complex database behaviors, and the importance of security-focused software testing.

We'll also find out which Graph DB company offers swag for finding bugs in its software and get Celine's advice about which graph DB to use.

-------------------------------

Want to listen ad-free?  Try our Graphs Course?  Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

Direct download: graph-bugs.mp3
Category:general -- posted at: 8:00am PST

In this episode, Gabriel Petrescu, an organizational network analyst, discusses how network science can provide deep insights into organizational structures using OrgXO, a tool that maps companies as networks rather than rigid hierarchies. Listeners will learn how analyzing workplace collaboration networks can reveal hidden influencers, organizational bottlenecks, and engagement levels, offering a data-driven approach to improving effectiveness and resilience.

Key insights include how companies can identify overburdened employees, address silos between departments, and detect vulnerabilities where too few individuals hold critical knowledge. Real-life applications range from mergers and acquisitions, where network analysis helps assess company dynamics before an acquisition, to restructuring efforts that improve workflow and team collaboration.

Gabriel’s work highlights how organizations can shift from traditional hierarchical thinking to a network-based perspective, leading to smarter decision-making and more adaptable companies.

Direct download: organizational-network-analysis.mp3
Category:general -- posted at: 7:00am PST

Is it better to have your work team fully connected or sparsely connected?


In this episode we'll try to answer this question and more with our guest Hiroki Sayama, a SUNY Distinguished Professor and director of the Center for Complex Systems at Binghamton University.


Hiroki delves into the applications of network science in organizational structures and innovation dynamics by showing his recent work of extracting network structures from organizational charts to enable insights into decision-making and performance, He'll also cover how network connectivity impacts team creativity and innovation.


Key insights include how the structure of organizational networks—such as the depth of hierarchy or proximity to leadership—can influence corporate performance and how sparse network connectivity fosters more diverse and innovative ideas than fully connected networks.

Direct download: organizational-networks.mp3
Category:general -- posted at: 5:00am PST

A man goes into a bar… This is the beginning of a riddle that our guest, Yoed Kennet, an assistant professor at the Technion's Faculty of Data and Decision Sciences, uses to measure creativity in subjects.

In our talk, Yoed speaks about how to combine cognitive science and network science to explore the complexities and decode the mysteries of the human mind.

The listeners will learn how network science provides tools to map and analyze human memory, revealing how problem-solving and creativity emerge from changes in semantic memory structures.

Key insights include the role of memory restructuring during moments of insight, the connection between semantic networks and creative thinking, and how understanding these processes can improve problem-solving and analogical reasoning.

Real-life applications span enhancing creativity in the workplace, building tools to combat cognitive rigidity in aging, and improving learning strategies by fostering richer, more flexible mental networks.

-------------------------------

Want to listen ad-free?  Try our Graphs Course?  Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

Direct download: networks-of-the-mind.mp3
Category:general -- posted at: 12:13pm PST

In this episode, Garima Agrawal, a senior researcher and AI consultant, brings her years of experience in data science and artificial intelligence. Listeners will learn about the evolving role of knowledge graphs in augmenting large language models (LLMs) for domain-specific tasks and how these tools can mitigate issues like hallucination in AI systems.

Key insights include how LLMs can leverage knowledge graphs to improve accuracy by integrating domain expertise, reducing hallucinations, and enabling better reasoning.

Real-life applications discussed range from enhancing customer support systems with efficient FAQ retrieval to creating smarter AI-driven decision-making pipelines.

Garima’s work highlights how blending static knowledge representation with dynamic AI models can lead to cost-effective, scalable, and human-centered AI solutions.

-------------------------------

Want to listen ad-free?  Try our Graphs Course?  Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

Direct download: llms-and-graphs-synergy.mp3
Category:general -- posted at: 9:41am PST

In this episode, Bnaya Gross, a Fulbright postdoctoral fellow at the Center for Complex Network Research at Northwestern University, explores the transformative applications of network science in fields ranging from infrastructure to medicine, by studying the interactions between networks ("a network of networks").

Listeners will learn how interdependent networks provide a framework for understanding cascading failures, such as power outages, and how these insights transfer to physical systems like superconducting materials and biological networks.

Key takeaways include understanding how dependencies between networks can amplify vulnerabilities, applying these principles to create resilient infrastructure systems, and using network medicine to uncover relationships between diseases, potential drug repurposing and the process of aging.

-------------------------------

Want to listen ad-free?  Try our Graphs Course?  Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

Direct download: a-network-of-networks-libsyn-sucks.mp3
Category:general -- posted at: 6:23pm PST

Our guests, Erwan Le Merrer and Gilles Tredan, are long-time collaborators in graph theory and distributed systems. They share their expertise on applying graph-based approaches to understanding both large language model (LLM) hallucinations and shadow banning on social media platforms.

In this episode, listeners will learn how graph structures and metrics can reveal patterns in algorithmic behavior and platform moderation practices.

Key insights include the use of graph theory to evaluate LLM outputs, uncovering patterns in hallucinated graphs that might hint at the underlying structure and training data of the models, and applying epidemic models to analyze the uneven spread of shadow banning on Twitter.

-------------------------------

Want to listen ad-free?  Try our Graphs Course?  Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

Direct download: auditing-llms-and-twitter.mp3
Category:general -- posted at: 10:53am PST

In this episode, Šimon Mandlík, a PhD candidate at the Czech Technical University will talk with us about leveraging machine learning and graph-based techniques for cybersecurity applications.

We'll learn how graphs are used to detect malicious activity in networks, such as identifying harmful domains and executable files by analyzing their relationships within vast datasets.

This will include the use of hierarchical multi-instance learning (HML) to represent JSON-based network activity as graphs and the advantages of analyzing connections between entities (like clients, domains etc.).

Our guest shows that while other graph methods (such as GNN or Label Propagation) lack in scalability or having trouble with heterogeneous graphs, his method can tackle them because of the "locality assumption" – fraud will be a local phenomenon in the graph – and by relying on this assumption, we can get faster and more accurate results.

-------------------------------

Want to listen ad-free?  Try our Graphs Course?  Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

Direct download: fraud-detection-with-graphs.mp3
Category:general -- posted at: 7:04pm PST

Thibaut Vidal, a professor at Polytechnique Montreal, specializes in leveraging advanced algorithms and machine learning to optimize supply chain operations.
In this episode, listeners will learn how graph-based approaches can transform supply chains by enabling more efficient routing, districting, and decision-making in complex logistical networks.

Key insights include the application of Graph Neural Networks to predict delivery costs, with potential to improve districting strategies for companies like UPS or Amazon and overcoming limitations of traditional heuristic methods.

Thibaut’s work underscores the potential for GNN to reduce costs, enhance operational efficiency, and provide better working conditions for teams through improved route familiarity and workload balance.

Direct download: optimizing-supply-chains-with-gnn.mp3
Category:general -- posted at: 7:34am PST

Our guest in this episode is David Tench, a Grace Hopper postdoctoral fellow at Lawrence Berkeley National Labs, who specializes in scalable graph algorithms and compression techniques to tackle massive datasets.


In this episode, we will learn how his techniques enable real-time analysis of large datasets, such as particle tracking in physics experiments or social network analysis, by reducing storage requirements while preserving critical structural properties.

David also challenges the common belief that giant graphs are sparse by pointing to a potential bias: Maybe because of the challenges that exist in analyzing large dense graphs, we only see datasets of sparse graphs? The truth is out there…

David encourages you to reach out to him if you have a large scale graph application that you don't currently have the capacity to deal with using your current methods and your current hardware. He promises to "look for the hammer that might help you with your nail".

Direct download: the-mystery-behind-large-graphs.mp3
Category:general -- posted at: 5:49pm PST

1