Privacy concerns grow after AI chatbots give out people's real phone numbers
By CBS News
Key Concepts
- Generative AI (GenAI): AI systems capable of generating text or content based on massive datasets.
- Personally Identifiable Information (PII): Data that can be used to identify a specific individual, such as phone numbers.
- Data Brokers: Entities that collect and sell personal information, which is then scraped for AI training.
- Data Scraping: The process of harvesting large volumes of information from the internet to train AI models.
- Consent: The ethical and legal issue regarding the use of personal data for AI training without the individual's permission.
The Growing Privacy Crisis: AI and Personal Data Exposure
Recent investigations, notably by the MIT Technology Review, have revealed that generative AI chatbots are surfacing private phone numbers. This phenomenon highlights a significant gap in data privacy protections as AI models become more integrated into daily digital interactions.
Mechanisms of Data Exposure
According to investigative tech reporter Eileen Quo, the exposure of personal phone numbers occurs primarily through two channels:
- Direct Queries: Users are discovering their own numbers, or those of colleagues and friends, by explicitly asking the AI for information about specific individuals.
- Erroneous Output: Chatbots are becoming "confused" during interactions, inadvertently providing a user's private number when attempting to retrieve or generate contact information for others, leading to unexpected spam-like calls.
Scale and Impact
While the exact frequency of these incidents is difficult to quantify due to the opaque nature of AI training, there is clear evidence of a surge in public concern. DeleteMe, a service specializing in removing personal information from the internet, reported a 400% increase in customer inquiries related to AI-driven data exposure.
The Root Cause: Training Data
The core issue lies in the methodology used to build these AI systems. Chatbots are trained on vast, indiscriminate datasets scraped from the internet. This training data frequently includes:
- PII harvested from public records.
- Information aggregated and sold by data brokers.
- Data from search sites that specialize in indexing personal profiles.
Mitigation and Remediation Challenges
Eileen Quo emphasizes that there is no "silver bullet" for this problem. Because the training data has already been ingested by the models, the information is effectively "baked in."
Actionable Steps for Risk Reduction:
- Data Scrubbing: Individuals should proactively remove their personal information from the internet.
- Regional Tools: Utilize specialized resources, such as the California state portal, which allows residents to request the removal of their personal information from data broker sites with a single click.
- User Awareness: Exercise caution regarding the information shared during interactions with chatbots, as these systems may retain or process input data in ways that compromise privacy.
Ethical and Legal Implications
The fundamental argument presented is that these AI systems operate without the consent of the individuals whose data they utilize. Because the AI retrieves this information from existing online sources rather than from the user's current device or session, it creates a systemic privacy violation where individuals have little control over how their digital footprint is repurposed by third-party AI developers.
Conclusion
The surfacing of private phone numbers by AI chatbots is a symptom of a broader issue regarding the lack of consent and control over personal data in the age of generative AI. While users can take steps to minimize their digital footprint, the responsibility largely rests on the developers of these models to address the inclusion of PII in their training sets. As it stands, the current architecture of these systems makes total prevention of such leaks nearly impossible for the average user.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.