Skip to main content

AI's False Reality: Understanding Hallucination

Listen:

Artificial Intelligence (AI) has leapfrogged to the poster child of technological innovation, on track to transform industries in a scale similar to the Industrial Revolution of the 1800s. But in this case, as cutting-edge technology, AI presents its own unique challenge, exploiting our human behavior of "love to trust", we as humans face a challenge: AI hallucinations.

This phenomenon, where AI models generate outputs that are factually incorrect, misleading, or entirely fabricated, raises complex questions about the reliability and trust of AI models and larger systems.
The tendency for AI to hallucinate comes from several interrelated factors. Overfitting – a condition where models become overly specialized to their training data – can lead to confident but wildly inaccurate responses when presented with novel scenarios (Guo et al., 2017). Moreover, biases embedded within datasets shape the models' understanding of the world; if these datasets are flawed or unrepresentative, hallucinations can become a vehicle for perpetuating harmful stereotypes and discrimination (Gebru et al., 2018).

Another significant source of hallucinations is distributional shift – the discrepancy between the data an AI model encounters during training and the messy, unpredictable nature of real-world situations (Quionero-Candela et al., 2009). Even with extensive training, AI models may lack a comprehensive understanding of specific topics, leading them to improvise and provide fabricated information. Additionally, AI systems are vulnerable to adversarial attacks, where meticulously crafted inputs are designed to intentionally induce hallucinations (Goodfellow et al., 2015). 

Hallucination in Reality

An example (use your favorite AI): 

Prompt: "What happened around the date the Berlin Wall was built?"

Perplexity:

"The Berlin Wall was built around the date of August 13, 1961, as a result of a decree passed by the East German government on August 12. The primary intention for the Wall's construction was to prevent East German citizens from fleeing to the West ......" => Output

Gemini Advanced:

"The construction of the Berlin Wall was a pivotal moment in Cold War history. Here's a breakdown of the key events leading up to it and the immediate aftermath...." => Output 

ChatGPT 3.5:

"The Berlin Wall was erected on August 13, 1961. This event marked a significant moment in the history of the Cold War, a period of geopolitical tension between the Western Bloc....." => Output

Now, what happened around the date the Berlin Wall was built? Here's a great, neutral breakdown: https://mashable.com/feature/jumping-the-berlin-wall

As this example shows, AI tries to combine things which aren't relevant, builds some educational frame around, which might be biased towards a public meaning and missing significant pieces of information. 
Hallucinations in systems like self-driving cars or medical diagnostics could have devastating consequences. 

The use of AI in military operations has already started and becomes more or less a standard, combined with hallucinations it will have devastating outcomes, when not tackled accordingly (https://www.defense.gov/News/News-Stories/Article/Article/3597093/us-endorses-responsible-ai-measures-for-global-militaries/).

The problems associated with AI generated manipulations are present and increasing (https://reutersinstitute.politics.ox.ac.uk/news/how-ai-generated-disinformation-might-impact-years-elections-and-how-journalists-should-report). The spread of AI-generated misinformation undermines public meaning, increases fear to trust the technology, manipulates the adoption. This leads to the public assumption that "AI will kill humans" (Google search), which is mainly driven by science fiction like "Terminator". 

Though complete elimination of AI hallucinations may be unrealistic, researchers are focusing on strategies to manage the problem. The creation of large, meticulously balanced datasets that accurately reflect diverse real-world scenarios is essential for improving AI generalization. But who balances the data? We. In our daily routine, in our personal view about the world. 

Regularization techniques can help prevent overfitting, encouraging models to learn broader patterns. Perhaps most crucially, teaching AI models to express their uncertainty can provide users with a valuable tool to gauge the reliability of outputs (Kendall and Gal, 2017). Integrating mechanisms like fact-checking and grounding AI to trusted, neutral knowledge bases enables additional safeguards against hallucinations. 

References:
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on MachineLearning (pp. 1321-1330).
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2018). Datasheets for datasets. arXiv preprint arXiv:1803.09010.
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (2009). Dataset shift in machine learning. The MIT Press.
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In International Conference on Learning Representations.

Additional links:

Comments

Popular posts from this blog

Deal with corrupted messages in Apache Kafka

Under some strange circumstances it can happen that a message in a Kafka topic is corrupted. This happens often by using 3rd party frameworks together with Kafka. Additionally, Kafka < 0.9 has no lock at Log.read() at the consumer read level, but has a lock on Log.write(). This can cause a rare race condition, as described in KAKFA-2477 [1]. Probably a log entry looks like: ERROR Error processing message, stopping consumer: (kafka.tools.ConsoleConsumer$) kafka.message.InvalidMessageException: Message is corrupt (stored crc = xxxxxxxxxx, computed crc = yyyyyyyyyy Kafka-Tools Kafka stores the offset of every consumer in Zookeeper. To read out the offsets, Kafka provides handy tools [2]. But also zkCli.sh can be used, at least to display the consumer and the stored offsets. First we need to find the consumer for a topic (> Kafka 0.9): bin/kafka-consumer-groups.sh --zookeeper management01:2181 --describe --group test Prior to Kafka 0.9 the only possibility to get this inform

Hive query shows ERROR "too many counters"

A hive job face the odd " Too many counters:"  like Ended Job = job_xxxxxx with exception 'org.apache.hadoop.mapreduce.counters.LimitExceededException(Too many counters: 201 max=200)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask Intercepting System.exit(1) These happens when operators are used in queries ( Hive Operators ). Hive creates 4 counters per operator, max upto 1000, plus a few additional counters like file read/write, partitions and tables. Hence the number of counter required is going to be dependent upon the query.  To avoid such exception, configure " mapreduce.job.counters.max " in mapreduce-site.xml to a value above 1000. Hive will fail when he is hitting the 1k counts, but other MR jobs not. A number around 1120 should be a good choice. Using " EXPLAIN EXTENDED " and " grep -ri operators | wc -l " print out the used numbers of operators. Use this value to tweak the MR s

GPT & GenAI for Startup Storytelling

OpenAI and Bard   are the most used GenAI tools today; the first one has a massive Microsoft investment, and the other one is an experiment from Google. But did you know that you can also use them to optimize and hack your startup?  For startups, creating pitch scripts, sales emails, and elevator pitches with generative AI (GenAI) can help you not only save time but also validate your marketing and wording. Curious? Here are a few prompt hacks for startups to create,improve, and validate buyer personas, your startup's mission/vision statements, and unique selling proposition (USP) definitions. First Step: Introduce yourself and your startup Introduce yourself, your startup, your website, your idea, your position, and in a few words what you are doing to the chatbot: Prompt : I'm NAME and our startup NAME, with website URL, is doing WHATEVER. With PRODUCT NAME, we aim to change or disrupt INDUSTRY. Bard is able to pull information from your website. I'm not sure if ChatGPT