Archive for the ‘Artificial General Intelligence’ Category

HuggingGPT: The Secret Weapon to Solve Complex AI Tasks – KDnuggets

Have you heard of the term Artificial General Intelligence (AGI)? If not, let me clarify. AGI can be thought of as an AI system that can understand, process, and respond the intellectual tasks just like humans do. It's a challenging task that requires an in-depth understanding of how the human brain works so we can replicate it. However, the advent of ChatGPT has drawn immense interest from the research community to develop such systems. Microsoft has released one such key AI-powered system called HuggingGPT (Microsoft Jarvis). It is one of the most mind-blowing things that I have come across.

Before I dive into the details of what is new in HuggingGPT and how it works, let us first understand the issue with ChatGPT and why it struggles to solve complex AI tasks. Large Language models like ChatGPT excel at interpreting textual data and handling general tasks. However, they often struggle with specific tasks and may generate absurd responses. You might have encountered bogus replies from ChatGPT while solving complex mathematical problems. On the other side, we have expert AI models like Stable Diffusion, and DALL-E that have a deeper understanding of their subject area but struggle with the broader tasks. We cannot fully harness the potential of LLMs to solve challenging AI tasks unless we develop a connection between them and the Specialized AI models. This is what HuggingGPT did. It combined the strengths of both to create more efficient, accurate, and versatile AI systems.

According to a recent paper published by Microsoft, HuggingGPT leverages the power of LLMs by using it as a controller to connect them to various AI models in Machine Learning communities (HuggingFace). Rather than training the ChatGPT for various tasks, we enable it to use external tools for greater efficiency. HuggingFace is a website that provides numerous tools and resources for developers and researchers. It also has a wide variety of specialized and high-accuracy models. HuggingGPT uses these models for sophisticated AI tasks in different domains and modalities thereby achieving impressive results. It has similar multimodal capabilities to OPenAI GPT-4 when it comes to text and images. But, it also connected you to the Internet and you can provide an external web link to ask questions about it.

Suppose you want the model to generate an audio reading of the text written on an image. HuggingGPT will perform this task serially using the best-suited models. Firstly, it will generate the image from text and use its result for audio generation. You can check the response details in the image below. Simply Amazing!

HuggingGPT is a collaborative system that uses LLMs as an interface to send user requests to expert models. The complete process starting from the user prompt to the model till receiving the response can be broken down into the following discrete steps:

In this stage, HuggingGPT makes use of ChatGPT to understand the user prompt and then breaks down the query into small actionable tasks. It also determines the dependencies of these tasks and defines their execution sequence. HuggingGPT has four slots for task parsing i.e. task type, task ID, task dependencies, and task arguments. Chat logs between the HuggingGPT and the user are recorded and displayed on the screen that shows the history of the resources.

Based on the user context and the available models, HuggingGPT uses an in-context task-model assignment mechanism to select the most appropriate model for a particular task. According to this mechanism, the selection of a model is considered a single-choice problem and it initially filters out the model based on the type of the task. After that, the models are ranked based on the number of downloads as it is considered a reliable measure that reflects the quality of the model. Top-K models are selected based on this ranking. Here K is just a constant that reflects the number of models, for example, if it is set to 3 then it will select 3 models with the highest number of downloads.

Here the task is assigned to a specific model, it performs the inference on it and returns the result. To enhance the efficiency of this process, HuggingGPT can run different models at the same time as long as they dont need the same resources. For example, if I give a prompt to generate pictures of cats and dogs then separate models can run in parallel to execute this task. However, sometimes models may need the same resources which is why HuggingGPT maintains an attribute to keep the track of the resources. It ensures that the resources are being used effectively.

The final step involves generating the response to the user. Firstly, it integrates all the information from the previous stages and the inference results. The information is presented in a structured format. For example, if the prompt was to detect the number of lions in an image, it will draw the appropriate bounding boxes with detection probabilities. The LLM (ChatGPT) then uses this format and presents it in human-friendly language.

HuggingGPT is built on top of Hugging Face's state-of-the-art GPT-3.5 architecture, which is a deep neural network model that can generate natural language text. Here is how you can set it up on your local computer:

The default configuration requires Ubuntu 16.04 LTS, VRAM of at least 24GB, RAM of at least 12GB (minimal), 16GB (standard), or 80GB (full), and disk space of at least 284 GB. Additionally, you'll need 42GB of space for damo-vilab/text-to-video-ms-1.7b, 126GB for ControlNet, 66GB for stable-diffusion-v1-5, and 50GB for other resources. For the "lite" configuration, you'll only need Ubuntu 16.04 LTS.

First, replace the OpenAI Key and the Hugging Face Token in the server/configs/config.default.yaml file with your keys. Alternatively, you can put them in the environment variables OPENAI_API_KEY and HUGGINGFACE_ACCESS_TOKEN, respectively

Run the following commands:

For Server:

Now you can access Jarvis' services by sending HTTP requests to the Web API endpoints. Send a request to :

The requests should be in JSON format and should include a list of messages that represent the user's inputs.

For Web:

For CLI:

Setting up Jarvis using CLI is quite simple. Just run the command mentioned below:

For Gradio:

Gradio demo is also being hosted on Hugging Face Space. You can experiment with it after entering the OPENAI_API_KEY and HUGGINGFACE_ACCESS_TOKEN.

To run it locally:

Note: In case of any issue please refer to the official Github Repo.

HuggingGPT also has certain limitations that I want to highlight here. For instance, the efficiency of the system is a major bottleneck and during all the stages mentioned earlier, HuggingGPT requires multiple interactions with LLMs. These interactions can lead to degraded user experience and increased latency. Similarly, the maximum context length is also limited by the number of allowed tokens. Another problem is the System's reliability, as the LLMs may misinterpret the prompt and generate a wrong sequence of tasks which in turn affects the whole process. Nonetheless, it has significant potential to solve complex AI tasks and is an excellent advancement toward AGI. Let's see in which direction this research leads us too. Thats a wrap, feel free to express your views in the comment section below.Kanwal Mehreen is an aspiring software developer with a keen interest in data science and applications of AI in medicine. Kanwal was selected as the Google Generation Scholar 2022 for the APAC region. Kanwal loves to share technical knowledge by writing articles on trending topics, and is passionate about improving the representation of women in tech industry.

Continued here:

HuggingGPT: The Secret Weapon to Solve Complex AI Tasks - KDnuggets

ChatGPT and its ilk are still "fake" intelligence – Salon

While the conversations they produce may seem astounding, ChatGPT, Bard and other contemporary "AI" chatbots are not magic. Rather,they are merely examples of what computer scientists call "large language models," or LLMs. Essentially, LLMs are trained to identify likely sequences of words, then generate coherent and convincing text.

In practice, ChatGPT and its ilk are really, really good at carrying on conversations and giving coherent, generally correct, responses to a wide variety of questions. Their uncanny text-generating prowess gives a beguiling illusion of intelligence that is convincing a growing number of people that LLMs now possess or soon will possess artificial general intelligence the ability to do any intellectual task that humans can do, and the holy grail of artificial intelligence research. Those convinced include Ian Hoagarth, who penned a recent op-ed for the Financial Times titled"We must slow down the race to God-like AI."

ChatGPT's sentences are sometimes irrelevant, sometimes confusing, and sometimes incorrect but always based on nothing more than statistically-identified word patterns.

Such concerns are almost certainly premature. That's because LLMs are still inexorably hobbled by the fact that they do not even attempt to understand what words or other data mean which is why they cannot be trusted to make predictions or decisions with important consequences. In other words, they are very complicated remixers and refashioners of existing text and data, but there is nothing underneath the hood.

In order to constitute real artificial intelligence, an LLM would have to pass Winograd schema challenges an updated version of the Turing Test, the challenge famously proposed by mathematician Alan Turing as a means of determining whether a computer was truly sentient. The Winograd schema challenge is a twist on that, designed for testing whether an artificial intelligence can understand language specifically, the identification of the antecedent of an ambiguous pronoun.

Indeed, LLMs still struggle with Winograd schema challenges even though dozens, if not hundreds, of such challenges are in the text databases LLMs train on. For example, here is one that I tested:

Gary:What does it refer to in this sentence? The trophy doesn't fit into the brown suitcase because it is too small.

ChatGPT:In the given sentence, "it" refers to the trophy.

Google's Bard also flopped:

Bard: In the sentence "The trophy doesn't fit into the brown suitcase because it is too small," the word "it" refers to the trophy. The sentence is saying that the trophy is too big to fit into the brown suitcase.

This sentence is an example of a Winograd Schema. A Winograd Schema is a type of natural language processing task that requires commonsense reasoning to answer. In this case, the commonsense reasoning required is that trophies are typically larger than suitcases, so if a trophy doesn't fit into a suitcase, it is because the trophy is too big, not because the suitcase is too small.

While it is interesting that the chatbot recognized that I was testing it with a Winograd Schema, it failed to actually exhibit "commonsense reasoning," as it claimed, and its conclusion is nonsense.

ChatGPT uses its vast database to identify related text that it can string together in prose that is grammatically correct, turgid, exceptionally verbose, and devoid of any understanding of the question or its answer.

Humans can easily answer such questions because, unlike computers, they know how words relate to the real world. To paraphrase Oren Etzioni, how can computers take over the world when they can't even figure out what "it" refers to in a sentence?

I want to focus here on a more complex example that reveals a great deal about the abilities and limitations of LLMs. Here's another interaction I had with these LLMs:

Gary: Suppose there is a wooden box with three cards inside. One card is red on both sides; one card is blue on both sides; and the third card is red on one side and green on the other side. If you pick a card randomly and look at a side randomly and see that it is red, what is the probability that the other side of the card is green?

This is a variation of the well-known Bertrand's box paradox and several related probability puzzles that are surely in the databases pillaged by ChatGPT, Bard, and otherLLMs.

ChatGPT gave a very long-winded answer, as it is prone to do, which I have broken into nine parts (Bard gave a shorter, but also incorrect, answer):

1. Let's first consider the three possible cards:

We know that one side of the card we picked is red. This eliminates the possibility of card 2, as both sides of that card are blue. Therefore, the card we picked is either card 1 or card 3.

The first bullet point of ChatGPT's answer is factually correct but quite pedantic, and doesn't really help us get to the correct answer. It is this kind of reasoning that leads many people to believe that the answer is 1/2 because there are two possibilities for the card we selected but ChatGPT doesn't draw that conclusion.

ChatGPT continued:

2. Now, let's consider the probability that the other side of the card is green, given that we know one side is red. If we picked card 1, then the other side must also be red, so the probability of the other side being green is 0. If we picked card 3, then the other side must be green, so the probability of the other side being green is 1.

3. Since we don't know whether we picked card 1 or card 3, we need to take into account the probability of picking each card. Since we picked a card randomly, each card has a probability of 1/3 of being picked. Therefore, the overall probability of picking card 1 and seeing a red side is (1/3)(1/2) = 1/6, while the probability of picking card 3 and seeing a red side is (1/3)(1) = 1/3.

Bullet point 2 is correct but, if spoken by a human, would seem like stalling.

Bullet point 3 starts in the right direction, but then mixes up cards 1 and 3.

4. Using Bayes' theorem, the probability that the other side of the card is green, given that we know one side is red, is:

P(green side | red side) = P(red side | green side) * P(green side) / P(red side)

5. We already know that P(red side) is the sum of the probabilities of seeing a red side for cards 1 and 3, which is 1/6 + 1/3 = 1/2. We also know that P(green side) is the probability of picking card 3, which is 1/3.

6. The only missing piece is P(red side | green side), the probability of seeing a red side given that the other side is green. Since we know that the other side of card 3 is green, this probability is 0.

Part 4 recognizes that Bayes' theorem can be used to solve the problem, no doubt based on the fact that, in its training data, Bayes' theorem is often used to solve these kinds of problems. However, the definitions of "green side" and "red side" are confusing. Better labels would be "back of a selected card is green" and "front of a selected card is red."

In part 5, ChatGPT incorrectly states that "P(green side) is the probability of picking card 3, which is 1/3." The correct probability that the back of a selected card is green is (1/3)(1/2) = 1/6.

Bullet point 6 is nonsense. The probability that the front of a selected card is red given that the back is green is 1, not 0.

7. Plugging these values into Bayes' theorem, we get:

P(green side | red side) = 0 * (1/3) / (1/2) = 0

8. Therefore, the probability that the other side of the card is green, given that we know one side is red, is 0.

9. In other words, seeing a red side does not provide any additional information about whether the other side is green, because the card could be either card 1 or card 3 with equal probability.

Part 7 plugs the (incorrectly derived) probabilities into Bayes' theorem. The correct answer is (1/6)(1)/(1/2) = 1/3.

Bullet point 8 asserts a conclusion that any semi-alert person would recognize as obviously wrong. The probability that the back of the selected card is green is surely not zero.

And part 9 adds gratuitous, contradictory explanations that contradict the zero-probability answer it just gave. Yet ChatGPT does not recognize that these are contradictions. If seeing a red side provides no additional information, then the probability that the red/green card was selected is 1/3. If the selected card can be either card 1 or card 3 with equal probability, then the probability that the red/green card was selected is 1/2.

How might an intelligent human answer the question? Very simply: using logical reasoning. There are three ways to choose a red side and, in one of these three cases, the other side is green. So, the probability is 1 in 3. An attractive, but incorrect, answer is that seeing the red side rules out Card 2 so the probability is 1 in 2.

What does ChatGPT do instead? It uses its vast database to identify related text that it can string together in prose that is grammatically correct, turgid, exceptionally verbose, and devoid of any understanding of the question or its answer. The sentences are sometimes irrelevant, sometimes confusing, and sometimes incorrect but always based on nothing more than statistically-identified word patterns. ChatGPT's zero-probability answer is clearly wrong, but it has no way of recognizing its error because it literally does not "know" what it is talking about.

There are consequences to the misconception that these chatbots are anywhere close to sentient: unjustly inflated stock prices, a general mania in the tech industry to advertise one's software as employing "AI," and a cultural obsession over the potential of what, as I've explained, is a glorified textual re-sequencer. But perhaps most importantly, the hype over LLMs threatens to create a new tech bubble which, when burst, will have the same sorrowful consequences as any other financial bubble.

Read more

about AI and chatbots

Follow this link:

ChatGPT and its ilk are still "fake" intelligence - Salon

Former OpenAI Researcher: Theres a 50% Chance AI Ends in ‘Catastrophe’ – Decrypt

A former key researcher at OpenAI believes there is a decent chance that artificial intelligence will take control of humanity and destroy it.

"I think maybe there's something like a 10-20% chance of AI takeover, [with] many [or] most humans dead, " Paul Christiano, who ran the language model alignment team at OpenAI, said on the Bankless podcast. "I take it quite seriously."

Christiano, who now heads the Alignment Research Center, a non-profit aimed at aligning AIs and machine learning systems with human interests, said that hes particularly worried about what happens when AIs reach the logical and creative capacity of a human being. "Overall, maybe we're talking about a 50/50 chance of catastrophe shortly after we have systems at the human level," he said.

Christiano is in good company. Recently scores of scientists around the world signed an online letter urging that OpenAI and other companies racing to build faster, smarter AIs, hit the pause button on development. Big wigs from Bill Gates to Elon Musk have expressed concern that, left unchecked, AI represents an obvious, existential danger to people.

Why would AI become evil? Fundamentally, for the same reason that a person does: training and life experience.

Like a baby, AI is trained by receiving mountains of data without really knowing what to do with it. It learns by trying to achieve certain goals with random actions and zeroes in on correct results, as defined by training.

So far, by immersing itself in data accrued on the internet, machine learning has enabled AIs to make huge leaps in stringing together well-structured, coherent responses to human queries. At the same time, the underlying computer processing that powers machine learning is getting faster, better, and more specialized. Some scientists believe that within a decade, that processing power, combined with artificial intelligence, will allow these machines to become sentient, like humans, and have a sense of self.

Thats when things get hairy. And its why many researchers argue that we need to figure out how to impose guardrails now, rather than later. As long as AI behavior is monitored, it can be controlled.

But if the coin lands on the other side, even OpenAIs co-founder says that things could get very, very bad.

This topic has been on the table for years. One of the most famous debates on the subject took place 11 years ago between AI researcher Eliezer Yudkowsky and the economist Robin Hanson. The two discussed the possibility of reaching foomwhich apparently stands for Fast Onset of Overwhelming Masterythe point at which AI becomes exponentially smarter than humans and capable of self improvement. (The derivation of the term foom is debatable.)

Eliezer and his acolytes believe its inevitable AIs will go 'foom' without warning, meaning, one day you build an AGI [artificial general intelligence] and hours or days later the thing has recursively self-improved into godlike intelligence and then eats the world. Is this realistic?" Perry Metzger, a computer scientist active in the AI community, tweeted recently.

Metzger argued that even when computer systems reach a level of human intelligence, theres still plenty of time to head off any bad outcomes. Is 'foom' logically possible? Maybe. Im not convinced," he said. "Is it real world possible? Im pretty sure no. Is long term deeply superhuman AI going to be a thing? Yes, but not a foom

Another prominent figure, Yann Le Cun, also raised his voice, claiming it is "utterly impossible," for humanity to experience an AI takeover. Lets hope so.

The rest is here:

Former OpenAI Researcher: Theres a 50% Chance AI Ends in 'Catastrophe' - Decrypt

OpenAI CTO Says AI Systems Should ‘Absolutely’ Be Regulated – Slashdot

Slashdot reader wiredmikey writes: Mira Murati, CTO of ChatGPT creator OpenAI, says artificial general intelligence (AGI) systems should be "absolutely" be regulated. In a recent interview, Murati said the company is constantly talking with governments and regulators and other organizations to agree on some level of standards. "We've done some work on that in the past couple of years with large language model developers in aligning on some basic safety standards for deployment of these models," Murati said. "But I think a lot more needs to happen. Government regulators should certainly be very involved." Murati specifically discussed OpenAI's approach to AGI with "human-level capability."OpenAI's specific vision around it is to build it safely and figure out how to build it in a way that's aligned with human intentions, so that the AI systems are doing the things that we want them to do, and that it maximally benefits as many people out there as possible, ideally everyone.

Q: Is there a path between products like GPT-4 and AGI?

A: We're far from the point of having a safe, reliable, aligned AGI system. Our path to getting there has a couple of important vectors. From a research standpoint, we're trying to build systems that have a robust understanding of the world similarly to how we do as humans. Systems like GPT-3 initially were trained only on text data, but our world is not only made of text, so we have images as well and then we started introducing other modalities.

The other angle has been scaling these systems to increase their generality. With GPT-4, we're dealing with a much more capable system, specifically from the angle of reasoning about things. This capability is key. If the model is smart enough to understand an ambiguous direction or a high-level direction, then you can figure out how to make it follow this direction. But if it doesn't even understand that high-level goal or high-level direction, it's much harder to align it. It's not enough to build this technology in a vacuum in a lab. We really need this contact with reality, with the real world, to see where are the weaknesses, where are the breakage points, and try to do so in a way that's controlled and low risk and get as much feedback as possible.

Q: What safety measures do you take?

A: We think about interventions at each stage. We redact certain data from the initial training on the model. With DALL-E, we wanted to reduce harmful bias issues we were seeing... In the model training, with ChatGPT in particular, we did reinforcement learning with human feedback to help the model get more aligned with human preferences. Basically what we're trying to do is amplify what's considered good behavior and then de-amplify what's considered bad behavior. One final quote from the interview: "Designing safety mechanisms in complex systems is hard... The safety mechanisms and coordination mechanisms in these AI systems and any complex technological system [are] difficult and require a lot of thought, exploration and coordination among players."

Read this article:

OpenAI CTO Says AI Systems Should 'Absolutely' Be Regulated - Slashdot

USC Cinematic Arts – University of Southern California

By Desa Philadelphia

Is AI Creative?That was the central question of discussion at a forum held at the USC School of Cinematic Arts (SCA) on Wednesday April 26, that brought together specialists in engineering, computer science, and filmmaking to talk about the capabilities, and limitations, of platforms like ChatGPT, Midjourney and DALL.E.The event, AI, Creativity & The Future of Film, was conceived by SCA alumnus Jon Dudkowski, a director and editor whose credits include Star Trek: Discovery, and Karim Jerbi, a Visiting Scholar at the Brain Imaging Group at USCs Ming Hsieh Institute, which is in the Department of Electrical Engineering at the USC Viterbi School of Engineering. Sponsored by Adobe, and presented as a joint effort between SCA and the USC Viterbi School of Engineering, the evening was an exercise in level-setting, to dispel myths about what AI is currently capable of creating. The answer? Nobodys job is in dangeryet.

The night began with a presentation by Yves Bergquist, Director of the AI & Blockchain in Media Projectat USCs Entertainment Technology Center at the School of Cinematic Arts, on the science behind the most popular emerging platforms. He explained the generative models at the heart of the technologiesfrom the Transformers like ChatGPT, which is able to sequence data to produce text for essays, prose, poetry, scripts etc.; through Diffusion models, like DALL.E, which add and then remove noise from existing images to create new ones; to efforts at integrating existing models. He then offered this definitive assessment of ChatGPT, the essay-writing bot that has been at the center of plagiarism concerns across the university: It is very good at writing bad and boring text. It is not going to be able to write a story. It is not going to be able to write a script. It does not understand the world at a level of symbolism, at a level of depth that we understand.

Jerbi took the standing-room-only audience through demonstrations of the kinds of experiments being done by researchers in Neuro-AI, a new field of inquiry that compares the brain activities of humans and machines that are performing the same tasks. The goal is to compare the biological networks of the brain with the artificial ones. We are seeing tremendous progress in AI but still far from human level intelligence, said Jerbi. Some things a toddler can do that the most advanced AI cant do. Jerbi however offered this discomforting fact. The next generation of AI, dubbed Artificial General Intelligence, is focused on closing that gap. The key word is General meaning the ability to apply instructions innovatively. Todays AI might use a hammer to just hit the nail it is instructed to pound; but generalized intelligence might then apply the hammer in breaking up rocks, without being told.Dudkowski then moderated a panel discussion in which Bergquist and Jerbi were joined by filmmakers Chad Nelson, whose film Critterz features characters created using DALL.E; Mary Sweeney, who produced and/or edited several David Lynch projects, and SCA alumna Athena Wickham, the executive producer of Westword and The Peripheral. They were also joined by William Swartout, Chief Technology Officer of the USC Institute for Creative Technologies.

No one on the panel yielded to any suggestion that AI, in and of itself, can be creative. Instead the consensus was that as a tool, it could facilitate faster iterations of works like script drafts, storyboards and production design. What excites me for myself is being able to use it like a tool to accelerate the process and to see what you have and dont have more quickly and inexpensively, said Wickham. What scares me is people getting lazy with it. I do worry that Im going to start getting a lot of scripts and pitches that feel like someone hasnt taken the time to edit it and put their own spin on it and thats going to piss me off.

Nelson concurred: I personally havent seen an AI image where I think thats all that needs to happen, its done. It doesnt know good from bad. Someone still has to say thats good.Sweeney worried that AI platforms will encourage more of the kind of device addiction that has been linked to depression in young people. But she described her approach as cautiously curious and compared new approaches to the shift from analog to digital film editing. Im always interested in new tools.

Essentially reading the room, Swartout acknowledged the attention AI platforms have been receiving in the press lately, and succinctly summarized the state of AI creativity at this moment: In the popular mind we are going to think we are much further than we are.

Originally posted here:

USC Cinematic Arts - University of Southern California