Archive for the ‘Ai’ Category

AI helps robots manipulate objects with their whole bodies – MIT News

Imagine you want to carry a large, heavy box up a flight of stairs. You might spread your fingers out and lift that box with both hands, then hold it on top of your forearms and balance it against your chest, using your whole body to manipulate the box.

Humans are generally good at whole-body manipulation, but robots struggle with such tasks. To the robot, each spot where the box could touch any point on the carriers fingers, arms, and torso represents a contact event that it must reason about. With billions of potential contact events, planning for this task quickly becomes intractable.

Now MIT researchers found a way to simplify this process, known as contact-rich manipulation planning. They use an AI technique called smoothing, which summarizes many contact events into a smaller number of decisions, to enable even a simple algorithm to quickly identify an effective manipulation plan for the robot.

While still in its early days, this method could potentially enable factories to use smaller, mobile robots that can manipulate objects with their entire arms or bodies, rather than large robotic arms that can only grasp using fingertips. This may help reduce energy consumption and drive down costs. In addition, this technique could be useful in robots sent on exploration missions to Mars or other solar system bodies, since they could adapt to the environment quickly using only an onboard computer.

Rather than thinking about this as a black-box system, if we can leverage the structure of these kinds of robotic systems using models, there is an opportunity to accelerate the whole procedure of trying to make these decisions and come up with contact-rich plans, says H.J. Terry Suh, an electrical engineering and computer science (EECS) graduate student and co-lead author of a paper on this technique.

Joining Suh on the paper are co-lead author Tao Pang PhD 23, a roboticist at Boston Dynamics AI Institute; Lujie Yang, an EECS graduate student; and senior author Russ Tedrake, the Toyota Professor of EECS, Aeronautics and Astronautics, and Mechanical Engineering, and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research appears this week in IEEE Transactions on Robotics.

Learning about learning

Reinforcement learning is a machine-learning technique where an agent, like a robot, learns to complete a task through trial and error with a reward for getting closer to a goal. Researchers say this type of learning takes a black-box approach because the system must learn everything about the world through trial and error.

It has been used effectively for contact-rich manipulation planning, where the robot seeks to learn the best way to move an object in a specified manner.

But because there may be billions of potential contact points that a robot must reason about when determining how to use its fingers, hands, arms, and body to interact with an object, this trial-and-error approach requires a great deal of computation.

Reinforcement learning may need to go through millions of years in simulation time to actually be able to learn a policy, Suh adds.

On the other hand, if researchers specifically design a physics-based model using their knowledge of the system and the task they want the robot to accomplish, that model incorporates structure about this world that makes it more efficient.

Yet physics-based approaches arent as effective as reinforcement learning when it comes to contact-rich manipulation planning Suh and Pang wondered why.

They conducted a detailed analysis and found that a technique known as smoothing enables reinforcement learning to perform so well.

Many of the decisions a robot could make when determining how to manipulate an object arent important in the grand scheme of things. For instance, each infinitesimal adjustment of one finger, whether or not it results in contact with the object, doesnt matter very much. Smoothing averages away many of those unimportant, intermediate decisions, leaving a few important ones.

Reinforcement learning performs smoothing implicitly by trying many contact points and then computing a weighted average of the results. Drawing on this insight, the MIT researchers designed a simple model that performs a similar type of smoothing, enabling it to focus on core robot-object interactions and predict long-term behavior. They showed that this approach could be just as effective as reinforcement learning at generating complex plans.

If you know a bit more about your problem, you can design more efficient algorithms, Pang says.

A winning combination

Even though smoothing greatly simplifies the decisions, searching through the remaining decisions can still be a difficult problem. So, the researchers combined their model with an algorithm that can rapidly and efficiently search through all possible decisions the robot could make.

With this combination, the computation time was cut down to about a minute on a standard laptop.

They first tested their approach in simulations where robotic hands were given tasks like moving a pen to a desired configuration, opening a door, or picking up a plate. In each instance, their model-based approach achieved the same performance as reinforcement learning, but in a fraction of the time. They saw similar results when they tested their model in hardware on real robotic arms.

The same ideas that enable whole-body manipulation also work for planning with dexterous, human-like hands. Previously, most researchers said that reinforcement learning was the only approach that scaled to dexterous hands, but Terry and Tao showed that by taking this key idea of (randomized) smoothing from reinforcement learning, they can make more traditional planning methods work extremely well, too, Tedrake says.

However, the model they developed relies on a simpler approximation of the real world, so it cannot handle very dynamic motions, such as objects falling. While effective for slower manipulation tasks, their approach cannot create a plan that would enable a robot to toss a can into a trash bin, for instance. In the future, the researchers plan to enhance their technique so it could tackle these highly dynamic motions.

If you study your models carefully and really understand the problem you are trying to solve, there are definitely some gains you can achieve. There are benefits to doing things that are beyond the black box, Suh says.

This work is funded, in part, by Amazon, MIT Lincoln Laboratory, the National Science Foundation, and the Ocado Group.

See more here:

AI helps robots manipulate objects with their whole bodies - MIT News

How to minimize data risk for generative AI and LLMs in the enterprise – VentureBeat

Head over to our on-demand library to view sessions from VB Transform 2023. Register Here

Enterprises have quickly recognized the power of generative AI to uncover new ideas and increase both developer and non-developer productivity. But pushing sensitive and proprietary data into publicly hosted large language models (LLMs) creates significant risks in security, privacy and governance. Businesses need to address these risks before they can start to see any benefit from these powerful new technologies.

As IDC notes, enterprises have legitimate concerns that LLMs may learn from their prompts and disclose proprietary information to other businesses that enter similar prompts. Businesses also worry that any sensitive data they share could be stored online and exposed to hackers or accidentally made public.

That makes feeding data and prompts into publicly hosted LLMs a nonstarter for most enterprises, especially those operating in regulated spaces. So, how can companies extract value from LLMs while sufficiently mitigating the risks?

Instead of sending your data out to an LLM, bring the LLM to your data. This is the model most enterprises will use to balance the need for innovation with the importance of keeping customer PII and other sensitive data secure. Most large businesses already maintain a strong security and governance boundary around their data, and they should host and deploy LLMs within that protected environment. This allows data teams to further develop and customize the LLM and employees to interact with it, all within the organizations existing security perimeter.

VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.

A strong AI strategy requires a strong data strategy to begin with. That means eliminating silos and establishing simple, consistent policies that allow teams to access the data they need within a strong security and governance posture. The end goal is to have actionable, trustworthy data that can be accessed easily to use with an LLM within a secure and governed environment.

LLMs trained on the entire web present more than just privacy challenges. Theyre prone to hallucinations and other inaccuracies and can reproduce biases and generate offensive responses that create further risk for businesses. Moreover, foundational LLMs have not been exposed to your organizations internal systems and data, meaning they cant answer questions specific to your business, your customers and possibly even your industry.

The answer is to extend and customize a model to make it smart about your own business. While hosted models like ChatGPT have gotten most of the attention, there is a long and growing list of LLMs that enterprises can download, customize, and use behind the firewall including open-source models like StarCoder from Hugging Face and StableLM from Stability AI. Tuning a foundational model on the entire web requires vast amounts of data and computing power, but as IDC notes, once a generative model is trained, it can be fine-tuned for a particular content domain with much less data.

An LLM doesnt need to be vast to be useful. Garbage in, garbage out is true for any AI model, and enterprises should customize models using internal data that they know they can trust and that will provide the insights they need. Your employees probably dont need to ask your LLM how to make a quiche or for Fathers Day gift ideas. But they may want to ask about sales in the Northwest region or the benefits a particular customers contract includes. Those answers will come from tuning the LLM on your own data in a secure and governed environment.

In addition to higher-quality results, optimizing LLMs for your organization can help reduce resource needs. Smaller models targeting specific use cases in the enterprise tend to require less compute power and smaller memory sizes than models built for general-purpose use cases or a large variety of enterprise use cases across different verticals and industries. Making LLMs more targeted for use cases in your organization will help you run LLMs in a more cost-effective, efficient way.

Tuning a model on your internal systems and data requires access to all the information that may be useful for that purpose, and much of this will be stored in formats besides text. About 80% of the worlds data is unstructured, including company data such as emails, images, contracts and training videos.

That requires technologies like natural language processing to extract information from unstructured sources and make it available to your data scientists so they can build and train multimodal AI models that can spot relationships between different types of data and surface these insights for your business.

This is a fast-moving area, and businesses must use caution with whatever approach they take to generative AI. That means reading the fine print about the models and services they use and working with reputable vendors that offer explicit guarantees about the models they provide. But its an area where companies cannot afford to stand still, and every business should be exploring how AI can disrupt its industry. Theres a balance that must be struck between risk and reward, and by bringing generative AI models close to your data and working within your existing security perimeter, youre more likely to reap the opportunities that this new technology brings.

Torsten Grabs is senior director of product management at Snowflake.

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even considercontributing an articleof your own!

Read More From DataDecisionMakers

Continue reading here:

How to minimize data risk for generative AI and LLMs in the enterprise - VentureBeat

AI and the law | | wvnews.com – WV News

State Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York North Carolina North Dakota Ohio Oklahoma Oregon Pennsylvania Rhode Island South Carolina South Dakota Tennessee Texas Utah Vermont Virginia Washington Washington D.C. West Virginia Wisconsin Wyoming Puerto Rico US Virgin Islands Armed Forces Americas Armed Forces Pacific Armed Forces Europe Northern Mariana Islands Marshall Islands American Samoa Federated States of Micronesia Guam Palau Alberta, Canada British Columbia, Canada Manitoba, Canada New Brunswick, Canada Newfoundland, Canada Nova Scotia, Canada Northwest Territories, Canada Nunavut, Canada Ontario, Canada Prince Edward Island, Canada Quebec, Canada Saskatchewan, Canada Yukon Territory, Canada

Zip Code

Country United States of America US Virgin Islands United States Minor Outlying Islands Canada Mexico, United Mexican States Bahamas, Commonwealth of the Cuba, Republic of Dominican Republic Haiti, Republic of Jamaica Afghanistan Albania, People's Socialist Republic of Algeria, People's Democratic Republic of American Samoa Andorra, Principality of Angola, Republic of Anguilla Antarctica (the territory South of 60 deg S) Antigua and Barbuda Argentina, Argentine Republic Armenia Aruba Australia, Commonwealth of Austria, Republic of Azerbaijan, Republic of Bahrain, Kingdom of Bangladesh, People's Republic of Barbados Belarus Belgium, Kingdom of Belize Benin, People's Republic of Bermuda Bhutan, Kingdom of Bolivia, Republic of Bosnia and Herzegovina Botswana, Republic of Bouvet Island (Bouvetoya) Brazil, Federative Republic of British Indian Ocean Territory (Chagos Archipelago) British Virgin Islands Brunei Darussalam Bulgaria, People's Republic of Burkina Faso Burundi, Republic of Cambodia, Kingdom of Cameroon, United Republic of Cape Verde, Republic of Cayman Islands Central African Republic Chad, Republic of Chile, Republic of China, People's Republic of Christmas Island Cocos (Keeling) Islands Colombia, Republic of Comoros, Union of the Congo, Democratic Republic of Congo, People's Republic of Cook Islands Costa Rica, Republic of Cote D'Ivoire, Ivory Coast, Republic of the Cyprus, Republic of Czech Republic Denmark, Kingdom of Djibouti, Republic of Dominica, Commonwealth of Ecuador, Republic of Egypt, Arab Republic of El Salvador, Republic of Equatorial Guinea, Republic of Eritrea Estonia Ethiopia Faeroe Islands Falkland Islands (Malvinas) Fiji, Republic of the Fiji Islands Finland, Republic of France, French Republic French Guiana French Polynesia French Southern Territories Gabon, Gabonese Republic Gambia, Republic of the Georgia Germany Ghana, Republic of Gibraltar Greece, Hellenic Republic Greenland Grenada Guadaloupe Guam Guatemala, Republic of Guinea, Revolutionary People's Rep'c of Guinea-Bissau, Republic of Guyana, Republic of Heard and McDonald Islands Holy See (Vatican City State) Honduras, Republic of Hong Kong, Special Administrative Region of China Hrvatska (Croatia) Hungary, Hungarian People's Republic Iceland, Republic of India, Republic of Indonesia, Republic of Iran, Islamic Republic of Iraq, Republic of Ireland Israel, State of Italy, Italian Republic Japan Jordan, Hashemite Kingdom of Kazakhstan, Republic of Kenya, Republic of Kiribati, Republic of Korea, Democratic People's Republic of Korea, Republic of Kuwait, State of Kyrgyz Republic Lao People's Democratic Republic Latvia Lebanon, Lebanese Republic Lesotho, Kingdom of Liberia, Republic of Libyan Arab Jamahiriya Liechtenstein, Principality of Lithuania Luxembourg, Grand Duchy of Macao, Special Administrative Region of China Macedonia, the former Yugoslav Republic of Madagascar, Republic of Malawi, Republic of Malaysia Maldives, Republic of Mali, Republic of Malta, Republic of Marshall Islands Martinique Mauritania, Islamic Republic of Mauritius Mayotte Micronesia, Federated States of Moldova, Republic of Monaco, Principality of Mongolia, Mongolian People's Republic Montserrat Morocco, Kingdom of Mozambique, People's Republic of Myanmar Namibia Nauru, Republic of Nepal, Kingdom of Netherlands Antilles Netherlands, Kingdom of the New Caledonia New Zealand Nicaragua, Republic of Niger, Republic of the Nigeria, Federal Republic of Niue, Republic of Norfolk Island Northern Mariana Islands Norway, Kingdom of Oman, Sultanate of Pakistan, Islamic Republic of Palau Palestinian Territory, Occupied Panama, Republic of Papua New Guinea Paraguay, Republic of Peru, Republic of Philippines, Republic of the Pitcairn Island Poland, Polish People's Republic Portugal, Portuguese Republic Puerto Rico Qatar, State of Reunion Romania, Socialist Republic of Russian Federation Rwanda, Rwandese Republic Samoa, Independent State of San Marino, Republic of Sao Tome and Principe, Democratic Republic of Saudi Arabia, Kingdom of Senegal, Republic of Serbia and Montenegro Seychelles, Republic of Sierra Leone, Republic of Singapore, Republic of Slovakia (Slovak Republic) Slovenia Solomon Islands Somalia, Somali Republic South Africa, Republic of South Georgia and the South Sandwich Islands Spain, Spanish State Sri Lanka, Democratic Socialist Republic of St. Helena St. Kitts and Nevis St. Lucia St. Pierre and Miquelon St. Vincent and the Grenadines Sudan, Democratic Republic of the Suriname, Republic of Svalbard & Jan Mayen Islands Swaziland, Kingdom of Sweden, Kingdom of Switzerland, Swiss Confederation Syrian Arab Republic Taiwan, Province of China Tajikistan Tanzania, United Republic of Thailand, Kingdom of Timor-Leste, Democratic Republic of Togo, Togolese Republic Tokelau (Tokelau Islands) Tonga, Kingdom of Trinidad and Tobago, Republic of Tunisia, Republic of Turkey, Republic of Turkmenistan Turks and Caicos Islands Tuvalu Uganda, Republic of Ukraine United Arab Emirates United Kingdom of Great Britain & N. Ireland Uruguay, Eastern Republic of Uzbekistan Vanuatu Venezuela, Bolivarian Republic of Viet Nam, Socialist Republic of Wallis and Futuna Islands Western Sahara Yemen Zambia, Republic of Zimbabwe

See original here:

AI and the law | | wvnews.com - WV News

Qualcomm’s ‘Holy Grail’: Generative AI Is Coming to Phones Soon – CNET

Generative AI like ChatGPT and Midjourney have dazzled imaginations and disrupted industries, but their debut has mostly been limited to browser windows on desktop computers. Next year, you'll be able to make use of generative AI on the go once premium phones launch with Qualcomm's top-tier chips inside.

Phones have used AI for years to touch up photos and improve autocorrect, but generative AI tools could bring the next level of enhancements to the mobile experience. Qualcomm is building generative AI into its next generation of premium chips, which are set to debut at its annual Qualcomm Summit in Hawaii in late October.

Summit attendees will get to experience firsthand what generative AI will bring to phones, but Qualcomm senior vice president of product management Ziad Asghar described to CNET why users should get excited for on-device AI. For one, having access to a user's data -- driving patterns, restaurant searches, photos and more -- all in one place will make solutions generated by AI in your phone much more customized and helpful than general responses from cloud-based generative AI.

"I think that's going to be the holy grail," Asghar said. "That's the true promise that makes us really excited about where this technology can go."

There are other advantages to having generative AI on-device. Most importantly, queries and personal data searched are kept private and not relayed through a distant server. Using local AI is also faster than waiting for cloud computation, and it can work while traveling on airplanes or in other areas that lack cell service.

But an on-device solution also makes business and efficiency sense. As machine learning models have gotten more complex (from hundreds of thousands of parameters to billions, Asghar said), it's more expensive to run servers answering queries, as Qualcomm explained in a white paper published last month. Back in April, OpenAI was estimated to spend around $700,000 per day getting ChatGPT to answer prompts, and that cost prediction was based on the older GPT-3 model, not the newer GPT-4 that is more complex and likely to be costlier to maintain at scale. Instead of needing an entire server farm, Qualcomm's solution is to have a device's existing silicon brain do all the thinking needed -- at no extra cost.

"Running AI on your phone is effectively free -- you paid for the computing power up front," Techsponential analyst Avi Greengart told CNET over email.

Greengart saw Qualcomm's on-device generative AI in action when the chipmaker had it on display at Mobile World Congress in February, using a Snapdragon 8 Gen 2-powered Android phone to run the image generating software Stable Diffusion. Despite being an early demo, he found it "tremendously exciting."

A Snapdragon 8 Gen 2 chipset.

Qualcomm has ideas for what people could do with phone-based generative AI, improving everything from productivity tasks to watching entertainment to creating content.

As the Stable Diffusion demo showcased, on-device generative AI could allow people to tweak images on command, like asking it to change the background to put you in front of the Venice canals, Asghar said. Or they could have it generate a completely new image -- but that's just the beginning, as text and visual large learning models could work in succession to flow from an idea to a ready output.

Using multiple models, Asghar said, a user could have their speech translated by automatic speech recognition into text that is then fed into an image generator. Take that a step further and have your phone render a person's face, which uses generative AI to make realistic mouth movements and text-to-speech to speak back to you, and boom, you've got a generative AI-powered virtual assistant you can have full conversations with.

This specific example could be powered in part by third-party AI, like Facebook parent company Meta's recently launched large language model Llama 2 in partnership with Microsoft as well as Qualcomm.

"[Llama 2] will allow customers, partners and developers to build use cases, such as intelligent virtual assistants, productivity applications, content creation tools, entertainment and more," Qualcomm said in a press release at the time. "These new on-device AI experiences, powered by Snapdragon, can work in areas with no connectivity or even in airplane mode."

Qualcomm won't limit these features to phones. At its upcoming summit, the company plans to announce generative AI solutions for PC and auto too. That personal assistant could help you with your to-do lists, schedule meetings and shoot off emails. If you're stuck outside the office and need to give a presentation, Asghar said, the AI could generate a new background so it doesn't look like you're sitting in your car and bring up a slide deck (or even help present it).

"For those of us who grew up watching Knight Rider, well, KITT is now going to be real," Asghar said, referring to the TV show's iconic smart car.

Regardless of the platform, the core generative AI solution will exist on-device. It could help with office busywork, like automatically generating notes from a call and creating a five-slide deck summarizing its key points ("This is like Clippy, but on steroids, right?" Asghar said). Or it could fabricate digital worlds from scratch in AR and VR.

Beyond fantasy worlds, generative AI could help blind people navigate the real world. Asghar described a situation where image-to-3D-image-to-text-to-speech model handoffs could use the phone's camera to recognize when a user is at an intersection and inform them when to stop, as well as how many cars are coming from which directions.

On the education front -- perhaps using a webcam or a phone's camera -- generative AI could gauge how well students are absorbing a teaching lesson, perhaps by tracking their expressions and body language. And then the generative AI could tailor the material to each student's strengths and weaknesses, Asghar theorized.

These are all Qualcomm's predictions, but third parties will have to decide how best to harness the technology to improve their own products and services. For phones, generative AI could have a real impact once it's integrated with mobile apps for more customized gaming experiences, social media and content creation, Techsponential's Greengart said.

It's hard to tell what that means for users until app makers have generative AI tech on hand to tinker and integrate into their apps. It's easier to extrapolate what it could do based on how AI helps people right now. Roger Entner, analyst for Recon Analytics, predicts that generative AI will help fix flaws in suboptimal photos, generate filters for social media, and refine autocorrect -- problems that exist right now.

"Generative AI here creates a quality of use improvement that soon we will take for granted," Entner told CNET over email.

A Snapdragon 8 Gen 2 encased in a red puck in front of a rig used to test chips in production.

Current generative AI solutions rely on big server farms to answer queries at scale, but Qualcomm is confident that its on-device silicon can handle single-user needs. In Asghar's labs, the company's chips handled AI models with 7 billion parameters (aspects that evaluate data and change the tone or accuracy of its output), which is far below the 175 billion parameters of OpenAI's GPT-3 model that powers ChatGPT, but should suit mobile searches.

"We will actually be able to show that running on the device at the [Hawaii] summit," Asghar said.

The demo device will likely pack Qualcomm's next top-tier chip, presumably the Snapdragon 8 Gen 3 that will end up in next year's premium Android phones. The demo device running Stable Diffusion at MWC 2023 used the Snapdragon 8 Gen 2 announced at last year's Snapdragon Summit in Hawaii.

In an era of phones barely lasting through the day before needing to recharge, there's also concern over whether summoning the generative AI genie throughout the day will drain your battery even faster. We'll have to wait for real-world tests to see how phones implement and optimize the technology, but Asghar pointed out that the MWC 2023 demo was running queries for attendees all day and didn't exhaust the battery or even warm to the touch. He believes Qualcomm's silicon is uniquely capable, with generative AI running mostly on a Snapdragon chipset's Hexagon processor and neural processing unit, with "very good power consumption."

"I think there is going to be concern for those who do not have dedicated pieces of hardware to do this processing," Asghar said.

Asghar believes that next year's premium Android phones powered with Qualcomm's silicon will be able to use generative AI. But it will take some time for that to trickle down to cheaper phones. Much like how on current phones AI assistance for cleaning up images, audio and video is best at the top of the lineup and gets less effective for cheaper phones, generative AI capabilities will be lesser (but still present) the further down you go in Qualcomm's chip catalog.

"Maybe you can do a 10-plus billion parameter model in the premium, and the tier below that might be lesser than that, if you're below that then it might be lesser than that," Asghar said. "So it will be a graceful degradation of those experiences, but they will extend into the other products as well."

As with 5G, Qualcomm may be first to a new technology with generative AI, but it won't be the last. Apple has quietly been improving its on-device AI, with senior vice president of software Craig Federighi noting in a post-Worldwide Developers Conference chat that they swapped in a more powerful transformer language model to improve autocorrect. Apple has even reportedly been testing its own "Apple GPT" chatbot internally. The tech giant is said to be developing its own framework to create large language models in order to compete in the AI space, which has heated up since OpenAI released ChatGPT to the public late in 2022.

Watch this: Comparing Bing Chat, Bard Chat and ChatGPT

Apple's AI could enter the race against Google's Bard AI and Microsoft's Bing AI, both of which have had limited releases this year for public testing. Those follow the more traditional "intelligent chatbot" model of generative AI enhancing software, but it's possible they'll arrive on phones through apps or be accessed through a web browser. Both Google and Microsoft are already integrating generative AI into their productivity platforms, so users will likely see their efforts first in mobile versions of Google Docs or Microsoft Office.

But for most phone owners, Qualcomm's chip-based generative AI could be the first impactful use of a new technology. We'll have to wait for the Snapdragon Summit to see how much our mobile experience may be changing as soon as next year.

See the original post:

Qualcomm's 'Holy Grail': Generative AI Is Coming to Phones Soon - CNET

The AI Tools Making Images Look Better – Quanta Magazine

Its one of the biggest cliches in crime and science fiction: An investigator pulls up a blurry photo on a computer screen and asks for it to be enhanced, and boom, the image comes into focus, revealing some essential clue. Its a wonderful storytelling convenience, but its been a frustrating fiction for decades blow up an image too much, and it becomes visibly pixelated. There isnt enough data to do more.

If you just navely upscale an image, its going to be blurry. Theres going to be a lot of detail, but its going to be wrong, said Bryan Catanzaro, vice president of applied deep learning research at Nvidia.

Recently, researchers and professionals have begun incorporating artificial intelligence algorithms into their image-enhancing tools, making the process easier and more powerful, but there are still limits to how much data can be retrieved from any image. Luckily, as researchers push enhancement algorithms ever further, they are finding new ways to cope with those limits even, at times, finding ways to overcome them.

In the past decade, researchers started enhancing images with a new kind of AI model called a generative adversarial network, or GAN, which could produce detailed, impressive-looking pictures. The images suddenly started looking a lot better, said Tomer Michaeli, an electrical engineer at the Technion in Israel. But he was surprised that images made by GANs showed high levels of distortion, which measures how close an enhanced image is to the underlying reality of what it shows. GANs produced images that looked pretty and natural, but they were actually making up, or hallucinating, details that werent accurate, which registered as high levels of distortion.

Michaeli watched the field of photo restoration split into two distinct sub-communities. One showed nice pictures, many made by GANs. The other showed data, but they didnt show many images, because they didnt look nice, he said.

In 2017, Michaeli and his graduate student Yochai Blau looked into this dichotomy more formally. They plotted the performance of various image-enhancement algorithms on a graph of distortion versus perceptual quality, using a known measure for perceptual quality that correlates well with humans subjective judgment. As Michaeli expected, some of the algorithms resulted in very high visual quality, while others were very accurate, with low distortion. But none had both advantages; you had to pick one or the other. The researchers dubbed this the perception-distortion trade-off.

Michaeli also challenged other researchers to come up with algorithms that could produce the best image quality for a given level of distortion, to allow fair comparisons between the pretty-picture algorithms and the nice-stats ones. Since then, hundreds of AI researchers have reported on the distortion and perception qualities of their algorithms, citing the Michaeli and Blau paper that described the trade-off.

Sometimes, the implications of the perception-distortion trade-off arent dire. Nvidia, for instance, found that high-definition screens werent nicely rendering some lower-definition visual content, so in February it released a tool that uses deep learning to upscale streaming video. In this case, Nvidias engineers chose perceptual quality over accuracy, accepting the fact that when the algorithm upscales video, it will make up some visual details that arent in the original video. The model is hallucinating. Its all a guess, Catanzaro said. Most of the time its fine for a super-resolution model to guess wrong, as long as its consistent.

Read the original post:

The AI Tools Making Images Look Better - Quanta Magazine