• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • About Us
  • Contact Us
  • Block Examples
  • Landing Page
Tech Robin Logo

Tech Robin | Technology News Blog

Best Technology News Blog on AI tools, metaverse, automation, Data Science, Device Tech, Software, Robotics, Tech Reviews, Finance, Health, and Tech Trends

  • Automation
    • Business Automation
    • Home Automation
    • Office Automation
  • Data Science
  • Device Tech
    • Smart Glasses
    • Smart Homes
    • Smart Phones
    • Smart TV
    • Smartwatches
    • Wearable Tech
    • Windows
  • Education Tech
  • Finance Tech
  • Health Tech
  • Tech Trends
    • AI tools
    • Cloud Computing
    • Cybersecurity
    • Internet of Things (IOT)
    • Machine Learning
    • Metaverse
    • Operating Systems
    • Robotics
    • Softwares
    • Tech Reviews
    • Virtual Reality

Microsoft – Microsoft’s Computer Vision Model Will Generate Alt Text For Reddit Images

March 7, 2023 by Miracle Olughu Leave a Comment

Microsoft – Microsoft’s Computer Vision Model Will Generate Alt Text For Reddit Images

Two years ago, Microsoft unveiled Florence, an AI system, calling it a “complete rethinking” of current computer vision models. Florence was both “unified” and “multimodal,” which meant that it could understand both language and images and that it could handle a variety of tasks as opposed to being restricted to particular applications, like creating captions. Florence was also unlike most vision models at the time.

As part of an update to the Vision APIs in Azure Cognitive Services, Florence is now available as part of Microsoft’s larger, ongoing effort to commercialize its AI research. The Florence-powered Microsoft Vision Services, with features like automatic captioning, background removal, video summarization, and image retrieval, go live today in preview for current Azure customers.

“Billions of image-text pairs were used to train Florence. Consequently, it is incredibly versatile “According to John Montgomery, CVP of Azure AI, who spoke with TechCrunch via email. Ask Florence to locate a specific frame in a video, and it will be able to do so. You can also ask Florence to distinguish between a Cosmic Crisp and a Honeycrisp apple.

Recommended: How To Protect A Microsoft Word Document With Password – 2 Method Of Protecting Microsoft Word Document With Password

Multimodal models are viewed by the AI research community, which includes tech behemoths like Microsoft, as the best route to developing AI systems that are more powerful. Multimodal models, which again comprehend multiple modalities like language and images or videos and audio, are naturally able to complete tasks faster than unimodal models (e.g., captioning videos).

Why not combine several “unimodal” models to accomplish the same goal, such as a model that only comprehends images and another that comprehends only language? There are several reasons for this, the first of which is that multimodal models sometimes outperform their unimodal counterparts in the same task because of the contextual information provided by the additional modalities.

For instance, an AI assistant that comprehends images, pricing information, and purchasing history is more likely to provide more relevant product recommendations than one that only comprehends pricing information.

The second reason is that multimodal models often result in faster processing times and (possibly) lower back-end costs due to their higher computational efficiency. That is undoubtedly a plus because Microsoft is a profit-driven company.

How about Florence then? Because it comprehends language, video, and image modalities as well as the connections between them, it is able to perform tasks like measuring the degree to which two modalities are similar or segmenting objects in a picture and pasting them onto a different background.

In light of ongoing legal disputes that may determine whether AI systems trained on copyrighted data, including images, are infringing on intellectual property holders’ rights, I thought it was important to ask Montgomery which data Microsoft used to train Florence. He wouldn’t go into detail other than to say that Florence uses data sources that were “responsibly obtained,” including data from partners.

Additionally, Montgomery noted that Florence’s training data had been cleaned of any potentially offensive material, which is a problem with many open-source training datasets.

Also See: Computer Vision Courses – Top 7 Free Computer Vision Courses You Must Know

When using large foundational models, it is crucial to ensure the training dataset’s quality in order to lay the groundwork for customized models for each vision task, according to Montgomery. Additionally, the modified models for each Vision task have undergone testing for fairness, adversarial and challenging cases, and they implement the same content moderation services as we have been using for DALL-E and Azure Open AI Service.

We’ll have to believe what the company says. It appears that some customers are. According to Montgomery, Reddit will generate captions for images on its platform using the new Florence-powered APIs, creating “alt text” so users with vision impairments can follow along in threads more easily.

“Florence’s ability to generate up to 10,000 tags per image will give Reddit much more control over how many objects in a picture they can identify and help generate much better captions,” Montgomery said. “Reddit will also use captioning to help all users improve article ranking for searching for posts.”

Microsoft also uses Florence across a swath of its platforms, products and services.

On LinkedIn and Reddit, Florence-powered services will generate captions to edit and support alt-text image descriptions. In Microsoft Teams, Florence is driving video segmentation capabilities. PowerPoint, Outlook and Word leverage Florence’s image captioning abilities for automatic alt text generation. And Designer and OneDrive, courtesy of Florence, have gained better image tagging, image search and background generation.

Also Read: Computer Vision – 5 Amazing Ways Of Using Computer Vision In Media And Entertainment

Montgomery sees Florence being used by customers for much more down the line, like detecting defects in manufacturing and enabling self-checkout in retail stores. I’d note that none of those use cases requires a multimodal vision model. But Montgomery asserts that multimodality adds something valuable to the equation.

“Florence is a complete re-thinking of vision models,” Montgomery said. “Once there’s the easy and high-quality translation between images and text, a world of possibilities opens up. Customers will experience significantly improved image search, train image and vision models and other model types like language and speech into entirely new types of applications and easily improve the quality of their customized versions.”

Share this:

  • Computer Vision – 5 Amazing Ways Of Using Computer Vision In Media And Entertainment
  • June 1, 2023
  • AI tools
  • Computer Vision Courses – Top 7 Free Computer Vision Courses You Must Know
  • June 1, 2023
  • AI tools
  • Open Source AI Platforms – Meaning, Uses And Top 5 Of The Best Open Source AI Platforms For You
  • April 26, 2023
  • AI tools

Filed Under: Tech Reviews

Reader Interactions

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

More to See

Samsung Unveils New Devices  – What You Need to Know About The Galaxy Z Fold 5, Z Flip 5, Watch 6 and Galaxy Tab S9

July 28, 2023 By Jael Okwuchukwu

YouTube Watch Time – 7 Possible Strategic Ways to Increase YouTube Watch Time 

July 10, 2023 By Jael Okwuchukwu

Footer

Text Widget

This is an example of a text widget which can be used to describe a particular service. You can also use other widgets in this location.

Examples of widgets that can be placed here in the footer are a calendar, latest tweets, recent comments, recent posts, search form, tag cloud or more.

Sample Link.

Recent

  • AI Image Generators –  Top 11 AI Image Generators And How to Generate AI Images
  • Samsung Unveils New Devices  – What You Need to Know About The Galaxy Z Fold 5, Z Flip 5, Watch 6 and Galaxy Tab S9
  • YouTube Watch Time – 7 Possible Strategic Ways to Increase YouTube Watch Time 
  • Ripple – TikTok Parent ByteDance Unveils AI-Powered Music Generator
  • AI in customer service -7 ways to increase productivity as a customer service Rep using Artificial Intelligence

Search

Copyright © 2023 · Tech Robin | Best Technology News Blog . Blog . Privacy . Contact . Sitemap