• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • About
  • Blog
  • Privacy
  • Contact
  • Submit Post
Tech Robin Logo

Tech Robin | Technology News Blog

Best Technology News Blog on AI tools, metaverse, automation, Data Science, Device Tech, Software, Robotics, Tech Reviews, Finance, Health, and Tech Trends

Ad example
  • Automation
    • Business Automation
    • Home Automation
    • Office Automation
  • Data Science
  • Device Tech
    • Smart Glasses
    • Smart Homes
    • Smart Phones
    • Smart TV
    • Smartwatches
    • Wearable Tech
    • Windows
  • Education Tech
  • Finance Tech
  • Health Tech
  • Tech Trends
    • AI tools
    • Cloud Computing
    • Cybersecurity
    • Internet of Things (IOT)
    • Machine Learning
    • Metaverse
    • Operating Systems
    • Robotics
    • Softwares
    • Tech Reviews
    • Virtual Reality

Microsoft – Microsoft’s Computer Vision Model Will Generate Alt Text For Reddit Images

March 7, 2023 by Miracle Olughu Leave a Comment

Spread the love

Microsoft – Microsoft’s Computer Vision Model Will Generate Alt Text For Reddit Images

Two years ago, Microsoft unveiled Florence, an AI system, calling it a “complete rethinking” of current computer vision models. Florence was both “unified” and “multimodal,” which meant that it could understand both language and images and that it could handle a variety of tasks as opposed to being restricted to particular applications, like creating captions. Florence was also unlike most vision models at the time.

As part of an update to the Vision APIs in Azure Cognitive Services, Florence is now available as part of Microsoft’s larger, ongoing effort to commercialize its AI research. The Florence-powered Microsoft Vision Services, with features like automatic captioning, background removal, video summarization, and image retrieval, go live today in preview for current Azure customers.

“Billions of image-text pairs were used to train Florence. Consequently, it is incredibly versatile “According to John Montgomery, CVP of Azure AI, who spoke with TechCrunch via email. Ask Florence to locate a specific frame in a video, and it will be able to do so. You can also ask Florence to distinguish between a Cosmic Crisp and a Honeycrisp apple.

Multimodal models are viewed by the AI research community, which includes tech behemoths like Microsoft, as the best route to developing AI systems that are more powerful. Multimodal models, which again comprehend multiple modalities like language and images or videos and audio, are naturally able to complete tasks faster than unimodal models (e.g., captioning videos).

Why not combine several “unimodal” models to accomplish the same goal, such as a model that only comprehends images and another that comprehends only language? There are several reasons for this, the first of which is that multimodal models sometimes outperform their unimodal counterparts in the same task because of the contextual information provided by the additional modalities.

For instance, an AI assistant that comprehends images, pricing information, and purchasing history is more likely to provide more relevant product recommendations than one that only comprehends pricing information.

The second reason is that multimodal models often result in faster processing times and (possibly) lower back-end costs due to their higher computational efficiency. That is undoubtedly a plus because Microsoft is a profit-driven company.

How about Florence then? Because it comprehends language, video, and image modalities as well as the connections between them, it is able to perform tasks like measuring the degree to which two modalities are similar or segmenting objects in a picture and pasting them onto a different background.

In light of ongoing legal disputes that may determine whether AI systems trained on copyrighted data, including images, are infringing on intellectual property holders’ rights, I thought it was important to ask Montgomery which data Microsoft used to train Florence. He wouldn’t go into detail other than to say that Florence uses data sources that were “responsibly obtained,” including data from partners.

Additionally, Montgomery noted that Florence’s training data had been cleaned of any potentially offensive material, which is a problem with many open-source training datasets.

When using large foundational models, it is crucial to ensure the training dataset’s quality in order to lay the groundwork for customized models for each vision task, according to Montgomery. Additionally, the modified models for each Vision task have undergone testing for fairness, adversarial and challenging cases, and they implement the same content moderation services as we have been using for DALL-E and Azure Open AI Service.

We’ll have to believe what the company says. It appears that some customers are. According to Montgomery, Reddit will generate captions for images on its platform using the new Florence-powered APIs, creating “alt text” so users with vision impairments can follow along in threads more easily.

“Florence’s ability to generate up to 10,000 tags per image will give Reddit much more control over how many objects in a picture they can identify and help generate much better captions,” Montgomery said. “Reddit will also use captioning to help all users improve article ranking for searching for posts.”

Microsoft also uses Florence across a swath of its platforms, products and services.

On LinkedIn and Reddit, Florence-powered services will generate captions to edit and support alt-text image descriptions. In Microsoft Teams, Florence is driving video segmentation capabilities. PowerPoint, Outlook and Word leverage Florence’s image captioning abilities for automatic alt text generation. And Designer and OneDrive, courtesy of Florence, have gained better image tagging, image search and background generation.

Montgomery sees Florence being used by customers for much more down the line, like detecting defects in manufacturing and enabling self-checkout in retail stores. I’d note that none of those use cases requires a multimodal vision model. But Montgomery asserts that multimodality adds something valuable to the equation.

“Florence is a complete re-thinking of vision models,” Montgomery said. “Once there’s the easy and high-quality translation between images and text, a world of possibilities opens up. Customers will experience significantly improved image search, train image and vision models and other model types like language and speech into entirely new types of applications and easily improve the quality of their customized versions.”

Related

Filed Under: Tech Reviews

Reader Interactions

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

More to See

Best eBook Websites - Top 10 Selected Best eBook Websites For you

Best eBook Websites – Top 10 Selected Best eBook Websites For you

March 23, 2023 By Miracle Olughu

Best Meme Websites - Best Meme Websites For Searching Funny Memes In 2023

Best Meme Websites – Best Meme Websites For Searching Funny Memes In 2023

March 21, 2023 By Miracle Olughu

How To Use Memes - 5 Useful Tips On How To Use Memes To Promote Your Company Brand

How To Use Memes – 5 Useful Tips On How To Use Memes To Promote Your Company Brand

March 21, 2023 By Miracle Olughu

Music Download Websites - Top 6 Music Download Websites For 2023

Music Download Websites – Top 6 Music Download Websites For 2023

March 20, 2023 By Miracle Olughu

Anime Download Sites - 2023 Top 9 Anime Download Sites To Download Anime Free

Anime Download Sites – 2023 Top 9 Anime Download Sites To Download Anime Free

March 20, 2023 By Miracle Olughu

RSS Professional Content Writing Services

  • Footnotes – Meaning, Importance, and 4 ways to use them effectively
  • Why are concave mirrors used in solar devices?
  • What is Citation? Meaning and how to cite sources plus 6 reasons for citation
  • Zotero Reference Manager -zotero.org 2 step installation, adding references fast, using zotero connector and plugin
  • What is Bibliography – Meaning and 3 popular forms
  • What is Glossary? – Meaning of Glossary and 10 steps to writing a Glossary of words
  • How to write a summary -5 things to note about summary writing
  • Literary devices -Essential literary devices and how they should be applied while writing
  • EBook file format- 8 best eBook file formats to use
  • Scientific Report -9 Components of a Scientific Report

RSS Example.NG News

  • FIFA Women’s World Cup Trophy – Nigeria Stands A Chance To Receive FIFA Women’s World Cup Trophy
  • Farmers e-Naira Programme – CBN, Association Launch e-Naira Programme, Target 1m Farmers
  • Tinubu Is In France – Countering The Rumors of Meeting CJN In London
  • Call For Cancellation Of Nigeria’s 2023 Presidential Elections – VMCII Rejects 2023 Presidential Elections And  Calls For Total Cancellation
  • Julian Nagelsmann – Bayern Sack Julian Nagelsmann
  • Abia State Governor-Elect – Alex Otti The Peter Obi Of Abia State
  • Dating Sites – 6 Best Online Dating Sites
  • Why We Celebrate Ramadan – What Makes Ramadan Very Special?
  • Dating Apps – How To Spot Scams On Dating Apps
  • UK Scholarships For Nigerian Students – Top 6 UK Scholarships For Nigerian Students 2023/2024

Footer

Text Widget

Tech Robin is geared towards covering information within the tech space for free consumption.

In the world of technology and technological advancement, it, therefore, becomes pertinent to provide readers and tech users with the needed information… Read More

All images seen and used on this site are properties of their respective owners. For DMCA Reports or takedown, Contact Us.

Recent

  • TikTok – TikTok CEO To Face US Congress Amid Data Privacy Concerns
  • Best eBook Websites – Top 10 Selected Best eBook Websites For you
  • How To Use Memes – 5 Useful Tips On How To Use Memes To Promote Your Company Brand
  • Best Meme Websites – Best Meme Websites For Searching Funny Memes In 2023
  • Music Download Websites – Top 6 Music Download Websites For 2023

Search

Copyright © 2023 · Tech Robin | Best Technology News Blog . Blog . Privacy . Contact . Sitemap

Go to mobile version