Interviews, insight & analysis on digital media & marketing

Countering the damage of fake reviews using publicly available data

By Gediminas Rickevicius, VP for Global Partnerships at Oxylabs

During the pandemic, the world turned to online shopping. And ever since, ecommerce businesses have been thriving. This year alone, retail ecommerce sales around the world are estimated to exceed 6.3 trillion U.S. dollars. The competition is growing, and businesses are looking for ways to win their market share. 

Traditionally, more competition results in consumers getting the best deals. However, this may not always be the case, as some businesses are employing malicious practices that harm competitor brands and trick customers into making purchase decisions based on fake reviews.

A recent survey by BrightLocal shows that 97% of consumers read online reviews when browsing for a local business. 69% report that reading a positive review is the key factor that makes them feel good about using a business. And 50% of the respondents trust consumer reviews as much as personal recommendations from family and friends.

Unfortunately, the amount of fake reviews is appalling. According to the U.S. Public Interest Research Group (PIRG), as much as 30-40% of online reviews are fabricated. In 2021, Trustpilot removed 2.7M fake reviews, while in 2022, TripAdvisor identified 1.3M reviews as fake.

The acceleration of AI has also contributed to the spread of fake reviews. Researchers analyzed Amazon data to find out that this marketplace alone has seen a 400% increase in AI-generated reviews since the launch of ChatGPT. The numbers are staggering and inevitably call for novel solutions that can help fight fake reviews. Public web intelligence is a powerful weapon in this fight, and companies increasingly use it to protect their brands.

Protecting brands with publicly available data

Fake reviews can pop up anywhere on the internet. Companies often don’t even know that their brands are being mentioned in public forums, on social media, in the review sections of competing e-shops, or in other places with heavy user traffic. If a company doesn’t have a review monitoring system or similar brand protection solution in place, it risks losing revenue and suffering from reputational damage.

If you think customers can tell a fake review from a genuine one, data shows otherwise. Brightlocal research found that 58% of consumers preferred the AI-generated review over a human-written one. Businesses can feel an immediate effect of fake reviews in their sales or consumer behaviour changes. Therefore, it’s essential to track brand mentions in real-time, as any delay can result in serious damage.

Brand mentions can be tracked and analyzed using AI-powered web intelligence collection tools. These solutions scan public data in real-time or with minimal time delay and send alerts whenever they find a related brand mention. In-house teams, outsourced specialists, or even automated AI tools can check if the mention is genuine or if it’s a fake review, and react accordingly. 

Usually, fake reviews have distinct characteristics, such as the same IP or subnet for multiple reviews, the same reviewer profile, repetitive wording and mistakes, etc. Companies can also get indications in what context the brand is mentioned by using sentiment analysis. It’s a Natural Language Processing (NLP) technique that sorts data into positive, negative, or neutral. This helps quickly identify if a brand mention can cause potential harm.

It’s a common myth that web intelligence only benefits large corporations with dedicated teams. The truth is that even small, local businesses can protect their business using public data. Web intelligence gathering tools can collect public data from specific locations around the world and return it in a preferred format. Companies can choose between an out-of-the-box solution with an API integration or enhance their in-house solutions with proxies.

The best proxies for brand protection are ethically sourced residential IP addresses. These IPs are more resilient than datacenter proxies and have a higher success rate in returning data without blocks. Moreover, they can target very specific locations, which is an important factor for local businesses.

Fueling AI solutions with web intelligence

While AI is a common tool for generating fake reviews, it’s also used to fight it. In 2023 alone, Google blocked or removed over 170 million fake reviews, which is a 45% increase compared to 2022. All thanks to a new machine learning algorithm, which swiftly identifies suspicious reviews by analysing patterns. 

Today, AI-powered solutions can also identify AI-generated content. But how accurate are AI text detectors?

Some AI text detectors still struggle to distinguish between AI-generated and human-generated content. For example, OpenAI’s experimental tool AI Classifier achieved a 26% accuracy rate and was discontinued. The tool incorrectly labelled human-generated text as an AI work around 9% of the time.

On the other hand, the University of Kansas developed an AI detector with a 99% accuracy rate in academic papers. Researchers selected 64 perspectives (similar to review articles) and used these to generate 128 articles using ChatGPT, which then trained the AI detector.

Training Large Language Models (LLM) requires vast amounts of public data from various sources. Focusing on niche subjects such as academic writing or fake and genuine reviews can be a good starting point for training the machine, which can then become the basis for developing universal AI text detection models with a high success rate.

Growing competition in the ecommerce industry and the rise of widely available generative AI tools significantly increased the number of fake online reviews. Companies are at risk of costly reputational and financial damage if their brand becomes a target of fabricated negative reviews.

Companies that successfully counter the damage of fake reviews use publicly available web data. Monitoring brand mentions by collecting and analysing web intelligence is an effective method to find fake reviews and remove them before they cause any damage. 

Moreover, public review data is an essential source for training AI text detectors and enabling them to distinguish AI and human-generated reviews successfully. The more data we feed into the AI tools, the better they’ll become at filtering content and identifying who, or what, wrote the content.