Flag

We stand with Ukraine and our team members from Ukraine. Here are ways you can help

Home ›› Artificial Intelligence ›› You Are Doing Research Wrong

You Are Doing Research Wrong

by Assaf Elovic
6 min read
Share this post on
Tweet
Share
Post
Share
Email
Print

Save

Discover how GPT Researcher is transforming research with multiple AI agents that collaboratively handle complex questions, aiming for a level of depth and objectivity beyond traditional search engines. The article also introduces Tavily, a search platform designed to offer transparent, bias-free insights by leveraging this multi-agent approach.

Over the past few years, we’ve witnessed an explosion of new AI tools designed to disrupt research. Some, like ChatPDF and Consensus, focus on extracting insights from documents. Others, such as Perplexity, excel at scouring the web for information. But here’s the thing: none of these tools combine both web and local document search within a single contextual research pipeline.

This is why I’m excited to introduce the latest advancements of GPT Researcher — now able to conduct hybrid research on any given task and document.

Web-driven research often lacks specific context, risks information overload, and may include outdated or unreliable data. On the flip side, local-driven research is limited to historical data and existing knowledge, potentially creating organizational echo chambers and missing out on crucial market trends or competitor moves. Both approaches, when used in isolation, can lead to incomplete or biased insights, hampering your ability to make fully informed decisions.

Today, we’re going to change the game. By the end of this guide, you’ll learn how to conduct hybrid research that combines the best of both worlds — web and local — enabling you to conduct more thorough, relevant, and insightful research.

Why Hybrid Research Works Better

By combining web and local sources, hybrid research addresses these limitations and offers several key advantages:

  • Grounded context: Local documents provide a foundation of verified, organization-specific information. This grounds the research in established knowledge, reducing the risk of straying from core concepts or misinterpreting industry-specific terminology. Example: A pharmaceutical company researching a new drug development opportunity can use its internal research papers and clinical trial data as a base, then supplement this with the latest published studies and regulatory updates from the web.
  • Enhanced accuracy: Web sources offer up-to-date information, while local documents provide historical context. This combination allows for more accurate trend analysis and decision-making. Example: A financial services firm analyzing market trends can combine its historical trading data with real-time market news and social media sentiment analysis to make more informed investment decisions.
  • Reduced bias: By drawing from both web and local sources, we mitigate the risk of bias that might be present in either source alone. Example: A tech company evaluating its product roadmap can balance internal feature requests and usage data with external customer reviews and competitor analysis, ensuring a well-rounded perspective.
  • Improved planning and reasoning: LLMs can leverage the context from local documents to better plan their web research strategies and reason about the information they find online. Example: An AI-powered market research tool can use a company’s past campaign data to guide its web search for current marketing trends, resulting in more relevant and actionable insights.
  • Customized insights: Hybrid research allows for the integration of proprietary information with public data, leading to unique, organization-specific insights. Example: A retail chain can combine its sales data with web-scraped competitor pricing and economic indicators to optimize its pricing strategy in different regions.

These are just a few examples for business use cases that can leverage hybrid research, but enough with the small talk — let’s build!

Building the Hybrid Research Assistant

Before we dive into the details, it’s worth noting that GPT Researcher has the capability to conduct hybrid research out of the box! However, to truly appreciate how this works and to give you a deeper understanding of the process, we’re going to take a look under the hood.

GPT Researcher hybrid research

GPT Researcher conducts web research based on an auto-generated plan from local documents, as seen in the architecture above. It then retrieves relevant information from both local and web data for the final research report.

We’ll explore how local documents are processed using LangChain, which is a key component of GPT Researcher’s document handling. Then, we’ll show you how to leverage GPT Researcher to conduct hybrid research, combining the advantages of web search with your local document knowledge base.

Processing Local Documents with Langchain

LangChain provides a variety of document loaders that allow us to process different file types. This flexibility is crucial when dealing with diverse local documents. Here’s how to set it up:

Create a function to load documents based on their file type:

Use the function to load your local documents:

Split the documents into smaller chunks for more efficient processing:

Create embeddings and store them in a vector database for quick retrieval:

Now that we have our local documents processed and stored in a vector database, we can easily search and retrieve relevant information. Here’s an example of how to perform a similarity search:

Conducting Web Research with GPT Researcher

Now that we’ve learned how to work with local documents, let’s take a quick look at how GPT Researcher works under the hood:

GPT Researcher Architecture

As seen above, GPT Researcher creates a research plan based on the given task by generating potential research queries that can collectively provide an objective and broad overview of the topic. Once these queries are generated, GPT Researcher uses a search engine like Tavily to find relevant results. Each scraped result is then saved in a vector database. Finally, the top k chunks most related to the research task are retrieved to generate a final research report.

GPT Researcher supports hybrid research, which involves an additional step of chunking local documents (implemented using Langchain) before retrieving the most related information. After numerous evaluations conducted by the community, we’ve found that hybrid research improved the correctness of final results by over 40%!

Running the Hybrid Research with GPT Researcher

Now that you have a better understanding of how hybrid research works, let’s demonstrate how easily this can be achieved with GPT Researcher.

Step 1: Install GPT Researcher with PIP

Step 2: Setting up the environment

We will run GPT Researcher with OpenAI as the LLM vendor and Tavily as the search engine. You’ll need to obtain API keys for both before moving forward. Then, export the environment variables in your CLI as follows:

Step 3: Initialize GPT Researcher with hybrid research configuration

GPT Researcher can be easily initialized with params that signal it to run hybrid research. You can conduct many forms of research, head to the documentation page to learn more.

To get GPT Researcher to run hybrid research, you need to include all relevant files in the my-docs directory (create it if it doesn’t exist) and set the instance report_source to “hybrid” as seen below. Once the report source is set to hybrid, GPT Researcher will look for existing documents in the my-docs directory and include them in the research. If no documents exist, it will ignore them.

As seen above, we can run the research on the following example:

  • Research task: “How does our product roadmap compare to emerging market trends in our industry?”
  • Web: Current market trends, competitor announcements, and industry forecasts.
  • Local: Internal product roadmap documents and feature prioritization lists.

After various community evaluations, we’ve found that the results of this research improve the quality and correctness of research by over 40% and remove hallucinations by 50%. Moreover as stated above, local information helps the LLM improve planning reasoning allowing it to make better decisions and researching more relevant web sources.

But wait, there’s more! GPT Researcher also includes a sleek front-end app using NextJS and Tailwind. To learn how to get it running check out the documentation page. You can easily use drag and drop for documents to run hybrid research.

Conclusion

Hybrid research represents a significant advancement in information gathering and decision-making. By leveraging tools like GPT Researcher, teams can now conduct more comprehensive, context-aware, and actionable research. This approach addresses the limitations of using web or local sources in isolation, offering benefits such as grounded context, enhanced accuracy, reduced bias, improved planning and reasoning, and customized insights.

The automation of hybrid research can enable teams to make faster, more data-driven decisions, ultimately enhancing productivity and offering a competitive advantage in analyzing an expanding pool of unstructured and dynamic information.

The article originally appeared on Medium.

Featured image courtesy: Assaf Elovic.

post authorAssaf Elovic

Assaf Elovic
Assaf Elovic is a tech leader with a deep involvement in AI innovation and development. He has contributed to several impactful AI projects like GPT Researcher, including leading engineering at companies like Servicefriend (acquired by Meta), Monday, and Wix. Through his work, he focuses on AI-driven product development and has a passion for sharing his expertise, often speaking and writing about innovation and emerging technologies.

Tweet
Share
Post
Share
Email
Print
Ideas In Brief
  • The article introduces GPT Researcher, an AI tool that uses multiple specialized agents to enhance research depth and accuracy beyond traditional search engines.
  • It explores how GPT Researcher’s agentic approach reduces bias by simulating a collaborative research process, focusing on factual, well-rounded responses.
  • The piece presents Tavily, a search engine aligned with GPT Researcher’s framework, aimed at delivering transparent and objective search results.

Related Articles

Curious about the next frontier in AI design? Discover how AI can go beyond chatbots to create seamless, context-aware interactions that anticipate user needs. Dive into the future of AI in UX design with this insightful article!

Article by Maximillian Piras
When Words Cannot Describe: Designing For AI Beyond Conversational Interfaces
  • The article explores the future of AI design, moving beyond simple chatbots to more sophisticated, integrated systems.
  • It argues that while conversational interfaces have been the focus, the potential for AI lies in creating seamless, contextual interactions across different platforms and devices.
  • The piece highlights the importance of understanding user intent and context, advocating for AI systems that can anticipate needs and provide personalized experiences.
Share:When Words Cannot Describe: Designing For AI Beyond Conversational Interfaces
21 min read

Discover how Flux.1, with its groundbreaking 12 billion parameters, sets a new benchmark in AI image generation. This article explores its advancements over Midjourney and Dall-E 3, showcasing its unmatched detail and prompt accuracy. Don’t miss out on seeing how this latest model redefines what’s possible in digital artistry!

Article by Jim Clyde Monge
Flux.1 is a Mind-Blowing Open-Weights AI Image Generator with 12B Parameters
  • This article examines Flux.1’s 12 billion parameters and its advancements over Midjourney and Dall-E 3. Highlights its superior image detail and prompt adherence.
  • The piece explores the shift of developers from Stability AI to Black Forest Labs and how this led to Flux.1. Analyzes the innovation impact.
  • It compares Flux.1 with Midjourney V6, Dall-E 3, and SD3 Ultra, focusing on visual quality, prompt coherence, and diversity.
  • The guide explains how to access Flux.1 via Replicate, HuggingFace, and Fal. Covers the different models—Pro, Dev, Schnell—and their uses.
  • The article investigates Flux.1’s capabilities in generating photorealistic and artistic images with examples of its realism and detailed rendering.
Share:Flux.1 is a Mind-Blowing Open-Weights AI Image Generator with 12B Parameters
5 min read

Is true consciousness in computers a possibility, or merely a fantasy? The article delves into the philosophical and scientific debates surrounding the nature of consciousness and its potential in AI. Explore why modern neuroscience and AI fall short of creating genuine awareness, the limits of current technology, and the profound philosophical questions that challenge our understanding of mind and machine. Discover why the pursuit of conscious machines might be more about myth than reality.

Article by Peter D'Autry
Why Computers Can’t Be Conscious
  • The article examines why computers, despite advancements, cannot achieve consciousness like humans. It challenges the assumption that mimicking human behavior equates to genuine consciousness.
  • It critiques the reductionist approach of equating neural activity with consciousness and argues that the “hard problem” of consciousness remains unsolved. The piece also discusses the limitations of both neuroscience and AI in addressing this problem.
  • The article disputes the notion that increasing complexity in AI will lead to consciousness, highlighting that understanding and experience cannot be solely derived from computational processes.
  • It emphasizes the importance of physical interaction and the lived experience in consciousness, arguing that AI lacks the embodied context necessary for genuine understanding and consciousness.
Share:Why Computers Can’t Be Conscious
18 min read

Tell us about you. Enroll in the course.

    This website uses cookies to ensure you get the best experience on our website. Check our privacy policy and