1. Introduction

1.1. What is a PDF QA Bot?

A PDF QA Bot is a chatbot that allows users to interact with PDF documents by asking questions and receiving responses based on the content within those PDFs. Unlike traditional search or text-based chatbots, a PDF QA Bot is specifically designed to parse and understand the content of PDF files, extracting relevant information to answer user queries.

The core idea is to turn any PDF document into an interactive knowledge base, where the bot can comprehend the text (and even extract data from images using OCR) to provide meaningful answers to specific questions.

1.2. Why Use Flask and Perplexity API?

  • Flask: Flask is a lightweight and flexible Python web framework that makes it easy to build APIs. It’s simple to set up, and its minimalist design is perfect for our PDF QA Bot, where the focus is on routing PDF uploads, handling user requests, and interacting with external APIs.
  • Perplexity API: Perplexity is a powerful natural language processing tool that can help with both text understanding and OCR. With its ability to extract structured answers from both plain text and OCR-generated data, it is a perfect fit for this project. The Perplexity API allows us to send both regular text and OCR-extracted content to it, and it will intelligently extract answers based on the questions asked.

Using Flask and Perplexity API together provides a solid foundation for building a responsive, efficient, and intelligent PDF QA Bot that works seamlessly with PDFs of all types — whether they contain text or images.

1.3. Overview of the Steps

This blog will guide you through the steps of building a PDF QA Bot using Flask and the Perplexity API:

  1. Setting Up the Development Environment: We’ll get Python and Flask installed, set up a virtual environment, and install the necessary libraries.
  2. Building the Flask API for PDF Upload: You’ll learn how to create a basic Flask app that can accept and store PDF files.
  3. Processing PDF Content: We’ll extract the text from the PDF and use Perplexity API to process the content.
  4. Integrating with Perplexity API: Learn how to send text (including OCR content) to Perplexity and retrieve answers.
  5. Handling QA Bot Responses: We’ll cover how to process user queries and match them to relevant content from the PDF.
  6. Error Handling and Debugging: We’ll handle common errors that may arise when working with PDFs or the Perplexity API.
  7. Conclusion: We’ll wrap up with next steps and ideas for enhancing the PDF QA Bot.

Let’s dive into the first step: setting up the development environment!


2. Setting Up the Development Environment

2.1. Installing Python and Flask

To get started, you’ll need Python installed on your machine. If you don’t have it yet, you can download it from python.org

Once Python is installed, the next step is to install Flask. Flask will be used to build the web API that handles PDF uploads, user queries, and interactions with the Perplexity API.

To install Flask, open your terminal (or command prompt) and run the following command:

pip install Flask

This will install Flask and its dependencies, allowing you to build a lightweight web application to serve your PDF QA Bot.

2.2. Setting Up a Virtual Environment

It’s a best practice to create a virtual environment for your Python projects. This ensures that your project’s dependencies are isolated from other Python projects on your machine.

To create a virtual environment:

  1. Navigate to your project folder:
cd path/to/your/project

2. Create the virtual environment:

python -m venv venv

3. Activate the virtual environment:

On Windows:

venv\Scripts\activate

On macOS/Linux:

source venv/bin/activate

Once activated, your terminal will show the virtual environment name (e.g., (venv)), indicating that it is active.

2.3. Installing Required Libraries and Packages

Now that we have Flask installed, let’s install the other libraries we’ll need for this project.

To install PyPDF2 (for PDF parsing) and Requests (for API interactions), run the following:

pip install PyPDF2 requests

Additionally, if you plan to use OCR features (like Pytesseract or Perplexity API’s OCR capabilities), make sure you have the necessary dependencies for handling images.

To install the Pillow library (for image handling):

pip install Pillow

2.3. Installing Required Libraries and Packages (Including Perplexity API)

Now that we have Flask and PyPDF2 installed, the next step is to integrate the Perplexity API for question-answering based on the content extracted from the PDFs.

Perplexity API will allow us to send text (including OCR-extracted content) from the PDF and get responses to questions that users ask.

Here’s how to set it up:

Step 1: Sign Up for Perplexity API

Before you can use the Perplexity API, you need to sign up and obtain an API key. Visit the Perplexity API website and create an account to get your key. Once you’ve signed up, you should be able to access your API key from the dashboard.

Step 2: Install the Perplexity Python Client

Once you have your API key, you’ll need to install the Perplexity Python client to easily interact with the API.

Install the client using pip:

pip install perplexity-api

This will install the necessary libraries for integrating Perplexity API into your Flask application.

Step 3: Configuring the Perplexity API Key

For security reasons, never hardcode your API key directly in your code. Instead, you can store it in an environment variable or a .env file.

  • Install the python-dotenv package to manage environment variables:
pip install python-dotenv
  • Create a .env file in your project root and add your Perplexity API key:
PERPLEXITY_API_KEY=your_api_key_here
  • In your Flask app, load the API key from the .env file using dotenv:
from dotenv import load_dotenv
import os

load_dotenv()
PERPLEXITY_API_KEY = os.getenv("PERPLEXITY_API_KEY")

This ensures that your API key is securely stored and can be accessed within your application.

Step 4: Using the Perplexity API

Once the Perplexity API client is installed and configured, you can send requests to the API to get answers from the text extracted from your PDF.

Here’s an example of how to send a query to Perplexity:

import requests

def ask_perplexity_api(question, pdf_text):
    url = "https://api.perplexity.ai/query"
    
    headers = {
        "Authorization": f"Bearer {PERPLEXITY_API_KEY}",
        "Content-Type": "application/json"
    }
    
    data = {
        "query": question,
        "document": pdf_text  # The extracted text from your PDF
    }

    response = requests.post(url, headers=headers, json=data)
    
    if response.status_code == 200:
        return response.json()['answer']  # Extract the answer from the response
    else:
        return "Error: Unable to get a response from Perplexity API."

In this example:

  • The query is the user’s question.
  • The document is the extracted text from the PDF that you want to query.

When a user asks a question, you’ll send the extracted text (and any OCR content, if available) to Perplexity, which will return the most relevant answer.

3. Building the Flask API for PDF Upload

In this section, we’ll build the basic Flask API that allows users to upload PDF files. These files will be processed to extract text, and we’ll query the Perplexity API to get answers based on the PDF content.

3.1. Initializing the Flask App

First, let’s initialize the Flask application. Create a new file, app.py, in your project folder.

from flask import Flask, request, jsonify
import os

app = Flask(__name__)

# Ensure the folder for storing uploaded PDFs exists
UPLOAD_FOLDER = 'uploads'
os.makedirs(UPLOAD_FOLDER, exist_ok=True)

app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

if __name__ == '__main__':
    app.run(debug=True)

Here:

  • We create a new Flask app instance.
  • We specify a folder (uploads) to save the uploaded PDF files.
  • The app will run in debug mode for easy troubleshooting.

3.2. Creating Routes for Uploading PDFs

Next, we need to create a route for accepting PDF uploads from the user. In this route, we’ll accept the uploaded file, save it, and then extract the text from it.

from flask import Flask, request, jsonify
import os
import PyPDF2

app = Flask(__name__)

UPLOAD_FOLDER = 'uploads'
os.makedirs(UPLOAD_FOLDER, exist_ok=True)

app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Route to upload a PDF file
@app.route('/upload', methods=['POST'])
def upload_pdf():
    if 'pdf' not in request.files:
        return jsonify({"error": "No file part"}), 400
    
    pdf_file = request.files['pdf']
    
    if pdf_file.filename == '':
        return jsonify({"error": "No selected file"}), 400
    
    if pdf_file and pdf_file.filename.endswith('.pdf'):
        filepath = os.path.join(app.config['UPLOAD_FOLDER'], pdf_file.filename)
        pdf_file.save(filepath)
        extracted_text = extract_pdf_text(filepath)
        return jsonify({"message": "PDF uploaded successfully", "extracted_text": extracted_text}), 200
    else:
        return jsonify({"error": "Invalid file format. Please upload a PDF."}), 400

def extract_pdf_text(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = ""
        for page in reader.pages:
            text += page.extract_text()
    return text

if __name__ == '__main__':
    app.run(debug=True)

Here’s what’s happening:

  1. Upload Route (/upload): This route handles PDF uploads from the user. The file is saved in the uploads folder.
  2. Extracting Text: After the PDF is uploaded, the extract_pdf_text function is called to extract the text using PyPDF2.
    • We open the PDF and loop through all the pages, extracting the text from each page.
  3. Return Response: We send a success message along with the extracted text as part of the JSON response.

3.3. Handling Multipart File Uploads in Flask

Flask uses multipart form-data for handling file uploads. When sending a PDF to the server, make sure that the Content-Type is set correctly. If you’re testing this with Postman or cURL, here’s how the request should look:

  • Postman: Set the method to POST, and in the “Body” section, choose “form-data.” Add a file key with the name pdf, and select a PDF file to upload.
  • cURL (Command Line Example):
curl -X POST -F "pdf=@your_pdf_file.pdf" http://localhost:5000/upload

3.4. Saving Uploaded PDFs

In the upload_pdf function, we save the uploaded PDF using:

filepath = os.path.join(app.config['UPLOAD_FOLDER'], pdf_file.filename)
pdf_file.save(filepath)

This stores the file in the uploads folder within the project directory. You can adjust the storage path as needed.

4. Processing PDF Content

In this section, we’ll process the uploaded PDF content, extract the text, and send the extracted text to the Perplexity API for question answering. We’ll also handle OCR processing for any images within the PDFs, if applicable.

4.1. Introduction to PDF Parsing

To extract content from PDFs, we will use PyPDF2 (or pdfminer if you’d like additional features), which is great for text-based PDFs. If your PDFs contain scanned images or non-selectable text, we’ll need to perform OCR (Optical Character Recognition) to extract text from images.

Since we’re using Perplexity API, it can handle both text and OCR content, so we’ll send the extracted content (from text and images) directly to the API.

4.2. Extracting Text from PDFs (PyPDF2)

We’ve already set up the extract_pdf_text function in the previous section, which uses PyPDF2 to extract text from the PDF. We’ll be using this function to pull the text from all pages of the uploaded PDF.

Here’s the code that extracts text from the PDF:

import PyPDF2

def extract_pdf_text(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = ""
        for page in reader.pages:
            text += page.extract_text()  # Extract text from each page
    return text

This function will handle PDFs that contain selectable text. However, for PDFs containing images or scanned text, we’ll need to handle OCR.

4.3. Detecting and Sending OCR Content to Perplexity API

If the PDF contains scanned images, the text won’t be extractable with PyPDF2. In such cases, OCR will be required to extract the text from those images.

While we could use an external OCR library like Tesseract or Pytesseract, Perplexity API can handle OCR processing natively. This means we can send the image content (from PDFs) to the Perplexity API, which will extract the text.

To send OCR content, we’ll need to first check if the PDF contains images, extract the images, and then send them for OCR processing through Perplexity API.

Here’s how to integrate OCR detection and handling using Perplexity API:

from perplexity_api import PerplexityAPI
from PIL import Image
import io
import requests

# Initialize Perplexity API
perplexity = PerplexityAPI(api_key=PERPLEXITY_API_KEY)

def extract_ocr_text_from_pdf(pdf_path):
    # Use Perplexity API for OCR processing of images in the PDF
    with open(pdf_path, 'rb') as file:
        # Extract images (assuming the PDF has images you want to OCR)
        images = extract_images_from_pdf(file)  # Placeholder for image extraction method
        
        # Send images to Perplexity API for OCR processing
        ocr_results = []
        for image in images:
            ocr_text = send_image_for_ocr(image)
            ocr_results.append(ocr_text)
        
        return " ".join(ocr_results)

def send_image_for_ocr(image):
    # Convert image to byte stream for Perplexity API
    img_byte_arr = io.BytesIO()
    image.save(img_byte_arr, format='PNG')
    img_byte_arr.seek(0)
    
    url = "https://api.perplexity.ai/ocr"
    files = {'file': ('image.png', img_byte_arr, 'image/png')}
    
    response = requests.post(url, files=files, headers={'Authorization': f"Bearer {PERPLEXITY_API_KEY}"})
    
    if response.status_code == 200:
        return response.json()['ocr_text']  # Extracted OCR text
    else:
        return "Error: Unable to extract OCR text."

# Function to extract images from the PDF (you can use libraries like pdf2image)
def extract_images_from_pdf(pdf_file):
    # Placeholder for the actual image extraction method (could use pdf2image or other libraries)
    pass

Explanation:

  • extract_ocr_text_from_pdf: This function extracts images from a PDF and sends each image to the Perplexity API for OCR.
  • send_image_for_ocr: This function handles the interaction with Perplexity’s OCR API. The image is converted into a byte stream, then sent to the API for text extraction.
  • extract_images_from_pdf: You’ll need an image extraction library like pdf2image or PyMuPDF to extract images from PDFs (this function is a placeholder for now).

4.4. Combining Extracted Text and OCR Results

Once we have both the text and OCR results from the PDF, we can combine both and send them to Perplexity API. This gives the chatbot a comprehensive understanding of the entire document.

Here’s an example of how you can combine the text and OCR content and send it to Perplexity:

def process_pdf_and_get_answer(pdf_path, question):
    # Extract text from the PDF
    extracted_text = extract_pdf_text(pdf_path)
    
    # If OCR is needed, extract OCR text
    ocr_text = extract_ocr_text_from_pdf(pdf_path) if contains_images(pdf_path) else ""

    # Combine both text and OCR results
    full_text = extracted_text + " " + ocr_text

    # Send the combined text to Perplexity for question answering
    answer = ask_perplexity_api(question, full_text)
    return answer

def contains_images(pdf_path):
    # Placeholder for logic to check if the PDF contains images (you can use PyMuPDF or pdf2image)
    pass

In this example:

  • We combine the text from the PDF and the OCR text (if available).
  • We then send the combined content to Perplexity API for answering a specific question.

5. Integrating with Perplexity API

In this section, we’ll integrate the Perplexity API to handle the question-answering process based on the extracted text from the PDF (both standard text and OCR-extracted content). We’ll look at how to send the content to Perplexity API and receive structured responses that the chatbot can use to answer user queries.

5.1. Introduction to Perplexity API

The Perplexity API is a powerful tool for natural language understanding. It’s designed to take documents (like text from PDFs) and user queries, and return relevant answers based on the content. This allows us to build a question-answering system where users can ask questions about the uploaded PDF, and the bot can provide relevant answers.

Perplexity supports both textual content (like text extracted from PDFs) and OCR content (extracted from images), making it perfect for our PDF QA Bot.

5.2. Obtaining an API Key from Perplexity

To interact with the Perplexity API, you’ll need an API key. Here’s how you can obtain it:

  1. Sign up: Go to the Perplexity API website and create an account.
  2. Get your API key: After signing up, you’ll be able to access the API key from your dashboard.

Make sure to keep your API key secure (never hardcode it directly in your code). We’ll store it in an environment variable, as discussed earlier.

5.3. Sending Extracted Text (and OCR Content) to Perplexity API

Once you have your PDF text and OCR content ready, it’s time to send the data to the Perplexity API for question answering. Below is an example of how to send the extracted PDF content and a user query to the Perplexity API.

First, make sure you’ve installed the requests library to send HTTP requests:

pip install requests

Then, create a function to interact with the Perplexity API:

import requests

def ask_perplexity_api(question, full_text):
    # Perplexity API URL for query processing
    url = "https://api.perplexity.ai/query"
    
    # Headers, including the API key for authentication
    headers = {
        "Authorization": f"Bearer {PERPLEXITY_API_KEY}",
        "Content-Type": "application/json"
    }

    # Payload containing the user question and the extracted text
    data = {
        "query": question,
        "document": full_text  # The extracted PDF text + OCR text
    }

    # Sending the request to the Perplexity API
    response = requests.post(url, headers=headers, json=data)
    
    # Handling the API response
    if response.status_code == 200:
        # Extracting the answer from the response
        answer = response.json()['answer']
        return answer
    else:
        # Returning error message if the API request fails
        return "Error: Unable to get a response from Perplexity API."

Explanation:

  • ask_perplexity_api: This function sends the question and the document (which contains the extracted text and OCR content) to the Perplexity API via a POST request.
  • The response from the API will contain an answer to the question based on the provided document.
  • If the API call fails, an error message is returned.

5.4. Retrieving Responses from Perplexity

When the user asks a question, we’ll use the function ask_perplexity_api to query the Perplexity API and retrieve the response. We’ll then send the answer back to the user.

Here’s an example of how to implement this:

@app.route('/ask', methods=['POST'])
def ask_question():
    question = request.json.get('question')
    pdf_path = request.json.get('pdf_path')
    
    if not question or not pdf_path:
        return jsonify({"error": "Missing question or PDF path"}), 400

    # Process the PDF to extract both text and OCR content
    full_text = process_pdf_and_get_answer(pdf_path, question)
    
    # Get the answer from Perplexity API
    answer = ask_perplexity_api(question, full_text)

    return jsonify({"question": question, "answer": answer}), 200

In this example:

  • /ask: This route handles user queries. It accepts a question and the PDF path in the JSON body.
  • The function then processes the PDF, combines the extracted text and OCR content, and sends the combined text to Perplexity API to get an answer.
  • Finally, it returns the answer in the response.

5.5. Parsing and Formatting the Response

The Perplexity API typically returns a JSON response with the answer to the question. In our code, we access the answer like this:

answer = response.json()['answer']

You can further format this response as needed, depending on how you want to display it to the user. You may want to clean the response or handle special cases (e.g., if the API doesn’t find an answer).

6. Handling QA Bot Responses

In this section, we will focus on refining how the PDF QA Bot handles user queries and delivers meaningful responses. We’ll discuss how to:

  1. Structure the user query to ensure it’s well-formed.
  2. Extract relevant information from the uploaded PDF.
  3. Format the response in a way that’s useful to the user.

The goal here is to make sure the interaction between the user and the bot feels natural and provides useful, accurate answers based on the content extracted from the PDF.


6.1. Structuring the User Query

When users interact with the bot, they will send a question related to the content in the PDF. It’s important to ensure the question is structured correctly and is relevant to the document being queried.

For example, if a user uploads a contract PDF, they might ask:

  • “What is the termination clause?”
  • “What is the payment schedule?”

To handle this, we’ll ensure that the question is parsed and structured for optimal understanding by the Perplexity API. One way to do this is to sanitize the query and remove unnecessary parts before sending it for processing.

Here’s how you can sanitize and preprocess the query:

def preprocess_query(query):
    # Basic preprocessing: remove unnecessary punctuation or keywords
    cleaned_query = query.strip().lower()
    return cleaned_query

This function ensures that:

  • The query is trimmed and doesn’t have extra spaces.
  • It’s lowercased to make sure the case doesn’t affect the response.
  • You can add more complex preprocessing (e.g., removing stop words) depending on the complexity of your bot.

6.2. Extracting Relevant Information from the User Query

Once the query is preprocessed, we’ll need to determine which section of the PDF the question pertains to. This is a more advanced step that could involve NLP (Natural Language Processing) techniques. However, for simplicity, we will rely on the Perplexity API to handle this for us, as it is designed to extract relevant answers directly from large documents.

The Perplexity API is capable of analyzing large chunks of text and providing an answer that best fits the query. We don’t need to manually extract specific sections, as the API will handle this based on the full text we provide. However, you can enhance this by adding additional layers of context or filtering based on keywords found in the query.

For example, if the query contains a keyword like “payment”, we could prioritize sections of the PDF that discuss payments:

def filter_query_by_keywords(query, full_text):
    keywords = ['payment', 'date', 'contract', 'termination']  # Example keywords
    relevant_text = ""
    for keyword in keywords:
        if keyword in query:
            relevant_text = full_text  # This could be customized to return sections with the keyword
            break
    return relevant_text if relevant_text else full_text

This basic function checks whether any keywords are present in the query and returns the relevant portion of the PDF. It’s a simple way to give the bot context, though more complex approaches (e.g., NLP-based) could provide better results.

6.3. Mapping User Queries to PDF Content (with OCR Data)

If the PDF contains both text-based content and OCR-based content, the query handling function will need to consider both sources. Since Perplexity API can handle both, we will combine both sources of information (text and OCR) and pass them as one unified document to the API.

Here’s an updated approach for combining text and OCR content:

def process_query_and_get_answer(pdf_path, query):
    # Extract text from the PDF
    extracted_text = extract_pdf_text(pdf_path)
    
    # Check if the PDF contains images and extract OCR text if needed
    ocr_text = extract_ocr_text_from_pdf(pdf_path) if contains_images(pdf_path) else ""
    
    # Combine both text and OCR content
    full_text = extracted_text + " " + ocr_text
    
    # Filter the combined text based on the query
    relevant_text = filter_query_by_keywords(query, full_text)
    
    # Send the relevant text to Perplexity for answering the question
    answer = ask_perplexity_api(query, relevant_text)
    
    return answer

In this example:

  • The function extracts text and OCR content.
  • It filters the combined content based on the query (in this case, using keywords).
  • The filtered text is sent to Perplexity API for answering.

6.4. Providing Contextual Responses

Once the answer is retrieved from Perplexity API, the next step is to ensure that the response is presented to the user in a clear and informative way.

For instance:

  • If the response from Perplexity is very specific, you might want to include some context or additional details about where the answer was found in the PDF (e.g., which page or section).

You can format the response by including:

  • The answer itself.
  • References to specific sections (e.g., page numbers or headings) from the PDF.

Here’s how to include additional context in the response:

def format_response(answer, pdf_path):
    # Placeholder for getting context (like page numbers, headings)
    context = get_pdf_context(pdf_path, answer)
    
    formatted_response = {
        "answer": answer,
        "context": context  # You can add more details like "page numbers" or "section headings"
    }
    
    return formatted_response

def get_pdf_context(pdf_path, answer):
    # Placeholder for extracting the context in which the answer was found
    # You can use more advanced methods like text matching or manual tagging for this
    return f"Found on page 2 (Section: Terms and Conditions)"  # Example context

In this example, the response is enhanced by adding context about where the answer was located in the PDF.


7. Error Handling and Debugging

In any application, especially one that interacts with external services and processes potentially large files, proper error handling is crucial. We need to ensure that the PDF QA Bot is robust and can gracefully handle various issues, such as missing files, corrupt PDFs, or API failures.

This section will cover common issues and how to handle them effectively, ensuring a smooth user experience.


7.1. Common Errors in PDF Parsing

When working with PDFs, especially those containing complex layouts or images, several issues might arise during text extraction. We will address these common errors:

7.1.1. Issues with Corrupt PDFs

Sometimes, users might upload PDFs that are corrupt or incompatible. If this happens, we need to catch the error and inform the user that the file cannot be processed.

Here’s how you can handle corrupt PDFs during the text extraction process:

from PyPDF2.utils import PdfReadError

def extract_pdf_text(pdf_path):
    try:
        with open(pdf_path, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            text = ""
            for page in reader.pages:
                text += page.extract_text()  # Extract text from each page
        return text
    except PdfReadError:
        return "Error: Unable to read the PDF. It may be corrupt or invalid."

This code catches the PdfReadError exception raised by PyPDF2 when a PDF cannot be read, providing the user with an appropriate error message.

7.1.2. OCR Errors and Misread Text

OCR-based errors can also occur when extracting text from images. OCR is prone to errors if the image quality is poor or the text is unclear. Perplexity API generally handles OCR reasonably well, but you may want to add fallback logic if OCR extraction fails.

For example:

def extract_ocr_text_from_pdf(pdf_path):
    try:
        ocr_text = send_image_for_ocr(pdf_path)  # Function for sending images to Perplexity API
        if not ocr_text:
            return "Error: OCR extraction failed. The text might not be readable."
        return ocr_text
    except Exception as e:
        return f"Error during OCR processing: {str(e)}"

This ensures that if OCR fails, the bot provides feedback to the user rather than crashing.


7.2. Handling Empty or Invalid PDF Files

We also need to handle situations where the uploaded file is empty or not a PDF file at all. This can be handled in the upload PDF route.

Here’s how you can check for file validity before processing:

@app.route('/upload', methods=['POST'])
def upload_pdf():
    if 'pdf' not in request.files:
        return jsonify({"error": "No file part"}), 400
    
    pdf_file = request.files['pdf']
    
    if pdf_file.filename == '':
        return jsonify({"error": "No selected file"}), 400
    
    if not pdf_file.filename.endswith('.pdf'):
        return jsonify({"error": "Invalid file format. Please upload a PDF."}), 400
    
    # Proceed with saving and processing the file
    filepath = os.path.join(app.config['UPLOAD_FOLDER'], pdf_file.filename)
    pdf_file.save(filepath)
    
    extracted_text = extract_pdf_text(filepath)
    if extracted_text == "Error: Unable to read the PDF. It may be corrupt or invalid.":
        return jsonify({"error": "The uploaded PDF is corrupt or invalid."}), 400
    
    return jsonify({"message": "PDF uploaded successfully", "extracted_text": extracted_text}), 200

This code:

  • Checks if the file is empty or has the wrong file type.
  • Catches errors related to corrupt PDFs during extraction.

7.3. Error Handling in Flask Routes

Since Flask interacts with external services (like Perplexity API), we need to ensure that the bot doesn’t break if something goes wrong with the API call. For example, if the API is down or the response is malformed, we should catch that and return a user-friendly error.

Here’s an example of handling Perplexity API errors:

def ask_perplexity_api(question, full_text):
    try:
        url = "https://api.perplexity.ai/query"
        headers = {
            "Authorization": f"Bearer {PERPLEXITY_API_KEY}",
            "Content-Type": "application/json"
        }
        data = {
            "query": question,
            "document": full_text
        }
        
        response = requests.post(url, headers=headers, json=data)
        
        if response.status_code == 200:
            return response.json()['answer']
        else:
            return f"Error: Received unexpected status code {response.status_code} from Perplexity API."
    except requests.exceptions.RequestException as e:
        return f"Error: Unable to connect to Perplexity API. {str(e)}"

In this example:

  • We use try-except to catch request errors and API failures.
  • If the API returns an unexpected status code, we return that in the response.
  • If the request fails entirely (e.g., due to network issues), we inform the user.

7.4. Debugging the Flask Application

Sometimes, issues may arise that are hard to debug, such as unexpected crashes or strange behavior. Here are some debugging tips to help troubleshoot your Flask app:

7.4.1. Using Flask’s Built-In Debug Mode

When developing locally, always run Flask in debug mode. This will provide detailed error messages, stack traces, and logs to help diagnose issues.

To enable debug mode, add this line in your app.py:

app.run(debug=True)

Flask will display error details in the browser if something goes wrong.

7.4.2. Logging Errors

For more advanced error tracking, you can use Python’s logging module to record detailed logs of errors. This is useful for production environments.

import logging

logging.basicConfig(filename='app.log', level=logging.DEBUG)

@app.route('/upload', methods=['POST'])
def upload_pdf():
try:
# Your upload logic here
pass
except Exception as e:
logging.error(f"Error during PDF upload: {str(e)}")
return jsonify({"error": "Internal server error. Please try again later."}), 500

This will log detailed error messages to a file (app.log), which can be useful for tracking down issues in production.

8. User Interface Design (Optional)

In this section, we’ll discuss how to design a simple web interface where users can upload PDFs and ask questions, interacting with the PDF QA Bot that we’ve built using Flask and Perplexity API. While the backend logic is important, the user experience (UI) can greatly enhance how users interact with the bot.

We’ll focus on:

  1. Creating a basic HTML form for PDF uploads.
  2. Allowing users to input queries.
  3. Displaying the bot’s responses.

8.1. Creating a Simple Frontend with HTML and JavaScript

We’ll begin by creating a basic HTML page with two main features:

  • A form to upload the PDF.
  • A text input to ask questions.

Here’s an example of the basic structure for the frontend (you can place this in templates/index.html if using Flask):

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>PDF QA Bot</title>
</head>
<body>
    <h1>PDF QA Bot</h1>

    <!-- Form for PDF Upload -->
    <h3>Upload a PDF</h3>
    <form id="pdf-upload-form" enctype="multipart/form-data">
        <input type="file" name="pdf" accept="application/pdf" required>
        <button type="submit">Upload PDF</button>
    </form>

    <!-- Form for asking questions -->
    <h3>Ask a Question</h3>
    <input type="text" id="question-input" placeholder="Enter your question" required>
    <button onclick="askQuestion()">Ask</button>

    <div id="response-container">
        <h3>Response</h3>
        <p id="response-text">Your answer will appear here.</p>
    </div>

    <script>
        let pdfPath = '';

        // Handle PDF upload and store the file path for later use
        document.getElementById('pdf-upload-form').addEventListener('submit', function (event) {
            event.preventDefault();
            const formData = new FormData();
            formData.append('pdf', document.querySelector('input[type="file"]').files[0]);

            fetch('/upload', {
                method: 'POST',
                body: formData,
            })
            .then(response => response.json())
            .then(data => {
                if (data.extracted_text) {
                    pdfPath = data.extracted_text; // Save the file path
                    alert('PDF uploaded and text extracted!');
                } else {
                    alert('Error uploading PDF');
                }
            })
            .catch(error => {
                console.error('Error uploading PDF:', error);
                alert('Error uploading PDF');
            });
        });

        // Function to ask a question
        function askQuestion() {
            const question = document.getElementById('question-input').value;
            if (!question) {
                alert('Please enter a question.');
                return;
            }

            fetch('/ask', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                },
                body: JSON.stringify({ question: question, pdf_path: pdfPath }),
            })
            .then(response => response.json())
            .then(data => {
                if (data.answer) {
                    document.getElementById('response-text').innerText = data.answer;
                } else {
                    document.getElementById('response-text').innerText = 'Sorry, no answer found.';
                }
            })
            .catch(error => {
                console.error('Error asking question:', error);
                document.getElementById('response-text').innerText = 'Error contacting the server.';
            });
        }
    </script>
</body>
</html>

8.2. Integrating the Frontend with the Flask API

The frontend interacts with the Flask backend through two main operations:

  • PDF upload: The user uploads a PDF, and the backend handles the file processing, extracting text (and OCR if necessary).
  • Asking a question: Once the PDF is uploaded, the user can enter a question, which the backend sends to Perplexity API for answering.

The form submits the PDF to the /upload route, and the question is sent to the /ask route.

Here’s the updated Flask routes to handle the frontend requests:

from flask import Flask, request, jsonify
import os
import PyPDF2

app = Flask(__name__)

UPLOAD_FOLDER = 'uploads'
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER

# Route to upload PDF
@app.route('/upload', methods=['POST'])
def upload_pdf():
    if 'pdf' not in request.files:
        return jsonify({"error": "No file part"}), 400
    
    pdf_file = request.files['pdf']
    
    if pdf_file.filename == '':
        return jsonify({"error": "No selected file"}), 400
    
    if not pdf_file.filename.endswith('.pdf'):
        return jsonify({"error": "Invalid file format. Please upload a PDF."}), 400
    
    filepath = os.path.join(app.config['UPLOAD_FOLDER'], pdf_file.filename)
    pdf_file.save(filepath)
    
    extracted_text = extract_pdf_text(filepath)
    
    return jsonify({"message": "PDF uploaded successfully", "extracted_text": filepath}), 200

# Route to ask a question
@app.route('/ask', methods=['POST'])
def ask_question():
    question = request.json.get('question')
    pdf_path = request.json.get('pdf_path')
    
    if not question or not pdf_path:
        return jsonify({"error": "Missing question or PDF path"}), 400

    full_text = process_pdf_and_get_answer(pdf_path, question)
    answer = ask_perplexity_api(question, full_text)

    return jsonify({"question": question, "answer": answer}), 200

# Extract text from PDF
def extract_pdf_text(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = ""
        for page in reader.pages:
            text += page.extract_text()
    return text

if __name__ == '__main__':
    app.run(debug=True)

8.3. Uploading PDFs through the Web Interface

In the HTML form, the PDF file is uploaded using the FormData API and sent to the Flask server using a POST request to the /upload endpoint. The server processes the PDF, extracts the text, and sends back a response.

8.4. Displaying Answers to Users

Once the user asks a question, the question-answering process is triggered:

  • The user’s question and PDF path are sent to the /ask route.
  • The Perplexity API processes the question and provides an answer, which is displayed in the browser.

The answer is dynamically inserted into the HTML page via JavaScript, giving a seamless interaction experience.


9. Testing and Optimizing the PDF QA Bot

After building the PDF QA Bot, it’s crucial to test the functionality thoroughly and ensure it performs optimally. This step will help identify any issues and areas where we can improve performance, especially when dealing with larger PDFs or more complex queries.

In this section, we’ll cover:

  1. Unit testing the Flask API.
  2. Testing OCR accuracy.
  3. Fine-tuning Perplexity API queries.
  4. Optimizing performance for large PDFs.

9.1. Unit Testing the Flask API

Unit testing is essential to ensure that each part of the bot is working correctly. We can write tests for the Flask API endpoints to verify that the PDF upload, question submission, and PDF processing are working as expected.

To perform unit testing in Flask, you can use the unittest library. Here’s how you can write tests for the /upload and /ask routes:

  1. Create a test file, e.g., test_app.py:
import unittest
from app import app
import os

class PDFQABotTestCase(unittest.TestCase):
    
    def setUp(self):
        # Create a test client
        self.client = app.test_client()
        self.client.testing = True
        
        # Create a test directory for uploading files
        if not os.path.exists('uploads'):
            os.makedirs('uploads')

    def tearDown(self):
        # Clean up after tests (e.g., remove any uploaded files)
        for filename in os.listdir('uploads'):
            os.remove(os.path.join('uploads', filename))

    def test_upload_pdf(self):
        """Test uploading a PDF file"""
        with open('sample.pdf', 'rb') as f:
            response = self.client.post('/upload', data={'pdf': f})
        
        self.assertEqual(response.status_code, 200)
        self.assertIn('PDF uploaded successfully', response.get_json()['message'])
        
    def test_invalid_pdf_format(self):
        """Test uploading a non-PDF file"""
        with open('sample.txt', 'w') as f:
            f.write("This is a sample text file.")
        
        with open('sample.txt', 'rb') as f:
            response = self.client.post('/upload', data={'pdf': f})
        
        self.assertEqual(response.status_code, 400)
        self.assertIn('Invalid file format. Please upload a PDF.', response.get_json()['error'])
    
    def test_ask_question(self):
        """Test submitting a question"""
        data = {
            'question': 'What is the termination clause?',
            'pdf_path': 'uploads/sample.pdf'  # Assuming the PDF is already uploaded
        }
        
        response = self.client.post('/ask', json=data)
        
        self.assertEqual(response.status_code, 200)
        self.assertIn('answer', response.get_json())
        
if __name__ == '__main__':
    unittest.main()

In this file:

  • setUp: Sets up the test client and creates a test directory for storing uploaded files.
  • tearDown: Cleans up after each test by deleting any files uploaded during testing.
  • Test cases:
    • test_upload_pdf: Verifies that uploading a PDF returns a success message.
    • test_invalid_pdf_format: Verifies that uploading a non-PDF file returns an error.
    • test_ask_question: Verifies that submitting a question returns an answer from the API.

To run the tests, use the command:

python -m unittest test_app.py

9.2. Testing OCR Accuracy

Since the accuracy of OCR is crucial for extracting correct information from scanned PDFs, it’s important to test the OCR process separately. Here are a few things to check:

  1. OCR Text Quality: Test different types of scanned PDFs (high-quality scans vs. low-quality scans) to evaluate how well the Perplexity API handles OCR content.
  2. Handwriting: If the PDFs contain handwritten text, check whether the OCR can accurately convert it into machine-readable text.
  3. Text Layout: Ensure that the text is extracted correctly from images (e.g., correct order of text and proper handling of multi-column documents).

For Perplexity API, you can test OCR by uploading PDFs that contain images or scanned text and checking the results returned by the API.


9.3. Fine-Tuning Perplexity API Queries

To get the best responses from the Perplexity API, fine-tuning your queries can improve the relevance and accuracy of the answers. Here are a few tips:

  1. Preprocessing User Queries:
    • As mentioned earlier, preprocess user queries by cleaning up unnecessary punctuation or keywords.
    • Ensure the query is clear and concise. Avoid ambiguous or overly broad questions.
  2. Optimizing the Input Text:
    • When sending the document to Perplexity, ensure that the text is concise. Large documents can sometimes overwhelm the model, so break them into smaller sections if needed.
    • Consider filtering out irrelevant sections of the text (using keywords or headings) to provide the model with more context.

Example of query optimization:

def preprocess_query(query, full_text):
    # Remove unnecessary keywords (e.g., "What is", "Tell me about")
    cleaned_query = query.strip().lower()
    
    # Focus on relevant sections of the text
    relevant_text = filter_query_by_keywords(cleaned_query, full_text)
    return relevant_text

3. Handling Uncertainty:

  • If the answer is unclear or too vague, implement fallback logic that asks the user for clarification.

9.4. Optimizing Performance for Large PDFs

Handling large PDFs efficiently is key to ensuring the bot runs smoothly, especially when dealing with multi-page documents.

Here are some tips for optimizing performance:

  1. Pagination:
    • If the PDF is too large, consider splitting it into sections or pages and processing each page separately. You can then send smaller chunks of text to Perplexity for processing.
  2. Asynchronous Processing:
    • For large PDFs, use background jobs to handle the processing asynchronously. Flask’s built-in support for synchronous operations might cause delays with large files, so using tools like Celery can help offload the work and keep the app responsive.
  3. Limit File Size:
    • Restrict the maximum file size for uploads to avoid processing massive PDFs that could slow down or crash the server.

Here’s how you can restrict the file size in Flask:

app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024  # 16 MB

It looks like we have covered the main technical aspects of building the PDF QA Bot, including the backend setup, integrating with the Perplexity API, testing, and optimizing. The next logical step would be to focus on Deployment and making the bot available to users in a production environment.

However, as per your earlier request, we omitted deployment-related topics. So, if you’re ready to wrap up the development phase, we could:

10. Conclusion

In this section, we’ll summarize the entire process of building and deploying the PDF QA Bot. The goal is to give readers a sense of closure and offer suggestions for future improvements.


10.1. Recap of the Steps

Throughout this tutorial, we’ve:

  1. Set up the development environment with Flask, Perplexity API, and other necessary libraries.
  2. Built the Flask API for uploading PDFs and interacting with the Perplexity API.
  3. Extracted text from PDFs using PyPDF2, handled OCR processing, and passed the content to Perplexity for question answering.
  4. Created a simple frontend to allow users to upload PDFs, ask questions, and receive answers.
  5. Tested and optimized the bot for performance and reliability, ensuring smooth handling of large PDFs and robust error handling.

10.2. Enhancing the PDF QA Bot with New Features

There are several ways you can take this bot further:

  • Multi-language support: If your PDFs contain multiple languages, you could integrate language detection and translation APIs to handle diverse content.
  • Advanced search capabilities: Implement features like highlighting answers in the PDF or displaying related sections when a user asks a question.
  • User feedback: Allow users to provide feedback on the quality of answers, helping the model improve over time.

10.3. Final Thoughts and Next Steps

With the PDF QA Bot now built, you can deploy it for real-world use or further enhance its capabilities. Whether you choose to deploy it locally or in the cloud, the foundation for creating an intelligent, interactive chatbot based on PDF documents is already in place.

You can also explore integrating additional NLP models, chatbot frameworks, or web frameworks to take this bot to the next level.


Final Notes

Thank you for following this step-by-step guide! You’ve built a powerful PDF QA Bot capable of answering questions based on PDF documents, using Flask and Perplexity API. By testing and optimizing the bot, you’ve ensured that it will work efficiently in a production environment.

Categorized in: