Skip to main content

Gemini AI Integration Explained


Table of Contents

  1. Gemini AI Overview
  2. API Call Architecture
  3. Prompt Engineering
  4. Retry Mechanism and Error Handling
  5. Token Usage and Cost Control
  6. Image Generation Integration

Gemini AI Overview

What is Google Gemini?

Google Gemini is a large language model (LLM) developed by Google that can understand and generate natural language text. In this project, Gemini plays the role of an "intelligent analyst," responsible for:

  1. Data Analysis: Understanding structured data in Excel or Google Sheets
  2. Insight Generation: Generating professional market analysis insights based on data
  3. Content Creation: Generating slide copy, persona descriptions, strategic recommendations, etc.
  4. Image Generation: Generating scenario illustrations through Imagen 4.0

Why Choose Gemini?

  1. Google Ecosystem Integration: As a Google internal project, using Gemini can enjoy better technical support and resources
  2. Multimodal Capabilities: Supports text generation and image generation, meeting diverse project needs
  3. API Stability: Google-provided API services are stable and reliable, suitable for production use
  4. Controllable Costs: Compared to other AI services, Gemini's pricing strategy is more suitable for enterprise internal use

Use Cases in This Project

  • Market Opportunity Analysis: Analyze market data, generate four-quadrant analysis and strategic recommendations
  • Audience Signal Analysis: Generate audience tag recommendations and persona paragraphs based on user behavior data
  • Persona Slide: Parse and translate persona data, generate slide content and image descriptions

API Call Architecture

Frontend-Backend Separation Design

The system adopts a frontend-backend separation architecture. Gemini API calls are completely performed on the backend, with the frontend only responsible for sending requests and displaying results.

Design Advantages:

  • Security: API Keys are not exposed in frontend code
  • Unified Management: All AI call logic is centrally managed, facilitating maintenance and optimization
  • Error Handling: Backend can implement unified retry and error handling mechanisms

Call Workflow

sequenceDiagram
participant FE as Frontend (React)
participant BE as Backend (Node.js)
participant GA as Google Gemini API

FE->>BE: 1. Send analysis request<br>(contains data, Prompt, config)
BE->>BE: 2. Validate request parameters
BE->>BE: 3. Build Gemini API request
BE->>GA: 4. Call Gemini API<br>(with retry mechanism)
GA-->>BE: 5. Return AI-generated results
BE->>BE: 6. Parse and validate results
BE-->>FE: 7. Return structured data
FE->>FE: 8. Render results to page

Backend API Endpoints

The system provides the following backend API endpoints:

1. Text Generation API

Endpoint: POST /api/gemini/generate

Request Parameters:

{
model: string; // Gemini model name, e.g., "gemini-2.5-pro"
prompt: string; // Prompt (contains task description and data)
temperature?: number; // Temperature parameter (0-1), controls output randomness
maxTokens?: number; // Maximum output token count
responseMimeType?: string; // Response format, e.g., "application/json"
}

Response Format:

{
text: string; // AI-generated text content
tokenUsage?: { // Token usage statistics
promptTokenCount?: number;
candidatesTokenCount?: number;
totalTokenCount?: number;
};
executionTime: number; // Execution time (milliseconds)
retryCount: number; // Retry count
}

2. Image Generation API

Endpoint: POST /api/gemini/generate-image

Request Parameters:

{
model: string; // Image generation model, e.g., "imagen-4.0"
prompt: string; // Image description prompt
aspectRatio?: string; // Aspect ratio, e.g., "1:1"
personGeneration?: string; // Person generation mode
}

Response Format:

{
imageBase64: string; // Base64-encoded image data
tokenUsage?: { // Token usage statistics
promptTokenCount?: number;
candidatesTokenCount?: number;
totalTokenCount?: number;
};
executionTime: number; // Execution time (milliseconds)
retryCount: number; // Retry count
}

Frontend Call Wrapper

The frontend uses backendGeminiService.ts to wrap all backend API calls, providing a unified interface and error handling.

Core Functions:

// Text generation (with retry)
export async function callBackendGeminiWithRetry(
request: BackendGeminiRequest,
maxRetries: number = 3,
onRetry?: (attempt: number, maxRetries: number, delay: number) => void
): Promise<BackendGeminiResponse>

// Image generation (with retry)
export async function callBackendGeminiImageGenerationWithRetry(
request: BackendGeminiImageRequest,
maxRetries: number = 3,
onRetry?: (attempt: number, maxRetries: number, delay: number) => void
): Promise<BackendGeminiImageResponse>

Prompt Engineering

What is Prompt Engineering?

Prompt Engineering refers to designing and optimizing instructions (Prompts) input to AI models to obtain the best output results. In this project, Prompt quality directly determines the quality and accuracy of AI-generated content.

A good Prompt is not simply asking the AI a question, but rather setting a role, a detailed action guide, and a clear output format for it. It's like writing an "action script" for the AI.

Prompt Design Principles

1. Clear Task Objectives

Prompts should clearly state the task the AI needs to complete, including:

  • Role Definition: What role the AI plays (e.g., "Senior Google Digital Marketing Strategist")
  • Task Description: What specific task to complete
  • Output Requirements: Expected output format and content

Example (Market Opportunity Analysis):

You are a senior Google Digital Marketing Strategist (DMS), conducting a complete "Market Opportunity Analysis."

Your task is to identify business opportunities for different markets (countries/regions) across different categories based on the provided market data, and generate a professional analysis report.

Output Requirements:
1. Must output valid JSON format
2. Include market classification (Opportunity Market, Challenge Market, Potential Market, Blue Ocean Market)
3. Each market includes priority analysis and strategic recommendations

2. Provide Context Information

Prompts should contain sufficient context information to help the AI understand data meaning and business background.

Example:

Industry Name: ${industryName}
Analysis Period: ${popPeriod}
Output Language: ${outputLanguage}

Data Field Descriptions:
- CPC: Cost per click, higher values indicate more intense competition
- Clicks: Click volume, indicates market demand intensity
- Median Demand: Median market demand
- Median Competition: Median market competition

3. Define Output Format

Clearly specify output format to ensure AI-generated content can be correctly parsed by programs.

Example (JSON format requirements):

Output Format Requirements:
- Must output a valid JSON object (not an array)
- Must include the following fields:
* subtitle: string (subtitle)
* priority1: { title: string, description: string, highlights: string[], strategies: string[] }
* priority2: { title: string, description: string, highlights: string[], strategies: string[] }
- All string fields cannot be empty
- JSON format must be correct, can be parsed by JSON.parse()

4. Provide Examples

Provide examples in Prompts to help the AI understand expected output format and content quality.

Example:

Example JSON Format:
{
"subtitle": "Based on high demand, low competition market characteristics, we identify the following opportunity markets...",
"priority1": {
"title": "First Priority",
"description": "Based on high demand, low competition market characteristics",
"highlights": ["US market performs outstandingly in electronics category", "CPC relatively low, less competitive pressure"],
"strategies": ["Increase ad investment", "Optimize keyword strategy"]
}
}

Real Case: Market Opportunity Analysis Prompt Design

In the "Market Opportunity Analysis" feature, we designed two core Prompts, used for data processing and chart summarization respectively. Let's break down the design insights of these Prompts in detail.

Prompt 1: Data Processing and Strategic Recommendation Generation

This Prompt is the first and most critical step of the entire feature. Its core task is to transform user-uploaded raw data into structured, insightful analysis results and generate preliminary strategic recommendations.

Design Point Breakdown:

  1. Role Playing:

    You are a Google Ads marketing strategy consultant. Your task is to analyze the provided market data to help clients determine country expansion priorities.

    • Insight: We first set a clear identity for the AI—a Google Ads marketing strategy consultant. This is not just a title; it effectively activates the AI model's internal knowledge about marketing, data analysis, and business strategy, making its responses more professional and closer to our business scenarios.
  2. Input Data Description:

    Input data is a JSON object array... "Clicks" represents market demand size... "CPC" represents market competition intensity.

    • Insight: We explicitly tell the AI what format of data it will receive and the business meaning of each key field (such as Clicks, CPC). This avoids AI misunderstanding of data and ensures analysis accuracy.
  3. Strict, Step-by-Step Action Instructions:

    Please strictly follow these steps... 1. Filter out all rows with "All selected query entities". 2. For each unique "Query Set Name"... a. First, sort by "Clicks" from high to low, find the top 50 countries for this category. b. Based on data from these top 50 countries, calculate ... median. c. Next ... select the top \${topN} countries with highest "Clicks". d. For each of these top \${topN} countries, create a new JSON object...

    • Insight: This is the core of the entire Prompt. We don't give the AI a vague "please help me analyze" instruction, but break down the senior analyst's complete analysis workflow into a series of machine-executable, unambiguous steps. This includes data cleaning, calculating benchmarks (medians), filtering core targets, and finally generating structured output. This workflow itself is the team's valuable knowledge asset.
  4. Define Analysis Model (Four Quadrants):

    - "Market Type": String generated based on the above two indicators: - High Competition=1 & High Demand=1 -> "Opportunity Market" - High Competition=1 & High Demand=0 -> "Challenge Market" - High Competition=0 & High Demand=0 -> "Potential Market" - High Competition=0 & High Demand=1 -> "Blue Ocean Market"

    • Insight: Here we define our own analysis model. By having the AI calculate whether each country's demand and competition levels are above the median, we teach the AI how to use the classic "four-quadrant analysis method" to classify markets. This is a typical example of endowing machines with human business wisdom.
  5. Structured Output Requirements:

    4. Finally, provide a general strategic recommendation of approximately 350-500 words in \${outputLanguage}. The strategic recommendation must be strictly divided into two paragraphs, format as follows... **First Paragraph (Conclusion Content...)**:... **Second Paragraph (Data Support Description...)**:...

    • Insight: We not only require the AI to analyze but also impose extremely strict requirements on its output format. We require it to generate structured text, containing "conclusion" and "data support" parts, with detailed regulations on word count and content points for each part. This ensures AI-generated content can be directly used for report creation, rather than disorganized text. We even consider the special nature of the "India market" and write it into the Prompt as a rule, which reflects deep embedding of domain knowledge.

Prompt 2: Chart Summary Generation

When the first step's data processing is complete, the frontend generates a visual bubble chart based on the results. The second Prompt's task is to let the AI "understand" this chart and write insightful, business-focused text summaries for it.

Design Points:

  • Clear Role: You are a professional Google Ads market strategist...
  • Strict Output Format: Requires outputting a JSON object containing fields like slideTitle, priority1_highlight, etc., allowing the frontend to directly read and fill into corresponding positions in the slide template.
  • Fine-Grained Rule Constraints:
    • Strictly prohibit any specific numbers in analysis: Forces the AI to use qualitative, more business-insightful language (such as "above median," "strong growth") for description, rather than simply restating numbers.
    • Analysis about India: Again emphasizes the special nature of the India market, ensuring AI recommendations are both data-based and aligned with real business environments.
    • Clear Priority Definition: We directly tell the AI how to define "first priority" and "second priority" markets based on data positions in different quadrants, and provide targeted strategic recommendation directions.

Design Insight Summary:

Through the above breakdown, you can see that CSA 3A (InsightHub)'s "intelligence" is not baseless. It originates from the project founder systematically "encoding" and "teaching" their professional experience, analysis frameworks, and domain knowledge as a data strategist to Gemini AI through Prompt Engineering in this new way.

The thinking, design, and repeated debugging behind this is the core of the project's ability to produce high-quality, standardized reports, and is also the project's biggest technical highlight and value. This also provides a clear direction for our future collaboration: When we need to optimize analysis logic or add new analysis dimensions, many times what we first need to optimize are these Prompts that drive the AI brain.

Prompt Template Management

The system uses the PromptManager feature to manage Prompt templates, supporting:

  1. Preset Templates: Default templates provided by the system
  2. Custom Templates: Users can create and edit their own templates
  3. Template Version Management: Records template creation and update times
  4. Template Sharing: Team members can share templates

Template Storage Format (YAML):

id: geo-analysis-preset-default-zh
name: Market Opportunity Analysis Template (Simplified Chinese)
workflowType: geo-analysis
isPreset: true
creatorName: System
createdAt: '2025-01-01T00:00:00.000Z'
prompts:
dataProcessingPrompt:
editable: true
content: |
You are a senior Google Digital Marketing Strategist...
chartSummaryPrompt:
editable: true
content: |
Based on the following market data, generate a professional analysis report...

Retry Mechanism and Error Handling

Why Do We Need a Retry Mechanism?

Gemini API calls may fail for the following reasons:

  • Service Overload: API service temporarily unavailable (503 error)
  • Rate Limiting: Request frequency exceeds limit (429 error)
  • Network Issues: Unstable network connection or timeout
  • Temporary Errors: Temporary server-side failures

Retry mechanisms can automatically handle these temporary errors, improving system stability and user experience.

Error Classification

The system classifies errors into the following categories:

Error TypeHTTP Status CodeRetryableRetry Delay
Service Overload503✅ Yes5 seconds
Rate Limiting429✅ Yes10 seconds
Service UnavailableUNAVAILABLE✅ Yes3 seconds
Network ErrorECONNREFUSED / ETIMEDOUT✅ Yes2 seconds
Other ErrorsOther❌ No-

Exponential Backoff Strategy

The system uses an exponential backoff strategy to avoid putting excessive pressure on the API service.

Backoff Algorithm:

  • Initial Delay: 2 seconds
  • Each Retry: Delay time doubles (2 seconds → 4 seconds → 8 seconds → ...)
  • Maximum Retries: 10 times (text generation) or 3 times (image generation)

Implementation Code (server/geminiService.ts):

export async function callGemini(
config: GeminiCallConfig,
apiKey: string,
maxRetries: number = 10,
initialDelay: number = 2000,
context?: { sessionUuid?: string; featureType?: string; userId?: string }
): Promise<GeminiCallResult> {
let attempt = 0;
let delay = initialDelay;
const startTime = Date.now();

while (attempt < maxRetries) {
try {
// Call Gemini API
const response = await ai.models.generateContent({...});
return { text: response.text, ... };
} catch (error: any) {
attempt++;
const errorInfo = classifyError(error);

// If not retryable or reached max retries, throw error
if (!errorInfo.retryable || attempt >= maxRetries) {
throw new GeminiCallError(...);
}

// Wait then retry (exponential backoff)
const retryAfter = errorInfo.retryAfter || delay;
await new Promise((resolve) => setTimeout(resolve, retryAfter * 1000));
delay *= 2; // Double delay time
}
}
}

Error Logging

The system records detailed error logs for problem diagnosis and performance optimization.

Log Format (conforming to Google Cloud Run's JSON log format):

{
"timestamp": "2025-01-XXT12:00:00.000Z",
"severity": "WARNING",
"message": "Gemini API call failed (attempt 2/10)",
"model": "gemini-2.5-pro",
"sessionUuid": "abc123",
"featureType": "geo-analysis",
"attempt": 2,
"error": "Service temporarily unavailable",
"errorCode": "OVERLOADED",
"retryable": true,
"retryAfter": 5
}

Token Usage and Cost Control

What is a Token?

A Token is the basic unit for AI models to process text. A Token may be a word, a character, or a punctuation mark. Gemini API billing is based on Token usage.

Token Calculation Examples:

  • "Hello" = 1 Token
  • "Hello world" = 2 Tokens
  • "你好" = 2 Tokens (Chinese characters are usually one Token per character)

Token Usage Statistics

The system records Token usage for each API call:

interface TokenUsage {
promptTokenCount?: number; // Input Token count (Prompt)
candidatesTokenCount?: number; // Output Token count (AI-generated content)
totalTokenCount?: number; // Total Token count
}

Cost Optimization Strategies

1. Prompt Optimization

  • Streamline Prompts: Remove unnecessary descriptions and examples
  • Structured Data: Use JSON format to pass data instead of natural language descriptions
  • Batch Processing: Process large amounts of data in batches to avoid overly large single requests

2. Output Control

  • Limit Output Length: Use maxTokens parameter to limit output length
  • Specify Output Format: Use responseMimeType: "application/json" to ensure correct output format and reduce invalid output

3. Caching Mechanism

  • Result Caching: For analysis requests with the same input, results can be cached to avoid duplicate calls
  • Template Caching: Prompt templates can be cached to avoid repeated construction

Token Usage Monitoring

The system records Token usage in logs for monitoring and cost analysis:

{
"timestamp": "2025-01-XXT12:00:00.000Z",
"severity": "INFO",
"message": "Gemini API call successful",
"tokenUsage": {
"promptTokenCount": 1500,
"candidatesTokenCount": 800,
"totalTokenCount": 2300
},
"executionTime": 3500
}

Image Generation Integration

Imagen 4.0 Overview

Imagen 4.0 is an image generation model developed by Google that can generate high-quality images through text descriptions. In this project, Imagen 4.0 is used to generate scenario illustrations for personas.

Image Generation Workflow

sequenceDiagram
participant FE as Frontend
participant BE as Backend
participant IA as Imagen 4.0 API

FE->>BE: 1. Send image generation request<br>(contains image description Prompt)
BE->>BE: 2. Build image generation config
BE->>IA: 3. Call Imagen 4.0 API<br>(with retry mechanism)
IA-->>BE: 4. Return Base64-encoded image
BE->>BE: 5. Validate and optimize image
BE-->>FE: 6. Return image data
FE->>FE: 7. Display image on page

Image Generation Configuration

Request Parameters:

{
model: "imagen-4.0";
prompt: string; // Image description, e.g., "A modern gaming setup with RGB lighting..."
aspectRatio?: "1:1"; // Aspect ratio, 1:1 means square
personGeneration?: "DONT_ALLOW"; // Person generation mode
}

Image Description Prompt Design:

Image description Prompts should:

  1. Be Specific and Clear: Describe specific scene details (colors, style, elements, etc.)
  2. Meet Business Needs: Generated images should match persona characteristics
  3. Avoid Sensitive Content: Don't use descriptions that may trigger content review

Example:

Generate a professional, modern scene illustration showing:
- A young professional (age 25-34) in a casual business setting
- Modern technology devices (laptop, smartphone) visible
- Clean, minimalist design style
- Warm, inviting color palette (blues and whites)
- Professional atmosphere suitable for business presentation
- 1:1 aspect ratio, 250px x 250px resolution

Image Generation Error Handling

Image generation may fail for the following reasons:

  • Content Review: Description content triggers review mechanism
  • Service Overload: API service temporarily unavailable
  • Timeout: Generation time too long (default timeout 120 seconds)

The system will:

  1. Auto Retry: For retryable errors, automatically retry (up to 3 times)
  2. Use Placeholder: If generation fails, use placeholder image
  3. Record Errors: Record detailed error logs for problem diagnosis

Summary

This section detailed how Gemini AI is integrated in this project, including:

  1. API Call Architecture: Frontend-backend separation design, unified backend API management
  2. Prompt Engineering: How to design and optimize Prompts to obtain best output
  3. Retry Mechanism: Exponential backoff strategy and error classification handling
  4. Token Usage: Cost control and monitoring mechanisms
  5. Image Generation: Imagen 4.0 integration and image generation workflow

These technical implementations ensure the system can stably and efficiently use Gemini AI to generate high-quality analysis content.


Related Documentation: