Core Technical Implementation Breakdown

Excel File Upload Implementation
Google Sheet Link Processing Implementation
Data Parsing and Normalization
Browser Local Storage Mechanism
Slide-Style HTML Rendering and Image Export
Session ZIP Export and Import

Excel File Upload Implementation

Feature Overview

Excel file upload is one of the basic data input methods for the system's three core features. Users can select and upload Excel files in .xlsx or .xls format through the browser interface, and the system will automatically parse the file content and extract the required data worksheets.

Technical Implementation Workflow

1. Frontend File Selection Component

The system uses the FileUpload component (located at src/components/FileUpload.tsx) to handle file selection. This component is based on Semi Design's Upload component but intercepts the default network upload behavior through the customRequest function, switching to direct frontend file processing.

Core Code Logic:

// Custom upload request, no actual network request sent
const customRequest = ({ file, fileInstance, onProgress, onSuccess, onError }) => {
  try {
    // Get the real File object
    const realFile = fileInstance || (file as any)?.originFile || (file as any);
    
    // Directly call the callback function passed from parent component, passing the file object
    onFileSelect(realFile as File);

    // Simulate upload progress (for UI feedback)
    let loaded = 0;
    const total = 100;
    timerRef.current = window.setInterval(() => {
      loaded = Math.min(loaded + 20, total);
      onProgress && onProgress({ total, loaded });
      if (loaded === total) {
        clearInterval(timerRef.current);
        onSuccess && onSuccess({});
      }
    }, 200);
  } catch (e) {
    onError && onError({ status: 500 }, e as Event);
  }
};

Design Points:

No Network Upload: Files are not actually sent to the server; all processing is completed in the browser
Progress Simulation: Simulates an upload progress bar for better user experience
Direct Passing: File object is directly passed to the parent component, which decides how to handle it

2. File Reading and Parsing

After file selection is complete, the parent component (such as GeoAnalysisPage, AudienceSignalPage) calls the corresponding parsing function. The system uses the XLSX library (SheetJS) to parse Excel files.

Parsing Workflow:

File Reading: Use the browser's FileReader API to read the file as ArrayBuffer
Workbook Parsing: Use XLSX.read() to parse the ArrayBuffer into a workbook object
Worksheet Extraction: Extract specific worksheets based on feature requirements (such as sublocations, input, etc.)
Data Conversion: Use XLSX.utils.sheet_to_json() to convert worksheet data to JSON format

Core Code Example (using Market Opportunity Analysis as an example):

export const parseExcelFile = (file: File, onProgress: LogCallback): Promise<RawData> => {
  return new Promise((resolve, reject) => {
    const reader = new FileReader();

    reader.onload = (event: ProgressEvent<FileReader>) => {
      if (!event.target?.result) {
        return reject(new Error("File reading failed."));
      }

      try {
        // 1. Convert file content to Uint8Array
        const data = new Uint8Array(event.target.result as ArrayBuffer);
        
        // 2. Parse workbook using XLSX library
        const workbook = XLSX.read(data, { type: 'array', cellDates: true });
        
        // 3. Extract industry name (from cell B4 of input worksheet)
        const inputSheet = workbook.Sheets[workbook.SheetNames[0]];
        const industryName = inputSheet['B4']?.v || '';

        // 4. Find target worksheet (e.g., 'sublocations')
        const targetSheetName = workbook.SheetNames.find(name => 
          name.toLowerCase().includes('sublocation')
        ) || workbook.SheetNames[1];

        // 5. Convert worksheet data to JSON
        const dataSheet = workbook.Sheets[targetSheetName];
        const jsonData: any[] = XLSX.utils.sheet_to_json(dataSheet);

        // 6. Data normalization processing
        const locationSheet = jsonData.map(row => ({
          // Clean and convert data fields
          category: normalizeCategory(row['品类名称'] || row['Category']),
          country: row['国家名称'] || row['Country'],
          cpc: parseFloat(String(row['CPC'] || 0)),
          // ... other fields
        }));

        resolve({
          industryName,
          popPeriod: extractPopPeriod(jsonData),
          locationSheet,
        });
      } catch (error) {
        reject(new Error(`Failed to parse Excel file: ${error.message}`));
      }
    };

    reader.onerror = () => {
      reject(new Error('File reading failed'));
    };

    // Start reading file
    reader.readAsArrayBuffer(file);
  });
};

Error Handling:

File format validation: Check file extension and MIME type
Worksheet existence check: Ensure required worksheets exist
Data completeness validation: Check if required fields exist
Exception catching: Use try-catch to catch all possible parsing errors

Google Sheet Link Processing Implementation

Feature Overview

In addition to uploading local Excel files, the system also supports users pasting Google Sheet links to automatically read cloud data. This feature requires integration with Google OAuth 2.0 authentication and Google Sheets API.

Technical Implementation Workflow

1. URL Validation and Spreadsheet ID Extraction

The system uses the GoogleSheetInput component (located at src/components/GoogleSheetInput.tsx) to handle user-input Google Sheet links.

URL Format Validation:

// Validate Google Sheet URL format
export function isValidGoogleSheetsUrl(url: string): boolean {
  const pattern = /^https?:\/\/docs\.google\.com\/spreadsheets\/d\/[a-zA-Z0-9-_]+/;
  return pattern.test(url);
}

// Extract Spreadsheet ID from URL
export function extractSpreadsheetId(url: string): string | null {
  const match = url.match(/\/spreadsheets\/d\/([a-zA-Z0-9-_]+)/);
  return match ? match[1] : null;
}

Supported URL Formats:

https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit
https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit#gid=0
https://docs.google.com/spreadsheets/d/SPREADSHEET_ID

2. User Authentication and Permission Check

Before reading Google Sheet data, the system needs to:

Check User Login Status: Ensure user has logged in via Google OAuth
Verify Access Token: Check if OAuth token is valid and not expired
Check Sheet Access Permission: Verify user has permission to access the Sheet

Permission Check Implementation (src/services/googleSheetsService.ts):

export async function checkSheetAccess(
  spreadsheetId: string,
  accessToken: string
): Promise<{ hasAccess: boolean; error?: string; metadata?: any; isScopeIssue?: boolean }> {
  try {
    // Call Google Sheets API v4 to get Sheet metadata
    const response = await fetch(
      `https://sheets.googleapis.com/v4/spreadsheets/${spreadsheetId}?fields=properties,sheets`,
      {
        headers: {
          'Authorization': `Bearer ${accessToken}`,
        },
      }
    );

    if (response.status === 404) {
      return { hasAccess: false, error: 'Sheet does not exist or has been deleted' };
    }

    if (response.status === 403) {
      // Check if it's a scope issue
      const errorData = await response.json();
      const isScopeIssue = errorData.error?.message?.includes('insufficient authentication scopes');
      return { 
        hasAccess: false, 
        error: 'No access permission', 
        isScopeIssue 
      };
    }

    if (response.status === 401) {
      return { 
        hasAccess: false, 
        error: 'Access token expired, please log in again',
        isScopeIssue: false 
      };
    }

    if (!response.ok) {
      return { hasAccess: false, error: `Access failed: ${response.statusText}` };
    }

    const metadata = await response.json();
    return { hasAccess: true, metadata };
  } catch (error) {
    return { hasAccess: false, error: `Network error: ${error.message}` };
  }
}

3. Data Reading and Conversion

If permission check passes, the system reads Sheet data and converts it to Excel-compatible format.

Data Reading Workflow:

export async function readSheetAsFile(
  spreadsheetId: string,
  accessToken: string,
  onProgress?: (message: string) => void
): Promise<File> {
  // 1. Get Sheet metadata (all worksheet list)
  onProgress?.('Getting worksheet list...');
  const metadataResponse = await fetch(
    `https://sheets.googleapis.com/v4/spreadsheets/${spreadsheetId}`,
    { headers: { 'Authorization': `Bearer ${accessToken}` } }
  );
  const metadata = await metadataResponse.json();

  // 2. Create workbook using XLSX library
  const workbook = XLSX.utils.book_new();

  // 3. Iterate through each worksheet, read data
  for (const sheet of metadata.sheets) {
    const sheetName = sheet.properties.title;
    onProgress?.(`Reading worksheet: ${sheetName}...`);

    // Call Google Sheets API to read worksheet data
    const dataResponse = await fetch(
      `https://sheets.googleapis.com/v4/spreadsheets/${spreadsheetId}/values/${encodeURIComponent(sheetName)}`,
      { headers: { 'Authorization': `Bearer ${accessToken}` } }
    );
    const data = await dataResponse.json();

    // 4. Convert data to XLSX format
    const worksheet = XLSX.utils.aoa_to_sheet(data.values || []);
    
    // 5. Clean worksheet name (Excel limitation: 31 characters, cannot contain special characters)
    const safeSheetName = sanitizeExcelSheetName(sheetName);
    XLSX.utils.book_append_sheet(workbook, worksheet, safeSheetName);
  }

  // 6. Convert workbook to binary data
  const excelBuffer = XLSX.write(workbook, { type: 'array', bookType: 'xlsx' });

  // 7. Create File object (consistent with uploaded Excel file format)
  const blob = new Blob([excelBuffer], { 
    type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' 
  });
  const fileName = `${metadata.properties.title || 'sheet'}.xlsx`;
  
  return new File([blob], fileName, { 
    type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' 
  });
}

Key Design Points:

Worksheet Name Cleaning: Excel worksheet names have length limits (31 characters) and character restrictions; the system automatically cleans them
Data Format Unification: Finally converted to standard Excel file format, completely consistent with local file upload processing workflow
Progress Feedback: Real-time feedback on reading progress through onProgress callback function

4. Automatic Re-authentication Mechanism

If the user has not logged in when entering a Sheet link, the system will:

Save Pending Request: Save Sheet URL to sessionStorage
Trigger Login Flow: Guide user to complete Google OAuth login
Automatically Resume Processing: After login completes, automatically restore from sessionStorage and continue processing

Implementation Code:

// Save pending Sheet request
const rememberPendingValidationHelen = useCallback((url: string) => {
  const payload = {
    sheetUrl: url,
    timestamp: Date.now(),
  };
  sessionStorage.setItem(PENDING_SHEET_KEY_HELEN, JSON.stringify(payload));
}, []);

// Automatically resume after login completes
useEffect(() => {
  if (user?.access_token && shouldAutoValidateHelen) {
    handleValidateAndLoad();
    setShouldAutoValidateHelen(false);
  }
}, [user?.access_token, shouldAutoValidateHelen]);

Data Parsing and Normalization

Feature Overview

All data input to InsightHub, whether through Excel file upload or Google Sheet link loading, originates from Google's internal Connect Benchmark (CBX) system. CBX-exported data has a fixed structure and all-English column names. Therefore, the system has a built-in powerful and intelligent parsing and normalization process to ensure subsequent AI analysis can proceed accurately and efficiently.

This section will detail the specific implementation logic of the three core features when processing CBX data.

Market Opportunity Analysis Data Parsing

This feature needs to handle two data structures from CBX: single-category reports and multi-category reports. The system is designed to automatically identify and handle both structures.

1. Key Information Extraction

Whether single-category or multi-category, the system first extracts basic information from the first worksheet (usually named Report Definition and Compliance):

Category/Industry Name: Read from cell B4. In single-category reports, this directly serves as the main category for analysis; in multi-category reports, it serves as an overall industry name.
Analysis Period (PoP Period): Extract values from columns Start Date, End Date, Comparison Start Date, and Comparison End Date from the first row of the data worksheet (Sublocations), and format them as a standard period description string.

2. Data Worksheet Location

The system intelligently locates the worksheet containing core data:

Priority Strategy: First, the system searches for worksheets whose names contain the keyword Sublocations.
Fallback Strategy: If not found, the system defaults to using the second worksheet in the file as the data source.

This dual strategy ensures that even if worksheet names are slightly modified, the system can most likely find the correct data.

3. Single-Category vs. Multi-Category Intelligent Recognition

This is one of the core designs in this parsing process. After reading the Sublocations worksheet data, the system checks whether the first row of data contains a column named Query Set Name:

If Query Set Name column exists:
- The system determines this is a multi-category report.
- Each data row's category affiliation is determined by that row's Query Set Name column value.
- The system automatically ignores summary rows where Query Set Name is All selected query entities, only processing specific category data.
If Query Set Name column does not exist:
- The system determines this is a single-category report.
- All data rows' category affiliations uniformly use the name previously read from cell B4 of the first worksheet.

This automated recognition mechanism allows users to not worry about the specific report type exported by CBX; the system can correctly parse it, greatly improving user experience.

4. Data Normalization

Finally, the system performs type conversion and cleaning on all relevant columns (such as Geo Name, Clicks, CPC, etc.), forming a standardized dataset for subsequent AI analysis.

Audience Signal Analysis and Persona Data Parsing

These two features share the same complex data file from CBX, which typically contains dozens of worksheets. To handle this complexity, the system has designed a more intelligent and robust "whitelist"-style parsing strategy.

1. "Whitelist" Filtering Mechanism

The system does not need to process all worksheets in the file but only cares about a predefined "Required Worksheet List". Taking Audience Signal Analysis as an example, this list includes:

input
user_behavior_Prod_Affinity_Search
user_behavior_Prod_InMarket_Search
user_behavior_AppAffinities
AffinityGeminiOutput

All worksheets not on this list are automatically ignored.

2. Intelligent Worksheet Matching (`findSheetMatch`)

Since CBX-exported file names may be too long, causing automatic truncation in Excel (maximum 31 characters), or renamed when converted from Google Sheet due to special characters (such as :), simple name matching is unreliable.

To address this, we implemented a powerful findSheetMatch function that searches for each required worksheet in the following order and strategy:

Exact Match: First attempt case-insensitive exact name matching.
Cleaned Exact Match: Clean both expected and actual names according to Excel naming conventions (e.g., replace special characters), then perform exact matching. This solves issues caused by special characters in Google Sheets.
Prefix Match: If the expected name length exceeds 31 characters, the system uses its first 31 characters to match with actual worksheet names to handle Excel's automatic truncation.
Keyword Match: If all above fail, the system uses regular expressions to perform fuzzy matching based on core keywords in each worksheet name (such as Prod_Affinity or AppAffinities).

This series of "degraded" matching strategies ensures that even if CBX-exported file names have various expected or unexpected changes, the system can most likely locate the correct worksheet.

3. Split Worksheet Automatic Merging

In some cases, CBX splits a large data table (such as ...Affinity_Search) into two independent worksheets (e.g., ...Affinity_Vir and ...Affinity_Sea).

The system's parsing logic can intelligently identify this situation: if the ..._Search table is not found in the required list, it automatically searches for corresponding ..._Vir and ..._Sea tables. If both exist, the system automatically merges the data from these two tables in memory before subsequent processing.

4. Data Extraction

For Audience Signal Analysis: After finding all required worksheets, the system converts their data to JSON format, stores them separately, for subsequent AI generation of UAC, PMax, and Demand Gen recommendation plans.
For Persona Slide: This feature only cares about the AffinityGeminiOutput worksheet. After finding this table, the system reads column A row by row for all cell content until encountering an empty cell. Each cell's complete text block is treated as an independent Persona data, waiting for AI's further deep parsing.

Through these carefully designed parsing strategies, InsightHub can stably and reliably process structurally complex CBX data, completely shielding users from the complexity of the data preparation phase, reflecting the project's deep thinking and excellent implementation in technical details.

Browser Local Storage Mechanism

Feature Overview

The system uses the browser's IndexedDB (through the Dexie.js library) to store user analysis results and workflow history. This is an important design decision that avoids the complexity of using a central database.

Why Choose Browser Local Storage?

1. Development Complexity Comparison

Complexity of Using Central Database (e.g., PostgreSQL):

Database Design: Need to design table structure, field types, indexes, relationships, etc.
Data Migration: Every data structure change requires writing migration scripts
API Development: Need to develop complete CRUD (Create, Read, Update, Delete) APIs
Authentication and Authorization: Need to implement user authentication, permission control, data isolation
Backup and Recovery: Need to regularly backup database, handle data recovery
Performance Optimization: Need to optimize query performance, handle concurrent access
Operations Cost: Need to maintain database servers, monitoring, scaling, etc.

Advantages of Using Browser Local Storage:

Zero Server Cost: No database server needed, reducing operations costs
Data Privacy: User data is completely stored locally, improving privacy protection
Offline Availability: Even when network is disconnected, users can still view historical results
Simple Development: No need to design complex database structures and APIs
Rapid Iteration: When data structure changes, only need to update frontend code

2. Technical Implementation

The system uses Dexie.js as an IndexedDB wrapper library, providing a more friendly API.

Database Structure Definition (src/services/workflowStorage.ts):

import Dexie, { Table } from 'dexie';

// Define database structure
class WorkflowDatabase extends Dexie {
  workflows!: Table<UnifiedWorkflow, string>;

  constructor() {
    super('InsightHubWorkflows');
    this.version(1).stores({
      workflows: 'id, workflowType, creatorEmail, createdAt, dataSourceType',
    });
  }
}

const db = new WorkflowDatabase();

Data Storage Workflow:

Image Extraction: Extract Base64-encoded images from result data, convert to Blob objects for separate storage
Data Cleaning: Remove Base64 image data from result data to reduce storage volume
Metadata Construction: Build metadata object containing workflow type, creation time, data source, etc.
Storage Execution: Store cleaned data and metadata to IndexedDB

Core Code:

export async function saveWorkflow(params: SaveWorkflowParams): Promise<string> {
  const workflowId = nanoid();
  
  // 1. Extract images (Base64 -> Blob)
  const { images, imageMetadata, cleanedData } = extractImagesFromResultData(
    params.workflowType,
    params.resultData
  );

  // 2. Build workflow object
  const workflow: UnifiedWorkflow = {
    id: workflowId,
    workflowType: params.workflowType,
    resultData: cleanedData,
    outputLanguage: params.outputLanguage,
    // ... other metadata
    imageMetadata, // Image metadata (does not include actual image data)
  };

  // 3. Store to IndexedDB
  await db.workflows.add(workflow);

  // 4. Store images separately (using IndexedDB's Blob support)
  for (let i = 0; i < images.length; i++) {
    const imageKey = `${workflowId}_image_${i}`;
    await db.workflows.update(workflowId, {
      [`image_${i}`]: images[i], // Dexie automatically handles Blob
    });
  }

  return workflowId;
}

Data Loading Workflow:

Load Workflow: Load workflow object from IndexedDB
Restore Images: Convert stored Blob images to Object URLs, reinsert into result data
Return Complete Data: Return result data including restored images

Core Code:

export async function loadWorkflowWithParsedData(
  workflowId: string
): Promise<{
  workflow: UnifiedWorkflow;
  resultData: any;
  metadata?: Record<string, any>;
} | undefined> {
  // 1. Load workflow
  const workflow = await db.workflows.get(workflowId);
  if (!workflow) {
    return undefined;
  }

  // 2. Restore images (Blob -> Object URL)
  const resultDataWithImages = restoreImagesToResultData(workflow, workflow.resultData);

  return {
    workflow,
    resultData: resultDataWithImages,
    metadata: {
      // ... other metadata
    },
  };
}

Storage Limitations and Notes

Storage Quota: Browsers typically allocate 5-10 GB of storage space per domain
Data Persistence: Clearing browser data will delete all stored workflows
Cross-Device Sync: Local storage does not automatically sync to other devices
Backup Recommendation: Recommend users regularly export ZIP files for backup

Slide-Style HTML Rendering and Image Export

Feature Overview

The system renders analysis results as "slide-style" HTML pages, which can then be exported as images or PPTX files. This design avoids the complexity of directly generating editable Google Slides.

Why Choose HTML Rendering?

Design Trade-offs

Challenges of Directly Generating Google Slides:

Template Complexity: Need to precisely control position, style, font, etc. of each element
API Limitations: Google Slides API has limited functionality and flexibility
Editing Requirements: Generated slides don't need to be editable, only need to display and export
Style Control: HTML/CSS provides more flexible style control capabilities

Advantages of HTML Rendering:

Flexible Styling: Can use full CSS functionality to precisely control styles
Development Efficiency: Using familiar web technology stack, fast development
Easy Adjustment: Modifying styles only requires adjusting CSS, no need to regenerate slides
Simple Export: Can easily export as images or convert to PPTX

Technical Implementation

1. HTML Structure Design

The system uses React components to build slide-style HTML structures.

Example (Market Opportunity Analysis bubble chart slide):

<div ref={slideContainerRef} className="slide-container">
  {/* Subtitle */}
  <div className="slide-subtitle">{subtitle}</div>
  
  {/* Main Title */}
  <h1 className="slide-title">{title}</h1>
  
  {/* Bubble Chart */}
  <div ref={chartContainerRef} className="chart-container">
    {/* ECharts chart rendering */}
  </div>
  
  {/* Priority Summary */}
  <div className="priority-section">
    <div className="priority-1">
      <h3>{priority1Title}</h3>
      <p>{priority1Description}</p>
      {/* ... more content */}
    </div>
    <div className="priority-2">
      {/* ... */}
    </div>
  </div>
  
  {/* Footer */}
  <div className="slide-footer">
    <span>Analysis Period: {popPeriod}</span>
  </div>
</div>

2. CSS Style Design

The system uses CSS to simulate Google Slides' visual effects.

Key Styles:

.slide-container {
  width: 1920px;        /* Standard slide width */
  height: 1080px;       /* Standard slide height */
  background: #ffffff;
  padding: 80px 120px;
  box-sizing: border-box;
  font-family: 'Roboto', sans-serif;
  position: relative;
}

.slide-title {
  font-size: 48px;
  font-weight: 700;
  color: #1a1a1a;
  margin-bottom: 40px;
}

.chart-container {
  width: 100%;
  height: 600px;
  margin: 40px 0;
}

3. Image Export Implementation

The system uses the html-to-image library to convert HTML elements to images.

Core Code (src/components/BubbleChart.tsx):

import { toPng, toBlob } from 'html-to-image';

// Export chart as DataURL (for copying to clipboard)
getChartAsDataURL: () => {
  if (chartContainerRef.current) {
    return toPng(chartContainerRef.current, {
      quality: 1.0,
      pixelRatio: 2, // Increase resolution
      backgroundColor: '#ffffff',
    });
  }
  return Promise.resolve(null);
},

// Export chart as Blob (for download or PPTX generation)
getChartAsBlob: async () => {
  if (chartContainerRef.current) {
    const dataUrl = await toPng(chartContainerRef.current, {
      quality: 1.0,
      pixelRatio: 2,
      backgroundColor: '#ffffff',
    });
    if (dataUrl) {
      const response = await fetch(dataUrl);
      return response.blob();
    }
  }
  return null;
},

Export Parameter Notes:

quality: Image quality (0-1), 1.0 is highest quality
pixelRatio: Pixel ratio, 2 means generating 2x resolution images (clearer)
backgroundColor: Background color, ensures transparent areas have correct background

4. PPTX Generation Implementation

The system uses the pptxgenjs library to convert HTML content to PowerPoint files.

Core Code (src/hooks/usePptxExport.ts):

import PptxGenJS from 'pptxgenjs';

export async function generatePptxAsBlob(
  chartsData: ChartCategoryData[],
  chartRefs: (BubbleChartHandles | null)[],
  popPeriod: string,
  outputLanguage: OutputLanguage
): Promise<Blob> {
  const pptx = new PptxGenJS();
  
  // Set slide dimensions (16:9)
  pptx.layout = 'LAYOUT_WIDE';
  pptx.defineLayout({ name: 'LAYOUT_WIDE', width: 10, height: 5.625 });

  // Generate one slide per category
  for (let i = 0; i < chartsData.length; i++) {
    const chartData = chartsData[i];
    const chartRef = chartRefs[i];
    
    // Get chart image (Blob)
    const chartBlob = await chartRef?.getChartAsBlob();
    if (!chartBlob) continue;

    // Convert Blob to Base64 (pptxgenjs requires)
    const chartBase64 = await blobToBase64(chartBlob);

    // Create new slide
    const slide = pptx.addSlide();

    // Add subtitle
    slide.addText(chartData.subtitle, {
      x: 0.5,
      y: 0.3,
      w: 9,
      h: 0.4,
      fontSize: 14,
      color: '666666',
    });

    // Add main title
    slide.addText(chartData.title, {
      x: 0.5,
      y: 0.7,
      w: 9,
      h: 0.6,
      fontSize: 36,
      bold: true,
      color: '1a1a1a',
    });

    // Add chart image
    slide.addImage({
      data: chartBase64,
      x: 0.5,
      y: 1.5,
      w: 9,
      h: 5,
    });

    // Add priority summary text
    // ... more content
  }

  // Generate PPTX file
  const blob = await pptx.write({ outputType: 'blob' });
  return blob as Blob;
}

Session ZIP Export and Import

Feature Overview

The system supports exporting complete analysis sessions as ZIP files, containing all result data, images, metadata, etc. Users can also import previously exported ZIP files to restore analysis results.

Export Function Implementation

1. ZIP File Structure

The exported ZIP file contains the following:

InsightHub-YYYYMMDD-HHmmss-{UUID}-session-{language}.zip
├── metadata.yaml              # Session metadata (YAML format)
├── detailed-data.xlsx         # Detailed data table (XLSX format)
├── presentation.pptx         # Presentation (PPTX format, optional)
├── images/                    # Images folder
│   ├── chart-{category}-{index}.png
│   ├── slide-{category}-{index}.png
│   └── ...
└── original-data.xlsx         # Original data file (if Excel was uploaded)

2. Export Implementation

The system uses the JSZip library to create ZIP files.

Core Code (src/hooks/useSessionExport.ts):

import JSZip from 'jszip';

async function generateSessionZipAsBlob(
  result: AnalysisResult,
  chartRefs: React.MutableRefObject<(BubbleChartHandles | null)[]>,
  user: UserProfile | null,
  outputLanguage: string,
  originalFile: File | null
): Promise<Blob> {
  const zip = new JSZip();
  const timestamp = formatTimestamp();

  // 1. Add YAML metadata file
  const metadata = {
    uuid: result.uuid,
    workflowType: 'geo-analysis',
    industryName: result.industryName,
    outputLanguage: result.output_language || outputLanguage,
    createdAt: new Date().toISOString(),
    creatorName: user?.name || 'Unknown',
    creatorEmail: user?.email || 'unknown@example.com',
    // ... more metadata
  };
  zip.file('metadata.yaml', yaml.dump(metadata));

  // 2. Add XLSX detailed data file
  const workbook = XLSX.utils.book_new();
  for (const categoryData of result.chartsData) {
    const worksheet = XLSX.utils.json_to_sheet(categoryData.detailedData);
    XLSX.utils.book_append_sheet(workbook, worksheet, categoryData.categoryName);
  }
  const xlsxBuffer = XLSX.write(workbook, { type: 'array', bookType: 'xlsx' });
  zip.file('detailed-data.xlsx', xlsxBuffer);

  // 3. Add PPTX file (if generation successful)
  try {
    const pptxBlob = await generatePptxAsBlob(
      result.chartsData,
      chartRefs.current,
      result.popPeriod,
      outputLanguage as OutputLanguage
    );
    zip.file('presentation.pptx', pptxBlob);
  } catch (error) {
    console.warn('PPTX generation failed, skipping', error);
  }

  // 4. Add all image files
  const imagesFolder = zip.folder('images');
  for (let i = 0; i < result.chartsData.length; i++) {
    const chartRef = chartRefs.current[i];
    if (chartRef) {
      // Export chart image
      const chartBlob = await chartRef.getChartAsBlob();
      if (chartBlob) {
        imagesFolder?.file(`chart-${result.chartsData[i].categoryName}-${i}.png`, chartBlob);
      }

      // Export slide image
      const slideBlob = await chartRef.getSlideAsBlob();
      if (slideBlob) {
        imagesFolder?.file(`slide-${result.chartsData[i].categoryName}-${i}.png`, slideBlob);
      }
    }
  }

  // 5. Add original data file (if provided)
  if (originalFile) {
    zip.file(`original-data.${originalFile.name.split('.').pop()}`, originalFile);
  }

  // 6. Generate ZIP file
  return zip.generateAsync({ type: 'blob' });
}

Import Function Implementation

1. ZIP File Parsing

The system uses the JSZip library to parse ZIP files.

Core Code:

import JSZip from 'jszip';

async function importSessionFromZip(zipFile: File): Promise<{
  metadata: any;
  resultData: any;
  originalFile?: File;
}> {
  const zip = await JSZip.loadAsync(zipFile);

  // 1. Read metadata
  const metadataFile = zip.file('metadata.yaml');
  if (!metadataFile) {
    throw new Error('ZIP file missing metadata.yaml');
  }
  const metadataContent = await metadataFile.async('string');
  const metadata = yaml.load(metadataContent);

  // 2. Read detailed data (optional)
  const dataFile = zip.file('detailed-data.xlsx');
  let resultData = null;
  if (dataFile) {
    const dataBuffer = await dataFile.async('arraybuffer');
    const workbook = XLSX.read(dataBuffer, { type: 'array' });
    // Parse workbook data...
  }

  // 3. Read original data file (if exists)
  const originalFile = zip.file(/^original-data\./)?.[0];
  let originalFileBlob: Blob | null = null;
  if (originalFile) {
    originalFileBlob = await originalFile.async('blob');
  }

  return {
    metadata,
    resultData,
    originalFile: originalFileBlob ? new File([originalFileBlob], 'original-data.xlsx') : undefined,
  };
}

2. Data Recovery

After importing the ZIP file, the system will:

Validate Metadata: Check metadata format and required fields
Restore Result Data: Restore imported data to results page
Restore Original File: If ZIP contains original file, restore file object
Save to Local Storage: Save restored data to IndexedDB for subsequent viewing

Summary

This section detailed the system's core technical implementations, including:

Excel File Upload: Use FileReader and XLSX library to parse files in the browser
Google Sheet Link Processing: Read cloud data through Google OAuth and Sheets API
Data Normalization: Unify field names, data types, handle null values, etc.
Browser Local Storage: Use IndexedDB to store analysis results, avoiding central database complexity
HTML Rendering and Export: Use HTML/CSS to render slides, export as images or PPTX
Session ZIP Export/Import: Complete session backup and recovery functionality

These technical implementations reflect the system's design philosophy: Under the premise of ensuring functional completeness, minimize development complexity and operations costs as much as possible.

Related Documentation:

Table of Contents​

Excel File Upload Implementation​

Feature Overview​

Technical Implementation Workflow​

1. Frontend File Selection Component​

2. File Reading and Parsing​

Google Sheet Link Processing Implementation​

Feature Overview​

Technical Implementation Workflow​

1. URL Validation and Spreadsheet ID Extraction​

2. User Authentication and Permission Check​

3. Data Reading and Conversion​

4. Automatic Re-authentication Mechanism​

Data Parsing and Normalization​

Feature Overview​

Market Opportunity Analysis Data Parsing​

1. Key Information Extraction​

2. Data Worksheet Location​

3. Single-Category vs. Multi-Category Intelligent Recognition​

4. Data Normalization​

Audience Signal Analysis and Persona Data Parsing​

1. "Whitelist" Filtering Mechanism​

2. Intelligent Worksheet Matching (findSheetMatch)​

3. Split Worksheet Automatic Merging​

4. Data Extraction​

Browser Local Storage Mechanism​

Feature Overview​

Why Choose Browser Local Storage?​

1. Development Complexity Comparison​

2. Technical Implementation​

Storage Limitations and Notes​

Slide-Style HTML Rendering and Image Export​

Feature Overview​

Why Choose HTML Rendering?​

Design Trade-offs​

Technical Implementation​

1. HTML Structure Design​

2. CSS Style Design​

3. Image Export Implementation​

4. PPTX Generation Implementation​

Session ZIP Export and Import​

Feature Overview​

Export Function Implementation​

1. ZIP File Structure​

2. Export Implementation​

Import Function Implementation​

1. ZIP File Parsing​

2. Data Recovery​

Summary​

Table of Contents

Excel File Upload Implementation

Feature Overview

Technical Implementation Workflow

1. Frontend File Selection Component

2. File Reading and Parsing

Google Sheet Link Processing Implementation

Feature Overview

Technical Implementation Workflow

1. URL Validation and Spreadsheet ID Extraction

2. User Authentication and Permission Check

3. Data Reading and Conversion

4. Automatic Re-authentication Mechanism

Data Parsing and Normalization

Feature Overview

Market Opportunity Analysis Data Parsing

1. Key Information Extraction

2. Data Worksheet Location

3. Single-Category vs. Multi-Category Intelligent Recognition

4. Data Normalization

Audience Signal Analysis and Persona Data Parsing

1. "Whitelist" Filtering Mechanism

2. Intelligent Worksheet Matching (`findSheetMatch`)

3. Split Worksheet Automatic Merging

4. Data Extraction

Browser Local Storage Mechanism

Feature Overview

Why Choose Browser Local Storage?

1. Development Complexity Comparison

2. Technical Implementation

Storage Limitations and Notes

Slide-Style HTML Rendering and Image Export

Feature Overview

Why Choose HTML Rendering?

Design Trade-offs

Technical Implementation

1. HTML Structure Design

2. CSS Style Design

3. Image Export Implementation

4. PPTX Generation Implementation

Session ZIP Export and Import

Feature Overview

Export Function Implementation

1. ZIP File Structure

2. Export Implementation

Import Function Implementation

1. ZIP File Parsing

2. Data Recovery

Summary