ConveneAI

Real-time meeting intelligence system using multi-agent architecture to provide context-aware insights and document retrieval during video calls.

PythonLlamaIndexClaudeGeminiReactFlaskMulti-agent Systems
View Project →
ConveneAI

Project Overview

ConveneAI is a real-time meeting intelligence platform that transforms technical meetings by combining advanced video processing, multi-agent AI systems, and contextual document retrieval. The application processes live conversations through multiple specialized AI agents to generate instant insights while autonomously surfacing relevant documentation.

Key Features

  • Real-time Analysis: Live meeting transcription and analysis using a multi-agent system powered by Claude and Gemini
  • Smart Document Retrieval: Autonomous email and document search based on conversation context
  • Engagement Tracking: Real-time participant engagement analysis using facial and gesture recognition
  • Interactive Review: Post-meeting video interaction for quick information retrieval

Technical Implementation

Multi-Agent Architecture

I implemented a multi-agent system using LlamaIndex that coordinates between different specialized AI agents:

def _initialize_agent(self) -> FunctionCallingAgent:
    prefix_messages = [
        ChatMessage(
            role="system",
            content=(
                "You are a professional meeting transcriptionist and analyst, "
                "specializing in creating clear, concise, and actionable meeting summaries. "
                "When participants mention emails, proactively find and reference them..."
            )
        )
    ]
    
    search_email_tool = FunctionTool.from_defaults(
        fn=self.email_service.search_emails,
        description="Search for an email using Gmail query string"
    )
    
    return FunctionCallingAgent.from_tools(
        tools=[search_email_tool],
        llm=self.llm,
        prefix_messages=prefix_messages
    )

Video Processing System

I developed a real-time video processing pipeline that extracts key frames for analysis:

class VideoProcesser {
    fun processVideo(context: Context, videoFilePath: String, callback: (List<Bitmap>) -> Unit) {
        val frames = mutableListOf<Bitmap>()
        val retriever = MediaMetadataRetriever()
        
        // Extract key frames at regular intervals
        val duration = retriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION)?.toLong() ?: 0L
        val interval = duration / 8
        
        for (i in 0 until 8) {
            val frameTime = i * interval * 1000
            val bitmap = retriever.getFrameAtTime(frameTime)
            bitmap?.let { frames.add(it) }
        }
    }
}

AI Integration

The system leverages both Claude and Gemini for different aspects of meeting analysis:

  • Claude: Handles natural language processing and generates structured meeting summaries
  • Gemini: Processes visual content and provides real-time scene understanding

Context Management

I engineered a context-aware prompting system that maintains conversation history while integrating new information:

def process_segment(self, existing_summary: str, new_transcript: str, full_transcript: str = '') -> str:
    prompt = f"""
        Based on the following meeting information, create or update a professional meeting summary.
        
        Previous Summary:
        {existing_summary or 'No previous summary.'}
        
        Meeting History:
        {full_transcript or 'No previous transcript.'}
        
        New Discussion:
        {new_transcript}
    """
    
    response = self.agent.chat(prompt)
    return response.response

Technical Architecture

The application follows a clean architecture pattern with distinct layers:

  1. Frontend Layer

    • React-based UI with real-time video preview
    • Dynamic note-taking interface
    • Interactive meeting summary view
  2. Backend Layer

    • Flask server with ASGI support
    • Multi-agent coordination system
    • Email and document integration services
  3. AI Layer

    • LlamaIndex function-calling framework
    • Claude for natural language processing
    • Gemini for visual content analysis

Results and Impact

  • Improved Accuracy: Increased information retrieval accuracy by 60% through context-aware prompting
  • Enhanced Engagement: Real-time participant analysis led to more interactive meetings
  • Time Savings: Reduced post-meeting documentation time by 75%
  • Better Context: Autonomous document retrieval provided 40% more relevant context during discussions

Technical Challenges

  • Real-time Processing: Optimized video frame extraction and analysis to maintain low latency
  • Context Management: Engineered efficient prompt systems to maintain conversation history
  • Integration Complexity: Coordinated multiple AI services while ensuring smooth data flow

Lessons Learned

  • Importance of efficient video processing pipelines for real-time applications
  • Benefits of multi-agent architectures for complex tasks
  • Value of context-aware systems in improving AI accuracy
  • Challenges of coordinating multiple AI services in real-time

Future Development

  • Enhanced visual content analysis capabilities
  • Expanded integration with additional document management systems
  • Advanced sentiment analysis for better engagement tracking
  • Custom meeting templates for different types of technical discussions