A Technical Journey

When my kids and I play Scrabble, keeping track of scores and validating moves can sometimes take away from the fun of the game itself. This inspired me to build an AI-powered companion that could handle these tasks while adding an educational twist. Here’s how I built it and what I learned along the way.

Building a Scrabble Companion with Gemini 2.0

The Vision

I wanted to create two applications that would work together:

A moderator app that uses AI to capture and validate game states
A companion app that provides real-time game insights and AI-powered move explanations

Companion App — Adjusting board position in the camera

Technical Architecture

Technical architecture diagram for the Scrabble companion project

Flutter & Firebase: The Foundation

The project is built on Flutter for both apps, with Firebase handling real-time synchronization. This combination provides:

Cross-platform compatibility
Real-time data sync between moderator and companion apps
Reliable state management
Smooth animations for game state visualization

AI Integration Stack

Computer Vision with Gemini

For board state capture, I implemented:

Image processing to identify board state
AI-powered OCR to recognize letters and positions
Position validation using board rules

While the accuracy isn’t perfect yet, it provides a solid foundation for game state capture.

Move Analysis with Multiple LLMs

I experimented with several LLM providers, all integrated through Vertex AI for consistency. Here’s how the integration works:

class GeminiService {
  late GenerativeModel _model;
  final ImageStorageService _imageStorage = ImageStorageService();
  final FirebaseService _firebaseService = FirebaseService();

  GeminiService() {
    // Initialize with Gemini 2.0 Flash
    _model = FirebaseVertexAI.instance
        .generativeModel(model: 'gemini-2.0-flash');
  }

  Future<Map<String, dynamic>> analyzeBoardImage(
    String sessionId,
    String imagePath,
  ) async {
    try {
      // Read current image bytes
      final currentImageBytes = await File(imagePath).readAsBytes();

      // Get board state
      final boardState = await _firebaseService.getBoardState(sessionId).first;
      final isFirstMove = boardState.isEmpty;

      if (isFirstMove) {
        // Handle first move analysis
        final response = await _model.generateContent([
          Content.multi([
            TextPart(_constructInitialBoardPrompt()),
            DataPart('image/jpeg', currentImageBytes),
          ]),
        ]);

        return {
          'status': 'success',
          'type': 'initial',
          'data': _parseGeminiResponse(response.text!, true),
        };
      } else {
        // Compare with previous state
        final response = await _model.generateContent([
          Content.multi([
            TextPart(_constructImageComparisonPrompt(boardState)),
            DataPart('image/jpeg', currentImageBytes),
          ]),
        ]);

        return {
          'status': 'success',
          'type': 'move',
          'data': _parseGeminiResponse(response.text!, false),
        };
      }
    } catch (e) {
      return {
        'status': 'error',
        'message': e.toString(),
      };
    }
  }

  // Prompt construction for initial board analysis
  String _constructInitialBoardPrompt() {
    return '''
    You are analyzing an image of an initial Scrabble board move.
    Accurately identify all visible letters and their positions.
    Return ONLY a JSON with this format:
    {
      "board": [
        {
          "letter": "A",
          "row": 7,
          "col": 7,
          "points": 1
        }
      ]
    }
    ''';
  }
}

Each LLM provider offered different strengths:

DeepSeek: Provided detailed move explanations with strategic insights
Gemini 2.0 Flash: Excellent balance of speed and accuracy, particularly for image analysis

The current implementation uses Gemini 2.0 Flash through Firebase’s Vertex AI SDK, which provides seamless integration with other Firebase services and excellent performance for both text and image analysis.

Move Explanations and Voice Synthesis

The move explanation system combines LLM analysis with voice synthesis:

class AIService {
  final LLMService _llmService;
  bool _isTtsInitialized = false;

  Future<String> generateMoveExplanation(
    String playerName,
    Move move,
    int currentScore,
  ) async {
    final prompt = '''
    Explain this Scrabble move played by $playerName:
    - Word: ${move.word}
    - Score: ${move.score} points
    - Tiles: ${move.tiles.map((t) => '${t.letter}(${t.points})').join(', ')}
    - Current total: $currentScore points

    Keep it brief but informative in 2 sentences maximum.
    ''';

    return await _llmService.generateExplanation(prompt);
  }

  Future<List<int>> convertToSpeech(String text, AppLanguage language) async {
    if (!_isTtsInitialized) {
      await _initializeTts();
    }

    final targetVoice = language == AppLanguage.english
        ? 'en-US-Wavenet-I'  // English male voice
        : 'fr-FR-Wavenet-D'; // French male voice

    final params = TtsParamsGoogle(
      voice: voice,
      audioFormat: AudioOutputFormatGoogle.linear16,
      text: text,
    );

    final ttsResponse = await TtsGoogle.convertTts(params);
    return ttsResponse.audio.buffer.asUint8List().toList();
  }
}

Implemented Google Cloud TTS for bilingual support (French/English)
Tested ElevenLabs for more natural voice qualities
Created a flexible voice provider system for easy switching between services

Implementation Highlights

Real-time Game State Management

class GameStateProvider with ChangeNotifier {
  // Real-time board state sync
  Stream<BoardState> getBoardState() {
    return FirebaseService().getBoardState(sessionId);
  }
  
  // Move processing
  Future<void> processMove(Move move) {
    // AI analysis & validation
    // Score calculation
    // State updates
  }
}

AI Move Analysis Pipeline

Capture board state through camera
Process image with computer vision
Validate move against game rules
Generate natural language explanation
Convert explanation to speech

Cross-Platform Considerations

Moderator app optimized for mobile camera usage
Companion app designed for tablet viewing
Shared codebase for game logic
Platform-specific UI optimizations

Challenges and Learnings

Computer Vision Accuracy

The biggest challenge was achieving reliable board state capture. Some strategies I implemented:

Grid overlay for better image alignment
Multiple image processing attempts
Manual correction capabilities

Prompt Engineering Techniques

One of the most interesting aspects of this project was crafting effective prompts. Here’s what I learned:

Board State Analysis Prompts

For capturing board state, specificity and constraints were crucial:

String _constructImageComparisonPrompt(Map<String, dynamic> previousState) {
  return '''
  Compare these two Scrabble board images: the first is the previous state, the second is after a move.
  Identify ONLY new letters that appear in the second image.

  Previous board state for reference:
  ${jsonEncode(previousState)}

  Return ONLY a JSON object in exactly this format:
  {
    "word": "EXAMPLE",
    "score": 15,
    "newLetters": [
      {
        "letter": "A",
        "row": 7,
        "col": 7,
        "points": 1
      }
    ]
  }

  Rules:
  - Use 0-based indices (0-14) for coordinates
  - All coordinates must be within the 15x15 grid
  - Return ONLY the JSON, no explanatory text
  - If no valid word was played, return: {"word": "", "score": 0, "newLetters": []}
  ''';
}

Move Explanation Prompts

For move explanations, I found that “role-playing” and context-setting improved results:

String createMoveExplanationPrompt(String playerName, Move move, int currentScore) {
  return '''
  You are an enthusiastic Scrabble commentator.
  
  Explain this move by $playerName:
  - Word: ${move.word}
  - Score for this move: ${move.score} points
  - Tiles placed: ${move.tiles.map((t) => '${t.letter}(${t.points})').join(', ')}
  - Current total after this move: $currentScore

  Focus on:
  1. Strategic value of the move
  2. Clever use of board multipliers
  3. Point calculation highlights

  Keep it brief but engaging in 2 sentences.
  ''';
}

Key Learnings

1. Structured Output

Always specify exact output format
Use JSON for structured data
Include example responses in prompts

2. Context Management

Provide relevant game state
Include previous moves when needed
Set clear role and tone expectations

3. Multilingual Considerations

Maintain same structure across languages
Adapt cultural references appropriately
Keep consistent tone and expertise level

This approach to prompt engineering resulted in more consistent and reliable responses, while maintaining the engaging and educational aspect of the game.

Real-time Sync

Firebase made real-time synchronization straightforward, but required careful planning for:

State consistency across devices
Handling network interruptions
Managing game session lifecycle

Future Improvements

1. Enhanced Image Recognition

Implementing better board detection algorithms
Adding support for different board layouts
Improving accuracy in various lighting conditions

2. Advanced AI Features

Move suggestion capabilities
Strategy analysis
Learning patterns from gameplay

3. Voice Synthesis

Exploring ElevenLabs integration
Adding more language support
Improving natural speech patterns

Testing Notes

For developers interested in testing the board recognition features, I highly recommend using ScrabbleCam. It provides a consistent way to test board capture functionality without needing a physical board.

Conclusion

Building this project has been a fantastic journey in combining AI technologies with real-world gaming. The most rewarding part has been seeing my kids’ reactions to AI-powered move explanations and how it adds a new dimension to our game.

Check out the code on GitHub to explore the implementation details or contribute to the project!

Technical Stack Summary

Flutter for cross-platform development
Firebase for real-time synchronization
Vertex AI for LLM integration
Google Cloud TTS for voice synthesis
Computer vision for board state capture

This project is open source and available on GitHub. Feel free to explore, contribute, or adapt it for your own AI experiments!