Building a Scrabble Companion with Gemini 2.0
Artificial Intelligence

Building a Scrabble Companion with Gemini 2.0

February 17, 2025 6 min read

A Technical Journey

When my kids and I play Scrabble, keeping track of scores and validating moves can sometimes take away from the fun of the game itself. This inspired me to build an AI-powered companion that could handle these tasks while adding an educational twist. Here’s how I built it and what I learned along the way.

Building a Scrabble Companion with Gemini 2.0
Building a Scrabble Companion with Gemini 2.0

The Vision

I wanted to create two applications that would work together:

  1. A moderator app that uses AI to capture and validate game states
  2. A companion app that provides real-time game insights and AI-powered move explanations
Companion App — Adjusting board position in the camera
Companion App — Adjusting board position in the camera

Technical Architecture

Technical architecture diagram for the Scrabble companion project
Technical architecture diagram for the Scrabble companion project

Flutter & Firebase: The Foundation

The project is built on Flutter for both apps, with Firebase handling real-time synchronization. This combination provides:

  • Cross-platform compatibility
  • Real-time data sync between moderator and companion apps
  • Reliable state management
  • Smooth animations for game state visualization

AI Integration Stack

Computer Vision with Gemini

For board state capture, I implemented:

  • Image processing to identify board state
  • AI-powered OCR to recognize letters and positions
  • Position validation using board rules

While the accuracy isn’t perfect yet, it provides a solid foundation for game state capture.

Move Analysis with Multiple LLMs

I experimented with several LLM providers, all integrated through Vertex AI for consistency. Here’s how the integration works:

class GeminiService {
  late GenerativeModel _model;
  final ImageStorageService _imageStorage = ImageStorageService();
  final FirebaseService _firebaseService = FirebaseService();

  GeminiService() {
    // Initialize with Gemini 2.0 Flash
    _model = FirebaseVertexAI.instance
        .generativeModel(model: 'gemini-2.0-flash');
  }

  Future<Map<String, dynamic>> analyzeBoardImage(
    String sessionId,
    String imagePath,
  ) async {
    try {
      // Read current image bytes
      final currentImageBytes = await File(imagePath).readAsBytes();

      // Get board state
      final boardState = await _firebaseService.getBoardState(sessionId).first;
      final isFirstMove = boardState.isEmpty;

      if (isFirstMove) {
        // Handle first move analysis
        final response = await _model.generateContent([
          Content.multi([
            TextPart(_constructInitialBoardPrompt()),
            DataPart('image/jpeg', currentImageBytes),
          ]),
        ]);

        return {
          'status': 'success',
          'type': 'initial',
          'data': _parseGeminiResponse(response.text!, true),
        };
      } else {
        // Compare with previous state
        final response = await _model.generateContent([
          Content.multi([
            TextPart(_constructImageComparisonPrompt(boardState)),
            DataPart('image/jpeg', currentImageBytes),
          ]),
        ]);

        return {
          'status': 'success',
          'type': 'move',
          'data': _parseGeminiResponse(response.text!, false),
        };
      }
    } catch (e) {
      return {
        'status': 'error',
        'message': e.toString(),
      };
    }
  }

  // Prompt construction for initial board analysis
  String _constructInitialBoardPrompt() {
    return '''
    You are analyzing an image of an initial Scrabble board move.
    Accurately identify all visible letters and their positions.
    Return ONLY a JSON with this format:
    {
      "board": [
        {
          "letter": "A",
          "row": 7,
          "col": 7,
          "points": 1
        }
      ]
    }
    ''';
  }
}

Each LLM provider offered different strengths:

  • DeepSeek: Provided detailed move explanations with strategic insights
  • Gemini 2.0 Flash: Excellent balance of speed and accuracy, particularly for image analysis

The current implementation uses Gemini 2.0 Flash through Firebase’s Vertex AI SDK, which provides seamless integration with other Firebase services and excellent performance for both text and image analysis.

Move Explanations and Voice Synthesis

The move explanation system combines LLM analysis with voice synthesis:

class AIService {
  final LLMService _llmService;
  bool _isTtsInitialized = false;

  Future<String> generateMoveExplanation(
    String playerName,
    Move move,
    int currentScore,
  ) async {
    final prompt = '''
    Explain this Scrabble move played by $playerName:
    - Word: ${move.word}
    - Score: ${move.score} points
    - Tiles: ${move.tiles.map((t) => '${t.letter}(${t.points})').join(', ')}
    - Current total: $currentScore points

    Keep it brief but informative in 2 sentences maximum.
    ''';

    return await _llmService.generateExplanation(prompt);
  }

  Future<List<int>> convertToSpeech(String text, AppLanguage language) async {
    if (!_isTtsInitialized) {
      await _initializeTts();
    }

    final targetVoice = language == AppLanguage.english
        ? 'en-US-Wavenet-I'  // English male voice
        : 'fr-FR-Wavenet-D'; // French male voice

    final params = TtsParamsGoogle(
      voice: voice,
      audioFormat: AudioOutputFormatGoogle.linear16,
      text: text,
    );

    final ttsResponse = await TtsGoogle.convertTts(params);
    return ttsResponse.audio.buffer.asUint8List().toList();
  }
}
  • Implemented Google Cloud TTS for bilingual support (French/English)
  • Tested ElevenLabs for more natural voice qualities
  • Created a flexible voice provider system for easy switching between services

Implementation Highlights

Real-time Game State Management

class GameStateProvider with ChangeNotifier {
  // Real-time board state sync
  Stream<BoardState> getBoardState() {
    return FirebaseService().getBoardState(sessionId);
  }
  
  // Move processing
  Future<void> processMove(Move move) {
    // AI analysis & validation
    // Score calculation
    // State updates
  }
}

AI Move Analysis Pipeline

  1. Capture board state through camera
  2. Process image with computer vision
  3. Validate move against game rules
  4. Generate natural language explanation
  5. Convert explanation to speech

Cross-Platform Considerations

  • Moderator app optimized for mobile camera usage
  • Companion app designed for tablet viewing
  • Shared codebase for game logic
  • Platform-specific UI optimizations

Challenges and Learnings

Computer Vision Accuracy

The biggest challenge was achieving reliable board state capture. Some strategies I implemented:

  • Grid overlay for better image alignment
  • Multiple image processing attempts
  • Manual correction capabilities

Prompt Engineering Techniques

One of the most interesting aspects of this project was crafting effective prompts. Here’s what I learned:

Board State Analysis Prompts

For capturing board state, specificity and constraints were crucial:

String _constructImageComparisonPrompt(Map<String, dynamic> previousState) {
  return '''
  Compare these two Scrabble board images: the first is the previous state, the second is after a move.
  Identify ONLY new letters that appear in the second image.

  Previous board state for reference:
  ${jsonEncode(previousState)}

  Return ONLY a JSON object in exactly this format:
  {
    "word": "EXAMPLE",
    "score": 15,
    "newLetters": [
      {
        "letter": "A",
        "row": 7,
        "col": 7,
        "points": 1
      }
    ]
  }

  Rules:
  - Use 0-based indices (0-14) for coordinates
  - All coordinates must be within the 15x15 grid
  - Return ONLY the JSON, no explanatory text
  - If no valid word was played, return: {"word": "", "score": 0, "newLetters": []}
  ''';
}

Move Explanation Prompts

For move explanations, I found that “role-playing” and context-setting improved results:

String createMoveExplanationPrompt(String playerName, Move move, int currentScore) {
  return '''
  You are an enthusiastic Scrabble commentator.
  
  Explain this move by $playerName:
  - Word: ${move.word}
  - Score for this move: ${move.score} points
  - Tiles placed: ${move.tiles.map((t) => '${t.letter}(${t.points})').join(', ')}
  - Current total after this move: $currentScore

  Focus on:
  1. Strategic value of the move
  2. Clever use of board multipliers
  3. Point calculation highlights

  Keep it brief but engaging in 2 sentences.
  ''';
}

Key Learnings

1. Structured Output

  • Always specify exact output format
  • Use JSON for structured data
  • Include example responses in prompts

2. Context Management

  • Provide relevant game state
  • Include previous moves when needed
  • Set clear role and tone expectations

3. Multilingual Considerations

  • Maintain same structure across languages
  • Adapt cultural references appropriately
  • Keep consistent tone and expertise level

This approach to prompt engineering resulted in more consistent and reliable responses, while maintaining the engaging and educational aspect of the game.

Real-time Sync

Firebase made real-time synchronization straightforward, but required careful planning for:

  • State consistency across devices
  • Handling network interruptions
  • Managing game session lifecycle

Future Improvements

1. Enhanced Image Recognition

  • Implementing better board detection algorithms
  • Adding support for different board layouts
  • Improving accuracy in various lighting conditions

2. Advanced AI Features

  • Move suggestion capabilities
  • Strategy analysis
  • Learning patterns from gameplay

3. Voice Synthesis

  • Exploring ElevenLabs integration
  • Adding more language support
  • Improving natural speech patterns

Testing Notes

For developers interested in testing the board recognition features, I highly recommend using ScrabbleCam. It provides a consistent way to test board capture functionality without needing a physical board.

Conclusion

Building this project has been a fantastic journey in combining AI technologies with real-world gaming. The most rewarding part has been seeing my kids’ reactions to AI-powered move explanations and how it adds a new dimension to our game.

Check out the code on GitHub to explore the implementation details or contribute to the project!

Technical Stack Summary

  • Flutter for cross-platform development
  • Firebase for real-time synchronization
  • Vertex AI for LLM integration
  • Google Cloud TTS for voice synthesis
  • Computer vision for board state capture

This project is open source and available on GitHub. Feel free to explore, contribute, or adapt it for your own AI experiments!