Single source of truth for all citations in the Hydra project.
| Paper | Authors | Year | Venue / URL | Key Contribution | Relevance to Hydra |
|---|---|---|---|---|---|
| Suphx: Mastering Mahjong with Deep Reinforcement Learning | Junjie Li, Sotetsu Koyamada, Qiwei Ye, Guoqing Liu, Chao Wang, Ruihan Yang, Li Zhao, Tao Qin, Tie-Yan Liu, Hsiao-Wuen Hon | 2020 | arXiv:2003.13590 | Oracle guiding, Global Reward Prediction (GRP), run-time policy adaptation, 10-dan achievement on Tenhou. Architecture: 50 residual blocks, 256 filters, separate models per action type with 838 input channels (discard/riichi) and 958 input channels (chow/pong/kong) (Table 2, Figures 4-5). | Core inspiration for oracle distillation and GRP head design |
| Tjong: A Transformer-based Mahjong AI via Hierarchical Decision-Making and Fan Backward | Xiali Li, Bo Liu, Zhi Wei, Zhaoqi Wang, Licheng Wu | 2024 | CAAI Trans. Intel. Tech. DOI: 10.1049/cit2.12298 | Hierarchical decision-making (action type → tile selection), transformer architecture for game sequences, fan backward reward shaping | Alternative architecture reference; fan backward considered for yaku awareness |
| Information Set Monte Carlo Tree Search | P. I. Cowling, E. J. Powley, D. Whitehouse | 2012 | IEEE TCIAIG | Foundation for handling imperfect information via determinization and information-set sampling | Theoretical basis for imperfect-info game approaches |
| Real-time Mahjong AI based on Monte Carlo Tree Search (Bakuuchi) | Mizukami et al. | 2014 | IEEE | Pre-deep-learning SOTA using ISMCTS + rule-based heuristics | Historical baseline for MCTS approaches |
| An Open-Source Interpretable and Reproducible Mahjong Agent (Phoenix) | — | 2021 | USC CSCI 527 Course Project | Transparent baseline with interpretable decision-making | Open-source baseline reference |
| Building a Computer Mahjong Player via Deep Convolutional Neural Networks | — | 2018 | IEEE | CNN for Mahjong, baseline methods | Early CNN approach for mahjong |
| Speedup Training Artificial Intelligence for Mahjong via Reward Variance Reduction | Li, Wu, Fu, Fu, Zhao, Xing | 2022 | IEEE CoG | RVR technique for reducing gradient noise from luck variance, oracle critic + expected reward network | Enables training on limited hardware; hand-luck baseline subtraction |
| Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game | Fu, Liu, Wu, Wang, Yang, Li, Xing, Li, Ma, Fu, Yang | 2022 | ICLR 2022 | ACH (Actor-Critic Hedge): merges deep RL with Weighted CFR for Nash Equilibrium convergence in imperfect-info games. Core offline training algorithm for Tencent's LuckyJ. | Game-theoretic RL alternative to PPO/DQN; LuckyJ's ACH + OLSS reached 10.68 stable dan on Tenhou |
| Opponent-Limited Online Search for Imperfect Information Games | Liu, Fu, Fu, Yang | 2023 | ICML 2023 | OLSS: imperfect-info subgame solving with opponent-limited tree pruning, orders of magnitude faster than common-knowledge methods. Tested on 2-player mahjong. | Core search component for LuckyJ; search-as-feature integration enables real-time strategy adjustment |
| Look-ahead Reasoning with a Learned Model in Imperfect Information Games (LAMIR) | Kubicek, Lisy | 2026 | ICLR 2026 | Learns abstract game models from agent-environment interaction, enables CFR-based depth-limited look-ahead search in imperfect-info games. Tested on 2-player games. arXiv:2510.05048, Code | Inspiration for Hydra's inference-time search direction (historical SEARCH_PGOI.md planning surface; not present as a standalone doc in the current repo). Referenced in TACC allocation proposal as "LAS" framing. |
| Hierarchical CFR with Policy Abstraction in Mahjong | (CFR-p authors) | 2023 | arXiv:2307.12087 | Applied vanilla CFR to a simplified 2-player 68-tile Mahjong variant with hierarchical policy abstraction. Even this heavily reduced game had ~10^43 leaf nodes before abstraction. Only known CFR application to any Mahjong variant. | Confirms 4-player Mahjong remains intractable for tabular CFR. Supports Hydra's RL-based approach over game-theoretic solving. |
| Paper | Authors | Year | Venue / URL | Key Contribution | Relevance to Hydra |
|---|---|---|---|---|---|
| Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm (AlphaZero) | Silver et al. | 2017 | arXiv | MCTS + neural network self-play, general game learning | Baseline game AI paradigm |
| Superhuman AI for Multiplayer Poker (Pluribus) | Brown, Sandholm | 2019 | Science | Imperfect-information game solving at scale | Opponent modeling in imperfect-info games |
| OpenAI Five | OpenAI | 2019 | OpenAI | Large-scale PPO for complex games | Training stability and PPO scaling |
| AlphaStar: Mastering the Real-Time Strategy Game StarCraft II | Vinyals et al. | 2019 | Nature | League training for multi-agent robustness | League training methodology for Phase 3 |
| Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning (DeepNash) | Perolat et al. | 2022 | Science | R-NaD for Nash equilibrium approximation | Considered and rejected; Nash approach less suitable for 4-player ranking |
| Paper | Authors | Year | Venue / URL | Key Contribution | Relevance to Hydra |
|---|---|---|---|---|---|
| Squeeze-and-Excitation Networks | Hu et al. | 2018 | CVPR | SE attention blocks for channel recalibration | Backbone design: dual-pool SE attention in every ResBlock |
| CBAM: Convolutional Block Attention Module | Woo et al. | 2018 | ECCV | Channel + spatial attention via dual-pool (avg+max) shared MLP | Hydra's SE module uses CBAM's channel attention component (dual-pool shared MLP) |
| Group Normalization | Wu & He | 2018 | ECCV | Batch-independent normalization | Training stability: GroupNorm(32) replaces BatchNorm |
| Proximal Policy Optimization Algorithms | Schulman et al. | 2017 | arXiv | PPO clipped surrogate objective | Core RL algorithm for Phases 2-3 |
| Attention Is All You Need | Vaswani et al. | 2017 | NeurIPS | Transformer architecture | Considered for backbone; used by Kanachan and Tjong |
| Learning Confidence for Out-of-Distribution Detection | DeVries, Taylor | 2018 | arXiv:1802.04865 | Confidence estimation as training regularization | Used by NAGA for calibrated action distributions |
| Project | URL | Language | Stars | License | Notes |
|---|---|---|---|---|---|
| Mortal | https://github.com/Equim-chan/Mortal | Rust/Python | 1.3K+ | AGPL-3.0-or-later | Primary competitor. ResNet(40 blocks, 192ch) + Channel Attention → DQN(Dueling) + CQL. Reference only — AGPL, cannot derive code. Study: obs encoding (1012×34), action masking (46 actions), GRP head, 1v3 duplicate evaluation. Weights have additional distribution restrictions beyond AGPL. |
| Kanachan | https://github.com/Cryolite/kanachan | C++/Python | 300+ | Unlicensed | Transformer encoder (BERT-style) — two configs: base (~90M params, 12L/768d) and large (~310M params, 24L/1024d). Trained on 65M+ Majsoul rounds (Gold+), zero hand-crafted features. 184 tokens: 33 sparse + 6 numeric + 113 progression + 32 candidates. Pipeline: BC → curriculum fine-tuning → offline RL (IQL/ILQL/CQL). No published benchmarks despite multi-year development (public repo created 2021-08-05). Parameter count makes online RL infeasible. |
| Akochan | https://github.com/critter-mj/akochan | C++ | ~280 | Custom (restrictive, Japanese) | EV-based heuristic engine with explicit suji/kabe/genbutsu analysis. Not ML-based. Matters: its hand-crafted defense logic is a useful sanity check — if Hydra's neural network disagrees with Akochan's defense in obvious spots, something is wrong. Also used as the backend for the original mjai-reviewer. |
| MahjongAI | https://github.com/erreurt/MahjongAI | Python | ~450 | — | Extensible agent framework with pluggable strategies. Matters less for architecture, more for its Tenhou client implementation — shows how to connect an AI to Tenhou's protocol if we ever need that. |
| AlphaJong | https://github.com/Jimboom7/AlphaJong | JavaScript | — | — | Browser-based heuristic engine (NOT AlphaZero despite the name). Tunable offense/defense balance via sliders. Matters only as a weak baseline — useful for sanity-checking that Hydra beats simple heuristics by a wide margin. |
| mjai-manue | https://github.com/gimite/mjai-manue | Ruby | 37 | — | Original MJAI protocol client. Matters as protocol reference — defines the canonical MJAI message format that Hydra must be compatible with. |
| NAGA | https://dmv.nico/en/articles/mahjong_ai_naga/ | — | — | Commercial | Pure supervised learning — 4 independent CNNs (discard, call, riichi, kan) trained on Tenhou Houou game logs via imitation learning. No self-play, no RL. Uses confidence estimation (DeVries & Taylor 2018) as training regularization and Guided Backpropagation (Springenberg et al. 2014) for interpretability. 5 playstyle variants (Omega, Gamma, Nishiki, Hibakari, Kagashi) differentiated by training on different players' game records, not architecture changes. CNN architecture details (layers, filters, input shape) never publicly disclosed — the DMV article is the sole official technical document. Achieved 10-dan on Tenhou (26,598 games — source unverified; number does not appear in the DMV article or any locatable public source), current models estimated ~9-dan stable. Not open-source. NAGA's "match%" metric is a common (but imperfect) benchmark. |
| LuckyJ | https://haobofu.github.io/ | — | — | Commercial | Tencent's mahjong AI (绝艺/JueYi brand). 10-dan on Tenhou in 1,321 games, 10.68 stable dan — strongest known AI. ACH + OLSS architecture, pure self-play. See COMMUNITY_INSIGHTS § LuckyJ for detailed architecture analysis. |
| Project | URL | Stars | Description |
|---|---|---|---|
| mjai-reviewer | https://github.com/Equim-chan/mjai-reviewer | 1.1K+ | CLI that generates HTML review reports showing Q-value differences per discard. Primary tool for evaluating Hydra's play quality. Apache-2.0 — can use directly. |
| mjai-reviewer3p | https://github.com/hidacow/mjai-reviewer3p | — | 3-player (sanma) fork of mjai-reviewer. Matters only if Hydra targets sanma. |
| killer_mortal_gui | https://github.com/killerducky/killer_mortal_gui | — | Enhanced Mortal review with deal-in heuristic multipliers (ryanmen 3.5×, kanchan suji-trap 2.6×, honor tanki/shanpon 1.7×, etc). Matters: these empirically-tuned danger multipliers are the best public reference for tile danger calibration — useful for validating Hydra's learned defense signals. |
| crx-mortal | https://github.com/announce/crx-mortal | — | Chrome extension for in-browser Mortal analysis. Low relevance for training. |
| mjai-batch-review | https://github.com/Xerxes-2/mjai-batch-review | 9 | Batch analyze multiple game logs at once. Matters for large-scale evaluation — when testing Hydra across thousands of games, batch review is faster than one-by-one. |
| Fork | URL | Key Difference |
|---|---|---|
| Mortal-Policy | https://github.com/Nitasurin/Mortal-Policy | PPO instead of DQN, GroupNorm instead of BatchNorm, entropy weight tuning. AGPL-3.0, reference only. Matters: closest public reference to Hydra's own architecture choice (PPO + GroupNorm). Study their AWR→PPO transition code path and how they handle the policy gradient with mahjong's 46-action space. |
| Project | URL | Language | License | Purpose |
|---|---|---|---|---|
| xiangting | https://github.com/Apricot-S/xiangting | Rust | MIT | Primary shanten library. Compile-time embedded tables (~200KB), no_std compatible, 3-player support, returns both shanten number and necessary/unnecessary tile sets. 34× faster than brute-force for replacement tile calculation. Hydra uses this for obs encoding channels (shanten features) and action masking. |
| xiangting-py | — | Python | MIT | Python bindings for xiangting via PyO3. Useful for training-side shanten calculation if needed. |
| tomohxx/shanten-number | — | C++ | LGPL-3.0 | Original table-based shanten algorithm that xiangting is derived from. Algorithm reference only — LGPL prevents static linking. Tables: suhai (1.9M entries, ~19.4MB), jihai (78K entries, ~0.78MB). Base-5 encoding for tile state indexing. |
| PyO3 | https://pyo3.rs/ | Rust | Apache-2.0 | Rust↔Python FFI for exposing game engine bindings to the training loop. |
| rayon | https://docs.rs/rayon/ | Rust | Apache-2.0 | Work-stealing data parallelism for batch game simulation. |
| serde / serde_json | https://serde.rs/ | Rust | Apache-2.0 | JSON serialization/deserialization for MJAI protocol parsing. |
| ndarray | https://docs.rs/ndarray/ | Rust | Apache-2.0 | N-dimensional array operations for constructing observation tensors. |
| ort | https://docs.rs/ort/ | Rust | Apache-2.0 | ONNX Runtime Rust bindings. Primary inference engine for self-play: loads exported PyTorch model as ONNX, runs forward passes with CUDA EP, CUDA graphs, and I/O binding for <5ms latency. This is the hot path during self-play — inference speed directly limits training throughput. |
| tract | https://docs.rs/tract/ | Rust | MIT OR Apache-2.0 | Pure Rust ML inference engine (no C++ deps). CPU-only fallback for environments without CUDA. Useful for CI testing and CPU-only deployment. |
| candle | https://github.com/huggingface/candle | Rust | Apache-2.0 | HuggingFace's Rust ML framework with CUDA and Metal support. Alternative to ONNX path — write inference directly in Rust, avoiding the PyTorch→ONNX export step. Worth evaluating if ONNX export causes accuracy loss or operator compatibility issues. |
| Burn | https://github.com/tracel-ai/burn | Rust | MIT OR Apache-2.0 | Native Rust training + inference framework with WGPU, CUDA, and LibTorch backends. Long-term option for moving the entire training loop to Rust (eliminating Python entirely). Growing ONNX import support. |
| tch-rs | — | Rust | MIT OR Apache-2.0 | Rust bindings for LibTorch. Alternative to PyO3 approach — call LibTorch directly from Rust instead of going through Python. Trades Python flexibility for lower FFI overhead. |
| mahjong (Python) | https://github.com/MahjongRepository/mahjong | Python | MIT | Hand scoring oracle — yaku detection, han/fu/score calculation, validated against 11M+ Tenhou hands. Pin to v1.4.0. Dev dependency for Rust engine verification and test case extraction. |
| agari | https://github.com/rysb-dev/agari | Rust | MIT (no LICENSE file) | Complete scoring engine (35 yaku, fu, payment, hand decomposition, ~100 unit tests). Most architecturally clean Rust mahjong scorer — study its HandDecomposition trait and Fu calculation for Hydra's own scoring module. Cargo.toml declares MIT but repo lacks a LICENSE file — safe to use as reference. |
| mahc | https://github.com/DrCheeseFace/mahc | Rust | BSD-3 | Scoring library with explicit Fu enum (each fu source is a named variant, not magic numbers). 38 yaku, 30K crates.io downloads. Study the Fu enum pattern — makes fu calculation self-documenting and testable vs Mortal's opaque approach. |
| mahjax | https://github.com/nissymori/mahjax | Python/JAX | Apache-2.0 | JAX-vectorized riichi environment reaching ~1.6M steps/sec on 8×A100 via JIT compilation. Matters for self-play: JAX vectorization can run thousands of games simultaneously on GPU, potentially 10-100x faster than sequential Rust simulator for generating training data. Study their state representation and vectorized game logic. |
| RiichiEnv | https://github.com/smly/RiichiEnv | Rust/Python | Apache-2.0 | Gym-style RL environment with Rust core + Python bindings, Mortal-compatible MJAI output. Verified correct over 1M+ games. Matters because it provides a ready-made OpenAI Gym interface — if Hydra's training loop uses standard Gym APIs (reset/step/reward), this slots in directly. Also useful as correctness oracle for our own Rust game engine. |
| Meowjong | https://github.com/VictorZXY/Meowjong | Python | MIT | Only open-source 3-player (sanma) mahjong AI. IEEE CoG 2022. Includes 5 CNN model variants and a Tenhou sanma log downloader. Matters because sanma is a stretch goal — if Hydra ever targets 3-player, this is the only reference implementation with published results. Also validates that CNN architectures work for reduced-player mahjong. |
| CleanRL | https://github.com/vwxyzjn/cleanrl | Python | MIT | Single-file PPO implementation (~250 lines) with wandb integration. Accompanied by the "37 Implementation Details of PPO" blog post that documents every hyperparameter and trick that matters. Hydra's PPO should be validated against CleanRL's implementation — same clipping, advantage normalization, value loss clipping, entropy coefficient schedule. The blog post is required reading before writing our PPO. |
| OpenSpiel | https://github.com/google-deepmind/open_spiel | C++/Python | Apache-2.0 | DeepMind's game RL framework with 70+ games, including AlphaZero, MCTS, CFR, and self-play training loops. Matters for Hydra's Phase 3 (league training): study their self-play loop architecture — how they manage opponent pools, ELO tracking, and policy selection. Also has imperfect-info game solvers that inform belief-state approaches. |
| Microsoft Olive | https://github.com/microsoft/Olive | Python | MIT | End-to-end model optimization: PyTorch → ONNX with quantization, pruning, operator fusion, shape inference via YAML config. Matters for inference speed during self-play: training generates millions of forward passes, so even 2x speedup from INT8 quantization directly halves self-play wall time. Use after model architecture stabilizes. |
| rlcard | https://github.com/datamllab/rlcard | Python | MIT | RL toolkit with a mahjong environment and pre-built DQN/NFSP agents. Lower fidelity than mahjax/RiichiEnv (simplified rules), but useful for rapid prototyping of reward shaping and training loop mechanics before running on the full environment. |
| mjai.app | https://github.com/smly/mjai.app | — | AGPL-3.0 | RiichiLab competition platform using MJAI protocol with Docker-based evaluation. Matters because this is a target venue — Hydra must produce MJAI-compatible output to enter competitions and benchmark against other AIs. Study their Docker submission format and evaluation harness. |
| Project | URL | Description |
|---|---|---|
| mjai | https://github.com/gimite/mjai | Original MJAI protocol server |
| mjai-gateway | https://github.com/tomohxx/mjai-gateway | MJAI ↔ Tenhou translator |
| Resource | URL | Content |
|---|---|---|
| Mortal Documentation | https://mortal.ekyu.moe | Architecture insights, performance data, playstyle statistics |
| MJAI Protocol Wiki | https://gimite.net/pukiwiki/index.php?MJAI | Standard protocol specification ( |
| MJAI Web Reviewer | https://mjai.ekyu.moe/ | Web interface for instant game reviews |
| Tenhou Documentation | https://tenhou.net/man/ | Tenhou log format specification (old /doc/ path returns 404) |
| Majsoul API | Various GitHub repos | Log extraction methods via WebSocket capture |
| NAGA Documentation | https://dmv.nico/en/articles/mahjong_ai_naga/ | Commercial AI architecture overview |
| Riichi Wiki — NAGA | https://riichi.wiki/Mahjong_AI_%E3%80%8CNAGA%E3%80%8D | Community wiki page on NAGA |
| Phoenix Paper | https://csci527-phoenix.github.io/documents/Paper.pdf | Open-source reproducible mahjong agent |
| ONNX Runtime | https://onnxruntime.ai/ | Production inference runtime |
| Source | Topics |
|---|---|
| Mortal GitHub Issues & Discussions | Known weaknesses, training problems, oracle guiding removal |
| r/Mahjong (Reddit) | Player perspective on AI behavior, known weaknesses |
| Discord (Riichi Mahjong) | Community testing, strategy discussion |
| Tenhou forums | High-level play analysis |
| Note.com mahjong blogs (Japanese) | 場況 (bakyou) struggles, efficiency vs situational tactics |
See ECOSYSTEM.md § Data Sources & Datasets for the current training data summary. A separate
archive/DATA_SOURCES.mdfile is not present in the current repo.
| Resource | Description |
|---|---|
| tomohxx Algorithm | Set-based recurrence, O(n) complexity; table-based lookup |
| tomohxx Tables | Suhai table: 1,940,777 entries × 10 bytes (~19.4 MB); Jihai table: 78,032 entries × 10 bytes (~0.78 MB) |
| tomohxx Indexing | Base-5 encoding: `tiles.iter().fold(0, |
| tomohxx Compressed | shanten_suhai.bin.gz (191 KB), shanten_jihai.bin.gz (5.6 KB) |
| xiangting Implementation | Rust port with 3-player support |
| Kanachan xiangting | LOUDS-based TRIE shanten calculator |
| Mahjong Algorithm Book | Japanese reference, theoretical background |
| Cryolite (2023) | "A Fast and Space-Efficient Algorithm for Calculating Deficient Numbers" |
| Resource | Description |
|---|---|
| Japanese Mahjong Strategy Books | Traditional defense theory |
| Daina Chiba's Defense | Quantitative suji analysis |
| Tenhou Player Guides | Statistical safety percentages |
| Suji Safety Note | Suji is approximately 60-70% safe (not 100%); protects only against ryanmen waits |
| Genbutsu Definition | 100% safe — tiles discarded by or after opponent's riichi |
| Kabe Definition | All 4 copies visible → no-chance wait; 3 copies = one-chance |
| Half-suji / Full-suji | One side visible vs both sides visible |
| killer_mortal_gui Heuristics | Ryanmen 3.5×, Kanchan 0.21×, Kanchan suji-trap 2.6×, Penchan 1.0×, Honor tanki/shanpon 1.7×; modifiers: Dora 1.2×, Ura-suji 1.3×, Matagi early 0.6×, Matagi riichi 1.2×, Red 5 discard 0.14× |
| Resource | Description |
|---|---|
| Tenhou Scoring Tables | Standard yaku/fu calculation |
| World Riichi Championship Rules | International standard |
| EMA Rules | European standard |
| Rank | Dan | Approx. Strength |
|---|---|---|
| R2000+ | 7-dan+ | Expert |
| R1800-2000 | 5-6 dan | Strong |
| R1600-1800 | 3-4 dan | Intermediate |
| AI | Platform | Achievement | Year | Notes |
|---|---|---|---|---|
| NAGA | Tenhou | 10-dan (26,598 games — unverified) | 2018+ | Pure imitation learning; current models ~9-dan stable |
| Suphx | Tenhou | 10-dan (5,373 games), 8.74 stable | 2020 | SL + RL + oracle guiding; paper states 100+ humans have achieved 10-dan |
| LuckyJ | Tenhou | 10-dan (1,321 games), 10.68 stable | 2023 | ACH + OLSS; statistically stronger than both NAGA and Suphx |
| Mortal | — | No ranked play | — | Tenhou rejected Mortal's AI account request (FAQ: "Tenhou rejected my AI account request for Mortal because Mortal was developed by an individual rather than a company"). Community-estimated ~7-dan play strength from mjai-reviewer analysis. |
| NAGA | Majsoul | Celestial | 2022 | — |
License policy: See ../infrastructure/INFRASTRUCTURE.md#license-compatibility
Mortal repository discussions relevant to Hydra design decisions:
| Discussion # | Topic | Key Insight |
|---|---|---|
| (source code) | MC returns vs TD | Mortal uses MC returns (not TD) for Q-targets — confirmed from source code (train.py Q-target computation). q_target = gamma^steps_to_done * kyoku_reward with no bootstrap from next-state Q-values. Hydra follows the same approach. |
| #27 | Batch size recommendations | Practical guidance on training batch sizes for mahjong RL. |
| #43 | torch.compile speedup | torch.compile gives 15-20% training speedup on Mortal. Hydra should enable this from day one. |
| #52 | NextRankPredictor rationale | Auxiliary task that predicts next placement — stabilizes feature learning by giving the backbone a secondary objective beyond Q-values. |
| #64 | Catastrophic forgetting in online RL | When transitioning from offline (behavioral cloning) to online (self-play), the model forgets offline knowledge. Equim-chan confirms this is a real problem. Hydra must plan for gradual transition with replay buffer mixing. |
| #70 | DeepCFR for GRP replacement | Community explored using DeepCFR instead of GRP. Conclusion: not practical for 4-player mahjong due to game tree size. |
| #91 | Mortal-Policy (PPO fork) | Nitasurin's PPO fork open-sourced. Confirms PPO works for mahjong, validates Hydra's algorithm choice. |
| #102 | Oracle guiding removed | Equim-chan: "didn't bring improvements in practice." Critical for Hydra — Suphx's oracle guiding (our Phase 1 inspiration) was tried and abandoned by Mortal's author. Hydra's oracle approach must differ from Suphx's naive implementation. |
| #108 | Maximum player score in observations | Discussion about score capping at 30K in observation encoding. Relevant to Hydra's uncapped score encoding decision. |
Mortal repository issues relevant to Hydra improvements:
| Issue # | Description |
|---|---|
| #111 | Overtake score miscalculation — Mortal miscalculates hand-building near placement thresholds; motivates Hydra's uncapped score encoding |
| #113 | Rating system closure discussion — community debate on whether to shut down Mortal's rating feature |
For academic reference to Hydra:
Hydra: A Practical Mahjong AI Architecture
Combining Oracle Distillation with Explicit Opponent Modeling
2026
Key techniques to cite:
- Oracle Distillation: Li et al. (2020) "Suphx"
- SE-ResNet Backbone: Hu et al. (2018) "Squeeze-and-Excitation Networks"
- PPO Training: Schulman et al. (2017) "Proximal Policy Optimization"
- GroupNorm: Wu & He (2018) "Group Normalization"
- League Training: Vinyals et al. (2019) "AlphaStar"