@@ -434,8 +434,27 @@ The latter is also used in tests.
434434There is no C++ implementation of a log reader. We do not have a scenario
435435motivating one.
436436
437- IR2Vec Embeddings
438- ================= 
437+ Embeddings
438+ ========== 
439+ 
440+ LLVM provides embedding frameworks to generate vector representations of code
441+ at different abstraction levels. These embeddings capture syntactic, semantic,
442+ and structural properties of the code and can be used as features for machine
443+ learning models in various compiler optimization tasks.
444+ 
445+ Two embedding frameworks are available:
446+ 
447+ - **IR2Vec **: Generates embeddings for LLVM IR
448+ - **MIR2Vec **: Generates embeddings for Machine IR
449+ 
450+ Both frameworks follow a similar architecture with vocabulary-based embedding
451+ generation, where a vocabulary maps code entities to n-dimensional floating
452+ point vectors. These embeddings can be computed at multiple granularity levels
453+ (instruction, basic block, and function) and used for ML-guided compiler
454+ optimizations.
455+ 
456+ IR2Vec
457+ ------ 
439458
440459IR2Vec is a program embedding approach designed specifically for LLVM IR. It
441460is implemented as a function analysis pass in LLVM. The IR2Vec embeddings
@@ -466,7 +485,7 @@ The core components are:
466485    compute embeddings for instructions, basic blocks, and functions.
467486
468487Using IR2Vec
469- ------------ 
488+ ^^^^^^^^^^^^ 
470489
471490.. note ::
472491
@@ -526,7 +545,7 @@ embeddings can be computed and accessed via an ``ir2vec::Embedder`` instance.
526545   between different code snippets, or perform other analyses as needed.
527546
528547Further Details
529- --------------- 
548+ ^^^^^^^^^^^^^^^ 
530549
531550For more detailed information about the IR2Vec algorithm, its parameters, and
532551advanced usage, please refer to the original paper:
@@ -538,6 +557,123 @@ triplets from LLVM IR, see :doc:`CommandGuide/llvm-ir2vec`.
538557The LLVM source code for ``IR2Vec `` can also be explored to understand the 
539558implementation details.
540559
560+ MIR2Vec
561+ ------- 
562+ 
563+ MIR2Vec is an extension of IR2Vec designed specifically for LLVM Machine IR 
564+ (MIR). It generates embeddings for machine-level instructions, basic blocks, 
565+ and functions. MIR2Vec operates on the target-specific machine representation,
566+ capturing machine instruction semantics including opcodes, operands, and 
567+ register information at the machine level.
568+ 
569+ MIR2Vec extends the vocabulary to include:
570+ 
571+ - **Machine Opcodes **: Target-specific instruction opcodes derived from the
572+   TargetInstrInfo, grouped by instruction semantics.
573+ 
574+ - **Common Operands **: All common operand types (excluding register operands),
575+   defined by the ``MachineOperand::MachineOperandType `` enum.
576+ 
577+ - **Physical Register Classes **: Register classes defined by the target,
578+   specialized for physical registers.
579+ 
580+ - **Virtual Register Classes **: Register classes defined by the target,
581+   specialized for virtual registers.
582+ 
583+ The core components are:
584+ 
585+ - **Vocabulary **: A mapping from machine IR entities (opcodes, operands, register
586+   classes) to their vector representations. This is managed by 
587+   ``MIR2VecVocabLegacyAnalysis `` for the legacy pass manager, with a 
588+   ``MIR2VecVocabProvider `` that can be used standalone or wrapped by pass 
589+   managers. The vocabulary (.json file) contains sections for opcodes, common 
590+   operands, physical register classes, and virtual register classes.
591+ 
592+   .. note ::
593+     
594+     The vocabulary file should contain these sections for it to be valid.
595+ 
596+ - **Embedder **: A class (``mir2vec::MIREmbedder ``) that uses the vocabulary to
597+   compute embeddings for machine instructions, machine basic blocks, and 
598+   machine functions. Currently, ``SymbolicMIREmbedder `` is the available 
599+   implementation.
600+ 
601+ Using MIR2Vec
602+ ^^^^^^^^^^^^^ 
603+ 
604+ .. note ::
605+ 
606+    This section describes how to use MIR2Vec within LLVM passes. `llvm-ir2vec `
607+    tool ` :doc: `CommandGuide/llvm-ir2vec ` can be used for generating MIR2Vec
608+    embeddings from Machine IR files (.mir), which can be useful for generating
609+    embeddings outside of compiler passes.
610+ 
611+ To generate MIR2Vec embeddings in a compiler pass, first obtain the vocabulary,
612+ then create an embedder instance to compute and access embeddings.
613+ 
614+ 1. **Get the Vocabulary **:
615+    In a MachineFunctionPass, get the vocabulary from the analysis:
616+ 
617+    .. code-block :: c++ 
618+ 
619+       auto &VocabAnalysis = getAnalysis<MIR2VecVocabLegacyAnalysis>();
620+       auto VocabOrErr = VocabAnalysis.getMIR2VecVocabulary(*MF.getFunction().getParent()); 
621+       if (!VocabOrErr) { 
622+         // Handle error: vocabulary is not available or invalid 
623+         return; 
624+       } 
625+       const mir2vec::MIRVocabulary &Vocabulary = *VocabOrErr; 
626+ 
627+    Note that ``MIR2VecVocabLegacyAnalysis `` is an immutable pass.
628+ 
629+ 2. **Create Embedder instance **:
630+    With the vocabulary, create an embedder for a specific machine function:
631+ 
632+    .. code-block :: c++ 
633+ 
634+       // Assuming MF is a MachineFunction&
635+       // For example, using MIR2VecKind::Symbolic: 
636+       std::unique_ptr<mir2vec::MIREmbedder> Emb =
637+           mir2vec::MIREmbedder: :create(MIR2VecKind::Symbolic, MF, Vocabulary);
638+ 
639+ 
640+ 3. **Compute and Access Embeddings **:
641+    Call ``getMFunctionVector() `` to get the embedding for the machine function.
642+ 
643+    .. code-block :: c++ 
644+ 
645+     mir2vec::Embedding FuncVector = Emb->getMFunctionVector();
646+ 
647+    Currently, ``MIREmbedder `` can generate embeddings at three levels: Machine
648+    Instructions, Machine Basic Blocks, and Machine Functions. Appropriate 
649+    getters are provided to access the embeddings at these levels.
650+ 
651+    .. note ::
652+ 
653+     The validity of the ``MIREmbedder `` instance (and the embeddings it 
654+     generates) is tied to the machine function it is associated with. If the 
655+     machine function is modified, the embeddings may become stale and should 
656+     be recomputed accordingly.
657+ 
658+ 4. **Working with Embeddings: **
659+    Embeddings are represented as ``std::vector<double> ``. These vectors can be
660+    used as features for machine learning models, compute similarity scores
661+    between different code snippets, or perform other analyses as needed.
662+ 
663+ Further Details
664+ ^^^^^^^^^^^^^^^ 
665+ 
666+ For more detailed information about the MIR2Vec algorithm, its parameters, and
667+ advanced usage, please refer to the original paper:
668+ `RL4ReAl: Reinforcement Learning for Register Allocation  <https://doi.org/10.1145/3578360.3580273 >`_.
669+ 
670+ For information about using MIR2Vec tool for generating embeddings from
671+ Machine IR, see :doc: `CommandGuide/llvm-ir2vec `.
672+ 
673+ The LLVM source code for ``MIR2Vec `` can be explored to understand the 
674+ implementation details. See ``llvm/include/llvm/CodeGen/MIR2Vec.h `` and 
675+ ``llvm/lib/CodeGen/MIR2Vec.cpp ``.
676+ 
541677Building with ML support
542678======================== 
543679
0 commit comments