大模型推理执行流程图

展示 Prefill、Decoding、工具调用与强制 Token 队列的执行链路。

graph TD
    A[User Prompt Tokens] --> B{Prefill Phase};
    B -->|Batch=1| C[Model Forward];
    C --> D[Initial KV Cache];
    D -->|Expand to Batch=N| E[Decoded KV Cache];
    
    subgraph "Decoding Loop (Row-wise)"
        E --> F[Model Forward (Next Token)];
        F --> G[Logits];
        G --> H{Sampling Strategy};
        H -->|Temp/Top-K| I[Sampled Token];
        
        I --> J{Tool Logic Check};
        J -- Normal Token --> K[Yield Token];
        J -- "<|python_start|>" --> L[Enter Python Mode];
        J -- "<|python_end|>" --> M[Exec Calculator];
        
        M --> N[Result Tokens];
        N -->|Push to Queue| O[Forced Tokens Deque];
        O -->|Override Sample| P[Next Input Token];
        
        K --> P;
        L --> P;
        P --> E;
    end

留言讨论

0 条留言

正在加载留言...