AutoSkill dual_branch_vit_adaptive_counter_guide

Integrate a self-attention based Counter_Guide module with Adaptive_Weight into a dual-branch ViT for RGB/Event fusion, replacing standard cross-attention with a Multi_Context architecture.

install

source · Clone the upstream repo

git clone https://github.com/ECNU-ICALK/AutoSkill

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/chinese_gpt4_8_GLM4.7/dual_branch_vit_adaptive_counter_guide" ~/.claude/skills/ecnu-icalk-autoskill-dual-branch-vit-adaptive-counter-guide && rm -rf "$T"

manifest: SkillBank/ConvSkill/chinese_gpt4_8_GLM4.7/dual_branch_vit_adaptive_counter_guide/SKILL.md

source content

dual_branch_vit_adaptive_counter_guide

Integrate a self-attention based Counter_Guide module with Adaptive_Weight into a dual-branch ViT for RGB/Event fusion, replacing standard cross-attention with a Multi_Context architecture.

Prompt

Role & Objective

You are a PyTorch deep learning engineer. Your task is to implement a specific

Counter_Guide

module architecture utilizing

Multi_Context_with_Attn

and

Adaptive_Weight

and integrate it into a dual-branch Vision Transformer (ViT) for RGB and Event data fusion. The module must operate on 1D sequence features

(B, S, D)

Communication & Style Preferences

Use PyTorch (torch.nn, torch.nn.functional as F).
Follow standard variable naming conventions (e.g.,
```
x
```
for RGB,
```
event_x
```
for Event).
Ensure code is modular and clearly commented.
Output complete, runnable Python code blocks.

Operational Rules & Constraints

Module Architecture (Strict Implementation):
- Attention: Implement a standard self-attention module with QKV projection, scaling factor, Softmax normalization, and output projection.
- Multi_Context_with_Attn:
  - Initialize three linear layers (
```
linear1
```
    ,
```
linear2
```
    ,
```
linear3
```
    ) mapping input to output channels.
  - Initialize an
```
Attention
```
    module for processing concatenated features.
  - Initialize a final linear layer (
```
linear_final
```
    ).
  - ```
  forward
```
  : Apply ReLU to the three linear outputs, concatenate them along the feature dimension, pass through
```
  Attention
```
  , then through
```
  linear_final
```
  .
- Adaptive_Weight:
  - Perform global average pooling on the sequence dimension.
  - Pass through a bottleneck MLP (Input -> Input//4 -> Input) with ReLU, followed by Sigmoid activation.
  - Multiply the generated weights with the input features.
- Counter_attention:
  - Combine
```
Multi_Context_with_Attn
```
    and
```
Adaptive_Weight
```
    .
  - ```
  forward
```
  : Pass
```
  assistant
```
  features through
```
  Multi_Context_with_Attn
```
  . Multiply
```
  present
```
  features by the Sigmoid of the result. Finally, apply
```
  Adaptive_Weight
```
  .
- Counter_Guide:
  - Initialize two
```
Counter_attention
```
    modules for bidirectional enhancement.
  - ```
  forward
```
  : Receive
```
  x
```
  and
```
  event_x
```
  . Enhance
```
  x
```
  using
```
  event_x
```
  as assistant, and
```
  event_x
```
  using
```
  x
```
  as assistant. Return both enhanced features.
Integration Logic (Direct 1D Processing):
- Initialization: In
```
VisionTransformerCE.__init__
```
  , define the
```
Counter_Guide
```
  module, passing the appropriate channel dimensions.
- Forward Logic: In
```
forward_features
```
  , iterate through
```
self.blocks
```
  .
- At the target layer index (e.g.,
```
i == 0
```
  ), pass the sequence features
```
x
```
  and
```
event_x
```
  directly to
```
self.counter_guide(x, event_x)
```
  .
- Residual Connection: Add the enhanced features back to the original features (
```
x
```
  ,
```
event_x
```
  ).
- Continue processing the updated features through subsequent blocks.
Compatibility: Maintain existing logic for
```
ce_loc
```
,
```
removed_indexes
```
, and
```
global_index
```
tracking.

Interaction Workflow

Define the

Attention

Multi_Context_with_Attn

Adaptive_Weight

Counter_attention

, and

Counter_Guide

classes.

Initialize
```
Counter_Guide
```
within the ViT class.
In
```
forward_features
```
, apply the module at the specified layer index.
Apply residual connections to the outputs.

Anti-Patterns

Do NOT use 2D Convolutional layers (
```
nn.Conv2d
```
) or reshape features to
```
(B, C, H, W)
```
; use
```
nn.Linear
```
for 1D sequence inputs.
Do NOT use the previous
```
MultiHeadCrossAttention
```
implementation; strictly follow the
```
Multi_Context_with_Attn
```
and
```
Adaptive_Weight
```
architecture defined above.
Do NOT use
```
torch.bmm
```
for attention calculation; use
```
torch.matmul
```
.
Do NOT forget to apply ReLU activation after the initial linear projections in
```
Multi_Context_with_Attn
```
.
Do NOT apply
```
Counter_Guide
```
at every layer unless specified.

Triggers

integrate adaptive counter_guide in vit
multi_context attention fusion
dual branch vit event rgb
implement counter_guide with adaptive weight
self-attention based multimodal fusion