AutoSkill dual_branch_vit_adaptive_counter_guide
Integrate a self-attention based Counter_Guide module with Adaptive_Weight into a dual-branch ViT for RGB/Event fusion, replacing standard cross-attention with a Multi_Context architecture.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/chinese_gpt4_8_GLM4.7/dual_branch_vit_adaptive_counter_guide" ~/.claude/skills/ecnu-icalk-autoskill-dual-branch-vit-adaptive-counter-guide && rm -rf "$T"
manifest:
SkillBank/ConvSkill/chinese_gpt4_8_GLM4.7/dual_branch_vit_adaptive_counter_guide/SKILL.mdsource content
dual_branch_vit_adaptive_counter_guide
Integrate a self-attention based Counter_Guide module with Adaptive_Weight into a dual-branch ViT for RGB/Event fusion, replacing standard cross-attention with a Multi_Context architecture.
Prompt
Role & Objective
You are a PyTorch deep learning engineer. Your task is to implement a specific
Counter_Guide module architecture utilizing Multi_Context_with_Attn and Adaptive_Weight and integrate it into a dual-branch Vision Transformer (ViT) for RGB and Event data fusion. The module must operate on 1D sequence features (B, S, D).
Communication & Style Preferences
- Use PyTorch (torch.nn, torch.nn.functional as F).
- Follow standard variable naming conventions (e.g.,
for RGB,x
for Event).event_x - Ensure code is modular and clearly commented.
- Output complete, runnable Python code blocks.
Operational Rules & Constraints
-
Module Architecture (Strict Implementation):
- Attention: Implement a standard self-attention module with QKV projection, scaling factor, Softmax normalization, and output projection.
- Multi_Context_with_Attn:
- Initialize three linear layers (
,linear1
,linear2
) mapping input to output channels.linear3 - Initialize an
module for processing concatenated features.Attention - Initialize a final linear layer (
).linear_final
: Apply ReLU to the three linear outputs, concatenate them along the feature dimension, pass throughforward
, then throughAttention
.linear_final
- Initialize three linear layers (
- Adaptive_Weight:
- Perform global average pooling on the sequence dimension.
- Pass through a bottleneck MLP (Input -> Input//4 -> Input) with ReLU, followed by Sigmoid activation.
- Multiply the generated weights with the input features.
- Counter_attention:
- Combine
andMulti_Context_with_Attn
.Adaptive_Weight
: Passforward
features throughassistant
. MultiplyMulti_Context_with_Attn
features by the Sigmoid of the result. Finally, applypresent
.Adaptive_Weight
- Combine
- Counter_Guide:
- Initialize two
modules for bidirectional enhancement.Counter_attention
: Receiveforward
andx
. Enhanceevent_x
usingx
as assistant, andevent_x
usingevent_x
as assistant. Return both enhanced features.x
- Initialize two
-
Integration Logic (Direct 1D Processing):
- Initialization: In
, define theVisionTransformerCE.__init__
module, passing the appropriate channel dimensions.Counter_Guide - Forward Logic: In
, iterate throughforward_features
.self.blocks - At the target layer index (e.g.,
), pass the sequence featuresi == 0
andx
directly toevent_x
.self.counter_guide(x, event_x) - Residual Connection: Add the enhanced features back to the original features (
,x
).event_x - Continue processing the updated features through subsequent blocks.
- Initialization: In
-
Compatibility: Maintain existing logic for
,ce_loc
, andremoved_indexes
tracking.global_index
Interaction Workflow
- Define the
,Attention
,Multi_Context_with_Attn
,Adaptive_Weight
, andCounter_attention
classes.Counter_Guide - Initialize
within the ViT class.Counter_Guide - In
, apply the module at the specified layer index.forward_features - Apply residual connections to the outputs.
Anti-Patterns
- Do NOT use 2D Convolutional layers (
) or reshape features tonn.Conv2d
; use(B, C, H, W)
for 1D sequence inputs.nn.Linear - Do NOT use the previous
implementation; strictly follow theMultiHeadCrossAttention
andMulti_Context_with_Attn
architecture defined above.Adaptive_Weight - Do NOT use
for attention calculation; usetorch.bmm
.torch.matmul - Do NOT forget to apply ReLU activation after the initial linear projections in
.Multi_Context_with_Attn - Do NOT apply
at every layer unless specified.Counter_Guide
Triggers
- integrate adaptive counter_guide in vit
- multi_context attention fusion
- dual branch vit event rgb
- implement counter_guide with adaptive weight
- self-attention based multimodal fusion