DisProtEdit: Exploring Disentangled Representations for Multi-Attribute Protein Editing

^♠️†Max Ku, ^♠️♡Sun Sun, ^♡Hongyu Guo, ^♠️†Wenhu Chen

^♠️University of Waterloo, ^♡National Research Council Canada, ^†Vector Institute m3ku@uwaterloo.ca, sun.sun@uwaterloo.ca, wenhu.chen@uwaterloo.ca

ICMLW 2025 (GenBio, FM4LS)

Code arXiv

Figure 1: (a, c) Original protein sequences with their corresponding structural and functional attributes. (b, d) Edited proteins generated by DisProtEdit in response to compositional prompts: (b) increase alpha-helices and increase Pin1 stability; (d) increase beta-sheets and increase Villin stability. These edits are considered among the most difficult combinations in our benchmark, where prior methods tend to fail. Despite the challenge, DisProtEdit demonstrates the ability to generate meaningful and plausible modifications. This highlights the model’s capacity to perform multi-attribute editing via modular, text-guided control over distinct biological properties.

Abstract

DisProtEdit is a controllable protein editing framework that disentangles structural and functional representations via dual-channel natural language supervision. Each protein is annotated with structure and function texts, derived automatically using GPT-4o, forming the SwissProtDis dataset (540k entries). Our model uses alignment and uniformity objectives for modality fusion and introduces a novel angular MMD loss for disentanglement. Editing is performed by modifying text prompts and interpolating in latent space, supporting modular control. We evaluate on a new multi-attribute editing benchmark and TAPE tasks, showing strong accuracy (up to 61.7% both-hit edit success) and competitive downstream performance, with improved interpretability and controllability.

How DisProtEdit works?

Figure 2: Overview of the DisProtEdit framework. (a) During training, a protein sequence and its corresponding structural and functional text descriptions are encoded into a shared embedding space using three loss functions: alignment, uniformity, and angular MMD. The alignment loss ensures that the protein embedding closely matches the concatenated structural and functional embeddings. The uniformity loss prevents representational collapse by encouraging embeddings to be spread uniformly on the hypersphere. To disentangle structural and functional semantics, we introduce a novel angular MMD (Maximum Mean Discrepancy) loss that separates the latent space into orthogonal subspaces, one for structure and one for function. (b) A T5-based decoder is trained to reconstruct protein sequences from the learned embeddings. (c) During editing, a new text prompt is used to generate an updated embedding, which is interpolated with the original using spherical linear interpolation (slerp), allowing precise, modular edits. (d) The SwissProtDis dataset is constructed by automatically decomposing UniProt annotations into structural and functional texts using GPT-4o, enabling dual-channel supervision at scale.

Findings

This method still far from perfect, as protein editing is a very challenging task. But we uncovered several key findings. First, alignment and uniformity objectives effectively integrate protein and text modalities. Second, angular MMD loss is essential for separating structure/function semantics. Lastly, DisProtEdit supports fine-grained control over edits, including hard cases like increasing helices and boosting stability, though these remain biologically challenging tasks.

Figure 3: We visualize UMAP projections of protein and text embeddings learned under different training strategies: (a) random projection, (b) contrastive learning, (c) contrastive learning followed by fine-tuning with alignment loss, and (d) DisProtEdit (ours). While contrastive learning improves cross-modal alignment over random projection, it still leaves a noticeable modality gap. This gap can be reduced with an additional fine-tuning stage using alignment loss, as shown in (c). In contrast, DisProtEdit achieves comparable cross-modal integration and semantic disentanglement in a single-stage training setup. This demonstrates the effectiveness of our alignment and uniformity objectives for learning meaningful, multimodal representations without requiring separate post-hoc alignment.

Figure 4: We present SVD projections of the learned embeddings under varying levels of the disentanglement weight λ_D, which controls the strength of the angular Maximum Mean Discrepancy (MMD) loss. When λ_D = 0, structural and functional embeddings are highly entangled, forming overlapping distributions that indicate weak semantic separation. As λ_D increases, we observe growing divergence between these two modalities. At λ_D = 1.0, the separation is most effective, as structural and functional text embeddings form two distinct and symmetric clusters. This confirms that the angular MMD loss successfully encourages modular latent subspaces, enabling interpretable and targeted edits. However, excessively large λ_D (e.g., 5.0) may reduce alignment quality and editing success due to over-regularization. These results demonstrate the importance of balancing disentanglement strength with alignment to maintain both semantic clarity and editability in the learned representation space.

Limitations

LLM-derived supervision quality: Our dual-channel structural and functional descriptions are generated using GPT-4o, based on UniProt annotations. While this enables scalable dataset construction, it may introduce hallucinations or imprecise interpretations, potentially affecting downstream model performance and biological reliability.
Oracle-based evaluation bias: Editing success is assessed using pretrained predictors, which may be inaccurate on out-of-distribution sequences or subtle modifications. This limits the biological fidelity of evaluation and motivates future work involving wet-lab validation or structure-aware assessment.

Citation

Please kindly cite our paper if you use our code, data, models or results:

@misc{ku2025disproteditexploringdisentangledrepresentations,
                  title={DisProtEdit: Exploring Disentangled Representations for Multi-Attribute Protein Editing}, 
                  author={Max Ku and Sun Sun and Hongyu Guo and Wenhu Chen},
                  year={2025},
                  booktitle={ICML Workshop on Generative AI and Biology},
                  eprint={2506.14853},
                  archivePrefix={arXiv},
                  primaryClass={q-bio.QM},
                  url={https://arxiv.org/abs/2506.14853}, 
            }