KORE

Enhancing Knowledge Injection for Large Multimodal Models via

Knowledge-Oriented Augmentations and Constraints

Teaser

"Knowledge-Oriented Control, Accurate Adaptation and Powerful Retention!"

– Evolving Knowledge Injection




Introduction

Large Multimodal Models encode extensive factual knowledge in their pre-trained weights. However, its knowledge remains static and limited, unable to keep pace with real-world developments, which hinders continuous knowledge acquisition. Effective knowledge injection thus becomes critical, involving two goals: knowledge adaptation (injecting new knowledge) and knowledge retention (preserving old knowledge). Existing methods often struggle to learn new knowledge and suffer from catastrophic forgetting. To address this, we propose KORE, a synergistic method of KnOwledge-oRientEd augmentations and constraints for injecting new knowledge into large multimodal models while preserving old knowledge. Unlike general text or image data augmentation, KORE automatically converts individual knowledge items into structured and comprehensive knowledge to ensure that the model accurately learns new knowledge, enabling accurate adaptation. Meanwhile, KORE stores previous knowledge in the covariance matrix of LMM's linear layer activations and initializes the adapter by projecting the original weights into the matrix's null space, defining a fine-tuning direction that minimizes interference with previous knowledge, enabling powerful retention. Extensive experiments on various LMMs, including LLaVA-v1.5-7B, LLaVA-v1.5-13B, and Qwen2.5-VL-7B, show that KORE achieves superior new knowledge injection performance and effectively mitigates catastrophic forgetting.

KnOwledge-oRientEd Augmentations and Constraints

Overview of KORE, a synergistic method for knowledge-oriented augmentation and constraint. KORE-Augmentation automatically converts each piece of knowledge into profound and structured knowledge. KORE-Constrain minimizes interference with previous knowledge by initializing an adapter with null space that stores covariance matrix of previous knowledge.

KORE-Augmentation

We observe that KORE-Augmentation augments the original knowledge into multi-rounds dialogues data (forming the trunk) and instruction tasks data (forming the branches), thereby constructing a comprehensive and higher-level knowledge tree (Left part of Figure 3) that supports superior generalization and internalization of new knowledge. KORE-Augmentation moves beyond enabling models to accurately fit training data for “data memorization”. Instead, it focuses on helping the model comprehend and reason about the inherent logic and associations within the knowledge itself. This enables the model to think, internalize new knowledge, and effectively extract and manipulate the learned knowledge, thereby achieving real “knowledge internalization”.

KORE-Constraint

Our analysis reveals two key findings: (1) Figure 4 (a) and (b) demonstrate that CO-SVD exhibits superior performance retention compared to Plain SVD, ASVD and suggest that multimodal knowledge can be effectively captured and stored in covariance matrix. (2) Figure 4 (c) shows that covariance matrices of linear layer inputs share similar outlier patterns for related tasks (POPE and HallusionBench), but differ from unrelated ones (MMBench), indicating that distinct tasks exhibit different outlier distributions in the covariance matrix. To build a multi-dimensional covariance matrix for KORE, we finally sample 64 examples per category from OneVision's single-image subset (General, Doc/Chart/Screen, Math/Reasoning, General OCR).

Figure 4 (a) and (b) demonstrate that CO-SVD exhibits superior performance retention compared to Plain SVD, ASVD and suggest that multimodal knowledge can be effectively captured and stored in covariance matrix.

Figure 4 (c) shows that covariance matrices of linear layer inputs share similar outlier patterns for related tasks (e.g. POPE and HallusionBench), but differ from unrelated ones (e.g. MMBench), indicating that distinct tasks exhibit different outlier distributions in the covariance matrix.

Analysis of Main Results

Image 1

Specifically, KORE (rank=235) achieves improvements of 12.63 in CEM and 21.27 in F1-Score over the best baseline on EVOKE, even outperforming LoRA by more than twofold.

Specifically, KORE (rank=235) outperforms LoRA across all knowledge retention tests, achieving top scores on OCR, M-DIS, HAL, and placing second on INS. Despite containing both multi-rounds dialogue and instruction tasks data, KORE-74K's performance on INS and M-IDU is suboptimal. We attribute this to the number of trainable parameters and the source of the covariance matrix. For instance, when r=256, KORE shows powerful retention performance, trailing Replay by a mere 2.31 on INS and outperforming it by 3.87 on M-IDU.

Specifically, KORE (rank=235) achieves an 8.41 improvement over the strongest baseline, demonstrating its superior comprehensive performance. These gains arise from KORE's ability to optimize the trade-off between injecting and preserving knowledge.
Image 2

Figure 5 compares 20 fine-grained News and Entity types from EVOKE. KORE consistently outperforms all baselines, demonstrating strong and comprehensive knowledge adaptation.
Image 3

Specifically, KORE outperforms LoRA (e.g. 6.53↑ in Avg) and continual learning methods (e.g. EWC, LwF and SEFE), achieving top scores on OCRVQA, MMMU and Hall. Furthermore, by adjusting trainable parameters (rank=256) and covariance matrix source, it closely matches or even exceeds Replay.
Image 3

Table 3 shows that specific constraints slightly reduce K.A score but substantially improve K.R and overall performance. Figure 6 further shows that specific constraints enhance targeted knowledge retention, notably with a 7.17 gain on MME, demonstrating their potential for tailored knowledge retention.

Analysis of Various LMM scales and Architectures

Image 1

Table 4 shows that KORE surpasses LoRA (e.g. 16.63↑ in CEM and 21.64↑ in F1-Score) on EVOKE, and achieves superior K.R performance across all six dimensions including M-DIS. With an overall improvement of 10.74 over Replay, these results confirm KORE's strong potential for larger LMMs.

On Qwen2.5-VL (7B), it surpasses LoRA (e.g. 12.63↑ in CEM and 21.27↑ in F1-Score) and Replay (e.g. 3.40↑ in Avg). Smaller improvement stems from Qwen2.5-VL's robust knowledge system, honed via three-stage training, which reduce marginal gains from knowledge injection (e.g. Comparing Table 1 and 4, Qwen2.5-VL (7B)'s gains are less than LLaVA-v1.5 (7B)'s with LoRA on EVOKE).

Analysis of Ablation Experiments

Image 1

Figure 7 shows a clear trend: KORE's performance increases with higher rank and more trainable parameters on nearly all evaluations. KORE (rank=64) still surpasses Replay in Avg, only using less than half of parameters of Replay.

Table 5 validates KORE's design, showing that each ablated component contributes positively to its overall performance. W/o Augmentation is particularly detrimental to knowledge adaptation (19.82↓ in CEM and 22.95↓ in F1-Score). Meanwhile, W/o Constraint and W/o Frozen Matrix A impairs knowledge retention.
Image 1

Qualitative Examples

Our Team

BibTeX

@article{jiang2025kore,
  title = {KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints},
  author={Jiang, Kailin and Jiang, Hongbo and Jiang, Ning and Gao, Zhi and Bi, Jinhe and Ren, Yuchen and Li, Bin and Du, Yuntao and Liu, Lei and Li, Qing},
  year = {2025}
}