Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy
Show full item record
No preview available
|
Title:
|
Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy |
| Author: |
Tayo, Aderiye Oluwasijibomi
|
| Advisor: |
Beltran Prieto, Luis Antonio
|
|
Abstract:
|
This thesis explores the impact of different text chunking strategies on the performance of Large Language Models (LLMs) in applications such as retrieval-augmented generation (RAG) and semantic search. It presents a comparative evaluation of sentence-based, recursive, and semantic chunking methods, analyzing their effectiveness in preserving context and meaning. Building on these insights, the thesis introduces a novel hybrid approachMarkdown-Aware Semantic Chunking (MASC)which leverages document structure and semantic similarity to optimize chunk formation. Empirical results demonstrate that MASC outperforms traditional methods across key evaluation metrics, offering improved accuracy, relevance, and faithfulness in LLM-generated responses. |
|
URI:
|
http://hdl.handle.net/10563/57753
|
|
Date:
|
2024-10-27 |
|
Availability:
|
Bez omezení |
|
Department:
|
Ústav informatiky a umělé inteligence |
|
Discipline:
|
Software Engineering |
Citace závěřečné práce
Files in this item
|
There are no files associated with this item.
|
This item appears in the following Collection(s)
Show full item record
Search DSpace
Browse
-
All of DSpace
-
This Collection
My Account