Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy

DSpace Repository

Language: English čeština 

Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy

Show full item record

No preview available
Title: Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy
Author: Tayo, Aderiye Oluwasijibomi
Advisor: Beltran Prieto, Luis Antonio
Abstract: This thesis explores the impact of different text chunking strategies on the performance of Large Language Models (LLMs) in applications such as retrieval-augmented generation (RAG) and semantic search. It presents a comparative evaluation of sentence-based, recursive, and semantic chunking methods, analyzing their effectiveness in preserving context and meaning. Building on these insights, the thesis introduces a novel hybrid approachMarkdown-Aware Semantic Chunking (MASC)which leverages document structure and semantic similarity to optimize chunk formation. Empirical results demonstrate that MASC outperforms traditional methods across key evaluation metrics, offering improved accuracy, relevance, and faithfulness in LLM-generated responses.
URI: http://hdl.handle.net/10563/57753
Date: 2024-10-27
Availability: Bez omezení
Department: Ústav informatiky a umělé inteligence
Discipline: Software Engineering


Citace závěřečné práce

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show full item record

Find fulltext

Search DSpace


Browse

My Account