Large general-purpose language models are not optimized by default for specific industries and business tasks. By using methods such as specialized fine-tuning, pruning unnecessary neural components, and knowledge distillation, models can be rearchitected to work faster, be cheaper to operate, and provide more accurate results.
The book "Rearchitecting LLM: Structural Methods for Creating Efficient Models" translates ideas from the latest AI research into practical approaches for optimizing models for specific tasks. Working with this practical book, you will perform "surgical" tuning of popular open-source models—such as Llama-3, Gemma, and Qwen—to create cost-effective local small language models (SLM).
As you study the material, you will learn to combine behavioral analysis of models with structural changes in architecture: identify and remove components that do not contribute to the model's goals, and apply "fair pruning" methods to reduce model bias at the level of individual neurons.
What's inside the book:
- universal methods for model architecture tuning
- end-to-end model rearchitecture pipelines
- improving explainability and reducing bias through model "cleaning"
- replacing external LLMs with local SLMs