LangChain, an on-premises CPU LLM, and a basic Prompt

Self-hosting Large Language Models?

  • Possible

Are they as capable as SaaS subscription models?

  • No, but they have custom use-cases.

 

LLMWare Bling on a CPU

Utilize the Hugging Face pipeline for easy access to a pre-trained model. LLMware Bling, being a CPU LLM, is part of this setup. The model's configuration enables remote code execution from Hugging Face.

 

https://huggingface.co/llmware/bling-stable-lm-3b-4e1t-v0

bling-stable-lm-3b-4e1t-0.1 part of the BLING ("Best Little Instruction-following No-GPU-required") model series, RAG-instruct trained on top of a StabilityAI stablelm-3b-4e1t base model.

BLING models are fine-tuned with distilled high-quality custom instruct datasets, targeted at a specific subset of instruct tasks with the objective of providing a high-quality Instruct model that is 'inference-ready' on a CPU laptop even without using any advanced quantization optimizations.

 

 

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline model_id = "llmware/bling-stable-lm-3b-4e1t-v0" # Ensure the directory for saving models is created and specified in your environment # This is more about ensuring that the model download doesn't prompt for storage location or confirmation import os from transformers import logging # Optionally, increase logging level if you want to see more details about the download process logging.set_verbosity_info() # Make sure you have set TRANSFORMERS_CACHE in your environment variables # os.environ["TRANSFORMERS_CACHE"] = "/path/to/your/preferred/cache/directory" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True) pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=500) hf = HuggingFacePipeline(pipeline=pipe)

 

Prompt for the model with LangChain

from langchain.prompts import PromptTemplate template = """Question: {question} Answer: Let's think step by step.""" prompt = PromptTemplate.from_template(template) chain = prompt | hf question = "What is electroencephalography?" test = chain.invoke({"question": question})

 

Listing and GitHub repo

The answer to this very relevant question can be found in the listing: