LangChain, an on-premises CPU LLM, and a basic Prompt
Self-hosting Large Language Models?
Possible
Are they as capable as SaaS subscription models?
No, but they have custom use-cases.
LLMWare Bling on a CPU
Utilize the Hugging Face pipeline for easy access to a pre-trained model. LLMware Bling, being a CPU LLM, is part of this setup. The model's configuration enables remote code execution from Hugging Face.
bling-stable-lm-3b-4e1t-0.1 part of the BLING ("Best Little Instruction-following No-GPU-required") model series, RAG-instruct trained on top of a StabilityAI stablelm-3b-4e1t base model.
BLING models are fine-tuned with distilled high-quality custom instruct datasets, targeted at a specific subset of instruct tasks with the objective of providing a high-quality Instruct model that is 'inference-ready' on a CPU laptop even without using any advanced quantization optimizations.
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_id = "llmware/bling-stable-lm-3b-4e1t-v0"
# Ensure the directory for saving models is created and specified in your environment
# This is more about ensuring that the model download doesn't prompt for storage location or confirmation
import os
from transformers import logging
# Optionally, increase logging level if you want to see more details about the download process
logging.set_verbosity_info()
# Make sure you have set TRANSFORMERS_CACHE in your environment variables
# os.environ["TRANSFORMERS_CACHE"] = "/path/to/your/preferred/cache/directory"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=500)
hf = HuggingFacePipeline(pipeline=pipe)
Prompt for the model with LangChain
from langchain.prompts import PromptTemplate
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
chain = prompt | hf
question = "What is electroencephalography?"
test = chain.invoke({"question": question})
Listing and GitHub repo
The answer to this very relevant question can be found in the listing: