LangChain, GPT-4 and a security motivation for Agents
- 1 LangChain
- 2 Prompt Engineering
- 3 Prototyping environment
- 3.1 Python libraries
- 3.2 First task: define Quantum Physics for kids
- 3.2.1 API key management
- 3.2.2 Instantiate the local connector
- 3.2.3 Forward a query
- 3.3 Second task: define Quantum Physics scientifically
- 3.4 Third task: implement and document Softmax (Python)
- 3.5 Fourth task: generate Python code, and debug it
- 4 Next
LLMs are so-called Large Language Models. In Q1 2024 they are the chic.
So here is an evening project of mine: I built a LLM for a Phrack article:
The parts in the following are contained in the Notebook.
LangChain
LangChain describes itself visually as a and a . A parrot and a chain. That’s almost poetic
LangChain is a framework for developing applications powered by language models. It enables applications that:
Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
Keep in mind: the LLM is as smart as a parrot.
The use cases vary, but in the following it’s about creating
a GPT-4 prompt answering machine (you may also use other models, such as GPT-3-turbo etc. Incl. models from other vendors)
a GPT-4 Python code and documentation generator (you may also use other languages, such as Java etc.)
a GPT-4 Python agent which tests and executes code
other agents can compare documents, draft Emails in MS365 / Gmail etc.
Today ( Mar 3, 2024 ), GPT-4 is the most sophisticated model on the mass market.
Prompt Engineering
“prompt” is an old IT term, which has its origins in the Mainframe era. It’s how REPL (Read Eval Print Loop) frontends for Shells (such as Bash, ZSH, KSH etc.) ask for human input. In the early days of Human Computer Interaction, a prompt was the only User Interface.
LLMs do not offer Graphical User Interfaces (GUIs) to the users. A prompt here however will not be defined in a programmatic language (such as Bash), or in a higher programming language such as Python. It will be defined in a language such as normal English, German, French, Hindi… Therefore, LLMs can be understood as enabler technologies, to bring computation into domains and institutions, where stringent engineering interfaces are too complex or time-consuming.
Prototyping environment
The following code is prototyped in Google Colab (which is a Jupyter Notebook service). This is relevant for the dependency management of the Python libraries.
Python libraries
Save this file as requirements.txt in the Colab working space, or if you are an experienced developer, in your development environment of choice.
openai==1.13.3
langchain==0.1.10
pinecone-client==3.1.0
python-dotenv==1.0.1
tiktoken==0.6.0
wikipedia==1.4.0
pypdf==4.0.2
langchain_openai==0.0.8
langchain_experimental==0.0.53
langchainhub==0.1.14
(Some of these libs are only used in later parts)
The most relevant libs are:
They offer APIs to speed up the interaction with the OpenAI API.
First task: define Quantum Physics for kids
In Jupyter / Colab you can install all Python packages like this:
# installing the required libraries
!pip install -r ./requirements.txt -q
For the record:
# !pip - installs the packages in the base environment
# pip - installs the packages in the virtual environment
API key management
I use a file named env (not .env, because of some issues with Colab)
The file is structured in a line-separated format:
You may add further keys.
Instantiate the local connector
The following code starts a local connector which simply sends requests to the OpenAI service endpoint. This costs money.
The llm
object would print out the API key as well, among other settings.
temperature refers to the “creativity” parameter. The higher, the more creative.
gpt-4 is the model name
Forward a query
The output
here is non-deterministic:
this is correct, based on encyclopedic standards
this is not something a child could understand
is this even possible, with just words?
Second task: define Quantum Physics scientifically
Here we refine the prompt a little more:
AIMessage – the setting of the AI (human expects boundaries, ethics, etc.)
HumanMessage – the question (human asks question)
SystemMessage – the context (human expects the context)
This answer is correct:
encyclopedic and linguistic
Third task: implement and document Softmax (Python)
This will output two results from a single prompt:
code
documentation
Again, this is correct.
Fourth task: generate Python code, and debug it
The result is:
This has been executed in the environment
Sandbox environments for LLM agents
Sandboxing here means that the changes to the environments are not persistent. There are various definitions of the term, including some which only apply in the context of software security and runtime environments. Here it is simply referring to a playground, where the LLM agents can freely roam.
Keep in mind: LLMs may hallucinate / are as smart as . The need a cage.
Google Colab (throw-away environments)
Easy, for Windows:
Intermediate, for Docker (incl. Windows as a VM with Docker Desktop):