What is LangChain? Easier development around LLMs

LangChain is a modular framework for Python and JavaScript that simplifies the development of applications that are powered by generative AI language models.

Wired brain illustration - next step to artificial intelligence

Using large language models (LLMs) is generally easy, although there’s an art to constructing effective prompts for them. On the other hand, programming with language models can be challenging. Enter LangChain.

LangChain is a framework for developing applications powered by language models. You can use LangChain to build chatbots or personal assistants, to summarize, analyze, or generate Q&A over documents or structured data, to write or understand code, to interact with APIs, and to create other applications that take advantage of generative AI. There are currently two versions of LangChain, one in Python and one in TypeScript/JavaScript.

LangChain enables language models to connect to sources of data, and also to interact with their environments. LangChain components are modular abstractions and collections of implementations of the abstractions. LangChain off-the-shelf chains are structured assemblies of components for accomplishing specific higher-level tasks. You can use components to customize existing chains and to build new chains.

Note that there are two kinds of language models, LLMs and chat models. LLMs take a string as input and return a string. Chat models take a list of messages as input and return a chat message. Chat messages contain two components, the content and a role. Roles specify where the content came from: a human, an AI, the system, a function call, or a generic input.

In general, LLMs use prompt templates for their input. A prompt template allows you to specify the role that you want the LLM or chat model to take, for example “a helpful assistant that translates English to French.” It also allows you to apply the template to many instances of content, such as a list of phrases that you want translated.

How LangChain works: Modules

LangChain has six modules:

  1. Model I/O, an interface with language models
  2. Data connection, an interface with application-specific data
  3. Chains, which construct sequences of calls
  4. Agents, which let chains choose which tools to use given high-level directives
  5. Memory, which persists application state between runs of a chain, and
  6. Callbacks, which log and stream intermediate steps of any chain.

Model I/O lets you manage prompts, call language models through common interfaces, and extract information from model outputs.

langchain 01 IDG

Data connection gives you the building blocks to load, transform, store and query your data.

langchain 02 IDG

Complex applications require chaining LLMs, either with each other or with other components. LangChain provides the Chain interface for such “chained” applications.

A conversational system should be able to access some window of past messages directly. LangChain calls this ability memory.

langchain 03 IDG

As opposed to chains, which hard-code sequences, agents use a language model as a reasoning engine to determine which actions to take and in which order.

Callbacks allow you to hook into the various stages of your LLM application. This is useful for logging, monitoring, streaming, and other tasks.

Debugging with LangSmith

LangSmith helps you trace and evaluate your LangChain language model applications and intelligent agents to help you move from prototype to production. As of this writing, it is still a closed beta. You can view a walkthrough of LangSmith and read the LangSmith docs without joining the beta test.

LangChain use cases

Use cases for LangChain include Q&A over documents, analyzing structured data, interacting with apis, code understanding, agent simulations, agents, autonomous (long-running) agents, chatbots, code writing, extraction, analyzing graph data, multi-modal outputs, self-checking, summarization, and tagging.

Some of these use cases have many examples, such as Q&A, which has about 17. Others have only one, such as web scraping.

LangChain integrations

There are roughly 163 LangChain integrations as of this writing. These include five callbacks, nine chat models, 115 document loaders, six document transformers, 54 LLMs, 11 ways of implementing memory (mostly with databases), 22 retrievers (mostly search methods), 31 text embedding models, 21 agent toolkits, 34 tools, and 42 vector stores. The integrations are also available grouped by provider.

LangChain essentially acts as a neutral hub for all of these capabilities.

Installing LangChain for Python and JavaScript

To install LangChain for Python, use pip or conda. The best practice is to install Python packages in virtual environments so that they don’t have version conflicts over dependencies.

I’ll show pip commands. For conda commands, consult the installation page and click on Conda.

The basic, minimal installation is

pip install langchain

For the record, that’s what I used. It does not include the modules for model providers, data stores, or other integrations. I plan to install whichever of those I need, when I need them.

To install LangChain and the common language models, use

pip install langchain[llms]

To install LangChain and all integrations, use

pip install langchain[all]

If you’re using zsh, which is the default shell on recent versions of macOS, then you’ll need to quote expressions with square brackets. Otherwise, without the quotes, the shell interprets square brackets as indicating arrays. For example:

pip install 'langchain[all]'

To install LangChain for JavaScript, use npm, Yarn, or pnpm, for example:

npm install -S langchain

You can use LangChain for JavaScript in Node.js, Cloudflare Workers, Vercel / Next.js (Browser, Serverless, and Edge functions), Supabase Edge Functions, web browsers, and Deno.

I won’t show you more about LangChain for JavaScript; I suggest that you consult the LangChain for JavaScript installation page to get started.

LangChain example

While there are hundreds of examples in the LangChain documentation, I only have room to show you one. This Python code comes from the end of the Quickstart, and demonstrates an LLMChain. This chain takes input variables, passes them to a prompt template to create a prompt, passes the prompt to an LLM (ChatOpenAI), and then passes the CSV output through an (optional) output parser to create a Python array of strings. 

from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.chains import LLMChain
from langchain.schema import BaseOutputParser

class CommaSeparatedListOutputParser(BaseOutputParser):
    """Parse the output of an LLM call to a comma-separated list."""
    
    def parse(self, text: str):
        """Parse the output of an LLM call."""
        return text.strip().split(", ")

template = """You are a helpful assistant who generates comma separated lists.
A user will pass in a category, and you should generate 5 objects in that category in a comma separated list.
ONLY return a comma separated list, and nothing more."""
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
chain = LLMChain(
    llm=ChatOpenAI(),
    prompt=chat_prompt,
    output_parser=CommaSeparatedListOutputParser()
)
chain.run("colors")
# >> ['red', 'blue', 'green', 'yellow', 'orange']

LangChain Expression Language (LCEL)

LangChain Expression Language is a declarative way to compose chains and get streaming, batch, and async support out of the box. LCEL makes using LangChain easier. You can use all the same existing LangChain constructs to create chains as you would when composing them in code, since LCEL is essentially a high-level alternative to creating chains in Python or TypeScript/JavaScript.

There’s a LangChain Teacher you can run interactively to learn LCEL, although you’ll need to install LangChain for Python first. Note that I wasn’t able to run the teacher. It seems to have a version-dependent bug.

LCEL expressions use pipe characters (|) to connect variables into chains. For example, a basic common chain uses a model and a prompt:

chain = prompt | model

In context, you might have this Python program:

from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI

model = ChatOpenAI()
prompt = ChatPromptTemplate.from_template("tell me a joke about {foo}")

chain = prompt | model

chain.invoke({"foo": "bears"})

The output (as given on the site) is:

    AIMessage(content='Why don\'t bears use cell phones? \n\nBecause they always get terrible "grizzly" reception!', additional_kwargs={}, example=False)

As you’ve seen, LangChain offers a powerful way to create generative AI applications powered by language models and data, connected into chains. I’ve shown you a few Python examples, and given you a link to the JavaScript examples. You can also program LangChain in R using a Python shim, as my InfoWorld colleague Sharon Machlis explains in Generative AI with LangChain, RStudio, and just enough Python. Another useful resource is the LangChain blog, which publishes a short article most weekdays.

Copyright © 2023 IDG Communications, Inc.