AI's exciting 'boring system integration' features

Last week Google released their extensions feature for their Gemini product enabling it greater access to their office suite. I commented on LinkedIn this seems like a ‘boring but important’ step toward delivering on many of the AI hype claims.

This is why I suspect Google and Microsoft are feeling pretty smug about their longer term prospects. They can pre-integrate their office suites into their AI products in the form of reusable tools. The feedback loop will be powerful as the more they integrate the more data they’ll receive and the more edge cases they can handle. Ultimately the barrier to AI agents might just end up being another boring ‘last mile’ system integration problem.

OpenAI also queitly announced their Strucutred Outputs feature this month which I’ve been playing with every day since it shipped on the 6th. It’s a developer facing feature innteded to solve a really common problem where the LLM can be consitently giving you the right answer but always in a slightly different format (e.g. JSON, YAML, Markdown, plain text).

Spin of the wheel, which output are we going to get?

Prompt: Extract product feedback from customer reviews, specify the product being reviewed, and categorize the sentiment as positive, negative, or neutral. Single result per output.

Json?

{
    "product_name": "acme widget",
    "customer_feedback": "Just wasted $150 on this piece of garbage. Was working fine then just stopped turning on one day. No response from customer service either. Save ur money and buy something else. Can't believe I fell for the hype smh",
    "sentiment": "negative"
}

Markdown on mondays?

    # Product Name
    acme widget

    # Customer Feedback 
    Just wasted $150 on this piece of garbage. Was working fine then just stopped
    turning on one day. No response from customer service either. Save ur money
    and buy something else. Can't believe I fell for the hype smh

    # Sentiment
    negative

Yolo yaml?

    - product_name: acme widget
      customer_feedback: 
        Just wasted $150 on this piece of garbage. Was working fine then just stopped
        turning on one day. No response from customer service either. Save ur money
        and buy something else. Can't believe I fell for the hype smh
      sentiment: negative

Or a pure “vibe” response

    The acme widget
    Broken dreams, wasted cash,
    Silent widget, customer's crash.
    Hype's allure fades,
    As frustration cascades.

Structured Solutions

The structured outputs feature enables developers to specify output schema’s of exactly how they would like the LLM to respond, if in the past we were trying to parse freeform essays this feature turns the responses into ‘dot the i, cross the t’ forms that the LLM has to fill out exactly as instructuted.

SO reliability

With structured output’s it makes it much easier to connect promtps togeather, it turns every prompt into clever little lego bricks that are much easier to ‘snap togeather’ into increasingly complex prompt chains.

In fact the first use case I thought was to use them to build the schema’s themselves, I created a simple POC app that takes any arbitary text file (e.g. CV, product feedback, blog post, meeting transcripts), an instruction prompt (e.g. “extract CV details”) and generates a structured output schema that can be used to extract that information.

TL;DR;

You can use AI to extract anything

Recruiting

Extract CV Details…

CV Extract

Customer Insights

Extract product review sentiment…

Review Extract

Content Management

Extract CMS fields for blog post…

Content Extract

Example of Structured Output

The first place I used the structured outputs was to fast track project creation, from a simple “Extract CV details” input it was easy to expand that to a Title, Description and suggested prompt for the user without needing to worry about parsing varying outputs from the LLM.

SO Project Setup

More than just parsing

If you’re looking at all this and thinking “Ok so it’s just parsing data” then I think you’re missing why I find it so exciting. Yes it’s parsing data but it’s also applying some degree of intelligence in how it does it.

Take the example of extracting ‘action items’ from a meeting transcript, unlike the LinkedIn profile in a CV these aren’t something that’s clearly spelled out in the text. Here the LLM is applying some intelligence to pick out areas where a specific person in the meeting has committed to delivering a particular outcome.

Ron Swanson 10:13 I’ve been talking to some potential clients and they’re asking about, uh… What was it again? Oh right, if our system can play nice with their project management tools. Are we… Is that something we’re doing?

Leslie Knope 10:28 Oh, yeah, yeah. That’s definitely on our roadmap, Ron. We’re planning to start with, um, with Asana and Trello I think? April, do you have any idea on the timeline for that?

April Ludgate 10:40 Uh, yeah, we’re aiming to have the first integration ready for beta in about… I think it was three weeks? We’ll need some help from Leslie on the design side to make sure it’s not a total UX disaster.

Action Item	Responsible Person	Deadline
Integrate with Asana and Trello	April Ludgate	2023-10-30

Also using this POC the setup needed to create this was just typing “Extract meeting action items” and running it against the meeting transcript. This isn’t a dedicated “Meeting action item” feature that I’ve created specifically to handle meeting transcripts. It’s just taking that intitial 4 word goal and turning it into an output that could do anything from populate a spreadsheet to call internal systems.

Meeting Extract

OK Gimmie.

For Developers

structured-output-schema-generator by Practical:AI

Schema generator for OpenAI’s APIs (OpenAI Docs)

1) Clone repo

git clone https://github.com/steve-practicalai/structured-output-schema-generator.git

2) Setup your python .venv

3) Install python packages

pip install -r requirements.txt

4) Copy .env.example to .env and add your OPENAI_API_KEY

5) Run the local streamlit app

streamlit run main.py

The final bundling phase of tech?

For such a rough POC it’s staggering how powerful it is, not the functionality itself (after all every transcription tool already does meeting action items and countless SaaS products exists to parse CVs) but rather the ability to create that functionality from scratch with just a single prompt.

Obviously we’re still a long way from something like this rough POC replacing entire SaaS products but it’s clear that those who are profitable today doing little more than input>parse>process with a WebUI on top are going to start feeling the squeeze. The real power of these lego bricks will come from end users (not big tech) chaining them togeather and making the agents they create from them accessible to everyone in an organisation.

SaaS Smoosh

Few will mourn the SaaS industry when this happens to it

AI is the Everything App

Snapping together intelligence wtih structured data