Streaming AI responses and the incomplete JSON problem
Modern LLM providers can stream their responses. This is great for user experience — instead of a loading spinner, users see the response being generated in real time.
They can also call external functions (also called "tools"): search your database, run calculations, send emails, whatever you give them. When the AI decides to use a tool, it returns the function arguments as JSON. For example, {"query": "sales reports Q4"}
.
Except there's a catch. The tool arguments come through as JSON. And when you're streaming JSON character by character, most of those characters form ... incomplete JSON:
{"user
{"user":
{"user":"ali
{"user":"alice","act
{"user":"alice","action":"se
{"user":"alice","action":"search","qu
{"user":"alice","action":"search","query":"sal
The obvious solution is to wait for the complete JSON and parse it at the end. That's what we did at first. The AI would stream its response, the JSON would accumulate in a buffer, and the user would see a spinner (for seconds, and sometimes minutes). Then, the complete, fully formed tool call would appear.
It worked. But it felt broken. Users would stare at a loading spinner with no feedback and no sense of progress. Meanwhile, we had streaming data arriving in real time that we couldn't use.
The problem was simple. We couldn't parse incomplete JSON, so we had to wait for the complete response before showing users anything. This is what we wanted:
- Preview the tool arguments as they're being constructed.
- Show which fields are being filled in, letter by letter.
- Give users real-time feedback so they know the AI is actually working.
- Make the whole experience feel fluid and responsive.
All of this requires valid JSON. Not eventually valid JSON. Valid JSON right now, for every single chunk that arrives.
The 'obvious' solution (that doesn't work)
The next idea is to complete the incomplete JSON programmatically: add closing quotes, close braces, and fill in nulls for missing values.
This is solvable. We looked at json-repair (fixes broken JSON) and json-stream (handles huge documents). But json-stream only emits when strings are complete — not for partials. We needed a parser accepting progressively longer strings and returning complete, parseable JSON each call.
We found that json-repair came close. It worked great for short responses, staying imperceptible up to about 3.4 KB. Then we noticed something weird. When testing with longer tool arguments (like when the AI was generating a document with multiple paragraphs), the UI started out smooth, but got progressively more janky. Every chunk started causing visible stutters around the 5 KB mark. And near the end, the browser was visibly lagging on each update.
We profiled it. Here's what we found:
Every time a new chunk arrived, existing libraries would reparse the entire JSON string from the beginning. They'd start at character zero, read through all the old data you already processed, then finally get to the new stuff.
This visualization shows the dramatic difference. The red line is what happens when you reparse from scratch each time, while the green line shows processing of new data only.
This is O(n²) behavior, where n is the total accumulated length. For a 12 KB tool argument arriving in five-character chunks (typical from OpenAI's streaming API), you're processing roughly 15 million characters total when you should only be processing 12,000.
It's the classic "read the whole file again every time you add a line" antipattern, but for JSON parsing.
Why does everyone do it this way?
Most JSON completion libraries aren't designed for streaming. They're designed for one-off repairs of truncated JSON (like from log files or crashed processes).
For a single call, these libraries are O(n) where n is the string length. That's perfectly reasonable. You call it once with your broken JSON, it parses through once, and you're done. But streaming AI means repeated calls — hundreds, potentially — for a long response. That's when O(n²) becomes painful.
The right way: Stateful incremental parsing
The solution? Maintain parsing state between calls. When a new chunk arrives, pick up exactly where you left off and process only the new characters.
The key is maintaining state between calls:
- Where we stopped parsing (so we can resume)
- What context we're in (object, array, nested depth)
- Any incomplete tokens we're building (strings, numbers)
- Escape sequence state for Unicode and special characters
Now, when you call complete()
repeatedly with progressively longer strings:
completer = JsonCompleter.new
# Process first chunk
completer.complete('{"users": [{"name": "')
# => '{"users": [{"name": ""}]}'
# State: last_index=21, in_string=true, context_stack=['{', '[', '{']
# Process with more data (only processes the new part!)
completer.complete('{"users": [{"name": "Alice"}')
# => '{"users": [{"name": "Alice"}]}'
# State: last_index=28, in_string=false, context_stack=['{', '[']
# Complete JSON
completer.complete('{"users": [{"name": "Alice"}, {"name": "Bob"}]}')
# => '{"users": [{"name": "Alice"}, {"name": "Bob"}]}'
Each call processes only the delta. From the second call, we know we already processed up to index 21, so we skip straight there and process characters 21-28. This is true O(n) behavior, where n is only the size of new data.
The devil in the details
There are a lot of edge cases to handle with JSON. Let's explore two of the trickiest.
Context matters
The same string segment "foo"
needs to be completed differently depending on context:
# In an object, might be a key needing a value
'{"foo"' # => '{"foo":null}'
# In an array, it's a complete value
'["foo"' # => '["foo"]'
# After a comma in an object, also a key
'{"bar":1,"foo"' # => '{"bar":1,"foo":null}'
This is why we track the context stack. We know we're inside an object, so we also know that a string followed by end-of-input needs a colon and a value.
Incomplete strings
When a chunk ends in the middle of a string, you need to save the string buffer and keep appending. But what if that chunk ended with a backslash?
'"Hello\\'
Is that an incomplete escape sequence? Or a complete escaped backslash followed by an incomplete one? You need to count trailing backslashes to know. And what about Unicode escapes?
'"Smile \u26' # Incomplete \u2605 (black star)
We need to track that we're mid-Unicode-escape and how many hex digits we've collected. When the next chunk arrives with "05," we append to the hex buffer, complete the escape, and continue.
There are more edge cases — incomplete numbers (2.
, 2e-
), deeply nested structures, and others — but the principle is the same: Maintain enough state to resume parsing wherever you left off.
How we use it in production
The pattern is simple. As streaming chunks arrive, complete them, parse them, and update the UI:
completer = JsonCompleter.new
accumulated_json = ""
# As each chunk arrives from the streaming API
on_stream_chunk do |chunk|
accumulated_json += chunk
# Complete the incomplete JSON
complete_json = completer.complete(accumulated_json)
# Now we can parse it!
arguments = JSON.parse(complete_json)
# Update the UI with current state
update_tool_call_ui(arguments)
end
Here's what this looks like in practice as the AI streams search tool arguments:
# Chunk 1:
complete_json = completer.complete('{"query": "sal')
# => '{"query": "sal"}'
arguments = JSON.parse(complete_json)
# UI shows: { "query" => "sal" }
# Chunk 2:
complete_json = completer.complete('{"query": "sales rep')
# => '{"query": "sales rep"}'
arguments = JSON.parse(complete_json)
# UI shows: { "query" => "sales rep" }
# Chunk 3:
complete_json = completer.complete('{"query": "sales reports Q4"}')
# => '{"query": "sales reports Q4"}'
arguments = JSON.parse(complete_json)
# UI shows: { "query" => "sales reports Q4" }
Before: The user sends a message, sees a spinner for five to 10 seconds, and the complete tool call suddenly appears.
After: The user sees the tool call UI appear immediately and watches arguments fill in chunk by chunk.
The difference is between feeling like the system is frozen and watching the AI work in real time.
Performance
Let's talk numbers. We benchmarked both approaches with this article's content (12 KB) streamed in five-character chunks — the typical chunk size from OpenAI's streaming API. That's 2,406 chunks total.
Here's what most benchmarks miss. The O(n²) approach doesn't just make things slower on average. It gets progressively worse as the response grows, crossing distinct quality zones:
- Chunk 1-688: Imperceptible (<1 millisecond) - Feels instant, users are happy
- Chunk 689: Enters noticeable zone (3.2 milliseconds at 3.4 KB) - First hint of lag
- Chunk 954: Enters janky zone (5.4 milliseconds at 4.8 KB) - Visible stuttering on every update
- Chunk 1514: Enters broken zone (16.2 milliseconds at 7.6 KB) - Only 63% through the file, already unusable
By the final chunks, each update takes 19-20 milliseconds. What should be a real-time stream feels like a slideshow.
O(n) approach stays in the imperceptible zone for all 2,406 chunks:
- 50th percentile: 0.02 milliseconds
- 75th percentile: 0.02 milliseconds
- 95th percentile: 0.03 milliseconds
- 99th percentile: 0.03 milliseconds
It starts fast. It stays fast. First chunk, last chunk — doesn't matter. There's consistent, imperceptible latency throughout the entire response.
Result: O(n²) takes 16.7 seconds total. O(n) takes 43 milliseconds total. That's 388 times faster.
For longer tool calls (20 KB or more), O(n²) approaches can take 30+ seconds, with the final chunks causing 40-50 milliseconds of lag each. JsonCompleter stays under 100 milliseconds total, with submillisecond per-chunk latency from start to finish.
Open source
As streaming becomes the expected UX for AI assistants, more applications will need stateful, incremental parsing for incomplete JSON. We extracted this into a gem.
Try it yourself
gem install json_completer
require 'json_completer'
# One-off completion
JsonCompleter.complete('{"name": "John", "age":')
# => '{"name": "John", "age":null}'
# Incremental streaming
completer = JsonCompleter.new
completer.complete('{"status":"') # => '{"status":""}'
completer.complete('{"status":"ok"}') # => '{"status":"ok"}'
Streaming is the future of AI interfaces. Users expect to see responses appear in real time — not after a loading spinner. For tool calls, that means parsing incomplete JSON hundreds of times per response. The difference between O(n²) and O(n) isn't just a performance optimization. It's the difference between a UI that feels broken and one that feels magical.
JsonCompleter is available on GitHub and as a Ruby gem (MIT licensed).
We're happy, fully remote, and hiring — take a look at our open engineering roles.