← Blog
Data FormatsApril 18, 20248 min

JSONL and NDJSON: The Complete Developer Guide

What JSON Lines is, when to use it over a JSON array, how to validate it, and why it has become the format of choice for ML training data and log pipelines.

What is JSONL?

JSONL (JSON Lines), also known as NDJSON (Newline Delimited JSON), is a text format where each line is a complete, valid JSON value. The format has three simple rules:

  • Each line is a valid JSON value (object, array, string, number, etc.)
  • Lines are separated by \n (Unix newlines recommended)
  • Empty lines are allowed and should be ignored
{"id":1,"name":"Alice","role":"admin"}
{"id":2,"name":"Bob","role":"user"}
{"id":3,"name":"Charlie","role":"user"}

Notice there are no surrounding brackets and no commas between lines — every line stands alone.

JSONL vs JSON Array

A JSON array of objects ([{...}, {...}]) is the obvious alternative. JSONL wins in specific scenarios:

Streaming

With a JSON array, you must receive the entire response before parsing begins — the parser needs to see the closing ]. With JSONL, you can parse and process each line as it arrives. This is why LLM APIs like OpenAI and Anthropic use JSONL for streaming responses.

Large files

Appending to a JSONL file is a single write call. Appending to a JSON array requires reading the file, removing the trailing ], appending a comma and new object, then re-adding the ]. For log files that grow continuously, JSONL wins decisively.

Partial processing

You can grep, head, tail, and wc -l a JSONL file with standard Unix tools. A JSON array requires a proper parser for any meaningful operation.

When to Use a JSON Array Instead

  • REST API responses — clients expect standard JSON
  • Small datasets where streaming is irrelevant
  • When the data has a natural top-level structure beyond a flat list
  • Browser-side data where you need JSON.parse() to work directly

Common Uses of JSONL

Machine Learning Training Data

JSONL is the de facto format for LLM fine-tuning datasets. OpenAI's fine-tuning API, Anthropic's training pipelines, and Hugging Face datasets all use JSONL:

{"messages":[{"role":"user","content":"What is 2+2?"},{"role":"assistant","content":"4"}]}
{"messages":[{"role":"user","content":"Translate 'hello' to French."},{"role":"assistant","content":"Bonjour"}]}

Application Logs

Structured logging tools (Pino, Winston, Bunyan) output JSONL by default. Each log entry is a self-contained JSON object that log aggregators (Datadog, Loki, CloudWatch) can parse without preprocessing.

Database Exports

MongoDB's mongoexport outputs JSONL. ClickHouse, BigQuery, and DynamoDB all support JSONL as an import/export format because it maps cleanly to a sequence of rows.

Validating JSONL

Validation is simple — parse each non-empty line with JSON.parse() and report errors by line number:

function validateJsonl(text) {
  return text.split('
').map((line, i) => {
    if (!line.trim()) return { line: i + 1, ok: true };
    try {
      JSON.parse(line);
      return { line: i + 1, ok: true };
    } catch (e) {
      return { line: i + 1, ok: false, error: e.message };
    }
  });
}

Converting JSONL to a JSON Array

const jsonArray = text
  .split('
')
  .filter(line => line.trim())
  .map(line => JSON.parse(line));

// Then stringify for output:
JSON.stringify(jsonArray, null, 2);

File Extension

Both .jsonl and .ndjson are used. .jsonl is more common in ML tooling; .ndjson is preferred in some API and logging contexts. The formats are identical — the difference is naming only.

Validate your JSONL instantly

Paste JSONL data to validate each line and convert valid records to a clean JSON array — all in your browser.

Open JSONL Validator →