Date:

Share:

How to parse JSON Lines (JSONL) with C#

Related Articles

To be sure, you are already familiar with JSON: it is one of the most common formats for sharing data as text.

Did you know there are differences tastes Of JSON? One of them is JSONL: It represents a JSON document in which e The items are in different lines Instead of being in an array of items.

It’s a pretty rare format to find, so it can be tricky to figure out how it works and how to analyze it. In this article we will learn how to parse JSONL file with C #.

Meet JSONL

As explained in Documentation of JSON lines, JSONL file is a file consisting of various items separated by a n character.

So, instead of being

[
     "name" : "Davide" ,
     "name" : "Emma" 
]

You have a list of items without an array grouping them.

 "name" : "Davide" 
 "name" : "Emma" 

I must admit I never heard of this format until a few months ago. Or, even better, I’ve already used JSONL files without knowing: JSONL is a common format for logs, Where each value is added to a file in a continuous stream.

Also, JSONL has several features:

  • Each item is a valid JSON item
  • Each row is separated by a n Character (or by rn, but r Ignore it)
  • It’s encoding using UTF-8

So, now, it’s time to analyze it!

File analysis

Say you’re creating a video game, and you want to read all the items your character has found:

class Item  
    public int Id  get; set; 
    public string Name  get; set; 
    public string Category  get; set; 

The list of items in the JSONL file can be stored as follows:

  "id": 1,  "name": "dynamite",  "category": "weapon" 
  "id": 2,  "name": "ham",  "category": "food" 
  "id": 3,  "name": "nail",  "category": "tool" 

Now, all we have to do is read the file and analyze it.

Assuming we read the content from a file and stored it in a string called content, We can Use NewtonSoft Analyze these lines.

As usual, let’s see how to parse the file, then dive deep into what’s happening. (Note: The following excerpt comes from This question In Stack Overflow)

List<Item> items = new List<Item>();

var jsonReader = new JsonTextReader(new StringReader(content))

    SupportMultipleContent = true 
;

var jsonSerializer = new JsonSerializer();
while (jsonReader.Read())

    Item item = jsonSerializer.Deserialize<Item>(jsonReader);
    items.Add(item);

return items;

Let’s unpack it:

var jsonReader = new JsonTextReader(new StringReader(content))

    SupportMultipleContent = true 
;

The first thing to do is create a show of JsonTextReader, A class that deserves what Newtonsoft.Json Names space. The constructor receives the TextReader Instance or any derivative class. So we can use a StringReader An instance representing a stream from a specified string.

The central part of this section (and somehow, of the whole article) is SupportMultipleContent Feature: When set to true It allows the JsonTextReader Continue reading the content as multi-lines.

His definition, in fact, says that:









public bool SupportMultipleContent  get; set; 

Finally, we can read the content:

var jsonSerializer = new JsonSerializer();
while (jsonReader.Read())

    Item item = jsonSerializer.Deserialize<Item>(jsonReader);
    items.Add(item);

Here we create a new one JsonSerializer (Again, comes from Newtonsoft), and use it to read one item at a time.

God while (jsonReader.Read()) Allows us to read the stream to the end. And to analyze every item that is in the stream, we use jsonSerializer.Deserialize<Item>(jsonReader);.

God Deserialize The method is smart enough to analyze each item even without a , A symbol that separates them, because we have the SupportMultipleContent To true.

Once we have the Item Object, we can do whatever we want, like add it to the list.

Additional readings

As we have learned, there are different tastes Of JSON. You can read a review of them on Wikipedia.

πŸ”— Introduction to JSON Lines Wikipedia

Of course, the best place to learn more about this format is its official documentation.

πŸ”— JSON Line Documentation | Jsonlines

This article exists thanks to the question of Amran Kadir asked for Stack Overflow, and of course, to Yuval Yitzhakov’s answer.

πŸ”— JSON series is separated by a row and canceled later Cartridge overflow

Since we used Newtonsoft (known as JSON.NET), you may want to take a look at its website.

πŸ”—SupportMultipleContent Feature | NewtonSoft

Finally, the repository used for this article.

πŸ”— JsonLinesReader Repository | GitHub

Summary

Maybe you’re thinking:

Why did David write an article in response to Stack Overflow? I could just read the same information there!

Well, if you were only interested in the main section, you would be right!

But this article exists for two main reasons.

First, I wanted to emphasize this JSON is not always the best choice for everythingA: It always depends on what we need. For continuous streams of items, JSONL is a good choice (if not the best). Do not choose the Most used Format: Choose the one that best suits your needs!

Second, I wanted to comment on that We should not be too attached to a specific directory: I usually prefer to use original stuff, so for reading JSON files, my first choice is System.Text.Json. But this is not always the best choice. Yes, we can write a complex solution (like the second answer on Stack Overflow), but … is it worth it? sometimes It is best to use a different directory, even if only for one specific task. So you can use System.Text.Json For the whole project except for the part where you need to read the JSONL file.

Have you ever encountered an unusual format? How did you deal with it?

Happy coding!

🐧

.

Source

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles