Understanding the JSON data format

This article is one part of a series of tutorials that will teach you how to begin collecting and analyzing scholarly metadata using the Crossref REST API check out the introduction to the series here. While this article can stand alone, it is intended as an addendum to my “What’s an API?” article here^

If you don’t have a technical background, one of the earliest obstacles you may face while learning to use the Crossref REST API to collect and analyze scholarly metadata is understanding the format this data is returned to you in. You may be asking yourself…

If you are already familiar with the JSON data format but are having trouble viewing it in your browser, follow my guide here

What’s JSON?

JSON is a flexible way of representing data that, despite the name (JavaScript Object Notation), is programming language-independent (meaning you don’t need to know JavaScript, or any other programming language to read and understand it!).

Here’s a very simple piece of data about a journal article1 that is represented (also known as ‘encoded’) in JSON format, and could be saved as a file with the .json file extension:

metadata.json
{
    "title": "RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children"
}

In the world of JSON, data is represented as “objects”, and every object begins with a {, ends with a }, and contains a number of “key”: value pairs that represent the information about that object that you want to store and transfer.2

Each key is a double-quoted string that describes what information the corresponding value represents, for example, the simple 1-line JSON object above described the title of a famously retracted journal article originally published in The Lancet. A value can be one of the following types:

  • Number, for example, a count of articles published by a journal
  • String, for example, the DOI of a journal article
  • Boolean (true/false), for example, whether a publisher deposits abstracts to Crossref in addition to metadata
  • Object, for example, an object representing the first author of a paper who has a First Name, Last Name, an ORCID, etc.
  • null, representing a missing or empty value3
  • an Array of any of the above. For example, an array of objects representing a paper’s references, each of which having a “DOI” key, a “title” key, and so on
    • All Array objects will start and end with a [/] and have each element of the array separated by a comma
An Object can have another Object within it, making it easy to represent complex nested data structures with JSON

To see these in action, I’ve extended the data from above to add more of the information Crossref has available on this article:

metadata.json
{
    "title": "RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children",
    "container-title": ["The Lancet"],           
    "DOI": "10.1016/s0140-6736(97)11096-0",                                                                                               
1    "updated-by": [
        {                                        
2            "type":"correction",
3            "source":"retraction-watch",
            "updated":{                          
4                "date-parts":[2004,3,6]
            }
        },
        {
            "type":"retraction",
            "source":"retraction-watch",
            "updated":{
5                "date-parts":[2010,2,6]
            }
        }
6    ]
}
1
This article received more than one update notice, as the updated-by field is an Array type beginning with a [,
2
The first update was a correction
3
The metadata corresponding to this correction was provided to Crossref by Retraction Watch
4
The date that the correction was issued (YYYY-MM-DD) was the 6th of March, 20044
5
We can see that in addition to the 2004 correction, there was a retraction update issued in 2010
6
This ] closes the “updated-by” array, indicating that there have been no more updates since the 2010 retraction

Note that this is not the full record of metadata that Crossref has for this article, in fact the .json file returned by Crossref includes 10x more information (~10KB compared to ~1KB) than this illustrative example. If you want to see the full metadata record, check it out below:

Some details about this visualization tool
  • This interactive JSON viewing widget is based on the excellent JSON Editor JavaScript project5 which by convention displays values of type String without double quotes. Instead, data-types are distinguished by color. For example, “status”: “ok” is displayed as status: ok. Values which are of Number data-type such as “reference-count”: 26 are displayed as reference-count: 26.

  • Note that “props” stands for “properties which is another way of referring to “key”:value pairs. For example, 4 props means the object has 4 keys.

  • timestamp keys are of type Number. They correspond to UNIX timestamps (in milliseconds) and will have a icon next to that will allow you to translate the UNIX timestamp into a human-readable date/time combination. This merely a convenience produced by the viewer, not a feature of JSON; there is no ‘datetime’ value type


You can also download the file here .

Working with JSON files

If you tried downloading and opening the .json file above and found a messy wall of techno-gibberish instead of structured, human-readable text, then you need to enable JSON formatting, so take a look at my walkthrough for guidance. I encourage you to look at that guide before moving on, because in the next articles in this tutorial series we will look at JSON data that comes from Crossref’s REST API, and while I will continue to offer interactive viewers on this site, it will be very convenient to be able to view the data directly in your browser in a new tab.

Next article!

Footnotes

  1. The article can be found here, with retraction notice here↩︎

  2. You may also hear these “keys” informally called attributes, fields, or properties. While these terms are more descriptive and more frequently used in conversation, it can sometimes be unclear whether you’re referring to the key or its corresponding value. For example, if someone tells you:

    I checked if the title property was null before adding the article

    …then that means they were checking if the value associated with the key title was null↩︎

  3. When working with scholarly APIs like Crossref’s REST API, you will likely not get any keys which have null values; if a piece of metadata is missing, the returned JSON object will either not include a key for that field or, if the field is an Array or Object type, an empty Array/Object↩︎

  4. Note that this is not actually how this date-parts field is presented by the current version of the REST API, but I’ve changed it for clarity↩︎

  5. To put credit where it’s due, I got the idea to use this library as an education tool by reading Luis M. Montilla’s article in the Crossref Learning Hub here↩︎

  6. When talking about ASCII characters, that is↩︎