metadata.json
{
"title": "RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children"
}This article is one part of a series of tutorials that will teach you how to begin collecting and analyzing scholarly metadata using the Crossref REST API check out the introduction to the series here. While this article can stand alone, it is intended as an addendum to my “What’s an API?” article here^
If you don’t have a technical background, one of the earliest obstacles you may face while learning to use the Crossref REST API to collect and analyze scholarly metadata is understanding the format this data is returned to you in. You may be asking yourself…
JSON is a flexible way of representing data that, despite the name (JavaScript Object Notation), is programming language-independent (meaning you don’t need to know JavaScript, or any other programming language to read and understand it!).
Here’s a very simple piece of data about a journal article1 that is represented (also known as ‘encoded’) in JSON format, and could be saved as a file with the .json file extension:
In the world of JSON, data is represented as “objects”, and every object begins with a {, ends with a }, and contains a number of “key”: value pairs that represent the information about that object that you want to store and transfer.2
Each key is a double-quoted string that describes what information the corresponding value represents, for example, the simple 1-line JSON object above described the title of a famously retracted journal article originally published in The Lancet. A value can be one of the following types:
To see these in action, I’ve extended the data from above to add more of the information Crossref has available on this article:
metadata.json
{
"title": "RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children",
"container-title": ["The Lancet"],
"DOI": "10.1016/s0140-6736(97)11096-0",
1 "updated-by": [
{
2 "type":"correction",
3 "source":"retraction-watch",
"updated":{
4 "date-parts":[2004,3,6]
}
},
{
"type":"retraction",
"source":"retraction-watch",
"updated":{
5 "date-parts":[2010,2,6]
}
}
6 ]
}Note that this is not the full record of metadata that Crossref has for this article, in fact the .json file returned by Crossref includes 10x more information (~10KB compared to ~1KB) than this illustrative example. If you want to see the full metadata record, check it out below:
This interactive JSON viewing widget is based on the excellent JSON Editor JavaScript project5 which by convention displays values of type String without double quotes. Instead, data-types are distinguished by color. For example, “status”: “ok” is displayed as status: ok. Values which are of Number data-type such as “reference-count”: 26 are displayed as reference-count: 26.
Note that “props” stands for “properties which is another way of referring to “key”:value pairs. For example, 4 props means the object has 4 keys.
timestamp keys are of type Number. They correspond to UNIX timestamps (in milliseconds) and will have a icon next to that will allow you to translate the UNIX timestamp into a human-readable date/time combination. This merely a convenience produced by the viewer, not a feature of JSON; there is no ‘datetime’ value type
You can also download the file here .
If you tried downloading and opening the .json file above and found a messy wall of techno-gibberish instead of structured, human-readable text, then you need to enable JSON formatting, so take a look at my walkthrough for guidance. I encourage you to look at that guide before moving on, because in the next articles in this tutorial series we will look at JSON data that comes from Crossref’s REST API, and while I will continue to offer interactive viewers on this site, it will be very convenient to be able to view the data directly in your browser in a new tab.
Next article!
JavaScript is a user-friendly programming language used to make webpages interactive—JavaScript is what made the “pop-up” behavior for this definiton link possible.
While you can use JavaScript to fetch and manipulate JSON data, on this site I prefer to show code examples in Python.
A Kilobyte is a measure of digital information storage costs. A Kilobyte is equal to 1,000 bytes of storage. A byte is the amount of space that a single character of text6 uses. 1Kb is enough space store to about this much text:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut consectetur, magna quis interdum feugiat, sapien nisi vestibulum risus, in elementum nulla sem sit amet libero. Integer tristique eleifend eros quis pulvinar. Aliquam quam lorem, rhoncus ac turpis vitae, molestie sodales magna. Duis in scelerisque libero. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Nunc sit amet posuere eros. Morbi sagittis leo et porta luctus. Vestibulum pretium arcu vel leo cursus hendrerit. Phasellus at justo ac nisl eleifend mollis. Nam interdum nisi vitae molestie pellentesque. In vulputate ligula non ligula condimentum pretium. Sed neque quam, rhoncus in orci et, lacinia egestas purus. Pellentesque nec libero vestibulum, rutrum justo ac, aliquam nisl.
Fusce mattis dolor vel risus elementum elementum. Donec id arcu eu enim congue mollis sed vel nunc. Integer dapibus nam.
Stay tuned for the next article in this series, explaining how to request, receive, and analyze public scholarly metadata made available by Crossref!
The article can be found here, with retraction notice here↩︎
You may also hear these “keys” informally called attributes, fields, or properties. While these terms are more descriptive and more frequently used in conversation, it can sometimes be unclear whether you’re referring to the key or its corresponding value. For example, if someone tells you:
I checked if the title property was null before adding the article
…then that means they were checking if the value associated with the key title was null↩︎
When working with scholarly APIs like Crossref’s REST API, you will likely not get any keys which have null values; if a piece of metadata is missing, the returned JSON object will either not include a key for that field or, if the field is an Array or Object type, an empty Array/Object↩︎
Note that this is not actually how this date-parts field is presented by the current version of the REST API, but I’ve changed it for clarity↩︎
To put credit where it’s due, I got the idea to use this library as an education tool by reading Luis M. Montilla’s article in the Crossref Learning Hub here↩︎