We all know what a “standard” means in our modern industrialised and computerised world. As often, xkcd nailed it perfectly:
In astronomy, there is one famous example that comes to the mind when talking about a standard (apart from the so-called “standard stars”, or course): FITS, for Flexible Image Transport System. It is the “standard” file format used to store images, tables and data cubes, in competition with the newer format XISF. FITS is great, but it is a loose standard that everybody conforms to only partially.
But let these file formats aside for today. Let’s talk about another domain that matters very much to Arcsecond: the standards of Application Programming Interfaces (APIs), which are the “contracts” that software developers put in place to make multiple computers talk to each other.
Why talking about APIs here? Because one key “service” provided by Arcsecond to the rest of the world is also to make available a consistent, stable and truly RESTful surface of astronomical APIs. In other words, Arcsecond absorbs and hides the complexity of other (and often inconsistent) services.
And it’s damn not easy… because nobody really follows a standard! Let’s dive into it.
What are APIs anyway?
APIs are choices made by the developer(s) to tell consumers (machines or humans): “that’s the way I speak and understand requests!” As you may guess, there are tons of different ways of doing it.
Let’s take the metaphor of a phone conversation between two people. For it to “work”, they need to speak roughly the same language (e.g. English), and use a common vocabulary where words have (again, roughly) the same meaning. Otherwise, no real communication could occur. Or it becomes very hard. The thing is that computers aren’t so smart. When there is a mistake somewhere, they can’t correct it themselves automatically (it would require a ton of additional software written beforehand, just like an AI Large Language Model such as ChatGTP or the excellent Le Chat by Mistral AI).
Generally speaking, for computers, “roughly” is not good enough. Computers, when exchanging data, can’t handle easily even small inconsistencies. One must have much more rigorous contracts, because they are absolutely essential for the whole thing to simply work.
In our case of a conversation, the phone lines are the Internet wires and part of the common language is the HTTP protocol (which is a true communication standard, and provide important words – in fact, verbs, and we’ll talk more about that below). Almost every device on the planet, with the knowledge of its IP/Internet address (like “www.arcsecond.io”, or “34.113.92.115”) can “speak” to another device through HTTP (we massively simplify the picture, but you get the idea).
We have the wires, and the words. What about the grammar?
That’s where things get more complicated. I won’t tell here the whole history of “standard” APIs during the internet era. But let simply say that one very popular “grammar” (way of organising the words for the machine conversation on the internet) that became de-facto a “standard” since many years is called REST.
data:image/s3,"s3://crabby-images/e255e/e255ec8b9cf1ddf8efef56725485b0463deb1173" alt=""
It’s an ugly acronym (for Representational State Transfert) whose meaning is obscure for most people, even for developers (and its understanding anyway has evolved a bit over time, as does every popular concept). But the key idea is this: “REST” is the way to go today, for most (not all, but most) of the online (i.e. internet) softwares and connected services (i.e. for dozens of millions of websites and services), especially all those who speak to others by providing data (to be distinguished from pure web “html” pages).
That’s the case of Arcsecond. All the beautiful interfaces you see in https://www.arcsecond.io (consumer of the data) are made thanks a large and consistent API surface available in https://api.arcsecond.io (provider of the data).
VO and IVOA?
But before entering into the details of REST, let’s talk about the VO, an acronym that is barely known (I suppose) outside the astronomers’ community. “VO” stands for “Virtual Observatory” and “IVOA” for “International Virtual Observatory Alliance“, that is, the global organisation federating all VO intiatives (mostly held by national science agencies).
data:image/s3,"s3://crabby-images/a188f/a188f89561f4822e9edef1467b5852ee387dd4ff" alt=""
From the outside, we may think: “astronomers are great, they invest time and resources since a lonnng time to create a unique and open standard for all people worldwide to share data and let services interoperate.”
On the other side, I wonder (am I alone?): why on Earth did they create another “web”???
Because no matter how you look at it, that’s basically it. The VO is an attempt at creating a kind of a parallel “web” of services and data exchange using HTTP (thanksfully), but with plenty of other standards that are quite hard to learn and use. I am developing web services since more than 10 years, learned it mostly alone, and I still can’t understand how to make a VO-compliant service (beyond serializing Arcsecond resources into the ugly and often inadapted VOTable format)…
But anyway, beyond the technical difficulty, what remains is the core non-technical question: why do astronomers would need a parallel web? Why learning and fighting to deploy VO services when there is … Internet? And the web? Is there any capability related to the VO that cannot be (more easily) made and deployed with standard web technologies?
It’s a sincere questionning! I tried multiple times to understand how to run a VO service, but always failed. I have no idea if someone else in astronomical community is also having a critical point of view on the VO, on the whole purpose of the VO, actually…
The above question is at the very heart of my frustration with the VO since many years (since the epoch when I was an official researcher). My feeling is that scientists (here, astronomers) are missing what’s powerful with modern and open-source software: the free and uncontrolled innovation. I consider it unbeatable, I think, because it’s basically free (as in free speech) innovation.
So why astronomers created a sidetrack? I am not sure. It depends very much on the people in place, and their collective culture. And today, an “alliance” (the IVOA) with many “commitees” spread over the world organise tons of “meetings” since many years to agree on “documents” and “specifications” to describe how software should be compliant to. And they have chosen a serialisation format – XML – that is given as THE only format to use (as if it was not important to decouple it from the modeling).
I can’t find a more opposite way of doing software to what I do than the VO.
Why? Because software development (its power, its real nature) is an empirical activity. As soon as you start replacing trial and errors of a running software with documents and specifications, you are probably doing it wrong.
In other words, I much more trust any “standard” that emerges from the permanent innovative work occurring across the web (because it means it has been battle-tested by many different users and contexts) than anything that is decided beforehand. Yes, even if this “standard” is a permanently and slowly-evolving concept.
Two key ideas to keep in mind before I stop my rant against the VO. First, Arcsecond will be somehow “VO-compliant” since resources will be available in VOTable format (example here – yes, sometimes you must nonetheless comply…). I’ve written a small post about how I do it in Arcsecond.
Second, despite the IVOA and all its efforts, we can see many professional astronomical services today who implement APIs…. using REST! Which means I am probably not alone questionning the relevance of the VO, isn’t it?
In fact, REST is so influent that it is being mentionned in the introduction for developers in the IVOA alliance website itself! Since I can’t find the energy to read all those documents to better understand what they mean by that (my brain is blocking it), I asked… ChatGPT! You will find in the bonus below, at the end of this article, the single question-answer I got from ChatGPT on that subject.
Now, let’s come back to REST.
REST and RESTish…
Ok, but what is REST all about ? As explained above, it is one (of not the most) popular “grammar” developers can use to develop connected services (REST relevance is for client-server architectures, which is what Internet globally is). As of today, the Wikipedia article about REST is pretty well done (and not too long for those who are interested).
But let’s concentrate on the one constraint that matters Arcsecond most, and of any service that pretend to be RESTful (i.e. conforming to the REST “standard”): a uniform interface. I am really puzzled to see how bizarre the interpretation of this constraint (see below) can be.
For anyone out there who is not a developer (that is, the vast majority of people), whatever REST is, you are certainly familar with another acronym: URL. You may know (or not) that it means Uniform Resource Locator. It is one of the greatest invention of all modern times, seriously.
data:image/s3,"s3://crabby-images/45724/457241cf48d943e7982a60864725b244f827cea5" alt=""
Even more interestingly, URLs represent a specialised version of something more generic: URI, which stands for Uniform Resource Identifier. Basically, it’s a consistent way to give an “address/identifier” of something. And we see the fundamental point here for URLs: the merge of the meaning between “address” and “identifier”.
This is the web: we identify stuff by their address. And, at its simplest, your internet browser is doing just this: it requests the content of a resource at a given address and interpret the response to present it visually.
So why REST is cool? Because it is a standard that describe how to architecture a service where resources are identified by unique URLs. Note that REST itself is not related to HTTP, simply it works very will with it (it is simply not tied to it). REST is powerful because it separates well what is “data” on one side, and “consumption of data” (visualisation, computations etc) on the other side.
But one must put this in bold: REST resources are identified by unique URLs. Hence, you must define and organise your resources! And only resources-related stuff must appear in your URLs!
And guess what: resources of a domain (telescopes, observatories, stars, for example…) can be organised as a huge tree. At the beginning of the URL, you have the name of the resource. And any additional segment appended to it represents the specialisation(s) of the given resource! That’s what all these “/” (slashes) are about in URLs!!! Simplest example:
- https://api.arcsecond.io/telescopes/ (see the plural, and the absence of ID, hence we request the list of all telescopes).
- https://api.arcsecond.io/telescopes/1c2dc153-1d9c-41b9-a8e0-9361f18e02ac/ (the previous list is specialised by a “/” followed by a UUID, hence it means we request the details of that list resources, identified by the UUID).
That sounds easy, isn’t it? So why it is so difficult to comply to these simple ideas? We’ll discuss examples below.
HTTP has verbs. You GET or POST it?
One thing that is key to understand is that HTTP is a protocol that also provides verbs, which can be used by developers to indicate the action you will perform on the resources. When you visit a website, your browser automatically send a GET request to the URL. But there are more possibilities (i.e. for when you submit a form, you use a POST
request). Here is the list of the main verbs, and their quite natural meaning for a RESTful service:
GET
: ask the server to return the resource(s)POST
: create a single new resourcePUT
orPATCH
: modify an existing resource (either entirely, or partially)DELETE
: delete the resource…
No need to mention HEAD
or OPTIONS
here, it would complicate the discussion for no real benefit.
REST standardize the use of HTTP verbs to indicate actions. This question is as old as Internet: how a browser can convey the meaning of an action to the server? For a long time, it was not standardized, and everybody was inventing stuff like :
https://example.com/give-me-that-crazy-resource-please.html?with=this_option
With REST, URLs designate resources, and thus must contain no action meaning, only the resource names and the IDs. And you use HTTP verbs to convey the meaning of your desired action on the given resource.
HTTP Verbs and REST
If you look carefully, you can already notice that sending a POST
request to an URL of single resource (like the one above: https://api.arcsecond.io/telescopes/1c2dc153-1d9c-41b9-a8e0-9361f18e02ac/) makes no real sense.
Because sending a POST
request on a given resource URL literally means “create a whole new resource at the place of an existing one”. Take the dumb stance of a computer for a second, the immediate question you can have is: what should I do with the existing resource if I am being requested to create a new resource in place of an existing one?
Likewise, sending a PATCH
to a list of resources has an ambiguous sense. Should we patch the whole list? What if this list is filtered by URL parameters like /resources?name=dummy
???
All these subtleties are very important because they are the key for long-term consistency, hence usability of your data. And this consistency is quite hard to achieve, for a given domain, given the relationships between the resources, and the sometimes ambiguous meaning of words (e.g. “Observatory” is a word we avoid in Arcsecond REST APIs because it is very ambiguous). Hence, this consistency can never be absolutely guaranteed even if you try to implement a strict RESTful service like in Arcsecond. It’s a permanent effort.
Let’s talk now about some others! And see how they manage…
Example 1 of the good, the bad, and the ugly: ESO.
Let’s start with my former employer, the European Southern Observatory. I must say that they have made huge progresses (I remember dealing with ugly stuff in the past). But when we look at their new API documentation, it’s fairly good.
Let’s take the first entry: /obsRuns
.
It’s the name of a resource (“observing runs”). We could be nitty picky and ask for the full name (as I do for Arcsecond), because obsRuns
could be confused in the future between other resources like observingRuns
, observedRuns
, or observableRuns
… But that’s OK, for now.
The capital R makes it easier to read, but has strictly no technical impact.
But better, it uses the plural (/obsRuns
and not /obsRun
) because it respects the fact that the cardinality (dealing with single or multiple resources of the same type) is handled with URL specialisations. Hence, URLs like /obsRuns
not followed by /runId
means it will return a list of resources.
This distinction is vital for the consumer, because it will receive an array of objects in the first example, and a single object in the second. I could verify the response of the endpoint with my ESO account, because my former observing runs are too far in the past, unfortunately.
When you open the obsRuns entry in the ESO webpage, to see the subsequent specialised URLs, the next one is correct: /obsRuns/{runId}
. This is the URL of a single resource, where the list URL is specialised by appending the identifier {runId}
.
As seen on the right side of the endpoints, the allowed HTTP methods is indicated in color. Up to now, we are talking about GET URLs (also called “endpoints”). Usually they are the most simple ones to implement.
But things start to get fishy in further APIs down below. Look at these, that do not respect URL specialisation:
/obsRuns/{runId}/phase1/blockedTimes
/obsRuns/{runId}/readme/schema
Why /obsRuns/{runId}/phase1/
itself is not available? Is it a specialisation of the obsRun resource? And what would a /phase1
resource (located at the root of the API) be? Good question!
But it gets really bad with POST
requests, which always more difficult to implement. See for instance, the first is bad, the second is even worse:
/obsRuns/{runId}/submit
/obsRuns/{runId}/request/instrumentChange
“Submit” is a verb, it should not be part of the URL. Submit what, anyway? This URL breaks the semantics. And what does it mean to send a GET request to something called /submit
??? Something that is perfectly valid technically speaking. Doesn’t the POST verb already hold the action meaning?
If by “submit” ESO developers (👋 hi, guys) mean “I want to submit my observing run of that given ID”, you should actually send a POST
(if it is new) or PATCH
(if it is not new) request to the single resource /obsRuns/{runId}
where one of the “status” field of the obsRun resource should equal a value “submitted” (or alike, it’s up to the API developers)!
Designing APIs is all about these tiny details. In Arcsecond, except for authentication endpoints (which are special), you won’t find any action verb in APIs. It is purely resource-oriented. Why does it matter? Robustness, and expansion over the years. The above APIs of ESO are fragile. If one day ESO wants to implement another action of change on a resource, they will get lost in hundreds of questions about how to name things and organise URLs. And APIs will soon become inconsistent. And thus slow to update & improve.
We could spend even more time criticising the details of ESO APIs, but let’s move to something worse: the APIs of Jet Propulsion Laboratory!
Example 2 of the good, the bad, and the ugly: JPL.
The JPL APIs are describe in this page, where they explicitely say that they attempted to follow the REST architecture style, yet “URI path mapping (another name for uniform interface) […] are not a requirement of REST“! Well, that’s a nice way to say “I am compliant, but don’t expect results to be uniform and standard”. To be honest, given the complexity of their resources and the richness of planets and asteroids (and all the stuff the JPL is studying since decades), a rigorous URI path mapping is probably very hard to achieve.
But let’s tak about the implementation that we had to achieve in Arcsecond to handle it correctly.
Let’s say you are looking for an object whose ID is unknown to you. In the standard REST way, you should make a GET
request to a resource-list, with a query such as ?search=<your search input>
. But not with JPL… The URL to use is the following:
https://ssd-api.jpl.nasa.gov/sbdb.api?sstr=<input search string>&alt-des=1
Where is the name of the resource? Why alt-des=1
? Why sstr
instead of search
? If sstr
designates a specific API, shouldn’t it be part of the URL segments instead?
But the difficult part is coming. When you make such a request, depending on the result, you will have a totally different output. If your search matches a single object, you will get the full object data! But if it matches more than one like in this example, you get a custom output, with a code (300
) inside, while the HTTP status code is the normal 200.
This is very strange. 300
is the HTTP code for “Multiple Choices”. Ok, semantically correct. But is it provided inside the response’s body, and not the status code of the request??? It’s technically wrong, because the technical requester (here, the Arcsecond backend server) can’t take a decision on the response body. That’s the job of status code of requests.
For instance, for the request: https://ssd-api.jpl.nasa.gov/sbdb.api?sstr=144P/Kushida&alt-des=1
, you get:
{
"signature": {
"source": "NASA/JPL Small-Body Database (SBDB) API",
"version": "1.3"
},
"code": "300",
"count": 2,
"message": "specified query matched more than one object",
"list": [
{
"name": "5605 Kushida (1993 DB)",
"pdes": "5605"
},
{
"pdes": "144P",
"name": "144P/Kushida"
}
]
}
Now, take the exact same URL, but instead of the search string “?search=144P/Kushida
“, we use only “
“. In that case, you get the whole astronomical object and all its details (see it here)! Hence, for whatever search requests, you might sometimes get an intermediate result, or a full object result. It means the URL is pointing to either an identified resource, or a list of a tiny subset of properties of those.?search=
144P
Even more difficult, because pdes
in the output above means “Primary Designation” and is not a true ID. The real one is called SPKID
. That’s why when we receive multiple matches like above, Arcsecond will trigger new and different requests to JPL to get the real final objects.
Here are the relevant Arcsecond URLs, all GET
requests (any POST
or PUT
/PATCH
would return a 405 “Method Not Allowed”). Note the presence of the resource name smallbodies/
, and the consistency between list and details endpoints:
https://api.arcsecond.io/smallbodies/?search=<input search string>
will return always a list of results, whatever the cardinality of the JPL results. Example.https://api.arcsecond.io/smallbodies/<spkid>/
will return the details of that given object, or 404 if it is not known. Example.- https://api.arcsecond.io/smallbodies/ will return a 400 bad request with a message explaining how to use that endpoint.
URL semantic is respected, and HTTP responses use standard status codes, and response bodies contain consistent results.
Example 3 of the good, the bad, and the ugly: Fink.
Fink is a broker providing access to data from transient events. Basically, when there is some explosion in the Universe, astronomers have systems to detect and observe the event. And Fink serves as an intermediary between those systems, and the community who want to analyse the associated data.
But let’s focus on their APIs, and the question for Arcsecond to integrate them or not (as of today, this is still under evaluation). This service do have APIs, which is good. But it has a kind of weird conventions.
As documented here, Fink chose to impose the use of POST
requests only, even for getting resources, even if it states in its introduction that “… it utilizes a publicly exposed REST API (https://api.fink-portal.org) that you can use to programmatically access the data.”
Exemple:
curl -H "Content-Type: application/json" -X POST \
-d '{"objectId":"ZTF21aaxtctv", "output-format":"csv"}' \
https://api.fink-portal.org/api/v1/objects -o ZTF21aaxtctv.csv
The payload of the request (after the -d
option) contains the object ID, and some output option. Again, this is fragile (apart from being strange again), and inconsistent. For instance, why objectId
use the camelCase, while output-format
is obviously written in kebab-case ?
What would be an Arcsecond API? Simple, when you truly follow REST, using a GET
request:
https://api.arcsecond.io/finktransients/ZTF21aaxtctv/?format-csv
But the best yet remains to come…
Example 4 of the good, the bad and the (really) ugly: TNS.
TNS is the official website (recognized by the IAU – International Astronomy Union) to report transient events. It is one of the source of Fink.
TNS APIs are documented here. Like Fink, developers (API consumers of TNS data) must always use the POST
HTTP verb, even for getting resources. And the way you differentiate GET
and POST
is to include get/
and set/
in URLs! I’ve never seen this before. Here are example requests:
curl -X POST -H 'user-agent: tns_marker{"tns_id":YOUR-BOT-ID,"type": "bot", "name":"YOUR-BOT-NAME"}' -d 'api_key=YOUR_API_KEY&data={"objname": "2021rf"}' https://www.wis-tns.org/api/get/search
curl -X POST -H 'user-agent: tns_marker{"tns_id":YOUR-BOT-ID,"type": "bot", "name":"YOUR-BOT-NAME"}' -d 'api_key=YOUR_API_KEY&data={"objname": "2021rf", "photometry": "1", "spectra": "1"}' https://www.wis-tns.org/api/get/object
Even more bizarre, as shown above, you are supposed to use a header called “User-Agent”, and twist its purpose (it’s a known HTTP header otherwise), and associate a serialized JSON object in it, where you provide an ID and your registered bot name?! This part is really puzzling:
-H 'user-agent: tns_marker{"tns_id":YOUR-BOT-ID,"type": "bot", "name":"YOUR-BOT-NAME"}'
On top of this, you provide a request payload inside which is included your API_KEY
(which is a kind of ID), beside the data payload of the request, which itself contains not only the ID of the resources but also output options???? It is probably one of the most bizarre thing I’ve seen in a long time.
What could be an Arcsecond equivalent URL of the first GET? Again, following the REST rules described above:
https://api.arcsecond.io/tnstransients/2021rf/?include=photometry,spectra
The API key (and only that key, no bot name, no bot ID, since these are all redundant) provided as custom header: X-Authorization-TNS-Key: Key <api_key>
. This is the way you can interact with Arcsecond APIs programatically.
Conclusion
URLs are the first and foremost element of an APIs design. And as Steve Jobs said:
data:image/s3,"s3://crabby-images/8caa2/8caa2bf9d909c8cf610e81f0a74a7ce97e88b4b7" alt=""
Next, quite obviously: adopt https://api.arcsecond.io for your astronomical services! I know, it’s a bit presomptuous, because out of the hundreds of Arcsecond endpoints, not everything is perfectly standard. It’s a long and tough work!
Good luck everyone, and tell me in the comments how you design your APIs.
Bonus: ChatGPT answers what REST probably means in the IVOA website.
My prompt:
The website of the IVOA, the International Virtual Observatory Alliance, contains a section for developers. In these introductory explanations, they mention the REST architecture. But what do they mean by that? IVOA produces documents that describe how Virtual Observatory services should be built and interact with each other, mostly using a special XML serialization called VOTable. All of this has nothing todo with REST and RESTful APIs, isn’t it?
ChatGPT Answer:
The reference to the REST architecture in the IVOA developer section is likely pointing to how modern services in the Virtual Observatory (VO) ecosystem might interact or be accessed, rather than directly relating to the XML serialization format like VOTable.
REST Architecture: What It Means
REST (Representational State Transfer) is an architectural style for building web services. It is based on principles such as:
- Stateless communication: Each request from a client contains all the information the server needs to fulfill the request.
- Resource-based approach: Services expose resources (e.g., data or objects) identified by URLs, and clients interact with them using standard HTTP methods (GET, POST, PUT, DELETE).
- Representation: Resources are represented in various formats, such as JSON, XML, or plain text.
The REST architecture is widely adopted for building APIs because it is lightweight, scalable, and easy to integrate with modern web technologies.
The Role of REST in IVOA
While it’s true that the IVOA focuses on protocols and formats like VOTable, TAP (Table Access Protocol), and SAMP (Simple Application Messaging Protocol), the mention of REST could indicate that:
- Service Accessibility: Modern Virtual Observatory services may be exposed as RESTful APIs to make them easier to access and use, especially for developers integrating VO services into web or application ecosystems.
- Newer Protocols: Some IVOA protocols or services might adopt RESTful approaches for ease of implementation and interaction, even if the underlying data exchange involves formats like VOTable.
- Complementary Use: REST could be used to expose endpoints that serve VOTable data or query results, acting as an interface layer for clients.
For example, a RESTful VO service could allow:
GET /observations/{id}
: Retrieve a specific observation in VOTable format.POST /queries
: Submit a custom query and get the results back, possibly in VOTable.
REST vs. VOTable
You’re correct that VOTable itself doesn’t directly relate to REST. VOTable is an XML-based format designed for tabular data with metadata, and it’s widely used in astronomy for data exchange. However:
- REST is about service design, while VOTable is about data serialization.
- A RESTful VO service might use VOTable as one of the supported formats for responses, alongside JSON or CSV, depending on the client’s needs.
Why Mention REST in the IVOA Context?
IVOA’s mention of REST might aim to:
- Encourage modern, web-friendly design for VO services.
- Help developers understand how to integrate VO services with other tools or platforms that expect RESTful APIs.
- Highlight evolving practices in the VO community, bridging traditional XML-based services with modern web APIs.
In summary, REST and VOTable serve different purposes but can coexist in the ecosystem of Virtual Observatory services. The reference to REST in IVOA’s documentation likely reflects an effort to modernize service interaction while maintaining compatibility with established VO standards like VOTable.
(end of ChatGPT conversation)
My personal conclusion: Well if IVOA encourages a RESTful service because it is more developer-friendly, it’s probably because… it damn is! And lot more popular! And consequently, we don’t actually need all these VO complexity. Just use the web, instead of creating a competing standard!
No responses yet