The second piece of the PGXN infrastructure, after PGXN Maanager, is the PGXN API Server. I’ve just finished the API documentation, which covers both the lightweight static file API provided by mirrors and the superset provided by the API server. So now seems like a good time to talk about the design of the API server and how it works.
At its core, the PGXN API server is just another mirror. It has an hourly
cron job that rsyncs to the master mirror, updating the mirror. But then it
iterates over the rsync log and transforms some things. Here’s what it does:
README file and any files recognized by Text::Markup and converts them to sanitized HTML with a table of contents. Such files can then be used to display the README on the distribution page and to display individual documentation files.META.json generated by PGXN Manager. For example, as of this writing, the API server’s semver 0.2.1 META.json and the unversioned semver.json are identical. Effectively, this format has all the metadata from the META.json as well as a list of all releases of the distribution from the semver.json. This is useful for displaying all the data on the distribution page by fetching the data in a single API request.META.json file. For example, if you look at the semver 0.0.0 META.json, you’ll see that it includes 0.2.1 in its list of releases, even though 0.2.1 was released after 0.2.0. This allows semver 0.2.0 page on the main site to have a select list of version to choose from, including versions released later, with a single API request.semver.json to the API semver.json.theory.json to the API theory.json and the mirror data types.json to the API data types.json. This allows the user page and tag pages to include the abstract in the list of distributions released by the user or associated with a tag.All of this merging stuff came out of my thinking following the discussion of the PGXN API RFC. The decision to use Lucy instead of PostgreSQL’s full-text search followed rather naturally from this, as I quickly realized that there was no other driving need for a relational database behind the API at all. The only dynamic API is the search API. Everything else is just static files. And given the performance issues of in-database search, as well as the desire to have fewer outside dependencies, made the decision a natural one.
Beyond the syncing, there is a very simple web server providing the HTTP REST interface to the static JSON files and the full-text search. That’s it, really. The API server is really just another mirror on steroids. The nice thing is that it allows an interface, such as WWW::PGXN or the new PGXN client to work with either interface, just failing gracefully when API server APIs are unavailable.
If you want to learn more about the specifics of the REST API, the API documentation has all the details. Really, it’s quite comprehensive!
I actually consider the API to be 1.0-complete at this point, unlike PGXN Manager. The only thing I want to add is JSONP support for static JSON files (right now it’s only for search results) and might tweak a few things here and there, but otherwise I think it’s in pretty good shape.
Longer term, though, it might be worthwhile to add some other features to enhance the value of PGXN overall. Some ideas:
But I think we need to build up some momentum on the foundation that’s in place. Have you submitted your extensions, yet?
Had an interesting discussion on #plack. The upload form, which takes a POST request for an upload, sends a redirect on a successful form submission. This is known as the Post/Redirect/Get (RDG) pattern, though I didn’t know that before DuckDuckGoing it it today. But I realized that the code was not redirecting on a failed form submission, but reloading the form. I was thinking that one should always redirect on POST, and so was looking into a Rails-like flash pattern to cache an error message and the form contents on the redirect. But in this discussion, I learned a few things:
One should use HTTP 1.1 status code 303 rather than status code 302 for these sorts of redirects. Apparently, this status code, called “See Other,” is specifically designed for use redirecting from a POST. Thanks to counfound for pointing that out.
The whole point of RDG is to prevent a double form submission. But the truth is, in the case of an error, you want a second form submission. For errors — such as a username conflict or a malformed distribution archive — the POST failed, so you show the form again along with a message about the problem and how to fix it.
What’s nice about this is that it’s already the way I was doing it! In fact, for such errors, I’m returning HTTP status code 409, which indicates that the request failed due to a conflict, and the user should fix it and resubmit. So no need to redirect on failure. Yay!
Note that for XMLHttpRequests, I’m not redirecting at all, but simply returning the proper status code (success or conflict, generally) and a fragment to be used to show an error message (or “success”) (in HTML, JSON, or plain text, depending on the requestor’s preferred type).
I think this works pretty well, and I’m pleased to be making good use of HTTP. Does it make sense to you?