The PGXN infrastructure is currently made up of four parts:
The network of mirrors, derived from the master mirror, and synchronized on various schedules via rsync (host a mirror).
PGXN Manager (code) is the core of the network. It provides the interface for users to upload releases, processes those releases, indexes them, and puts them on the master mirror. Details below.
PGXN API (code) is a mirror server with benefits. Once an hour, it rsyncs from the master mirror, and does extra processing of new and modified files, notably full-text indexing. I’ll write up some details next week.
PGXN Site (code) powers the main site. It’s a thin wrapper around the API server, using WWW::PGXN to fetch JSON files and convert them to HTML. I’ll write more about this bit next week, too.
Some details on Manager. This is by far the most complicated part of the system. Which is funny, because I hadn’t anticipated that when I started work on PGXN (I’d estimated half as many hours as for implementing the site). But as I worked through design issues and wrote the code, the need for that complexity became apparent — and not just because it’s the only part that offers authentication. The main reason it’s complex is so that no other part needs to be.
Allow me to explain. People can upload almost anything as a distribution. So long as the META.json adheres to the spec, the rest can be just about anything. But the upload doesn’t necessarily end up on the network unmodified. Sure, if you follow the guidelines of the HOWTO what you uploaded will be exactly what ends up on the network. But I didn’t want to be that strict about PGXN Manager would accept. So in addition to verifying the structure of your META.json file, PGXN::Manager::Distribution also:
Extracts the archive. If it’s not a zip file, a zip archive is created from the contents of the uploaded file. You can upload any kind of archive readable by Archive::Extract. The currently supported formats are: .tar, .tar.gz, .gz, .Z, tar.bz2, .tbz, .bz2, .zip, .xz,, .txz, .tar.xz and .lzma. By always converting to a zip file, PGXN client apps can be quite simple, not having to worry about the archive format.
Validates the META.json. This part isn’t perfect, yet. I fixed a bug today where it would die and return a 500 on a missing version number. It’d probably be worthwhile to adapt CPAN::Meta::Validator to validate PGXN META.json files at some point, both for Manager and for developers wanting to validate before uploading.
Normalizes all version numbers in the META.json into semantic versions. You can specify the distribution version, prerequisite versions, and extension versions as simple numbers and they’ll be converted to semantic versions. A version like “1.20”, for example, becomes “1.2.0”. See the SemVer documentation for details on how versions are normalized via the declare() method. This normalization is done so that client applications will get known valid semantic versions to compare when determining dependencies. However, it’s best that they be semantic versions to begin with. Normalized versions will be written back to the archive META.json file (with the “generated_by” key updated to reflect that PGXN Manager regenerated the file). If no versions need validating, the archive META.json will be left alone.
Makes sure the zip archive has a directory prefix named "$dist-$version/". If the archive has no directory prefix, or if the prefix is not "$dist-$version/", the archive is rewritten with that prefix. This ensures that the archive will always extract into a directory with the same name as the archive and not spray files all over your desktop when you unzip it.
Copies or writes out a new zip file named "$dist-$version.pgz". Think of .pgz as “PostgreSQL Zip” or, if you’d rather, “PGXN Zip”. Either way, it’s just a zip archive.
That processing done, with a good META.json and zip archive, the JSON, username, and SHA1 of the zip archive are handed off to the database for more processing. The add_distribution() database function does all the heavy lifting here. It:
Parses the JSON string, validates that all required keys are present, and normalizes version numbers. Yes, this is redundant, but I don’t think I need to lecture the reads of this blog about database integrity. :-)
Creates a new metadata structure and stores all the required and many of the optional meta spec keys, as well as the SHA1 of the distribution file, the date, and the user’s nickname.
Sets the “release_status” to “stable” if there was no status in the original JSON.
Adds a “provides” section to the metadata if none was included in the original JSON. In such a case, it assumes that the distribution contains one extension and that it has the same name and version as the distribution itself.
Validates that the uploading user is owner or co-owner of all provided extensions. If no one is listed as owner of one or more included extensions, the user will be assigned ownership. If the user is not owner or co-owner of any included extensions, an exception will be thrown.
Records the distribution, extensions, and tags in the database.
Once all this work is done, add_distribution() returns all the JSON that needs to be written to the mirror. These files make up the “index” on the network, and include:
META.json file (example). This file is derived from the META.json included in the archive (example), but reflects all the normalization changes and added keys outlined above.It also returns JSON for network statistics files. These are updated every time a new release is uploaded:
dist.json lists the 56 most recent releases and has a count of all distributions and of releases of those distributions.extension.json lists the 56 most recent extension releases and a count of all extensions on the network.user.json lists of the 56 most prolific users (based on the number of distributions) and a count of distributions and releases for each.tag.json lists the 56 most popular tags (measured by the number of distributions they’re associated with) and a count of all tags on the network.summary.json has basic summary information about the network, which is just counts of distributions, releases, extensions, users, tags, and mirrors.If you think that’s a lot of data to be updated, you’re right! But since releases are relatively infrequent (a couple a day at the moment), it’s best to generate all this stuff as static files that are rsynced to all mirrors. In this way, any mirror can function as a very simple, lightweight REST API. And indeed, that’s just how the planned PGXN client will behave. The WWW::PGXN already provides the interface it will use.
I guess that was a lot of information. Let this be a reference document for interested hackers, then. The core functionality is all there, but there’s a lot more to be done:
rrr (#10).$nickname@pgxn.org will be forwarded to a user’s actual address. This would allow us to remove literal email addresses from the JSON files and the site (the site obfuscates them, but still…). Anyone got some good postfix chops for this? The users table is quite simple (#13).Want to help out? Fork PGXN Manager and have at it. Hell, at this point I’d really appreciate a code review, as I’m pretty sure there’s only been one set of eyes on this code so far.
Next week, I plan to blog about
rsyncBut given how these things go, and how I need to start writing mirror API and API server documentation, it might take me a longer to get to them all.
Last week I created the PGXN Manager interface for requesting a PGXN user account. It looks like this:

I really like the placeholder support in HTML 5, here nicely rendered by Safari. I’ve also used the jQuery Validation plugin to validate the form fields. So if you try to submit an incomplete form, it will complain before submitting, like so:

The back end does similar validation if you have JavaScript disabled, so it should degrade nicely. I think I might add a Twitter field so @pgxn can credit users for their uploads, but otherwise this is done.
As with PAUSE, an administrator must accept it or reject an account request. Once your account is approved (likely unless you’re a spammer), you’ll be able to upload distributions (I’m going to do that part today). I finished the user admin interface yesterday. Here’s a screen snap:

Some details. PGXN Manager will be hosted at http://manager.pgxn.org/. It uses basic auth for authentication; logged-in users will access https://manager.pgxn.org/auth. Administrators will have an extra menu item to this screen, which will allow them to accept or reject account requests. Its URI is /auth/admin/moderate. The admin can click the “Play” button to see the requestor’s note explaining why he should get an account. It’s a popover enabled by some jQuery code and looks like this:

Once an admin has read the request, she can accept it by clicking the blue checkmark icon, or reject it by clicking the red minus icon. The former links to /auth/admin/accept/{nickname} and the latter to /auth/admin/reject/{nickname}. The submits are done by jQuery async requests by default, but if you have JavaScript disabled they will submit as usual and the back end will process the request and simply redirect to the moderation screen.
This works very well, I think. It’s an attractive interface and degrades reasonably well. (Well, you can’t read the request details if you have JavaScript disabled, but the requests work nicely). The jQuery code fades out a row after it has been accepted or rejected, so it’s easy for an admin to do a bunch of moderation all at once. And finally, the interface is driven by the URLs, so I think it’s pretty restful. The only ones I’m not sure about are the accept/reject URLs, because they have action names in them (“accept” and “reject”). Is that RESTful? Or should they use GET query strings or something?
Okay, on to the upload interface. Wish me luck!