So I’m designing the full text indexing for the PGXN search site. I’m modeling it on CPAN Search, which has been great. There are four search options:
The documentation search is the one I’m perhaps least sure about. It assumes that each extension in a distribution will have documentation. But so far that has not really been the practice for PostgreSQL extensions. Most folks seem to stick the documentation in the README. And even then it can be almost nothing. So a search for “count nulls” probably would not find “countnulls” extension, because there is no documentation. What should I do about this? I’m thinking one of:
Encourage folks to write documentation. I’m going to do this anyway, because the docs will really help the visibility of an extension on the site. It looks like this. If you have no docs for an extension, your extension will not appear in the search results (or perhaps it might, but link to the distribution).
If there is no documentation for an extension in a distribution, index the README as the documentation. I’m not really keen on this idea, because the README should describe the distribution, how to install it, etc. I’m planning to use it in the distribution-specific index. Documentation of the extension should be more about how the extension works, what it’s interface is, etc. Or so it seems to me, at least (I’m admittedly biased to this practice among CPAN modules). But at least with this approach there would be a link to “documentation” for an extension on the search site.
Erm, not really thinking of any other options. I feel pretty strongly that folks should write docs for their extensions, as much as possible, and I’ve set things up so that, from PGXN’s point of view, at least, you can write documentation in whatever format you like (assuming the format is supported by or added to Text::Markup), as long as they’re in a doc/ or docs directory. I want it to be as easy as possible. But I also want there to be decent search results ASAP.
Comments?
I’ve started working on the main (search) site in earnest now. The basic layout is done, and I’m working on the distribution view (mockup). My thinking so far has been that I would simply serve a page that requested, say, /dist/pgTAP/, and that page would use Ajax requests to fetch the data from the API server and display stuff. I think this will work pretty well except for one thing: 404s.
That is, if you request /dist/nonexistent/, then it will load a page with the HTTP status code 200 OK, but then, when the Ajax request 404s, it will show a “Not found” error message. That’s all well and good, but I’m wondering about the impact of two things:
Since the page itself won’t 404, search engines might index links to nonexistent extensions. Of course, bad links won’t be that common, but of course they do happen and then tend to live forever.
If the search site uses Ajax to fetch the contents of a page via JSON (or, for documentation, as an HTML document it will put into a div), will the full content be properly indexed by search engines?
So these are serious questions, in my mind. Do we loose good search engine indelibility when we load content dynamically?
Of course, I can instead write it so that the back end fetches stuff from the API server (and perhaps directly from the file system) and get ‘round these issues, but then it’s less of a cool example of the use of the API server.
What do you think? Good advice much appreciated!