I'm making a new tag: #devski. I'll be posting future development updates to #devski and specifically not #hubski, unless it's sufficiently user-facing. So if you care about code updates, follow #devski. If not…carry on.
I expect a lot of my work to be not–user facing. I also expect a lot of my work to be on Github. Starting now.
https://github.com/hubski/hubski
What is this??! That's the C trigraph for a vertical bar. But that's not important right now. That is what I've been working on for the last couple weeks. Don't get too excited—it's an internal API, not a public one (see my previous to-do list).
Specifically, this code:
1. converts the internal s-expression publication data to PostgreSQL
2. serve the PostgreSQL publication data over an internal REST JSON API, for raw internal publications, and which accepts both reads and writes via GET/POST.
3. adds Racket functions to read and write from the REST interface, which can be called directly by the primary Hubski Arc service.
At this point, it's about 95% complete. The big things that need done now are converting static data, and doing tests and timings for converting the live data. The s-expressions have some eccentricities which I'm not 100% sure the conversion functions handle perfectly. Hence, tests.
I'm not a Racket expert, so if you are, feel free to suggest improvements. We're not really looking for pull requests at the moment; maybe in the future, but right now, as small as this project is, it's probably about as fast to write as to review.
My to-do list is now:
1. Write tests validating sexpr–sql–json conversions.
2. Test converting the full publication file 'database' to SQL.
3. Convert hubski.com to the internal API service. This involves both converting to SQL, and converting the primary service to read from REST.
4. Convert more data. Publications are by far the biggest data we have, but there's a lot more. Hopefully the publications API will act as a framework, and the rest will go faster.
5. Create a public API, based on the internal API.
My long-term timeline hasn't changed. I'm conservatively estimating: 1,2 in the next couple weeks, 3 in a month or so, 4 by October or November, and 5 by the new year.
I have a few nonspecific things on my list too, like changing markdown→html parsing to use a library, and reviewing and publishing the Arc code. At some point, these to-do lists will appear as Github issues.
Wheeee!! Shit just got real. Awesome work Rob!
So kind. For those of you that don't code, rob05c is making Hubski more scalable, the data easier to use, and creating an API so that mobile apps and similar things can be built. akkartik, you'd probably dig #devski. So cool.The s-expressions have some eccentricities
I'm super excited about this. Heck, I sent you a message probably a year ago asking about API possilities. I'll be following #devski for sure! I have a few ideas I'd like to try once there is a good API.
It might not be a terrible idea to auto increment a primary key on those reference tables for whenever they get big. You'd just put a FK constraint on those one-to-many relationships in the other tables. create table if not exists "publication_shared_by" (
I'm not sure if the identity(1,1) syntax is correct (I work primarily with a T-SQL system).
); id integer identity(1,1), --primary key will help when this table becomes large
publication_id integer, -- fk into publication, NOT a pk, one to many
username text --create a non-clustered index to the 'username' table or something
SERIAL is the PostgreSQL version. Right now, the Arc code creates its own ids. It will have to be changed to assume auto-increment. And yes, we may, eventually. We also need primary and foreign key constraints, not null constraints, indices, and other things. I plan to add all those after the initial conversion and the live app is pulling from SQL. Having zero constraints initially makes the conversion faster, which is important because I plan to bring hubski down to convert. Right now, it takes about an hour. I could do it live, but that would take extra code, and an hour downtime at 04:00 CDT is acceptable. If you're referring to 'search_text' etc when you say 'reference tables,' those will be going away when the real search solution is done. None of the other one-to-many tables will be very big. I'm also on the fence about auto-increment. I'm not primarily a database dev, but as a developer, I generally disapprove of logic in databases. But that's a different argument.
Haha, I'd call you a blasphemer! Let the application generate the Id?! What kind of voodoo are you throwing out here?! (jk) I'm slowly finding my love for software development residing within the confines of database work. Most of our legacy application code is generally stable and a lot of tasks I find myself doing is data migration and query optimization. Before that there was a ton of handwritten replication (don't ever do it, please; but if you have to give me a call) but thats all out the window with an in-house automated solution. Almost all my coworkers are database enthusiasts, a few of which I would call fanatics. A couple posses a lifetime of wisdom within the subject that has left me really inspired. I'm super stoked about you throwing up the schema, thanks a lot!