In the spirit of New Year we thought it was a good idea to give a little insight into how hubski works and what our plans for the technical future of hubski are.
First a little history of hubski's codebase. As mk said in his State Of Hubski we started out as a clone of Hacker News which is written in Arc Lisp, Paul Graham's experimental lisp dialect. HN is a rather focused side-project with design and code decisions made for specific ends for the hacker community pg wants to see: from the text-centric, process-list-esque design to a highly simplified markdown implementation. This is the what hubski has been built on, or hacked on in some cases.
One of the main issues that comes from this is that very few other people actively work on or with Arc. There are very few libraries written in arc so that means if we want to interface with most other software we need to write our own client library. The other option is to just code it ourselves, but most of what we write would pale in comparison to other software which have had a ton of time put into them by brilliant communities.
How things work currently
When you first connect to hubski you hit nginx which handles much of our static content as well as cached content. If you aren't logged in nginx will check a redis server to see if that url has been cached, otherwise it passes it on to the application which is a monolithic app which handles everything.
All of our data is stored in files each of which is just an s-expressions What the app does is it loads some to all of the posts and comments into memory in a hashtable. At the point we're at now we can't load the entirety of our data into memory due to issues with RAM size and expensive function calls. A lot of functions - such as detecting duplicate posts when submitting, searching, or retrieving feeds - are basically map calls across this entire working set. In addition to this our working set is a bit of a memory leak because it doesn't have a way to manage its size so it just keeps growing until the we reset the app (which we do fairly often).
Plans for the future
There are a lot of things that we would like to do - things that have been asked for by the community. Ideally we would have a system setup that automatically scales to deal with traffic and that makes adding services and features in any language relatively simple.
One of the first steps is going to be to give hubski a proper database. Separating the database from the hubski app will be a big step towards being able to add other services. As an example this will allow us to more easily integrate mature search solutions to hubski such as Elasticsearch or Solr Right now every post has a list of words which search as its search text and we map through our working set. This of course limits us to search through only what we can store in memory which is unfortunate.
The next step will be to make hubski horizontally scalable, i.e. allow us to run multiple copies of hubski and load balance between them. One of the things that this will allow us to do is run different versions of hubski at the same (like have a beta version that beta testers can use).
What this all means
Hopefully this will give some idea of the hurdles we face when developing hubski. The ultimate goal of our work will be to make hubski's code into as much of a fertile soil as hubski is. My dream is this:
1. Have hubski run and scale on its own while having the ability to add functions and services on the fly with a plug-and-play simplicity.
2. Have a development environment which can be set up in a short amount of time so open source developers can help develop hubski or just hack away on their own.
3. Have an nice RESTful API for people who want to build things with and for hubski.
There will be a lot of questions to answer along the way such as what is an appropriate API policy to allow people to be creative (and we have a very creative community) while maintaining a long-term healthy relationship with people who choose to use it and what parts of hubski would we want to open source with what license. At the end of it all I do think that we will be in a much better place. Even for those of you who have no interest in the technical details behind hubski we will, for instance, be able to better respond to your feedback.
I'm excited for the future of hubski and I keep this goal in mind whenever I'm working. We have a lot of potential as a community and I firmly believe this will give us a firm foundation.
Thanks not only for taking the time to make Hubski a better place through constructing it, but for taking the time to be transparent about it too!
Awww yiss. I think quite a bit of people have been waiting for this. Overall, this sounds like a great plan. I'm not sure how much back-end work is required for hubski (sounds like a lot), but this should certainly reduce the amount of maintenance work required, right? Which would then lead to more development for user-faced features.3. Have an nice RESTful API for people who want to build things with and for hubski.
Thanks for the insight! Since being new here, it helps me to understand the platform better. What I def can say is, way more transparent and lighter overview than in Reddit. Kudos and keep it up!
Thanks for the insight! Since being new here, it helps me to understand the platform better. What I def can say is, way more transparent and lighter overview than in Reddit.
We'll probably go with Mongdodb, mainly because there is a client library written in Racket. Arc is kinda built on Racket (technically I believe it was originally written on top of mzscheme but we use racket to run it now) so we should be able to cross define the client library into Arc and use it. Aside from that the considerations that I made were that it needs to be schema-less (we like to play around a lot, add new features) and scalable from one server to n (things like Cassandradb typically require a minimum of 3 servers to start).What type of database are you considering?
mk on advertising, last year. Anyone know how close we are, cost-wise, to needing to implement any sort of subscription system?
This is great news! as a small time developer and a new member (by the way, I love the site so far!) I would love to see these additions.