Yesterday, I fixed a performance bug in post pages, where they were doing an expensive 'score' query for every comment. They now do a single query for all scores. Should be faster, especially for posts with a lot of comments.
Today, I migrated user notifications to SQL. That was the last list in the users structure, which required a one-to-many table, and thus made loading and saving the whole profile (which the app does needlessly, prolifically) slow.
So, the performance and timeout issues when I tried to move users to SQL in May should be resolved. Saving the users structure will be only a single query on a single table now.
It's still a big change, so I still need to do it at a time when I'm available to fix things rapidly for a day or two. I was really hoping to do it before I took possession of the house I'm buying, because I'll be busy fixing things on the house for a while. Not sure that's going to happen.
The plan is still to convert users to SQL, and with everything in SQL, it becomes easy to convert each Arc function to Racket one at a time.
As always, questions welcome.
First of all, congratulations on buying a new house! A few questions from a curious newbie: 1. What makes SQL better than whatever storing method that you were using before? 2. Why is Hubski to be moved to Racket? Isn't Arc built on top of Racket already? Is the shift to increase performance, like moving a JavaScript-based code onto C?
Data was originally stored as serialised s-expressions. So, files containing e.g. `(publication (id 1234) (user foo) (tag bar) (tag2 baz) …)`. This worked because all data was loaded into memory on startup. Keeping everything in memory is both not scalable, and expensive for hosting. If the data isn't loaded on start and kept in memory, It's untenably slow to load and search files like that for specific data you need, when you need it. SQL is fast to query. SQL is also more flexible: we can change SQL servers, or even pay for separate SQL hosting if it's cheaper. Some of our data fits a 'NoSQL' document store model, but not all, so we'd have to keep two databases, or deal with non-relational pain for our relational data. Further, if we think we need the performance of non-relational storage, PostgreSQL is still faster than Mongo. Hubski is moving to Racket for flexibility. Arc is a terrible language. It's poorly designed, and even more poorly documented, and completely lacks many features such as SQL. Racket is very well documented, and has libraries for everything we need. While mk has been writing Arc for years and has a pretty good handle on it, for everyone else, it's a nightmare to figure out what obscure, undocumented macros and functions do. Having known neither before I started on Hubski, it easily takes me 5–10× as long to write Arc as Racket. Arc is also poorly mantained; if there were a bug in the interpreter, we'd likely be stuck with it. Racket is actively mantained by a large community. In short, Arc is dead. We're moving to a live language.
Are we talking RAM here? I can imagine it being slow when handled with HDD and the response times, but the idea of being loaded into memory and still being slow doesn't seem fitting. Were/are there any alternatives to Racket that you see fit for something like Hubski?If the data isn't loaded on start and kept in memory, It's untenably slow to load and search files like that for specific data you need, when you need it.
In short, Arc is dead. We're moving to a live language.
Yes. It isn't slow when loaded in RAM. But we don't want to keep it loaded in RAM, because paying for hosting with a lot of RAM is more expensive. And also because every time someone makes a post, that's more RAM we need, so the longer the site goes on, and the more people posting, it just keeps needing more and more RAM. HDD space is much cheaper than RAM, so we want to keep all the data on the HDD. We can do that with SQL, which can quickly search and load specific data on the HDD as-needed. We can't do that with the current s-expression files. Yes, but mk particularly likes Lisp, and I don't disagree. Common Lisp and Clojure would also be suitable; they're both comprehensive, popular, and well maintained. But since Arc is built on Racket, we can call Racket directly from the existing Arc, and thus move the site to Racket one function at a time. With any other language, we'd have to rewrite the site all at once, before we could use it. As a software engineer, other languages I think would work well are Elixir, Go, Scala, and Nim. Another possibility would be a drop-in Content Management System such as Wordpress or Plone; or a drop-in SQL REST API such as PostgREST, with SQL views and users, and the "site" entirely client-side HTML and Javascript.Are we talking RAM here?
Were/are there any alternatives to Racket that you see fit for something like Hubski?
Yeah, I can see that. If it's no secret, how much RAM you would've needed to run Hubski RAM-side full-time at the moment and how big is an average post? You say "HDD", and I immediately think of SSD. It's no RAM, but it's much faster nevertheless. More expensive, too. Have you considered SSD? I do see that it's better to use faster software than buy faster hardware, but still. Would using SSD instead of HDD make that big of a difference, even? Do you see any of the possible programmatical constructions you've mentioned being used as a base for Hubski some time later?It isn't slow when loaded in RAM. But we don't want to keep it loaded in RAM, because paying for hosting with a lot of RAM is more expensive.
We currently pay for a VM with 16G of RAM. The app is currently using 2.5G. In the past it's used over 8G, so we feel the need for at least 16. We are running Hubski with everything in RAM at the moment. I've made a few small tables load dynamically, but +95% of our data, all publications (posts and comments) are still loaded into RAM on startup. Again, that's one of the things I'm working on fixing, and one of the primary motivations of moving to SQL. Moving to SQL doesn't get things out of RAM automatically. We still have to do additional refactoring to make things load as-needed from the HDD. I said 'HDD', but our host uses entirely SSDs, so we get that for free. Probably not. One of these days I'll probably spend a weekend configuring Wordpress to look like Hubski, and see what it looks like. But, we want an API anyway: the most likely scenario is, once all data is in SQL, creating an API that serves all data needed by every page on hubski.com, and then "rewriting" hubski.com as a static html page that queries the API, and the "main app" goes away.Do you see any of the possible programmatical constructions you've mentioned being used as a base for Hubski some time later?
Oh, so you rent it. I thought you were hosting it on your own equipment. 2 x 8 GB is less than $100 on Amazon. I'm sure it's increased by the electricity costs but not so much as to justify outsourcing it (though I have no idea whether it's true in the US). Why not set up your own server?
There's more than just the cost of the RAM to factor in here. You would of course need an entire server capable of hosting Hubski, but even that isn't the expensive part. When you pay for a host, you're doing more than just renting hardware (though of course that's part of it). For Hubski to get the same quality of service on their own hardware as they do from a source they would need: - The hardware itself -Space to store the hardware - High speed internet, which probably means you want that space to be in a datacenter - Someone to be around all the time to guarantee the electricity stays on & the hardware is fixed if anything goes wrong I don't think you could do all of that yourself for Hubski's funding goal of 2.4k / year. Even then, it would be hard to beat the reliability & flexibility of using a hosted server. If Hubski wanted to upgrade their server, they'd have to do it for each piece of the puzzle. You have to get to a massive scale before you even hit the break even point of maintaining your own hardware - even then some companies (Netflix) still opt to use hosted providers.
Yes, we don't host ourselves. Bandwidth is the big cost, not electricity. Almost all ISPs here forbid 'servers' on residential connections (though many don't strictly enforce it, unless the traffic is noticeable). It'd be at least $100-150/month, much more than hosting should be.