a thoughtful web.
Good ideas and conversation. No ads, no tracking.   Login or Take a Tour!
comment by user-inactivated
user-inactivated  ·  3074 days ago  ·  link  ·    ·  parent  ·  post: Devski Update: Notifications in SQL, Post Performance Fix

First of all, congratulations on buying a new house!

A few questions from a curious newbie:

1. What makes SQL better than whatever storing method that you were using before?

2. Why is Hubski to be moved to Racket? Isn't Arc built on top of Racket already? Is the shift to increase performance, like moving a JavaScript-based code onto C?





rob05c  ·  3074 days ago  ·  link  ·  

Data was originally stored as serialised s-expressions. So, files containing e.g. `(publication (id 1234) (user foo) (tag bar) (tag2 baz) …)`. This worked because all data was loaded into memory on startup. Keeping everything in memory is both not scalable, and expensive for hosting.

If the data isn't loaded on start and kept in memory, It's untenably slow to load and search files like that for specific data you need, when you need it. SQL is fast to query. SQL is also more flexible: we can change SQL servers, or even pay for separate SQL hosting if it's cheaper.

Some of our data fits a 'NoSQL' document store model, but not all, so we'd have to keep two databases, or deal with non-relational pain for our relational data. Further, if we think we need the performance of non-relational storage, PostgreSQL is still faster than Mongo.

Hubski is moving to Racket for flexibility. Arc is a terrible language. It's poorly designed, and even more poorly documented, and completely lacks many features such as SQL. Racket is very well documented, and has libraries for everything we need. While mk has been writing Arc for years and has a pretty good handle on it, for everyone else, it's a nightmare to figure out what obscure, undocumented macros and functions do. Having known neither before I started on Hubski, it easily takes me 5–10× as long to write Arc as Racket. Arc is also poorly mantained; if there were a bug in the interpreter, we'd likely be stuck with it. Racket is actively mantained by a large community. In short, Arc is dead. We're moving to a live language.

user-inactivated  ·  3074 days ago  ·  link  ·  

    If the data isn't loaded on start and kept in memory, It's untenably slow to load and search files like that for specific data you need, when you need it.

Are we talking RAM here? I can imagine it being slow when handled with HDD and the response times, but the idea of being loaded into memory and still being slow doesn't seem fitting.

    In short, Arc is dead. We're moving to a live language.

Were/are there any alternatives to Racket that you see fit for something like Hubski?

rob05c  ·  3074 days ago  ·  link  ·  

    Are we talking RAM here?

Yes. It isn't slow when loaded in RAM. But we don't want to keep it loaded in RAM, because paying for hosting with a lot of RAM is more expensive. And also because every time someone makes a post, that's more RAM we need, so the longer the site goes on, and the more people posting, it just keeps needing more and more RAM.

HDD space is much cheaper than RAM, so we want to keep all the data on the HDD. We can do that with SQL, which can quickly search and load specific data on the HDD as-needed. We can't do that with the current s-expression files.

    Were/are there any alternatives to Racket that you see fit for something like Hubski?

Yes, but mk particularly likes Lisp, and I don't disagree. Common Lisp and Clojure would also be suitable; they're both comprehensive, popular, and well maintained. But since Arc is built on Racket, we can call Racket directly from the existing Arc, and thus move the site to Racket one function at a time. With any other language, we'd have to rewrite the site all at once, before we could use it.

As a software engineer, other languages I think would work well are Elixir, Go, Scala, and Nim.

Another possibility would be a drop-in Content Management System such as Wordpress or Plone; or a drop-in SQL REST API such as PostgREST, with SQL views and users, and the "site" entirely client-side HTML and Javascript.

user-inactivated  ·  3073 days ago  ·  link  ·  

    It isn't slow when loaded in RAM. But we don't want to keep it loaded in RAM, because paying for hosting with a lot of RAM is more expensive.

Yeah, I can see that. If it's no secret, how much RAM you would've needed to run Hubski RAM-side full-time at the moment and how big is an average post?

You say "HDD", and I immediately think of SSD. It's no RAM, but it's much faster nevertheless. More expensive, too. Have you considered SSD? I do see that it's better to use faster software than buy faster hardware, but still. Would using SSD instead of HDD make that big of a difference, even?

Do you see any of the possible programmatical constructions you've mentioned being used as a base for Hubski some time later?

rob05c  ·  3073 days ago  ·  link  ·  

We currently pay for a VM with 16G of RAM. The app is currently using 2.5G. In the past it's used over 8G, so we feel the need for at least 16. We are running Hubski with everything in RAM at the moment. I've made a few small tables load dynamically, but +95% of our data, all publications (posts and comments) are still loaded into RAM on startup.

Again, that's one of the things I'm working on fixing, and one of the primary motivations of moving to SQL. Moving to SQL doesn't get things out of RAM automatically. We still have to do additional refactoring to make things load as-needed from the HDD.

I said 'HDD', but our host uses entirely SSDs, so we get that for free.

    Do you see any of the possible programmatical constructions you've mentioned being used as a base for Hubski some time later?

Probably not. One of these days I'll probably spend a weekend configuring Wordpress to look like Hubski, and see what it looks like. But, we want an API anyway: the most likely scenario is, once all data is in SQL, creating an API that serves all data needed by every page on hubski.com, and then "rewriting" hubski.com as a static html page that queries the API, and the "main app" goes away.

user-inactivated  ·  3073 days ago  ·  link  ·  

Oh, so you rent it. I thought you were hosting it on your own equipment. 2 x 8 GB is less than $100 on Amazon. I'm sure it's increased by the electricity costs but not so much as to justify outsourcing it (though I have no idea whether it's true in the US). Why not set up your own server?

byonic  ·  3073 days ago  ·  link  ·  

There's more than just the cost of the RAM to factor in here.

You would of course need an entire server capable of hosting Hubski, but even that isn't the expensive part.

When you pay for a host, you're doing more than just renting hardware (though of course that's part of it). For Hubski to get the same quality of service on their own hardware as they do from a source they would need:

- The hardware itself

-Space to store the hardware

- High speed internet, which probably means you want that space to be in a datacenter

- Someone to be around all the time to guarantee the electricity stays on & the hardware is fixed if anything goes wrong

I don't think you could do all of that yourself for Hubski's funding goal of 2.4k / year.

Even then, it would be hard to beat the reliability & flexibility of using a hosted server. If Hubski wanted to upgrade their server, they'd have to do it for each piece of the puzzle. You have to get to a massive scale before you even hit the break even point of maintaining your own hardware - even then some companies (Netflix) still opt to use hosted providers.

rob05c  ·  3073 days ago  ·  link  ·  

Yes, we don't host ourselves. Bandwidth is the big cost, not electricity. Almost all ISPs here forbid 'servers' on residential connections (though many don't strictly enforce it, unless the traffic is noticeable). It'd be at least $100-150/month, much more than hosting should be.

user-inactivated  ·  3073 days ago  ·  link  ·  

Oh yeah, I forget how expensive ISPs are in the US. Fuck your ISPs, man.

rob05c  ·  3072 days ago  ·  link  ·  

Did I mention I work for Comcast? >_>

I'm on the Open Source CDN team. https://github.com/Comcast/traffic_control :D