a thoughtful web.
Good ideas and conversation. No ads, no tracking.   Login or Take a Tour!
comment by guyjin
guyjin  ·  3475 days ago  ·  link  ·    ·  parent  ·  post: Hubski Update: Welcome rob05c! + more tag info

"So yes, we are open-sourcing the code, and that will happen once we have made sure that there are no vulnerabilities exposed in the release."

I think you have that backwards.





rob05c  ·  3475 days ago  ·  link  ·  

Indeed. And I both noted that as an advantage, and stressed the need for daily backups before we open.

guyjin  ·  3474 days ago  ·  link  ·  

Is there a case that Hubski knows about and I don't where a public facing website was attacked after opening source?

kleinbl00  ·  3474 days ago  ·  link  ·  

Reddit. Gimme a sec, i'll find the comment virus.

Here we go.

syzo  ·  3474 days ago  ·  link  ·  

    To prevent double escaping of certain characters, they are run through MD5 after being escaped once, and then the MD5 is undone at the end. Since the MD5 is the same every time, someone figured out that if you just put the MD5 into your comment, it would be unescaped at the end.

I can only assume "MD5" is something to do with Markdown and not the hash function. Otherwise, what the fuck am I reading?

rob05c  ·  3472 days ago  ·  link  ·  

    I can only assume "MD5" is something to do with Markdown and not the hash function.

Nope.

    what the fuck am I reading?

That is, in fact, the only sane response to Perl.

rob05c  ·  3472 days ago  ·  link  ·  
This comment has been deleted.
syzo  ·  3472 days ago  ·  link  ·  

Wait hang on, I'm all confused. Is there a reason you can't just take an input, sanitize it, then lex+parse it like you would any other compiled thing (which Markdown is, markdown -> HTML), and spit out the HTML at the end? What's with this MD5 to escape some characters to "prevent double escaping"?

I have admittedly not looked much into markdown implementations, especially for forum-like sites like Reddit.

mknod  ·  3472 days ago  ·  link  ·  

Sanatizing it often causes its own problems including losing things like spaces, or having a ridiculous regex that depends on knowing what the user intends on inputting. For example are we going to sanitize for "? Well what about “ or ❝ ?

The solution you've come up with is the old programming problem:

' a programmer has a problem that they solve with regex, now they have 2 problems '

I don't necessarily agree with their solution but it can be easy to see how they came to it. 🐐 here is an unsanitized goat.

syzo  ·  3471 days ago  ·  link  ·  

    Sanatizing it often causes its own problems including losing things like spaces, or having a ridiculous regex that depends on knowing what the user intends on inputting. For example are we going to sanitize for "? Well what about “ or ❝ ?

Just the bare minimum, basically anything that would come out as HTML or scripts, so you can't do <b>this</b> or <?php echo("this"); ?> or <script>alert("this");</script>

So just turn "<" into "&lt;" and ">" into "&gt;" and you should be good to go? You need to make sure you can't SQL inject, too (the issue with those quotation characters, I imagine) - I obviously haven't thought of this too far and I'm sure there's a bunch of issues like that. There usually are libraries to do input sanitizing, aren't there?

Then, Markdown can handle the rest as normal, which sounds like it's the harder issue with specifying a grammar and building a lexer+parser off it. Markdown would probably ignore things like “ or ❝ or 🐐 and treat them as normal characters.

    🐐 here is an unsanitized goat.

Get it together, goat. Wash your hands more often!

Saw the goat on my phone but not on my desktop browser :(

mknod  ·  3471 days ago  ·  link  ·  

Yeah I too am ignorant for the reasons they did this, and feel like your method is probably a better mix of the right direction approach. Computationally it might have something to do with it, there might be some data to support looking up a hash in postgresql vs a string is computationally better than generating some text processing.

Like I said though "I dunno!"

mknod  ·  3472 days ago  ·  link  ·  

When I see stuff like this I just imagine whiskey played a large part.

guyjin  ·  3474 days ago  ·  link  ·  

Ah, before my time. Neat.

kleinbl00  ·  3474 days ago  ·  link  ·  

It was pretty aggro. You'd open up Reddit, your envelope would have a three-digit number next to it, and if you hovered over anything you saw, you joined the party. Whole site went down for about 18 hours.

It was one of those examples of precocious kids going "hmm - what happens when I do this?" and what happens is AWS gets dragged down by a substantial percentage because Reddit's that much of a hog. Dude did a Q&A about three days later (this was a year or two before IAmA existed) but it's lost to time.

mknod  ·  3472 days ago  ·  link  ·  

Some unsolicited suggestions: I highly recommend rdiff-backup for source code. Also PLEASE guys do not keep backups on the same server as production. Too many devs I've seen lose everything because of a misplaced command hard drives are cheaper and easier than rewriting code! I'm glad to hear your implementing this!

rob05c  ·  3470 days ago  ·  link  ·  

    from: rob05c to: mk date: Fri, Jul 10, 2015 at 12:54 AM subject: Hubski Backups Ok, I set up my server to back up hubski's user data daily. It's incremental, so it shouldn't suck bandwidth, but if it's an issue, let me know and I'll kill it. I'm specifically running rdiff-backup

^_^

mknod  ·  3469 days ago  ·  link  ·  

We probably run out of the same sysadmin brain.

mk  ·  3475 days ago  ·  link  ·  

LOL, you are likely right.