a thoughtful web.
Good ideas and conversation. No ads, no tracking.   Login or Take a Tour!
comment by mknod
mknod  ·  3200 days ago  ·  link  ·    ·  parent  ·  post: Hubski Update: Welcome rob05c! + more tag info

Sanatizing it often causes its own problems including losing things like spaces, or having a ridiculous regex that depends on knowing what the user intends on inputting. For example are we going to sanitize for "? Well what about “ or ❝ ?

The solution you've come up with is the old programming problem:

' a programmer has a problem that they solve with regex, now they have 2 problems '

I don't necessarily agree with their solution but it can be easy to see how they came to it. 🐐 here is an unsanitized goat.





syzo  ·  3200 days ago  ·  link  ·  

    Sanatizing it often causes its own problems including losing things like spaces, or having a ridiculous regex that depends on knowing what the user intends on inputting. For example are we going to sanitize for "? Well what about “ or ❝ ?

Just the bare minimum, basically anything that would come out as HTML or scripts, so you can't do <b>this</b> or <?php echo("this"); ?> or <script>alert("this");</script>

So just turn "<" into "&lt;" and ">" into "&gt;" and you should be good to go? You need to make sure you can't SQL inject, too (the issue with those quotation characters, I imagine) - I obviously haven't thought of this too far and I'm sure there's a bunch of issues like that. There usually are libraries to do input sanitizing, aren't there?

Then, Markdown can handle the rest as normal, which sounds like it's the harder issue with specifying a grammar and building a lexer+parser off it. Markdown would probably ignore things like “ or ❝ or 🐐 and treat them as normal characters.

    🐐 here is an unsanitized goat.

Get it together, goat. Wash your hands more often!

Saw the goat on my phone but not on my desktop browser :(

mknod  ·  3200 days ago  ·  link  ·  

Yeah I too am ignorant for the reasons they did this, and feel like your method is probably a better mix of the right direction approach. Computationally it might have something to do with it, there might be some data to support looking up a hash in postgresql vs a string is computationally better than generating some text processing.

Like I said though "I dunno!"