The Wild Wild Web:

Information Security from a Developer's Point of View

Node and PHP are both powerful server-side languages. They are also ones which leave security almost entirely up to the developer. In other words, as a developer you're not handcuffed, but you're also starkly exposed unless you write a suit of armour into each of your scripts.

No website can be totally secure, but protecting a site means you'll avoid most headaches that you'll almost certainly face if your site remains unprotected. Here I give some of the precautions that safeguard the database-driven CMS I wrote for this website.

Also, these are some definitions of terms that should be useful as a sort of InfoSec 101 for developers. This is a bit oversimplified, but hopefully I have made up for that with clarity. For security beginners this should also serve as an inroad into reading texts like Padraic Brady's Survive The Deep End: PHP Security or Karl Düüna's Secure Your Node.js Web Application.

Context (also 'Environment', 'Frame'). The context of the code you write is the parser interpreting what you type. For instance, the context to this paragraph is an HTML parser, interpreting HTML.
- Writing '<script>…</script>', we change the context within the script tags from an HTML parser interpreting HTML, to a JavaScript parser interpreting JS.
- From here down, I'll use the languages themselves as synonymous with the term 'context', just because that makes it easier to talk about them.
- Browsers recognize a number of client-side contexts, including HTML, HTML attributes, JavaScript, Cascading Stylesheets, and URLs, each of which have their own special characters (characters which are interpreted by the parser to have special meaning, like '<' and '>' in an HTML parser) and therefore each of which must be defended against using different tactics.
Sanitization. I use this as the most general term, a combination of filtering, validation, and/or escaping (see below) that attempts to ensure an input to a context — like your scripts, your database (DB), HTML, JS, etc. — does not trigger unexpected and/or unauthorized behaviour.
Input. Anything that comes into your script from any outside source whatever. This can be the DB, the frontend, an API, etc. There are 2+ billion internet users who can visit your site or write APIs at any time. If only 0.0001% of them are bad actors / attackers, you can't trust any input at all. In short, sanitize everything.
Output. Anything that your script gives out to any external source whatever. Again, this can be the DB, the frontend, an API, etc. Always sanitize any output that came from input.
- Though all the stuff you write in yourself is considered safe (no one after all attacks themselves, or at least no one should), sanitize your own stuff too. Later on you might have the specifications change into ones that must accept inputs.
Blacklists. A blacklist allows all data through as written, then removes specific characters or substrings from that data.
- For example, filtering a user's input in your scripts first allows a string through as written, and then removes specific characters / substrings from that string.
- Another example is escaping output. The use of this: within a context (say, JS), escaping converts special characters into their defanged counterparts. So for example '{', which is normally an interpreted character that doesn't show up on the screen, would be escaped into the printed version '{' that shows up on the monitor and is no longer interpreted by the JS interpreter.
Whitelists. A whitelist only allows specific text / information through, removing everything else.
- For example, validation checks that a string matches your pre-defined rules. If it does not, it's rejected: an empty string (or 'false' or whatever you wish, so long as it's not the input string) is returned.
Rule Of Thumb: 'Filter then validate all input. Escape all output'. It's not always possible to use this rule, but whenever you can, I'd highly recommend it. Here's why:
- Input. First you filter the input, to see if the filtered version of the string can pass the validation, even if the unfiltered string could not. Then you run it through a validation check, a whitelist of acceptable inputs. Anything that doesn't make it through this validation stage is rejected.
- Output. If you want to e.g., write dynamic HTML through PHP or Node, escape everything. Namely, output only non-interpreted strings. This way attackers have little chance of defacing your website (or worse).

Some Of This Website's Protections For Current Or Future Inputs

This website is built with a custom CMS I wrote in PHP. Here are a couple of things I did to protect it (there are many more than this, such as sending the appropriate HTTP headers, anti SQL injection code, sanitization of all inputs from the admin backend within the class which has the methods to receive said input, etc.)

Cross-Site Scripting (XSS). XSS is a security threat that allows an attacker to inject any JavaScript they please into your displayed webpages. If allowed through, this gives the attacker carte blanche to manipulate your website any way they see fit, including manipulation of the DOM, stealing users' personally identifiable information, sending Ajax requests to other sites and a whole host of other nasties.

I took my inspiration from Edward Z. Yang, the creator of HTMLPurifier (a poetic, elegantly written piece of code). For my CMS I decided to be more draconian than him however: I coded an HTML reparser that takes the HTML you put in, turns it into a tree, then strips anything that has an ounce of potential to remotely come close to being a threat. It then returns the clean HTML to your application.

This clean HTML is used with any code in the CMS which is dynamically served to the frontend from the server side.

The reparser itself eliminates '<script>…</script>' and other tags, re-arranges the final product according to the rules of inline vs block nodes, etc. If the HTML is so bad that it can't find a way to clean it up completely, it returns an empty string — but this usually doesn't happen unless the input actively tries to be malicious, or has very poor syntax.

In my code for anti-XSS, I found it easier to parse the initial HTML with PHP's 'DOMDocument' class, then convert this in my own reparser into a Left Child, Right Sibling (LCRS) binary tree of my own design. After manipulating the tree where necessary, my code returns the final product. All of this was unit-tested all the way.

PHP Sessions, Put Into A Database. HTTP — the backbone protocol of the web — is a stateless technology. This means the server has no memory from one client-side request to the next, and vice-versa. PHP includes sessions to make up for this fact. Sessions provide a way of storing any data you want into the server's filesystem, or for the more adventurous, into a database. Then you can re-use this data in any normally memoryless page you please.

It turns out that storing sessions in a DB is quite a bit more complicated than any websites I've seen will tell you (such as this one (2004), or this one here (2018), both excellent articles in themselves).

This is mostly because of PHP's internal session quirks. The reader will have to go down that rabbit hole on their own if they wish to, but essentially I've ensured safe, perhaps even overly-paranoid, PHP sessions in the DB. This means any website using this CMS on a shared hosting plan can rest easy: your sessions are no longer shared.

Cross-Site Request Forgery (CSRF). A user logs into a site. The site trusts that user (technically, the website's server trusts that user's browser). If a completely separate ('cross') domain ('site') sends an unauthorized request through that user's browser — the current site thinks that request is coming from a trusted user, when in fact it is not — then that's a forged request.

I have created multiple tokens in the administrative backend which check that the request is coming from this site itself, by invisibly sending a token each time the user does something to change states. If not, the backend saves all of that user's work (but deletes nothing), then logs them out.

Hack Me Ethically

I don't believe I've left any room for crackers to take down this site, besides the odd case e.g., 0-day vulnerabilities in PHP, etc. However if you manage to hack or crack my site, please do be ethical about it! Send the vulnerability to admin at mgraichy.com