Changelog
- 2023-05-02: Likely earlier, initialization of wiki page.
- 2024-07-03: Add flowchart for a more complete set of home network deployment instructions.
Home network instructions
Hopefully this would be a nice exhaustive list of things to do from ground up, in case my network goes bonkers.
Get a domain
Get a new domain from a domain registrar. Preferably one that has:
- WHOIS Domain privacy
- Cheap renewal rates
- (Optional) Bundles DNS hosting:
- Support for editing different types of DNS records (i.e. A, CNAME, MX, TXT, NS)
- API access to modify DNS records (for ACME + DDNS)
TODO: DNS zone transfer
https://github.com/joohoi/acme-dns
See below for some self-musings last time, should probably update this for next time.
The journey to self-hosting
It all started with the Raspberry Pi 3 I bought myself back when university started around 2016/17, even while I didn't know heads and tails between an Arduino and Raspberry Pi. Four years later I'm barely starting my journey to self-hosting, and more to come.
While I am by no means a data hoarder, the amount of stuff I had gathered was enough to fill several hard drives of around 250 GB each. The fact that the files were scattered across drives made it hard to manage them, and the first instinct was to introduce a RAID system that created a larger logical volume. This eventually happened when I bought a QNAP NAS alongside four 4 TB hard drives in 2019 - RAID 0 for both storage and speed meant I now have 8 TB to play around with.
Eventually this became problematic - I had no backup solution in place, and data transfer and backup was always done over the cloud via Dropbox. The website I had was also hosted by the Apache web server on the NAS, which is tedious to handle. Picking up networking skills also took a little time.
Now, the horizon is a little visible, and soon came a massive influx of things to work on. I suppose I appreciate the work of sysadmins more now. There are a lot more architectures and self-hosting capabilities to learn about, now that I have begun my very own (modest) homelab.
Operating system
The very first non-Windows system I experimented with was Raspbian OS (for the Raspberry). Needless to say, my inexperience with the Linux operating system left me disappointed, and so did the lack of games and support for other commercial products (such as Microsoft Office).
I did have all the time I wanted to explore scripting - at first with Python, then batch scripting for start up scripts. A bit more familiarity with the command line (as a result of git and needing 'vim' for Linux-based systems) gave me the confidence to try out Linux systems again. Occasional dual booting / fresh installation with Ubuntu was a non-committal thing.
This is until I started toying with Linux tools in full force, especially during the computer network module I took in late 2020, and the use of OpenSUSE environments in the quantum optics lab + needing to do all the shenanigans to get QNAP to behave the way I want it to. Eventually migrated off QNAP to get my Raspberries up and running.
Choice of server OS
There were a couple of choices when it came to the host OS for servers.
- Ubuntu Server was often touted as beginner-friendly
- CentOS as the closest one can get to the Red Hat Enterprise Linux (RHEL) without sacrificing stability as in FedoraOS
- OpenSUSE Leap for being crusty and long-lived ("so they must be doing something right")
Thank goodness I didn't pay much attention to CentOS whose long-term support (LTS) started dissipating. OpenSUSE was a legitimate contender, especially given the use in the quantum optics lab and rock-solid stability, but three main factors raised a couple doubts:
- OpenSUSE runs on
rpm
/zypper
for package management. I'm not too familiar with this package management and the stability / compatibility of packages with OpenSUSE. - The desktop experience is considered stable, but I'm going to deploy a server with remote management (especially to reduce the overhead of the machine).
- Debian-based Ubuntu boasts a wider compatibility with different software.
So here we are: Ubuntu Server 20.04 LTS.
Knowledge base
A good knowledge base is important to store records of progress as well as attempted solutions. This is specifically in contrast to todo lists that attempt to solve a very different time-sensitive problem (some systems like "Getting Things Done" (GTD) encourage integration and single-source-of-truth, but they are tedious to maintain).
My very first attempt at a knowledge base is in Evernote - whose development team previously known to be very resistant to user feedback, and the occasional data loss, incompatibility with other formats, and a more recent two-device restriction led me off it. In the meantime, I've been consolidating notes within my own file system.
The first online-based knowledge base was in the form of a website hosted on GitHub pages, but this had very severe limitations in terms of storing sensitive content and the ability to only serve static websites. I eventually shifted to a website-based one using Sphinx and scheduled build jobs with cron
. Managed to pick up RestructuredText as a result. The problem however lay in editing the webpages (requiring access to the filesystem either via SFTP or WebDAV) and the latency between which changes are reflected on the website, but it was a good enough solution to consolidate information.
A migration to self-hosted Standard Notes from Evernote is a great step forward for secure knowledge retention, but sharing of note collections remain limited. There was no image storage functionality as well, and no extensions available for publishing to one's own webserver. There was also an impending knowledge glut that I constantly fretted over, since notes could no longer be interlinked, and notes had to be constantly pinned to get my attention.
Initial attempts to setup a MediaWiki on my QNAP NAS failed horribly, but the process was a lot easier using the Raspberry Pis. This was when I started seriously considering the use of wikis for knowledge management. Thus began the search for the right wiki software.
Choice of wiki
The decision to select a wiki software for personal knowledge management came to a few factors (ignoring those that are taken for granted such as owning your own content, otherwise Medium and Notion would be viable contenders too). The WikiMatrix is a good starting point for comparing between wiki softwares.
- Mature software: Good stability and support (common usability issues and administration issues should be dealt with). MoinMoin seems pretty new but has poor support for migration. XWiki would have been a good candidate other than the fact that development doesn't seem to be visible and there is poor uptake in the community beyond feature requests (the forum itself does seem pretty active though). PmWiki development also seems to be pretty small-scale. DokuWiki has many stale open issues. (edit: Look where I am now :>)
- Free and open source: Avoids service lock-in and costly expenses. Confluence is a popular enterprise solution but requires monthly subscription.
- Cater towards personal use with minimal bloat: Enterprise-level reliability is great, but tight integration with unneeded collaboration tools, not so much. This is especially critical since I'm running a webserver with very minimal specs. TWiki has very strong enterprise offerings but integrates many bug tracking / kanban features that aren't particularly necessary. The look-and-feel is also pretty dated - one such TWiki-based sample website Loop.ph eventually migrated away. Tiki Wiki CMS Groupware swears by the "do-it-all" philosophy.
- Granular access controls: Rather than having two separate wikis on which to maintain content, it is a lot easier for both writer and readers to navigate on a single webpage, and is given access depending on group rights. This also makes the wiki easier to manage. This is the main drawback of MediaWiki for personal wikis. The same goes for TiddlyWiki, although it seems like a very fantastic resource for non-linear learning and project management. Outline looks aesthetic but does not support public sharing of entire collections.
Wiki.js does have strong community support, considered stable by a couple of users (1~2 years) and looks pretty with a lot of exciting feature requests under review. It does share some disadvantages such as lack of concurrent editing support, no backlinking, and requires a full database which makes it more tedious to backup (though v2 of Wiki.js seems to have better migration support). Comments on Hacker News suggest that Wiki.js running on javascript is a non-ideal situation, but we'll see how it goes moving forward.
In the end, stability and software maturity are very important factors. While Wiki.js is the new kid around the block with beautiful features, the small quality of life performant features just aren't there, as well as the high number of bugs. The complexity of maintaining and customizing the wiki also significantly influences usability.
A good reminder that dated software does not mean bad software! The eventual migration to Dokuwiki proved effective.
Security best practices
Securing the server
- Keep root account disabled: By default, login into the root account on a Linux OS should already been disabled. Do not enable it - everyone knows there is an account named 'root'. By principle of least privilege, maintain root access by adding privileged users to the "super user" (sudo) group.
- Minimize exposed ports: It is trivial to do a port scan and probe such popular ports for common vulnerabilities. Under no circumstances should one expose to the internet an SSH port, a database service, a web administration service, etc. Keep these services to within the local area network (LAN) to minimize the attack surface. Expose only the bare required services (e.g. HTTP(S), OpenVPN). This is easily achieved by limiting the number of port forwarding rules on the router, which typically serves as the gateway for the LAN.
- Limit read-access to content: Protect directories from being directly read, especially for situations where WebDAV is enabled. In a similar vein, do not host sensitive content on the directories where pages are served.
- Force HTTPS: Unauthenticated HTTP sessions leave the user vulnerable to man-in-the-middle attacks. Write rules in the webserver to redirect the user to HTTPS. Required certificates can be easily obtained using Certbot (for free!). If possible, send HSTS headers as well to force browsers to adhere.
- Consider using denylist for malicious IPs: Some end-points are commonly used to perform remote code execution attacks, e.g. Xdebug server via PhpStorm, Laravel Ignition. Automated firewall tools such as Fail2Ban can prove useful. Do however consider the risk of denying legitimate users.
- Regularly update software, with a pinch of salt: Malware can often be introduced to future versions of the software when sold to unknown third-party developers, e.g. The Great Suspender, Event Stream, Nano AdBlocker. Regularly check for security patches, but make sure these are safe to update.
Browsing practices
No amount of changes in software will be effective in increasing privacy if safe browsing practices are not followed. I have always been bothered by Facebook's intrusiveness and lack of commitment to their "I will never snoop on you" messages time and time again. Now even more recently, with Google's ability to read emails, background location tracking, etc. But the convenience comes at a cost.
- Tor network: Your IP address can be used to trace your activity across websites. The Tor Browser is the singular most important tool for safe browsing, as well as a free-to-use VPN (including setting specific exit nodes in the configuration file, using the respective ISO 3166 country codes). Media streaming is however inevitably slow. One may also face service denials when servers themselves explicitly blacklist known Tor nodes.
- VPN service: In situations where faster transfer speeds are required over a proxy, a paywalled virtual private network service is good. Choose one that has good (user) reviews, e.g. ExpressVPN as of 2021, and avoid those that frequently pay for good reviews. Note that the use of free VPNs (other than the Tor network) is absolutely discouraged - the reality is when businesses spend money on servers, domains, development time, revenue has to be pulled in from the users directly.
- Avoid pirated content: Free is good, until you crash and burn your computer, or expose it to potential malware. Using pirated content can be very appealing, especially for software with high licensing fees. But one will eventually end up with malware, or even cease-or-desist through torrent honeypots. Do not install pirated software, period (and if you do, be careful to scrutinize the uploader history and reputation to the finest detail).
- Secure communication: Avoid sending sensitive content over insecure channels, period. That means converting PDFs using "free" services, uploading files to "free" services, communicating passwords over chats, etc.
- Use a password manager: A good password manager allows one to (1) generate secure passwords, (2) quickly access and enter login details across different websites and platforms, (3) with a strong encryption focus. One good candidate is Bitwarden, with the additional ability to perform timeout logouts and self-hosting. This is in contrast to simple password strategies by users and also securing your password access (passwords stored on the Chrome browser can be easily uncovered). While password managers can be a single-point of failure, this can be mitigated by two-factor authentication (2FA) - and even using a double-blind protocol, say appending an additional personal keycode to the end of the auto-generated password, if more security is desired.
- Consider avoiding OAuth logins: Related to password management is the use of OAuth "social" logins. Using third-party authentication services leaks data to these social media platforms, and compromising these social media accounts invariably has knock-on effects on these other websites. Malicious applications can also be designed to perform token phishing. Using a randomized password limits the scale of attack.