CS5321 Network Security

Week 12

The contents of this lecture mainly by Dr. Muoi Tran.

Blockchains

While digital currencies with cryptography are not new (e.g. Chaum82, Chaum85, Camenisch05), Bitcoin's novelty comes from decentralization using a blockchain.

Bitcoin nodes can achieve chain agreement via:

All nodes can generate transactions (confirmed transactions are included in a block)
- Miners generate new blocks from unconfirmed transactions, requiring some powers (e.g. computational or stake), but has incentives of new coins
Nodes connect via a peer-to-peer network, propagating new transactions/blocks via gossiping protocols and verification
- Networks can be permissionless or permissioned
- Consensus rules: (1) only valid blocks are stored, (2) only one block at any height is accepted

The security of the blockchain is composed of three layers: (1) transaction layer, (2) consensus/mining layer, (3) P2P network, with each layer depending on the one after. Big research topics in blockchain network security includes:

Block propagation
- Delayed delivery: late blocks can result in forks in the blockchain. Typically a non-attack in the case of network latency, or large block, etc.
- Selfish mining: miners that do not release new blocks immediately to gain advantages in mining the next block
- Incentives: no incentives for non-miner nodes to propagate blocks -> centralized
Transaction propagation
- DoS attacks: flooding of transactions
- Censorship attacks: miners can choose which transactions propagate, e.g. censor transactions with high fees
- Privacy: transactions are grouped together and linked to origin, e.g. IP address
Network P2P topology
- Peer discovery: discovery of peers requires the use of central entities, e.g. Bitcoin bootstrapped nodes by developers
- Mapping network: mapping of network topology allows attackers to conduct targeted attack on influences nodes
- Eclipse attacks: splitting of P2P network resulting in lack of consensus

Partitioning attacks

Goal to isolate one or more nodes from the rest of the network. This enables other following attacks:

Double-spending attack: attacker spends twice and overrides the first transaction with the second
51% attack: attacker outcompetes each partitioned miner (notably, for a equal network split, attacker only needs 40% mining power to gain majority over each 30% network)
Other attacks like selfish-mining, transaction censoring, etc.

Two main methods of fragmentation:

Eclipse, Erebus attack: influence victims to connect to adversary-controlled peers
Bitcoin hijacking attack: adversary intercepts all legitimate peering connections

Eclipse attack

"Eclipse Attacks on Bitcoin’s Peer-to-Peer Network," Heilman 2015.

Goal: Influence victim to connect to adversary-controlled botnet, assuming victim has a public IP address (Bitcoin version 0.9.3 or earlier).

Regular nodes can have up to 8 outgoing connections, selected from internal peer database addrman
- Full nodes have a public IP and additionally up to 117 incoming connections
New nodes are typically bootstrapped by receiving list of peers via DNS seeds (if IP addresses are unknown)
1. Solicited ADDR messages as a response to GETADDR requests
2. Unsolicited ADDR messages from IP advertisements
addrman stores each IP with a timestamp:
- ADDR messages (containing up to 1000 IP addresses) enter "New table"
- IP address of peered nodes enter "Tried table", using a hash-bucket algorithm and eviction policy to cycle IPs
- Outgoing connections are selected from either "New" or "Tried" table

This effectively isolates the victim node, since the "New" (unsolicited ADDR flooding) and "Tried" (connect to victim via botnet) tables all contain adversary IPs. This works since the adversary IPs have more recent timestamps. Only 3000 botnet IPs are required for sufficient /16 prefix diversity.

The Eclipse attack is mitigated in the latest Bitcoin Core, by adopting countermeasures:

Random eviction: No preference to newer IPs
Test before evict: Do not evict existing IPs that are still reachable
Feeler connection: Periodically test and move reachable IPs from "New" to "Tried"
Larger table size: Increase "New" and "Tried" table size 4x
Remove direct IP insertion: Only outgoing IP connections made are stored in "Tried"

Bitcoin hijacking

"Hijacking Bitcoin: Routing Attacks on Cryptocurrencies," Apostolaki 2017.

Essentially using BGP hijacking to intercept all Bitcoin connections of the victim. This is viable since Bitcoin is highly centralized from network perspective, e.g. 60% of all Bitcoin transactions flow through only 3 ISPs. Victim IPs need to belong to prefixes shorter than /24 for it to be hijackable.

Hijacking only <100 prefixes can isolate up to 47% mining power. While effective, BGP hijacking attacks are very easy to detect.

An alternative attack is actually by relying on a latency-inducing MitM adversary, by exchanging packets between two nodes late. This causes the victim node to waste mining power.

Erebus attack

"A Stealthier Partitioning Attack against Bitcoin Peer-to-Peer Network," Tran 2020.

This attack works even with Eclipse countermeasures. Idea is for the attacker to force himself on-path of node connections, by spoofing shadow IPs across their AS. The attacker only needs to be within top-100 ASes to have access to millions of shadow IPs, as shown in graph below:

Since shadow IPs are geographically well-distributed, it is difficult to identify.

The strategy is to flood the "New" table and wait for the trickle-down via feeler connections (~2 mins/IP). This effectively isolates the victim nodes after 5-6 weeks.

See the Erebus attack website. Here's also a table that nicely summarizes the last three attacks:

Week 11

Anti-censorship

In short, here's a nice map of internet restrictions:

Naive solutions to anti-censorship fail because the proxy servers are known to both users as well as censors themselves. This allows the censors to either perform DoS, or perform deep packet inspection to see whether a proxy is being accessed.

Some desired properties of an anti-censorship system:

Unobservability
Unblockability
Plausible deniability
Deployment feasibility
Scalability

Decoy routing

Refer to the slides - a stenographic tag is used to leak session secrets to the Telex station.

Unobservable circumvention

Another approach to censorship circumvention is by hiding information within imitations of popular protocols (a.k.a. a Parrot system). This papers shows that current implementations don't tend to have perfect mimicking, and is hence highly distinguishable.

SkypeMorph (2012)

Uses traffic shaping from an actual Skype connection. Some problems:

Does not respond properly to changes in bandwidth (packet rate controlled via TCP control channel), compared to true Skype client

Other failed tests:

StegoTorus (2012)

Link dependencies revealed when censor intentionally increases latency on one link

Does not behave like a regular HTTP server

CensorSpoofer (2012)

Replication of SIP...?

Main recommendation is to run the actual protocol itself, instead of trying to mimick over protocols.

Cover protocols

Cover protocol uses a popular protocol to encode communication data, instead of trying to imitate the protocol.

One example is the FreeWave protocol runs with IP over VoIP, since (1) the high number of users means blocking will generate large collateral damage, (2) protocol is encrypted. This uses the Skype protocol as a cover, i.e. data is encoded as audio and wed

Problem arises when the data sent by the data encoding protocol and the actual protocol does not match, e.g. in packet length.

Week 10

Anonymity

Defined as action being unidentifiable within a set of subjects. A direct list of applications:

Privacy-preserving web browsing (from marketers, governments)
Untraceable electronic mail (whistleblowing, political dissidents)
Digital cash (emulate unlinkability of paper money)
Anonymous voting / survey
Censorship-resistant publishing

Simple implementation of anonymization via proxies (relays messages between nodes). A single proxy is susceptible to traffic analysis (correlations in packet size and timing) along inbound and outbound links of the anonymizer - a simple expansion into a network solves this issue.

Mixes

An early proposal for anonymous email is "Untraceable electronic mail, return addresses, and digital pseudonyms" (Chaum 1981), which uses a trusted re-mailer together with public-key crypto (denoted as a Mix server).

All senders encrypt their message with Mix's public key (together with a nonce), and specify the recipient, e.g. $$ M \leftarrow{} A\; (\rightarrow{} B): \{r_0, \{r_1, M\}_{pk(B)}, B\}_{pk(mix)} $$
For return addressing, A can attach $$ \{ K_1, A \}_{pk(mix)}, K_2 $$ to the message $$ M $$ sent to B.
- B will send his reply message $$ M' $$ with another nonce $$ \{r_2, M'}_{K_2} $$, encrypted with the symmetric key.
- To send the message, the Mix server is tasked to additionally encrypt the message with the other symmetric key $$ K_1 $$.
- Note that $$ K_2 $$ is for confidentiality, and $$ K_1 $$ is for anonymity.

These messages are sent through a sequence of mixes (network of mixes => "mixnet"), which will preserve anonymity as long as there is one trusted mix.

This gives rise to the concept of onion routing (Reed, Syverson, Goldschlag, 1997), i.e. routing info is contained within the message, and encrypted with the public key of routers along the path (so each router only learns the identity of the next router)

Main problem lies in public key encryption computationally expensive -> high latency.

Tor

Second-generation onion-routing network developed (Dingledine, Mathewson, Syverson, 2003).

The Tor circuit is setup by establishing symmetric session keys with each onion router (OR) one hop at a time, using DH key agreement. Authentication is unilateral (authentication of ORs only).

Alice establishes a TLS session with OR1 (assigned circuit ID 1 below)
Alice's secret $$ g^{x_1} $$ sent to OR1, and the reply following the rest of DH protocol
Alice then requests OR1 to relay an extension of the circuit with OR2

Main benefits:

Client-side, many applications can share one circuit
Tor routing does not need root privileges or kernel modification (democratization of routing => more routers => better anonymity)

For clients to learn about other ORs, first-gen ORs used in-band network status networks (incurs expensive flooding, something about partitioning attack?). Today using directory

Hidden services over Tor

Responder anonymity is achieved by announcing the hidden service (.onion) via rendezvous points (or introduction points) on a set number of ORs.

Bob sets up hidden service:
- Bob looks for multiple nodes willing to be Introduction Point
- Bob notifies Directory Server of service name and IP address of Introduction Point
Alice as client wants to connect to Bob's service:
- Alice contacts Directory Server to retrieve IP address of Introduction Point
- Alice looks for a node to be Rendezvous Point
- Alice sends IP address of Rendezvous Point to Introduction Point, which in turn forwards it to Bob
- Both Alice and Bob connect to Rendezvous Point to communicate

Note that all the connections to OR are initiated and sustained by clients and servers, i.e. IP addresses of client and server are not known to the ORs.

Attacks on Tor anonymity

Network attackers with partial/global network access (by tapping ISPs)
- Traffic correlation attacks: passive monitoring of cells, identification of client-server pairs
  - e.g. state ISP controls both entry and exit hops
  - Intersection attack, e.g. correlation of multiple content timestamps with online users
  - Prefixing hijacking attack
- Traffic confirmation attacks: active marking of flows
Malicious exit nodes: can see destination and perhaps application layer protocol

Solutions via mixing, padding, traffic shaping, but introduces latency. Note that the goals of Tor do not include:

Side channels (e.g. timing, packet size)
Traffic analysis
Securing of directory servers
Defending against malicious ORs
Hiding network connections
"Lack of anonymity set"

A short comment on Operation Onymous, which advertises a subset of controlled OR nodes.

Week 9

Denial of service (DoS)

Definition: "A group of authorized users of a specified service makes said service unavailable to another group of authorized users, for a period of time exceeding the intended/advertised service maximum waiting time."

See distribution of DDoS attacks, obtained from this article. The article additionally mentions the size of attacks follow a Zipfian distribution.

Week 8

BGP routing

Same high-level concepts as usually introduced:

Autonomous systems (AS), each with their own 16-bit number (ASN) (notable ones include Starhub 4657, NUS 7472, Singtel 7473, Google 15169, 36561...)
BGP routers are edge routers sitting between AS, running BGP which is a policy-based path vector routing protocol
- Entire world sees routing advertisements
- Policies are driven by economics
IGP (Interior Gateway Protocol) runs the systems within the AS

A bit more detail into BGP routing: the AS directly adjacent to the end-host advertises that it can route to said end-host, to other AS. This propagates iteratively.

Under the hood, the AS managed by the ISPs are organized in a hierarchical manner, with Tier 1 ISPs sitting at the top. ISP can both act as peers in exchanging peering information, or if there are no such links, the ISPs refer to upper level ISPs for this routing. The choice of routes follow a rough precedence order:

Routes are imported from all three layers: customers, peers, as well as the higher providers. When exporting routes,

Since the customer pays the ISP for access, they get all the routes (i.e. routes advertised by other customers, peers and providers).
Peers and providers only obtain the minimal routing information that the ISP is beholden to advertise, i.e. routes to itself internally, as well as to its customers.

This leads to the Gao-Rexford model that models BGP routes, i.e. AS preference for customer, to peer, to provider routes, and there is an emergent valley-free property for traversal and advertisement, i.e. no BGP routes go down to the Customer then back up to the Provider.

Attacks on BGP

Prefix theft: ISP falsely advertises prefixes (i.e. injecting itself on the path to destination) for routing
- This attack can be easily detect
AS path interception: either truncation or alteration
1. One-hop prefix hijacking resulting in an invalid next-hop

Successful attacks are conditioned on which invalid routes the AS accepts, typically when the new route is cheaper (and shorter) than the original valid route, i.e. easier to misdirect routes going to Customers, as opposed to Providers (the latter requires more money).

Countermeasures:

Resource Public Key Infrastructure (RPKI): Each AS can only advertise address blocks that it owns, i.e. binding prefixes to the AS that can originate them -> prevents prefix hijacking
BGPsec: Routing information to sub-prefixes are signed by the AS -> prevents prefix hijacking, and interception to some extent
- Problems with real-time signatures and validation, as well as differences in BGPsec message formats (?)
- Next-AS attacks are possible if the adjacent AS does not adopt BGPsec [Lychev et. al., SIGCOMM13]

SCION

Achieves properties including availability (prevent DDoS attacks, no kill switch), sovereignty, transparency and secrecy. Tries to maintain the following architectural principles, including:

Stateless packet forwarding (inconsistent states are problematic)
Path-aware networking (sender embeds path information, with multipath capability)
Avoids using public key encryption due to its vulnerability, and relies on local roots of trust

Dependency analysis for software reliability! Avoid circular dependencies. Also formal verification of software.

Architecture is described by groupings of AS known as Isolation Domains (ISD), which are managed by the ISD core formed by administrative (core) AS to manage the domain. Within the ISD, the core AS fulfill two advertisement:

Up-path segments disseminated by the core AS (essentially flooding clients)
Down-path segments chosen by the core AS to advertise to core AS in other ISDs

Since the clients have all up-, core- and down-segments, they can intersect routing information to create shorter peers. Some extension possible to direct peering links.

Relatively high coverage in Switzerland, as well as some subset of networks in Singapore.

Disadvantages:

Q: How does this mitigate problems with advertising false paths? Reliance on core AS to correctly advertise down-segments - same problem as BGP. Avoid signing

Problems with RapTOR goes away when TOR layer is run on top of SCION, to achieve private communication. In SCION, there is a possibility of using peering links that are only known to the host.
SCION documentation and education. Security from multiple levels: FPKI, TRC certificates, etc. Some slides, may be useful...?
Also has a domain that seems to be malicious lmao: http://sbasdemo.net/
Useful links: BGPStream, DigitalAttackMap

Week 7

DNS

Some keywords:

DNS is a distributed system
DNS namespace is organized as a tree whose nodes are labels
Namespaces are broken up into zones with its own authoritative name server, and authority over subtrees can be delegated, demarcating zone boundaries.

Main threats to DNS: (1) Availability: DDoS, (2) Integrity: Malicious open recursive resolvers, network MitM, cache poisoning, (3) Confidentiality: Pervasive monitoring, censorship

To defend against DDoS, can overprovision DNS servers and rely on anycast (typically implemented with BGP) for the root DNS.

DNSSEC

A possibly interesting overview of cache poisoning attacks, as well as comparison between DNSCurve vs DNSSEC (the former provides point-to-point security, but do not fit the goals of DNS). Cache poisoning essentially achieved by having attacker force nameserver to resolve target domain to own IP address (when domain is not currently cached).

Before adoption of Bailiwick rule in 1993, which prevents nameservers from answering outside their authorized zones, DNS cache can be poisoned using arbitrary domains.
Alternatively, can flood DNS queries as well as spoofed responses with randomized 2-byte IDs. The 2008 Kaminsky attack relies on injecting the malicious DNS-IP mapping into the "Additional Section" section of subdomain DNS queries, since arbitrary subdomains will likely not be cached, and satisfies the Bailiwick rule. This is resolved by randomizing the UDP source port used for DNS.

DNSSEC provides integrity of responses by signing DNS responses (root DNS signs TLD DNS certificate, and so on). Responses are pre-signed and cached to minimize signing overhead. Denial of existence (signing for arbitrary zones that do not exist) achieved by using a hashing function and return a hash range for which valid domains do not exist. Main adoption challenge: large response size of DNSSEC can trigger TCP fallback, and some networks filter TCP responses.

DNS-over-Encryption

Problem: Pervasive monitoring of DNS traffic by state-sponsored actors (e.g. NSA's QuantumDNS (2014) and MoreCowBell (2015)) to perform user tracking [Kirchler 2016] and user behaviour analysis [Kim 2015].

Some notable protocols, with a paper characterizing implementation fidelity and pervasiveness on server-side DoE, as well as reachability and performance on client-side, and finally the usage of DoT and DoH.

DNS-over-TLS (DoT) [RFC7858] that wraps DNS lookups with TLS session
- Employs port 853 to distinguish from other TLS traffic
- Client-side more implemented in Android/Linux kernel, but not in browsers (for obvious reasons), while server-side open DoT resolvers mostly owned by large providers
- Similar RFC for DNS-over-DTLS, essentially TLS over UDP instead of TCP, but functions primarily as a backup proposal for DoT with no real-world implementation
DNS-over-HTTPS (DoH) [RFC8484], mixing both the HTTPS and DNS application layer protocols
- DNS query either in URI parameter or POST message body
- Mostly covered by major browser vendors, e.g. Chromium, Firefox
- De-facto standard as of March 2020, though face reachability issues, e.g. China blocking DoH traffic
DNSCrypt (2011) has low pickup, and DNS-over-QUIC too advanced for its time?

Worth reading up on Encrypted SNI and its ties to TLS1.3. This might also be something interesting to read up on: ycombinator. DoH is increasingly used for C2 by malware, see article.

Week 6

Honeypots and threat analysis

Goal is to detect indication of upcoming attacks, and collect data during attack to derive intelligence, e.g. where? what tool? attack progression and vectors? patterns/trends? These are collectively used to evaluate security of real system by checking for vulnerability.

Two types of honeypots, see a larger list here:

Low-interaction: Produces minimal responses for services, mainly used for statistical evaluation
- Industrial Control Systems (ICS) honeypot that has TCP listeners for popular smart grid related ports and dummy servers
- CONPOT
High-interaction: Emulate realistic system to attack, consisting of dummy devices and network topology
- Emulation of whole smart grid monitoring and control infrastructure:, e.g. honeypot that includes also virtualized workstations, that doubles as a cyber range (for cyberattack experiments)
- Cowrie (SSH)
- Honeyd honeypot framework, including OS fingerprint spoofing

This contrasts with decoy networks which are deployed alongside the real system with virtual devices to confuse attackers, e.g. DecIED (9-anonymity for example). Shodan is a good way to identify the websites.

Week 5

On the TCP/IP stack. Something about r-utilities for remote control/login, e.g. rlogin.

Accountable IP (AIP) performs self-certification from... fell asleep...

Need to look up the mechanism of AIP. Something about session-based keys, whose public key is then hashes to form an identifier.

The other topic on TCP.

Week 4

Public Key Infrastructure

More on digital certificates/signatures, and PKI in general:

How a digital certificate and certificate authority works
- Multiple CA is the typical PKI model
- Self-signed, domain-validated, extended validation (EV) certificate
Secure channel set up using certificate's public key

Trusted CAs are typically listed to function as pre-shared keys (see 2012 MSR SV PKI project, and others). Many CA compromise/breach incidents:

Misconfigured CAs
- 2015 eDellRoot certificate - Dell shipped laptops that trust a self-signed root CA, and also included the private key
CA breaches - e.g. Symantec
Certified lies
Compelled certificates

Certificate revocation to invalidate certificates, either by publishing CRL (certificate revocation list) and pushing delta CRLs during updates, or via OCSP (online certificate status protocol). Notably, OCSP avoids issues with mass certificate revocation, e.g. during CA compromise. Something about OCSP stapling as well.

Other than the obvious traffic overhead for revocation querying in OCSP, there is also a potential privacy concern where CA learns user activity.

Some techniques attempted to increase security of PKI:

2015 HTTPS Public Key Pinning (HPKP), detailed in RFC-7469, where the certificate chain must include a trusted public key. Long pinning of compromised CAs can compromise security.
2008 Perspectives where users trust a set of notary servers for majority vote of validation - each notary server contacts known SSL/TLS servers every day and stores entire history of observed certificates and support queries. This allows detection of MitM and compelled certificates (i.e. one misbehaving CA), but requires additional connections.
2011 Convergence enhances Perspectives to perform lookup via 2-step onion routing: 1st server redirects query to 2nd server which responds to check if key returned by domain is equal. Incurs additional latency.
2013 Certificate Transparency (CT) - using an append-only log for certificate pinning, as a proof of inclusion, including both authorized and unauthorized certificates.
- Merkle-tree as data structure, by distributing the signed tree head (STH) $$ \{ TH \}_{K_{\text{log}}^{-1}} $$distributed by a CT log server. Auditability from analysis of the logs which are public - verify that new entries come after old ones, and new log tree contains the old log tree.
- There is a discussion summary thing by one Daniel W Lee? Can discuss over forums.
- Concerns: Logs need to highly available -> geographic distribution -> need some consensus mechanism that has to also deal with latency. Leakage of non-public domains, with subdomain enumeration possible.

In CT, the certificate is sent to a number of logs, with each log issuing a signed certificate timestamp (SCT) with a promise that the certification insertion will occur within a maximum merge delay (MMD), e.g. 24 hours. This SCT is used together with the domain certificate. Implementation-wise can be performed either via X.509 extensions (via the CA), or using TLS directly (handled by server).

This is more a community-maintained protocol, since essential for external parties to monitor behaviour of log servers, as well as consensus between web clients, and auditors to identify misbehaviour. Note that CT is not a revocation mechanism, but only as an identifier for misbehaviour.

As of 2016, two widespread PKIs in use: CA Baseline Requirements, and IETF PKIX.

Week 3

Key establishment/distribution

Secure session established by sharing symmetric secret keys, which are typically randomly chosen session keys instead of a permanent key - avoids replay attack.

Instead of relying on PKI for public key distribution (to initiate sharing of session key), both parties can also rely on a trusted key distribution center (KDC) via the Needham-Schroeder protocol.

KDC issues pairwise secret session keys
KDC performs key distribution, revocation, etc. and thus must be online during key distribution

Week 2

Asymmetric encryption

Covering asymmetric encryption primitives, namely how public-key encryption works: usage in authentication, digital signatures (non-repudiation), Diffie-Hellman key agreement (discrete log problem)

DH key agreement does not provide authentication, and is vulnerable to man-in-the-middle attacks. This is mitigated using RSA algorithm. A rough overview. Public-key infrastructure (PKI) is defined in RFC4949, which defines the set of resources for management of digital certificates based on asymmetric cryptography.

Importantly, symmetric crypto is about 3 orders of magnitude faster than asymmetric crypto. This is a pretty useful comparison:

Hash functions

Hash functions are simply functions that map inputs of arbitrary length and compresses into outputs/digests of fixed length L, i.e. $$ H: \{0,1\}^* \rightarrow \{0,1\}^L $$. For a cryptographically-secure hash function, three properties must be satisfied:

Collision resistance: Difficult to find $$ x $$ and $$x'$$ such that $$H(x) = H(x')$$.
Second-preimage resistance: Difficult to find $$ x' $$ such that $$H(x) = H(x')$$, given $$ x$$.
Preimage resistance: Difficult to find $$x$$ such that $$H(x) = y$$, given $$y$$.

The complexity of attack for hash functions of n-bit digests with either preimage resistance and second-preimage resistance, assuming a random search, is half the output search space, where the search space is of size $$ 2^n $$. With only collision resistance, this improves to $$ 2^{(n/2)} $$ trials on average (i.e. birthday paradox), since the attacker can compare with any other output value previously seen.

Probability of hash collision with only collision resistance

Common usage of hash functions:

Fingerprinting
S/KEY as a one-time password system
- A hash chain is generated, say $$ H^{100}(\text{secret}) $$ stored on the server, and authentication is achieved by the user supplying $$ H^{99}(\text{secret}) $$ that satisfies the server's record.
- Vulnerabilities include MitM as well as race conditions (attacker reading first N-1 values then test N values, though mitigated by wrapping in a secure shell session).
Merkle hash trees
Password hashing for password storage at rest
- Vulnerabilities in limited password space, and the use of rainbow tables (pre-computation of hashes). Mitigated by slowing hash functions (e.g. bcrypt) and use of a random salt to prevent pre-computation.
Commitment schemes, e.g. C(m) to hide "m" and bind sender to "m".

Week 1

Module information

Lecturer: Daisuke Mashima (mashima@comp.nus.edu.sg, https://www.mashikma.us/daisuke/index.html)
Office hours: Tue-Thurs 6-7pm (Zoom: coordinate on prior day via email) (Physical: CREATE TOWER #14-02 UTown)
Content: Module itself focuses on emerging systems in the market, as well as research opportunities. One or two research paper readings per lecture (focus on threat model and security assumptions).
Books: Kaufman, Network Security
Grading: 2 take-home exams 25% each, 6 biweekly Canvas online 30 min quizzes 25% from top 5, individual mini-project (announced week 7) 20%, participation attendance discussion summary 5%.
1. Alternative to mini-project is a cyberattack/defence mechanism paper from top-tier conferences, implement an attack/defence tool and show demonstration (5-10 page report denoting tool and usage)

Contact Prof Seth Gilbert for CS3235 due pre-requisite. [4236]

Introduction

Principles of information security:

Confidentiality (keep someone else's data)
Privacy (keep data about person secret)
Availability (keep data accessible)
Anonymity (keep identity of protocol participant secret)
Secrecy (keep data hidden from unintended participants)
Integrity (ensure stored data is correct)
Entity authentication (verify identity of protocol participant)
Data authentication (verify integrity of transmitted data and sender) - related to non-repudiation of origin using digital signatures

Some terminology notes:

Signature provides authentication + public verification (i.e. convince others of the origin of the data)
Authorization (allow entity to perform action) vs Auditability (enable forensic activities)

Symmetric encryption schemes

OTP as one example of stream ciphers, e.g. RC4 and AES in CTR mode where initialization vector (IV) and ciphertext has to be transmitted. Essentially transforming the shared key $k$ using PRG($k$, IV). Noted vulnerabilities:

Keystream reuse attack - avoid reuse of IV
Ciphertext modification attack

Block ciphers (e.g. DES, RC5, AES) extends encryption block size from individual bits to fixed block sizes, so the key itself defines a mapping from plaintext block to ciphertext block. Key space should be at least 128-bit to prevent key enumeration attacks.

AES is particularly powerful (i.e. Rijndael) - runs in 28 cycles/byte in software, and 3.5 cycles/byte in Intel's AES-NI
Different modes of operation: ECB, CBC are block cipher mode, while CFB, OFB and CTR are stream cipher mode.

AES block cipher modes

Electronic codebook (ECB) mode involves splitting into fixed blocks, encrypting, then concatenating. Simple to compute, but one-to-one mapping of plaintext to ciphertext can potentially leak information (e.g. by distribution of blocks), see the ECB Penguin example. This does not provide semantic security, i.e. indistinguishability in the context of a chosen-plaintext attack.

Cipher block chaining (CBC) mode uses the previous cipher block concatenated with the next plaintext block for encryption. This allows semantic security, but now cannot be parallelized. IV cannot be reused, nor should the IV be predictable (e.g. SSL 2.0). The CBC mode is vulnerable to a padding oracle attack, when the decryption module indicates whether the padding is valid or not.

Counter (CTR) mode uses an incrementing counter that is encrypted, and XOR-ed with plaintext to generate the ciphertext. CTR mode is also vulnerable to IV reuse.

Example of CBC mode vulnerability to key reuse

Suppose a DB uses the same secret key K for encryption (i.e. $E(K, P \oplus IV)$, with different IV for each user, and the column admits only fixed enumeration of values, say "true" and "false".

Eve, knowing IV_A and IV_E, can generate the ciphertext seen by Alice by providing "true" XOR IV_A XOR IV_E.

Authentication

Remember that encryption does not provide authentication. The latter can be provided using Message Authentication Codes (MACs), which is essentially a checksum for authentication. Popularly implemented as a hash-based MAC (HMAC), or using AES's CBC-MAC (replacing IV with constant 0, so that the final block as the MAC is deterministic).

Authenticated encryption requires semantic security and ciphertext integrity. Can be done either via:

Encrypt-and-MAC (e.g. SSH): Use of different secret key for encryption and authentication, with the ciphertext and MAC concatenated. Not semantically secure since same MAC generated for same plaintext.
MAC-then-Encrypt (e.g. SSL/TLS): i.e. E(k, P||MAC(k2, P)). Proven secure against CPA attacker, but unknown against CCA.
Encrypt-then-MAC (e.g. IPSec): i.e. E(k, P) + MAC(k2, E(k, P)). Proven secure against both CPA and CCA.

The threat model discussed above is as follows:

Chosen plaintext attack (CPA) allows attacker to obtain ciphertext for chosen plaintext.
Chosen ciphertext attack (CCA) allows attacker to additionally send ciphertext for decryption into plaintext.