The contents of this lecture mainly by Dr. Muoi Tran.
While digital currencies with cryptography are not new (e.g. Chaum82, Chaum85, Camenisch05), Bitcoin's novelty comes from decentralization using a blockchain.
Bitcoin nodes can achieve chain agreement via:
The security of the blockchain is composed of three layers: (1) transaction layer, (2) consensus/mining layer, (3) P2P network, with each layer depending on the one after. Big research topics in blockchain network security includes:
Goal to isolate one or more nodes from the rest of the network. This enables other following attacks:
Two main methods of fragmentation:
"Eclipse Attacks on Bitcoin’s Peer-to-Peer Network," Heilman 2015.
Goal: Influence victim to connect to adversary-controlled botnet, assuming victim has a public IP address (Bitcoin version 0.9.3 or earlier).
This effectively isolates the victim node, since the "New" (unsolicited ADDR flooding) and "Tried" (connect to victim via botnet) tables all contain adversary IPs. This works since the adversary IPs have more recent timestamps. Only 3000 botnet IPs are required for sufficient /16 prefix diversity.
The Eclipse attack is mitigated in the latest Bitcoin Core, by adopting countermeasures:
"Hijacking Bitcoin: Routing Attacks on Cryptocurrencies," Apostolaki 2017.
Essentially using BGP hijacking to intercept all Bitcoin connections of the victim. This is viable since Bitcoin is highly centralized from network perspective, e.g. 60% of all Bitcoin transactions flow through only 3 ISPs. Victim IPs need to belong to prefixes shorter than /24 for it to be hijackable.
Hijacking only <100 prefixes can isolate up to 47% mining power. While effective, BGP hijacking attacks are very easy to detect.
An alternative attack is actually by relying on a latency-inducing MitM adversary, by exchanging packets between two nodes late. This causes the victim node to waste mining power.
"A Stealthier Partitioning Attack against Bitcoin Peer-to-Peer Network," Tran 2020.
This attack works even with Eclipse countermeasures. Idea is for the attacker to force himself on-path of node connections, by spoofing shadow IPs across their AS. The attacker only needs to be within top-100 ASes to have access to millions of shadow IPs, as shown in graph below:
Since shadow IPs are geographically well-distributed, it is difficult to identify.
The strategy is to flood the "New" table and wait for the trickle-down via feeler connections (~2 mins/IP). This effectively isolates the victim nodes after 5-6 weeks.
See the Erebus attack website. Here's also a table that nicely summarizes the last three attacks:
In short, here's a nice map of internet restrictions:
Naive solutions to anti-censorship fail because the proxy servers are known to both users as well as censors themselves. This allows the censors to either perform DoS, or perform deep packet inspection to see whether a proxy is being accessed.
Some desired properties of an anti-censorship system:
Refer to the slides - a stenographic tag is used to leak session secrets to the Telex station.
Another approach to censorship circumvention is by hiding information within imitations of popular protocols (a.k.a. a Parrot system). This papers shows that current implementations don't tend to have perfect mimicking, and is hence highly distinguishable.
Uses traffic shaping from an actual Skype connection. Some problems:
Replication of SIP...?
Main recommendation is to run the actual protocol itself, instead of trying to mimick over protocols.
Cover protocol uses a popular protocol to encode communication data, instead of trying to imitate the protocol.
One example is the FreeWave protocol runs with IP over VoIP, since (1) the high number of users means blocking will generate large collateral damage, (2) protocol is encrypted. This uses the Skype protocol as a cover, i.e. data is encoded as audio and wed
Problem arises when the data sent by the data encoding protocol and the actual protocol does not match, e.g. in packet length.
Defined as action being unidentifiable within a set of subjects. A direct list of applications:
Simple implementation of anonymization via proxies (relays messages between nodes). A single proxy is susceptible to traffic analysis (correlations in packet size and timing) along inbound and outbound links of the anonymizer - a simple expansion into a network solves this issue.
An early proposal for anonymous email is "Untraceable electronic mail, return addresses, and digital pseudonyms" (Chaum 1981), which uses a trusted re-mailer together with public-key crypto (denoted as a Mix server).
This gives rise to the concept of onion routing (Reed, Syverson, Goldschlag, 1997), i.e. routing info is contained within the message, and encrypted with the public key of routers along the path (so each router only learns the identity of the next router)
Main problem lies in public key encryption computationally expensive -> high latency.
Second-generation onion-routing network developed (Dingledine, Mathewson, Syverson, 2003).
The Tor circuit is setup by establishing symmetric session keys with each onion router (OR) one hop at a time, using DH key agreement. Authentication is unilateral (authentication of ORs only).
Main benefits:
For clients to learn about other ORs, first-gen ORs used in-band network status networks (incurs expensive flooding, something about partitioning attack?). Today using directory
Responder anonymity is achieved by announcing the hidden service (.onion) via rendezvous points (or introduction points) on a set number of ORs.
Note that all the connections to OR are initiated and sustained by clients and servers, i.e. IP addresses of client and server are not known to the ORs.
Solutions via mixing, padding, traffic shaping, but introduces latency. Note that the goals of Tor do not include:
A short comment on Operation Onymous, which advertises a subset of controlled OR nodes.
Definition: "A group of authorized users of a specified service makes said service unavailable to another group of authorized users, for a period of time exceeding the intended/advertised service maximum waiting time."
See distribution of DDoS attacks, obtained from this article. The article additionally mentions the size of attacks follow a Zipfian distribution.
Same high-level concepts as usually introduced:
A bit more detail into BGP routing: the AS directly adjacent to the end-host advertises that it can route to said end-host, to other AS. This propagates iteratively.
Under the hood, the AS managed by the ISPs are organized in a hierarchical manner, with Tier 1 ISPs sitting at the top. ISP can both act as peers in exchanging peering information, or if there are no such links, the ISPs refer to upper level ISPs for this routing. The choice of routes follow a rough precedence order:
Routes are imported from all three layers: customers, peers, as well as the higher providers. When exporting routes,
This leads to the Gao-Rexford model that models BGP routes, i.e. AS preference for customer, to peer, to provider routes, and there is an emergent valley-free property for traversal and advertisement, i.e. no BGP routes go down to the Customer then back up to the Provider.
Successful attacks are conditioned on which invalid routes the AS accepts, typically when the new route is cheaper (and shorter) than the original valid route, i.e. easier to misdirect routes going to Customers, as opposed to Providers (the latter requires more money).
Countermeasures:
Achieves properties including availability (prevent DDoS attacks, no kill switch), sovereignty, transparency and secrecy. Tries to maintain the following architectural principles, including:
Dependency analysis for software reliability! Avoid circular dependencies. Also formal verification of software.
Architecture is described by groupings of AS known as Isolation Domains (ISD), which are managed by the ISD core formed by administrative (core) AS to manage the domain. Within the ISD, the core AS fulfill two advertisement:
Since the clients have all up-, core- and down-segments, they can intersect routing information to create shorter peers. Some extension possible to direct peering links.
Relatively high coverage in Switzerland, as well as some subset of networks in Singapore.
Disadvantages:
Q: How does this mitigate problems with advertising false paths? Reliance on core AS to correctly advertise down-segments - same problem as BGP. Avoid signing
http://sbasdemo.net/
Some keywords:
Main threats to DNS: (1) Availability: DDoS, (2) Integrity: Malicious open recursive resolvers, network MitM, cache poisoning, (3) Confidentiality: Pervasive monitoring, censorship
To defend against DDoS, can overprovision DNS servers and rely on anycast (typically implemented with BGP) for the root DNS.
A possibly interesting overview of cache poisoning attacks, as well as comparison between DNSCurve vs DNSSEC (the former provides point-to-point security, but do not fit the goals of DNS). Cache poisoning essentially achieved by having attacker force nameserver to resolve target domain to own IP address (when domain is not currently cached).
DNSSEC provides integrity of responses by signing DNS responses (root DNS signs TLD DNS certificate, and so on). Responses are pre-signed and cached to minimize signing overhead. Denial of existence (signing for arbitrary zones that do not exist) achieved by using a hashing function and return a hash range for which valid domains do not exist. Main adoption challenge: large response size of DNSSEC can trigger TCP fallback, and some networks filter TCP responses.
Problem: Pervasive monitoring of DNS traffic by state-sponsored actors (e.g. NSA's QuantumDNS (2014) and MoreCowBell (2015)) to perform user tracking [Kirchler 2016] and user behaviour analysis [Kim 2015].
Some notable protocols, with a paper characterizing implementation fidelity and pervasiveness on server-side DoE, as well as reachability and performance on client-side, and finally the usage of DoT and DoH.
Worth reading up on Encrypted SNI and its ties to TLS1.3. This might also be something interesting to read up on: ycombinator. DoH is increasingly used for C2 by malware, see article.
Goal is to detect indication of upcoming attacks, and collect data during attack to derive intelligence, e.g. where? what tool? attack progression and vectors? patterns/trends? These are collectively used to evaluate security of real system by checking for vulnerability.
Two types of honeypots, see a larger list here:
This contrasts with decoy networks which are deployed alongside the real system with virtual devices to confuse attackers, e.g. DecIED (9-anonymity for example). Shodan is a good way to identify the websites.
On the TCP/IP stack. Something about r-utilities for remote control/login, e.g. rlogin
.
Accountable IP (AIP) performs self-certification from... fell asleep...
Need to look up the mechanism of AIP. Something about session-based keys, whose public key is then hashes to form an identifier.
The other topic on TCP.
More on digital certificates/signatures, and PKI in general:
Trusted CAs are typically listed to function as pre-shared keys (see 2012 MSR SV PKI project, and others). Many CA compromise/breach incidents:
Certificate revocation to invalidate certificates, either by publishing CRL (certificate revocation list) and pushing delta CRLs during updates, or via OCSP (online certificate status protocol). Notably, OCSP avoids issues with mass certificate revocation, e.g. during CA compromise. Something about OCSP stapling as well.
Other than the obvious traffic overhead for revocation querying in OCSP, there is also a potential privacy concern where CA learns user activity.
Some techniques attempted to increase security of PKI:
In CT, the certificate is sent to a number of logs, with each log issuing a signed certificate timestamp (SCT) with a promise that the certification insertion will occur within a maximum merge delay (MMD), e.g. 24 hours. This SCT is used together with the domain certificate. Implementation-wise can be performed either via X.509 extensions (via the CA), or using TLS directly (handled by server).
This is more a community-maintained protocol, since essential for external parties to monitor behaviour of log servers, as well as consensus between web clients, and auditors to identify misbehaviour. Note that CT is not a revocation mechanism, but only as an identifier for misbehaviour.
As of 2016, two widespread PKIs in use: CA Baseline Requirements, and IETF PKIX.
Secure session established by sharing symmetric secret keys, which are typically randomly chosen session keys instead of a permanent key - avoids replay attack.
Instead of relying on PKI for public key distribution (to initiate sharing of session key), both parties can also rely on a trusted key distribution center (KDC) via the Needham-Schroeder protocol.
Covering asymmetric encryption primitives, namely how public-key encryption works: usage in authentication, digital signatures (non-repudiation), Diffie-Hellman key agreement (discrete log problem)
DH key agreement does not provide authentication, and is vulnerable to man-in-the-middle attacks. This is mitigated using RSA algorithm. A rough overview. Public-key infrastructure (PKI) is defined in RFC4949, which defines the set of resources for management of digital certificates based on asymmetric cryptography.
Importantly, symmetric crypto is about 3 orders of magnitude faster than asymmetric crypto. This is a pretty useful comparison:
Hash functions are simply functions that map inputs of arbitrary length and compresses into outputs/digests of fixed length L, i.e. $$ H: \{0,1\}^* \rightarrow \{0,1\}^L $$. For a cryptographically-secure hash function, three properties must be satisfied:
The complexity of attack for hash functions of n-bit digests with either preimage resistance and second-preimage resistance, assuming a random search, is half the output search space, where the search space is of size $$ 2^n $$. With only collision resistance, this improves to $$ 2^{(n/2)} $$ trials on average (i.e. birthday paradox), since the attacker can compare with any other output value previously seen.
Probability of hash collision with only collision resistance
Common usage of hash functions:
Contact Prof Seth Gilbert for CS3235 due pre-requisite. [4236]
Principles of information security:
Some terminology notes:
OTP as one example of stream ciphers, e.g. RC4 and AES in CTR mode where initialization vector (IV) and ciphertext has to be transmitted. Essentially transforming the shared key $k$ using PRG($k$, IV). Noted vulnerabilities:
Block ciphers (e.g. DES, RC5, AES) extends encryption block size from individual bits to fixed block sizes, so the key itself defines a mapping from plaintext block to ciphertext block. Key space should be at least 128-bit to prevent key enumeration attacks.
Electronic codebook (ECB) mode involves splitting into fixed blocks, encrypting, then concatenating. Simple to compute, but one-to-one mapping of plaintext to ciphertext can potentially leak information (e.g. by distribution of blocks), see the ECB Penguin example. This does not provide semantic security, i.e. indistinguishability in the context of a chosen-plaintext attack.
Cipher block chaining (CBC) mode uses the previous cipher block concatenated with the next plaintext block for encryption. This allows semantic security, but now cannot be parallelized. IV cannot be reused, nor should the IV be predictable (e.g. SSL 2.0). The CBC mode is vulnerable to a padding oracle attack, when the decryption module indicates whether the padding is valid or not.
Counter (CTR) mode uses an incrementing counter that is encrypted, and XOR-ed with plaintext to generate the ciphertext. CTR mode is also vulnerable to IV reuse.
Example of CBC mode vulnerability to key reuse
Suppose a DB uses the same secret key K for encryption (i.e. $E(K, P \oplus IV)$, with different IV for each user, and the column admits only fixed enumeration of values, say "true" and "false".
Eve, knowing IV_A and IV_E, can generate the ciphertext seen by Alice by providing "true" XOR IV_A XOR IV_E.
Remember that encryption does not provide authentication. The latter can be provided using Message Authentication Codes (MACs), which is essentially a checksum for authentication. Popularly implemented as a hash-based MAC (HMAC), or using AES's CBC-MAC (replacing IV with constant 0, so that the final block as the MAC is deterministic).
Authenticated encryption requires semantic security and ciphertext integrity. Can be done either via:
The threat model discussed above is as follows: