Attack Detection, Investigation, and Mitigation for Network Functions Virtualization (NFV)
By Canadian Institute for Cybersecurity (CIC)
Key Concepts
- Network Functions Virtualization (NFV): Virtualizing proprietary hardware network devices (like DNS servers, firewalls) into software running on standard servers, typically hosted on a cloud.
- Service Function Chains (SFCs): Chaining multiple virtualized network functions (e.g., firewall, IDS, DPI) together for packet processing.
- Cryptographic Trailer: A security mechanism involving adding a Message Authentication Code (MAC) to packets to ensure integrity and authenticity.
- Side Channel Information: Using indirect communication channels, like inter-packet delay, to convey information.
- Provenance Graph: A graph representation of logs and events, showing causal relationships to trace back the root cause of an incident.
- Waypoints: Additional points of interest identified in a provenance graph by mapping external information (like CVE details) to graph nodes.
- Syscalls (System Calls): The interface between an application and the operating system kernel, used to request services from the kernel.
- Seccomp (Secure Computing Mode): A Linux kernel feature used to restrict the system calls a process can make, thereby reducing the attack surface.
- Crowdsourcing: Leveraging a large group of people to contribute to a task, in this context, identifying attack patterns.
- Fenix: A proposed tool for temporary attack mitigation by blocking specific sequences of syscalls.
Attack Detection: Virtualizing Cryptographic Trailers
This section addresses the challenge of detecting attacks in virtualized network environments, specifically on Service Function Chains (SFCs).
- Problem: In SFCs, where network functions are chained and run on virtual machines or switches, attackers can compromise virtual switches to manipulate packets. This manipulation (e.g., bypassing IDS, dropping packets, man-in-the-middle attacks) is invisible to tenants, and cloud providers may not have sufficient context to detect it.
- Traditional Solution & Its Limitation: Using cryptographic trailers (e.g., MACs) on every packet ensures integrity. However, for small packets (like DNS), this can lead to a significant communication overhead (up to 40%), halving network throughput, which is unacceptable.
- Proposed Solution: Virtualizing Cryptographic Trailers: The core idea is to encode the cryptographic trailer information within a side channel, specifically the inter-packet delay.
- Mechanism: Instead of adding a physical trailer, the delay between packets is manipulated to encode bits of the trailer. For example, shorter delays can represent '0' and longer delays can represent '1'. This allows for zero communication overhead.
- Encoding Strategy: To encode a 256-bit MAC, packets are grouped into frames, blocks, and super blocks.
- The first frame of a block encodes the block ID.
- The second frame encodes the flow ID.
- Subsequent frames encode parts of the MAC.
- Challenge and Solution: A key challenge is that the MAC for a super block can only be computed after receiving all its packets. However, the trailer (MAC) needs to be encoded in the inter-packet delays of those same packets. This creates a deadlock. The solution is to encode the trailer of a super block in the next super block. This allows all packets of the current super block to be forwarded immediately without delay, while the trailer information for the next super block is prepared.
- Attack Detection: Two super blocks are analyzed as a pair. The trailer from the first super block is decoded from the second super block's inter-packet delays. The MAC is recomputed and compared with the decoded MAC. Mismatches indicate an attack.
- Capabilities: The solution can detect attacks and also locate them, identifying attacked packets and the type of attack (drop, modification, injection).
- Addressing Network Jitter: The paper mentions two solutions to mitigate the impact of natural network jitter on inter-packet delay measurements, detailed in the full paper.
- Evaluation:
- Deployed on Amazon EC2.
- 100% accuracy in watermark extraction.
- End-to-end latency: 2.1 ms (traditional) and 0.68 ms (5G Kubernetes).
- Detection and classification accuracy: 70% for attack classification (room for improvement due to less embedded trailer data).
- Compared to physical trailers, it can free up 45% of communication overhead.
- Summary: Achieves a better trade-off between security, latency, and communication overhead compared to traditional physical trailer solutions.
Attack Investigation: Enhancing Provenance Graph Analysis
This section focuses on improving the investigation of cloud-scale network attacks by making provenance graphs more manageable and insightful.
- Problem: Cloud-scale network attacks generate massive amounts of log data. Provenance graphs, used to trace root causes, can become gigantic and overwhelming for human analysts. Existing pruning solutions have limitations:
- Backward search: Can lead to dependency explosion.
- Anomaly-based scoring: Requires complete training data and can have false positives.
- Graph theory features (e.g., node degree): Ignores node semantics.
- Path length limits: Can miss relevant, distant events (false negatives).
- High cost: Many solutions are computationally expensive.
- Core Insight: Existing methods treat alerts as abstract graph nodes and fail to leverage the rich external information associated with them (e.g., IDS rules, CVE details).
- Proposed Solution: Leveraging External Information and Waypoints:
- Mechanism: Extract information from external sources (like CVE databases) and map it to nodes within the provenance graph. This identifies waypoints, which are additional points of interest.
- Curation: These waypoints and original points of interest are curated into a shape that includes all relevant nodes for the alert.
- System Components:
- System Lexicon: For Linux kernel attacks, a lexicon is built to identify relevant external information.
- Knowledge Base: Stores extracted information.
- Query Generation: Information is used to generate queries to map onto the provenance graph.
- Waypoint Pruning: Irrelevant waypoints are removed.
- Correlation: Explores various methods to correlate waypoints and points of interest, including:
- Shortest path.
- Vicinity (nearby nodes).
- Process dependency (parent-child relationships).
- Conditional expansion (stopping expansion to avoid dependency explosion).
- Evaluation:
- Tested on 20 Linux kernel CVEs.
- Achieved 100% true positive for 19 out of 20 CVEs (96% for one).
- False positive rate below 0.6% for 16 out of 20 CVEs.
- Evaluated as a pre-processor for existing pruning solutions: Reduced false positives from tens or hundreds of thousands to less than 50.
- Real-world user study showed the tool is very useful.
- Summary: The approach of looking "outside the window" (external information) to enrich provenance graph analysis significantly improves attack investigation by making graphs more manageable and insightful.
Attack Mitigation: Temporary Patching with Fenix
This section presents a method for temporarily patching vulnerabilities that lack official vendor patches.
- Problem: The traditional patching process relies on software vendors releasing official patches, which can take a long time (average delay of 100 days). This leaves systems vulnerable, as exemplified by the Log4Shell vulnerability.
- Proposed Solution: Crowdsourced Temporary Patching with Fenix:
- Idea: Instead of waiting for a vendor patch, affected users crowdsource the identification of system call (syscall) sequences that, if blocked, would prevent exploitation of a vulnerability.
- Fenix Tool: This identified sequence is then fed into the Fenix tool, which acts as a temporary patch by blocking those specific syscalls.
- Mechanism: Selective Syscall Processing (Inspired by Visa Checks):
- Analogy: Similar to how border officers process citizens quickly but conduct more thorough checks on other travelers.
- Seccomp Limitations:
- Blocking unused syscalls reduces attack surface but can be bypassed if attackers use the same syscalls as legitimate processes.
- Blocking all syscalls indiscriminately breaks normal container functionality.
- Inspecting syscall arguments is accurate but inefficient.
- Fenix's Approach: Combines Seccomp and a more thorough inspection tool (like
ptrace).- Fast Path: If a syscall doesn't match the expected pattern for the vulnerability, it's allowed through quickly (like a citizen).
- Deep Inspection Path: If a syscall matches the expected pattern, it triggers a deeper inspection of its arguments using
ptrace(like a thorough check for other travelers). If the arguments confirm it's malicious, it's blocked.
- Dynamic Updates: Fenix can dynamically update Seccomp filters to inspect sequences of syscalls, not just individual ones. For example, if an attack signature is "syscall A followed by syscall B," Fenix can first look for A, and upon finding it, update its filter to look for B.
- Implementation:
- Leverages crowdsourced attack investigation to identify malicious syscall sequences.
- Builds a finite state machine from these sequences to monitor incoming syscalls.
- Fenix applies these sequences to affected users, requiring only the installation of Fenix.
- Evaluation:
- Tested on 20 real-world CVEs, blocking all of them (better coverage than some existing solutions).
- Compared to existing stateful solutions, Fenix effectively blocks variations of attack sequences that can evade other tools, without false positives.
- Overhead is negligible, comparable to standard Seccomp, even with
ptraceinspection.
- Summary: Fenix acts as a "Swiss knife" for security patching, allowing users to temporarily patch various vulnerabilities by identifying and blocking malicious syscall sequences through a crowdsourced and efficient mechanism.
Conclusion
The presented research offers innovative solutions across three critical areas of cloud security: attack detection, investigation, and mitigation. By virtualizing cryptographic trailers using inter-packet delays, the research enhances attack detection with minimal overhead. For attack investigation, it leverages external information to create more manageable and insightful provenance graphs. Finally, it introduces Fenix, a novel crowdsourced approach for temporary attack mitigation, effectively patching vulnerabilities lacking official vendor support by dynamically blocking malicious syscall sequences. These works collectively aim to improve the security posture of cloud environments in the face of evolving threats.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "Attack Detection, Investigation, and Mitigation for Network Functions Virtualization (NFV)". What would you like to know?