My research focuses on the security of computer networks. In particular, I work on solutions which leverage recent advances in network programmability to make networks able to detect and mitigate attacks and to provide more security and privacy. Part of my work is funded by armasuisse and the Zurich Information Security & Privacy Center (ZISC).
Our paper "Towards an AI-powered Player in Cyber Defence Exercises" was accepted at CyCon 2021.
I'm looking forward to be part of the IEEE S&P 2020 Shadow PC.
The recording of my talk for the CyCon Student Award is now available:
Check it out: YouTube.
The new website of our group is online!
Check it out: nsg.ee.ethz.ch.
I feel honored to be awarded with the NATO CCD COE Student Award 2017.
My talk at CyCon 2017 is available here.
The website about our iTAP-project is online: https://itap.ethz.ch.
Check it out to learn how programmable switches can provide anonymous communication!
“iTAP: In-network Traffic Analysis Prevention using Software-Defined Networks”
accepted at ACM SOSR 2017
I'm humbled to receive an ETH Medal for my Master thesis “SDN-based Network Obfuscation”.
My talk at the Master ceremony is available here.
Many large organizations operate dedicated wide area networks (WANs) distinct from the Internet to connect their data centers and remote sites through high-throughput links. While encryption generally protects these WANs well against content eavesdropping, they remain vulnerable to traffic analysis attacks that infer visited websites, watched videos or contents of VoIP calls from analysis of the traffic volume, packet sizes or timing information. Existing techniques to obfuscate Internet traffic are not well suited for WANs as they are either highly inefficient or require modifications to the communication protocols used by end hosts. This paper presents ditto, a traffic obfuscation system adapted to the requirements of WANs: achieving high-throughput traffic obfuscation at line rate without modifications of end hosts. ditto adds padding to packets and introduces chaff packets to make the resulting obfuscated traffic independent of production traffic with respect to packet sizes, timing and traffic volume. We evaluate a full implementation of ditto running on programmable switches in the network data plane. Our results show that ditto runs at 100 Gbps line rate and performs with negligible performance overhead up to a realistic traffic load of 70 Gbps per WAN link.
Cyber attacks are becoming increasingly frequent, sophisticated, and stealthy. This makes it harder for cyber defence teams to keep up, forcing them to automate their defence capabilities in order to improve their reactivity and efficiency. Therefore, we propose a fully automated cyber defence framework that no longer needs support from humans to detect and mitigate attacks within a complex infrastructure. We design our framework based on a real-world case – Locked Shields – the world’s largest cyber defence exercise. In this exercise, teams have to defend their networked infrastructure against attacks, while maintaining operational services for their users. Our framework architecture connects various cyber sensors with network, device, application, and user actuators through an artificial intelligence (AI)-powered automated team in order to dynamically secure the cyber environment. To the best of our knowledge, our framework is the first attempt towards a fully automated cyber defence team that aims at protecting complex environments from sophisticated attacks.
Traditional network control planes can be slow and require manual tinkering from operators to change their behavior. There is thus great interest in a faster, data-driven approach that uses signals from real-time traffic instead. However, the promise of fast and automatic reaction to data comes with new risks: malicious inputs designed towards negative outcomes for the network, service providers, users, and operators.
Adversarial inputs are a well-recognized problem in other areas; we show that networking applications are susceptible to them too. We characterize the attack surface of data-driven networks and examine how attackers with different privileges—from infected hosts to operator-level access—may target network infrastructure, applications, and protocols. To illustrate the problem, we present case studies with concrete attacks on recently proposed data-driven systems.
Our analysis urgently calls for a careful study of attacks and defenses in data-driven networking, with a view towards ensuring that their promise is not marred by oversights in robust design.
Remote shell sessions via protocols such as SSH are essential for managing systems, deploying applications, and running experiments. However, combined with weak passwords or flaws in the authentication process, remote shell access becomes a major security risk, as it allows an attacker to run arbitrary commands in the name of an impersonated user or even a system administrator. For example, remote shells of weakly protected systems are often exploited in order to build large botnets, to send spam emails, or to launch distributed denial of service attacks. Also, malicious insiders in organizations often use shell sessions to access and transfer restricted data. In this work, we tackle the problem of detecting malicious shell sessions based on session logs, i.e., recorded sequences of commands that were executed over time. Our approach is to classify sessions as benign or malicious by analyzing the sequence of commands that the shell users executed. We model such sequences of commands as n-grams and use them as features to train a supervised machine learning classifier. Our evaluation, based on freely available data and data from our own honeypot infrastructure, shows that the classifier reaches a true positive rate of 99.4% and a true negative rate of 99.7% after observing only four shell commands.
The diversity of applications and devices in enterprise networks combined with large traffic volumes make it inherently challenging to quickly identify malicious traffic. When incidents occur, emergency response teams often lose precious time in reverse-engineering the network topology and configuration before they can focus on malicious activities and digital forensics. In this paper, we present a system that quickly and reliably identifies Command and Control (C&C) channels without prior network knowledge. The key idea is to train a classifier using network traffic from attacks that happened in the past and use it to identify C&C connections in the current traffic of other networks. Specifically, we leverage the fact that – while benign traffic differs – malicious traffic bears similarities across networks (e.g., devices participating in a botnet act in a similar manner irrespective of their location).To ensure performance and scalability, we use a random forest classifier based on a set of computationally-efficient features tailored to the detection of C&C traffic. In order to prevent attackers from outwitting our classifier, we tune the model parameters to maximize robustness. We measure high resilience against possible attacks – e.g.,attempts to camouflaging C&C flows as benign traffic – and packet loss during the inference. We have implemented our approach and we show its practicality on a real use case:Locked Shields, the world’s largest cyber defense exercise. In Locked Shields, defenders have limited resources to protect a large, heterogeneous network against unknown attacks. Using recorded datasets (from 2017 and 2018) from a participating team, we show that our classifier is able to identify C&C channels with 99% precision and over 90% recall in near real time and with realistic resource requirements. If the team had used our system in 2018, it would have discovered 10 out of 12 C&C servers p.p1 in the first hours of the exercise.
Simple path tracing tools such as traceroute allow malicious users to infer network topologies remotely and use that knowledge to craft advanced denial-of-service (DoS) attacks such as Link-Flooding Attacks (LFAs). Yet, despite the risk, most network operators still allow path tracing as it is an essential network debugging tool.
In this paper, we present NetHide, a network topology obfuscation framework that mitigates LFAs while preserving the practicality of path tracing tools. The key idea behind NetHide is to formulate network obfuscation as a multi-objective optimization problem that allows for a flexible tradeoff between security (encoded as hard constraints) and usability (encoded as soft constraints). While solving this problem exactly is hard, we show that NetHide can obfuscate topologies at scale by only considering a subset of the candidate solutions and without reducing obfuscation quality. In practice, NetHide obfuscates the topology by intercepting and modifying path tracing probes directly in the data plane. We show that this process can be done at line-rate, in a stateless fashion, by leveraging the latest generation of programmable network devices.
We fully implemented NetHide and evaluated it on realistic topologies. Our results show that NetHide is able to obfuscate large topologies (> 150 nodes) while preserving near-perfect debugging capabilities. In particular, we show that operators can still precisely trace back >90% of link failures despite obfuscation.
Organizations increasingly rely on cyber threat intelligence feeds to protect their infrastructure from attacks. These feeds typically list IP addresses or domains associated with malicious activities such as spreading malware or participating in a botnet. Today, there is a rich ecosystem of commercial and free cyber threat intelligence feeds, making it difficult, yet essential, for network defenders to quantify the quality and to select the optimal set of feeds to follow. Selecting too many or low-quality feeds results in many false alerts, while considering too few feeds increases the risk of missing relevant threats. Naïve individual metrics like size and update rate give a somewhat good overview about a feed, but they do not allow conclusions about its quality and they can easily be manipulated by feed providers.
In this paper, we present FeedRank, a novel ranking approach for cyber threat intelligence feeds. In contrast to individual metrics, FeedRank is robust against tampering attempts by feed providers. FeedRank’s key insight is to rank feeds according to the originality of their content and the reuse of entries by other feeds. Such correlations between feeds are modelled in a graph, which allows FeedRank to find temporal and spatial correlations without requiring any ground truth or an operator’s feedback.
We illustrate FeedRank’s usefulness with two characteristic examples: (i) selecting the best feeds that together contain as many distinct entries as possible; and (ii) selecting the best feeds that list new entries before they appear on other feeds. We evaluate FeedRank based on a large set of real feeds. The evaluation shows that FeedRank identifies dishonest feeds as outliers and that dishonest feeds do not achieve a better FeedRank score than the top-rated real feeds.
Advances in layer 2 networking technologies have fostered the deployment of large, geographically distributed LANs. Due to their large diameter, such LANs provide many vantage points for wiretapping. As an example, Google's internal network was reportedly tapped by governmental agencies, forcing the Web giant to encrypt its internal traffic. While using encryption certainly helps, eavesdroppers can still access traffic metadata which often reveals sensitive information, such as who communicates with whom and which are the critical hubs in the infrastructure.
This paper presents iTAP, a system for providing strong anonymity guarantees within a network. iTAP is network-based and can be partially deployed. Akin to onion routing, iTAP rewrites packet headers at the network edges by leveraging SDN devices. As large LANs can see millions of flows, the key challenge is to rewrite headers in a way that guarantees strong anonymity while, at the same time, scaling the control-plane (number of events) and the data-plane (number of flow rules). iTAP addresses these challenges by adopting a hybrid rewriting scheme. Specifically, iTAP scales by reusing rewriting rules across distinct flows and by distributing them on multiple switches. As reusing headers leaks information, iTAP monitors this leakage and adapts the rewriting rules before any eavesdropper could provably de-anonymize any host.
We implemented iTAP and evaluated it using real network traffic traces. We show that iTAP works in practice, on existing hardware, and that deploying few SDN switches is enough to protect a large share of the network traffic.