My name is Motahhare Eslami, officially Motahareh EslamiMehdiabadi (I know, it's long!). I am a PhD candidate at University of Illinois at Urbana-Champaign. I'm working in Social Spaces Group with Karrie Karahalios.
My PhD work draws on human-computer interaction, social computing and data mining techniques to investigate
users’ behavior around opaque algorithmic systems, redesign these systems to communicate opaque
algorithmic processes to users and provide them with a more informed, satisfying, and engaging interaction.
My research has been discussed in the popular press, including
The Washington Post,
International Business Times,
New Scientist, and
MIT Technology Review.
Awareness of bias in algorithms is growing among scholars and users of algorithmic systems. But what can we observe about how users discover such biases, and how they communicate these biases? We developed a cross-platform audit technique that analyzed online ratings of more than 800 hotels across four hotel rating platforms and found that one site’s rating algorithm biased ratings of low-to-medium quality hotels 14-37% higher than others. Analyzing reviews of 166 users who independently also discovered this bias, we seek to understand if, how, and in what ways users perceive and manage this bias. Our analysis suggests some ways of describing common reactions to the discovery of bias on the rating platform, including efforts to inform others, efforts to correct the bias, and demonstrations of broken trust. We conclude with a discussion of how patterns in such behavior might inform design approaches that anticipate unexpected bias and provide reliable means for meaningful bias discovery and response.
Search systems in online social media sites are frequently used to find information about ongoing events and people. For topics with multiple competing perspectives, such as political events or political candidates, bias in the top-ranked results significantly shapes public opinion. However, bias does not emerge from an algorithm alone. It is important to distinguish between the bias that arises from the data that serves as the input to the ranking algorithm and the bias that arises from the ranking algorithm itself. In this paper, we propose a framework to quantify these distinct biases and apply this framework to politics-related queries on Twitter. We found that both the input data and the ranking algorithm contribute significantly to produce varying amounts of bias in the search results and in different ways. We discuss the consequences of these biases and propose mechanisms to signal this bias in social search systems interfaces.
Our daily digital life is full of algorithmically selected content such as social media feeds, recommendations and personalized search results. These algorithms have great power to shape users' experiences yet users are often unaware of their presence. Whether it is useful to give users insight into these algorithms’ existence or functionality and how such insight might affect their experience are open questions. To address them, we conducted a user study with 40 Facebook users to examine their perceptions of the Facebook News Feed curation algorithm.
Surprisingly, more than half of the participants (62.5%) were not aware of the News Feed algorithm at all. Initial reactions for these previously unaware participants were surprise and anger. We developed a system, FeedVis, to reveal to users the difference between the algorithmically curated and an unadulterated News Feed, and used it to study how users perceive this difference. Participants were most upset when close friends and family were not shown—they had often inferred social meaning from the filtering of the feed. By the end of the study, however, participants were mostly satisfied with the content on their feeds. Following up with participants two to six months after the study, we found that for most, satisfaction levels remained similar before and after becoming aware of the algorithm, however, algorithmic awareness led users to more actively engage with Facebook and bolstered their overall feelings of control on the site.
Detecting groups or communities within social networks attracts a noticeable attention in order to analyze people collective behavior. In result of this great attention, a large number of community detection or clustering algorithms has been proposed to find the groups in social networks. However, the issue of evaluation these algorithms has not received enough consideration. This problem arises from the need of community detection algorithms to the Ground-Truth which Big Data makes it hard or impossible. Considering this problem, this project tries to use a new evaluation approach which humanizes the community detection process. Applying three different community detection algorithms over the Facebook network, we develop a Community Detection Application (CDA) which asks people to evaluate the algorithms in finding the groups of their Facebook network. We believe this new approach provides a promising step towards evaluating the community detection process in a different way.
The diffusion phenomenon has a remarkable impact on Online Social Networks (OSNs). Gathering diffusion data over these large networks encounters many challenges which can be alleviated by adopting a suitable sampling approach. The contribution of this project is twofold. First we study the sampling approaches over diffusion networks, and for the first time, classify these approaches into two categories; (1) Structure-based Sampling (SBS), and (2) Diffusion-based Sampling (DBS). The dependency of the former approach to topological features of the network, and unavailability of real diffusion paths in the latter, converts the problem of choosing an appropriate sampling approach to a trade-off. Second, we formally define the diffusion network sampling problem and propose a number of new diffusion-based characteristics to evaluate introduced sampling approaches. Our experiments on large scale synthetic and real datasets show that although DBS performs much better than SBS in higher sampling rates (16% ~ 29% on average), their performances differ about 7% in lower sampling rates. Therefore, in real large scale systems with low sampling rate requirements, SBS would be a better choice according to its lower time complexity in gathering data compared to DBS. Moreover, we show that the introduced sampling approaches (SBS and DBS) play a more important role than the graph exploration techniques such as Breadth-First Search (BFS) and Random Walk (RW) in the analysis of diffusion processes.
Partially-observed data collected by sampling methods is often being studied to obtain the characteristics of information diffusion networks on Online Social Networks (OSNs). However, these methods are usually done without considering the diffusion process behavior. In this paper, we propose a novel two-step (sampling/estimation) measurement framework by utilizing diffusion process characteristics. To this end, we propose a link-tracing based sampling design which uses the infection times as local information without any knowledge about the latent structure of diffusion network. To correct the bias of sampled data, we introduce three estimators for different categories of characteristics; links-based, node-based, and cascade-based. To the best of our knowledge, this is the first study to introduce a complete framework measurement for diffusion networks. Our comprehensive empirical analysis over large synthetic and real datasets demonstrates that the proposed framework outperforms common sampling methods (BFS and RW) in terms of link-based characteristics by about 37% and 35% in average, respectively. We also show that an estimator has an important role in correcting the bias of sampling from diffusion networks.
Many real-world systems and applications such as World Wide Web, and social interactions can be modeled as networks of interacting dynamical nodes. However, in many cases, one encounters the situation where the pattern of the node-to-node interactions (i.e., edges) or the structure of a network is unknown. We address this issue by studying the Network Reconstruction Problem: Given a network with missing edges, how is it possible to uncover the network structure based on certain observable quantities extracted from partial measurements? We propose a novel framework called CS-NetRec based on a newly emerged paradigm in sparse signal recovery called Compressive Sensing (CS). The general idea of using CS is that if the presentation of information is sparse, then it can be recovered by using a few numbers of linear measurements. In particular, we utilize the observed data of information cascades in the context of CS for network reconstruction. Our comprehensive empirical analysis over both synthetic and real datasets demonstrates that the proposed framework leads to an efficient and effective reconstruction. More specifically, the results demonstrate that our framework can perform accurately even on a low number of cascades (e.g. when the number of cascades is around half of the number of existing edges in the desired network). Furthermore, our framework is capable of near-perfect reconstruction of the desired network in presence of 95% sparsity. In addition, we compared the performance of our framework with NetInf; one of the state-of-the-art methods in inferring the networks of diffusion. The results suggest that the proposed method outperforms NetInf by an average of 10% improvement based on the F-measure.
The spread of information cascades over social networks forms the diffusion networks. The latent structure of diffusion networks makes the problem of extracting diffusion links difficult. As observing the sources of information is not usually possible, the only available prior knowledge is the infection times of individuals. We confront these challenges by proposing a new method called DNE to extract the diffusion networks by using the time-series data. We model the diffusion process on information networks as a Markov random walk process and develop an algorithm to discover the most probable diffusion links. We validate our model on both synthetic and real data and show the low dependency of our method to the number of transmitting cascades over the underlying networks. Moreover, the proposed model can speed up the extraction process up to 300 times with respect to the existing state of the art method.
High reliability and low power consumption are among the major requirements in the design of Wireless Sensor Networks (WSNs). In this project, a multi-objective problem is formulated as a Joint Power consumption and data Reliability (JPR) optimization problem. For this purpose, a Connected Dominating Set (CDS)-based topology control approach is proposed. Our objective is to self-organize the network with minimum interference and power consumption. We consider the power changes into a topology with Minimum CDS (MCDS) infrastructure subject to connectivity constraints. Since this problem is NP-hard, it cannot be dealt with using polynomial time exact algorithms. Therefore, we first present a genetic algorithm taking into consideration problem-specific goals and constraints in an approximated manner called JPR Genetic Algorithm (JPR-GA). Secondly, a Hierarchical Sub-Chromosome Genetic Algorithm (HSC-GA) is proposed to obtain more accurate and faster solutions in large and dense networks. We evaluate these algorithms over different networks topologies to analyze their efficiency. Comparing JPR-GA and HSC-GA with two different scenarios reveal that the proposed algorithms can efficiently balance power consumption and data communication reliability of sensor nodes and also prolong the network lifetime in WSNs.