Inhaltspezifische Aktionen


Unsere Forschung nutzt Data Science Methoden und Machine Learning zur Vorhersage menschlicher Entscheidungen auf digitalen Plattformen. Aktuelle Forschungsprojekte richten sich auf ein breites Spektrum an Forschungsfragen mit gesellschaftlicher und wirtschaftlicher Relevanz; einschließlich der Analyse sozialer Netzwerke, Finanzmärkte und methodischer Innovationen im Bereich der Text-Analyse.

Research Focus


Online Harms on Social Media

Social media is a fertile ground for misinformation and anti-social behavior, including online harassment, cyberbullying, and hate speech. Our research uses data science methods combined with large-scale datasets to better understand these phenomena and develop effective countermeasures.

Data Science for Business Applications 

We use state-of-the-art quantitative methods to understand and predict the dissemination and economic impact of news, comments, and reviews in financial markets and electronic commerce. Furthermore, we engineer tools that allow decision-makers to replace gut decisions with data-driven practices.

Data Science Methods for Unstructured Online Data 

Our research relies upon the ability to accurately process unstructured online data in various forms (e.g., text, images). For this purpose, we actively develop state-of-the-art data science methods (e.g., in the area of text mining and sentiment analysis) to derive actionable insights from unstructured online data.
Laufende Drittmittelprojekte

Laufende Drittmittelprojekte

Community-Based Fact-Checking on Social Media, Deutsche Forschungsgemeinschaft (DFG), 2022 — 2025.

Rumor Diffusion on Social Media During the COVID-19 Pandemic, Deutsche Forschungsgemeinschaft (DFG), 2021 — 2024.

Featured Research

Featured Research

Community-Based Fact-Checking on Twitter’s Birdwatch Platform

Twitter has recently introduced “Birdwatch,” a community-driven approach to address misinformation on Twitter. In this work, we empirically analyze how users interact with this new feature. Our empirical analysis yields the following main findings: (i) users more frequently file Birdwatch notes for misleading than not misleading tweets. These misleading tweets are primarily reported because of factual errors, lack of important context, or because they contain unverified claims. (ii) Birdwatch notes are more helpful to other users if they link to trustworthy sources and if they embed a more positive sentiment. (iii) The helpfulness of Birdwatch notes depends on the social influence of the author of the fact-checked tweet. For influential users with many followers, Birdwatch notes yield a lower level of consensus among users and community-created fact checks are more likely to be seen as being incorrect. Altogether, our findings can help social media platforms to formulate guidelines for users on how to write more helpful fact checks. At the same time, our analysis suggests that community-based fact-checking faces challenges regarding biased views and polarization among the user base.

Paper at ICWSM


Hate Speech in the Political Discourse on Social Media: Disparities Across Parties, Gender, and Ethnicity

Social media has become an indispensable channel for political communication. However, the political discourse is increasingly characterized by hate speech, which affects not only the reputation of individual politicians but also the functioning of society at large. In this work, we empirically analyze how the amount of hate speech in replies to posts from politicians on Twitter depends on personal characteristics, such as their party affiliation, gender, and ethnicity. For this purpose, we employ Twitter's Historical API to collect every tweet posted by members of the 117th U.S. Congress for an observation period of more than six months. Additionally, we gather replies for each tweet and use machine learning to predict the amount of hate speech they embed. Subsequently, we implement hierarchical regression models to analyze whether politicians with certain characteristics receive more hate speech. We find that tweets are particularly likely to receive hate speech in replies if they are authored by (i) persons of color from the Democratic party, (ii) white Republicans, and (iii) women. Furthermore, our analysis reveals that more negative sentiment (in the source tweet) is associated with more hate speech (in replies). However, the association varies across parties: negative sentiment attracts more hate speech for Democrats (vs. Republicans). Altogether, our empirical findings imply significant differences in how politicians are treated on social media depending on their party affiliation, gender, and ethnicity.

Paper at WWW


Floor Plans and Visual Analytics for Hedonic Rent Price Appraisal

Online real estate platforms have become significant marketplaces facilitating users’ search for an apartment or a house. Yet it remains challenging to accurately appraise a property’s value. Prior works have primarily studied real estate valuation based on hedonic price models that take structured data into account while accompanying unstructured data is typically ignored. In this study, we investigate to what extent an automated visual analysis of apartment floor plans on online real estate platforms can enhance hedonic rent price appraisal. We propose a tailored deep learning approach to learn price-relevant aesthetics of floor plans from historical price data. Subsequently, we integrate the floor plan predictions into hedonic rent price models that account for both structural and locational characteristics of an apartment. Our empirical analysis based on a unique dataset of 9,174 real estate listings suggests that there is an underutilization of the available data in current hedonic models. We find that (1) the aesthetics of floor plans have significant explanatory power regarding rent prices – even after controlling for structural and locational apartment characteristics, and (2) harnessing floor plans results in an up to 10.56% lower out-of-sample prediction error. We further find that floor plans yield a particular high gain in predictive performance for older and smaller apartments. Altogether, our empirical findings contribute to the existing research body by establishing the link between visual aesthetics of floor plans and real estate prices. Moreover, our approach has important implications for online real estate platforms, which can use our findings to enhance user experience in their real estate listings.

Paper at WWW


Argumentation Mining of Online Consumer Reviews

Review helpfulness serves as focal point in understanding customers’ purchase decision-making process on online retailer platforms. An overwhelming majority of previous works find longer reviews to be more helpful than short reviews. In this paper, we propose that longer reviews should not be assumed to be uniformly more helpful; instead, we argue that the effect depends on the line of argumentation in the review text. To test this idea, we use a large dataset of customer reviews from Amazon in combination with a state-of-the-art approach from natural language processing that allows us to study argumentation lines at sentence level. Our empirical analysis suggests that the frequency of argumentation changes moderates the effect of review length on helpfulness. Altogether, we disprove the prevailing narrative that longer reviews are uniformly perceived as more helpful. Our findings allow retailer platforms to improve their customer feedback systems and to feature more useful product reviews.

Paper at Journal of Business Research


Package: ReinforcementLearning

This package performs model-free reinforcement learning in R. The implementation enables the learning of an optimal policy based on sample sequences consisting of states, actions and rewards. In addition, it supplies multiple predefined reinforcement learning algorithms, such as experience replay.

ReinforcementLearning on CRAN


Package: SentimentAnalysis

This package performs a sentiment analysis of textual contents in R. The implementation utilizes various existing dictionaries, such as Harvard IV, or finance-specific dictionaries. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable.

SentimentAnalysis on CRAN