![]() | Only 14 pages are availabe for public view |
Abstract In online social networks such as Twitter, tweeting allows users to share a variety of content to their own followers. As tweets are retweeted from user to user, large cascades of tweets propagation are formed. The growth of cascades over time signals the popularity or lack thereof of the subject matter. The k-core of an information graph is a common measure of a node connectedness in diverse applications. The k-core decomposition algorithm categorizes nodes into k-shells based on their connectivity. Previous research claimed that the super-spreaders are those located at the k-core of a social graph and the nodes become of less importance as they get assigned to a k-shell away from the k-core. A meme represents an idea or a topic that spreads among users of an online social network. Current research on modelling information diusion in social media focuses on studying retweet cascades of individual tweets independently. However, as a meme spreads, it evolves, and users adopt the meme in varying manners. While retweet cascades can model the propagation of a single piece of information among users, they are not useful in studying the propagation of the whole meme. In this thesis, we aim to study the information diusion from a wider perspective where the information propagation of a meme is tracked rather than individual tweets. And also, investigate the in uence eect of the super-spreaders, located at the k-core, on the meme cascade growth. First, the cascade growth of retweet cascades and the various features that govern the growth are studied. We pose the question of whether the same feature set can be used for cascade growth prediction of any dataset on Twitter. Two types of growth prediction are addressed: structural and temporal. First, a denition of structural and temporal growth is devised. Then, an approach to select the best of these features based on the dataset for better accuracy results is proposed. We present and discuss the results of the most discriminating features in predicting cascades’ growth and provide evidence that the pre-selection of features improved the accuracy of the prediction task on the datasets. Moreover, an evidence that the features governing the cascade growth vary from one dataset to another is found. Next, we generalize the modelling of retweet cascades to a modelling of the diusion of a meme. To construct the meme adoption graph (MAG), messages related to a meme are identied from the social network stream. Then, a recent clustering algorithm is utilized to automatically extract and cluster tweets. Next, three epidemic cascade construction models are evaluated and compared to construct the MAG and represent a meme diusion. Also, a set of structural characteristics derived from the MAG that describe the underlying meme adoption pattern are proposed. An empirical study, using four real-world Twitter datasets, is performed to demonstrate the eectiveness of the proposed MAG. Moreover, we work towards evaluating the in uence span of the social media super-spreaders, located at the k-core, in terms of the number of k-shells that their in uence is capable of reaching. Our methodology is based on the observation that the k-core size is directly correlated to the graph size under certain conditions. These conditions are explained and the correlation is utilized to assess the eectiveness of the k-core nodes for in uence dissemination. The results of the carried out experiments show a high correlation between the k-core size and the sizes of the inner k-shells in the examined datasets. However, the correlation starts to decrease in the outer k-shells. Further investigations have shown that the k-shells, that were less correlated, exhibited a higher presence of spam accounts. Finally, the eectiveness of using the k-core nodes, as seed nodes, for in uence maximisation is inspected. A measure is proposed to estimate the relative strength of the k-core as an in uence source among other sources of in uence contributing to the cascade development. And, we propose combining that measure along with the correlation between the inner k-core size and the cascade size to determine the in uence domination of the k-core nodes, and hence the eectiveness of targeting these specic nodes for in uence maximization. |