Next Generation Internet

NGI Forward is the strategy and policy arm of the European Commission’s flagship Next Generation Internet initiative, which seeks to build a more democratic, inclusive, resilient, sustainable and trustworthy internet by 2030.

  • Programme

    Horizon 2020
  • Timeframe

    2019-2022
  • Consortium

    Nesta, DELab, Aarhus University, Edgeryders, Resonance Design, City of Amsterdam
NGI

Next Generation Internet - Forward

As part of NGI Forward, our main aim has been to support the Next Generation Internet initiative by providing data science tools to map and analyse the developments of the tech world.

We have focused on three goals:

  • To develop text-mining methodologies to extract insights on issues relevant to NGI
  • To prepare case studies highlighting key conclusions from the data-driven research
  • To publish the results in forms facilitating further use and research

DELab Projects

Here you will find our other projects

Research

Text mining work in the NGI field

Library

Here you will find all our reports, research papers and blog posts

  • The main goal of this report is to introduce an online tool that facilitates the exploration of key technology challenges and related policy issues. Based on a text-mining methodology, we have examined and identified the specific topics discussed in a wide range of written media shared on social media platforms.

    We focused on six general umbrella topics:
    • Environment, Sustainability & Resilience
    • Decentralising Power & Building Alternatives
    • Public Space & Sociality
    • Privacy, Identity & Data Governance
    • Trustworthy Information Flows
    • Cybersecurity & Democracy
    • Access, Inclusion & Justice

    For each wide topic, interactive maps present clusters of articles covering related issues, enabling the discovery of problems, opinions and recommendations for solutions. With expert analysis, we have tagged and named these clusters on the map, supporting further analyses by the users.

    In order to showcase the potential of the tool, we have prepared a deep dive for the umbrella topic Access, Inclusion & Justice.

    The reports provides insights on the challenges and solutions related to: Open Internet (access to the Internet, control over infrastructure, censorship and content moderation) Inclusive Tech (gender and racial equality, inclusive education, legal tech) Ethical Tech (algorithmic bias, military and surveillance application of AI, gig economy)
    Read full report
  • This report continues the analysis presented in “Towards a Human-Centric Internet: Challenges and Solutions”. Similarly to our prior work, text documents shared on Twitter are examined for six wide umbrella topics: Environment; Sustainability and Resilience; Decentralizing Power and Building Alternatives; Public Space and Sociality; Privacy, Identity and Data Governance; Trustworthy Information Flows; Cybersecurity and Democracy; and Access, Inclusion and Justice.

    Our main aim is to expand the exploration of key technology challenges and related policy issues by focusing on various languages beyond English. By collecting online articles in German, Polish, Portuguese and Spanish, we examine discussions across various European and Latin American countries, including:

    • German-speaking countries
    • Poland
    • Spain
    • Brazil
    • Spanish-speaking Latin American countries

    The previously presented text-mining methodology is implemented to identify the specific topics discussed in these countries and regions. For each region, interactive maps highlight clusters of articles covering related issues, enabling their exploration and analysis. Following the analysis of the maps, various case studies are presented for each region that summarize local perspectives on global technology issues, such as the spread of fake news, concerns about privacy or the use of facial recognition software.


    Read full report
  • This guide is a methodological companion of the report “Towards a Human-Centric Internet: Challenges and Solutions”. The main goal of this study is to develop a visualization tool enabling the exploration of key technology challenges and related policy issues. Based on a text-mining methodology, we have examined and identified the specific topics discussed in a wide range of written media shared on social media platforms.

    In this methodological guide, we describe various methods that could be used to automatically generate topics, optionally augmented with expert analysis. Later, we present how these methods can be benchmarked to find the one most suitable for our NGI dataset and its umbrella topics. The benchmarking method is based on a labeled news dataset: Reuters-21578. We examine how various unsupervised topic detection methods (Latent Dirichlet Allocation, Pachinko Allocation, t-SNE, doc2vec, SVD and bag-of-words, combined with suitable clustering algorithms such as k-means, Gaussian mixtures, and HDBSCAN) perform on this dataset.

    We show the results and justify the choice of the model: t-SNE embeddings clustered with Gaussian mixtures. We also demonstrate that HDBSCAN clustering is a robust alternative to expert analysis, although with some demonstrable disadvantages. The main report presents a description of all narrow topics identified, as well as a deep dive into one umbrella topic. In this report, the descriptions of umbrella topics focus on the similarities and differences between the main and the alternative methods of assigning topics. The interactive results presenting both methods are available online: https://ngitopics.delabapps.eu.


    Read full report
  • This report provides an overview of the resources and output produced by the DELab team at the University of Warsaw during the Next Generation Internet (NGI) Forward project.

    The aim of this document is to describe and collect all pieces of work in one place, facilitating their continued use following the project. As part of NGI Forward, our main aim has been to support the Next Generation Internet initiative by providing data science tools to map and analyse the developments of the tech world. Therefore, there are different types of outputs: methodologies with documented Python codes; reports summarizing insights; and interactive presentations with publicly available data.

    We briefly introduce and explain the two main text-mining methodologies (trend analysis and topic mapping) and novel datasets prepared during the project, as well as presenting the list of available reports, tutorials and other project results.


    Read full report
  • The report summarises the results of the first iteration of trend analysis.

    The report is based on the online news dataset and on the working paper dataset for the period 2016.01-2019-04.


    Read full report
  • The report provides our preliminary analysis for topic mapping. The analysis is based on the same news media dataset as the first iteration of trend analysis (2016.01-2019-04).

    The presented methodology used LDA and t-SNE to map latent topics in the news dataset. 17 wide umbrella topics were identified, and 5 were selected for further analysis.


    Read full report
  • The report summarises the results of the second iteration of trend analysis and an additional deep dive focused on the COVID-19 pandemic.

    Therefore, two separate trend analyses are presented: one for the period 2016.01-2019.12, and the second for the period 2020.01- 2020.06.

    The COVID-19 analysis includes further explorations on open-source Github projects and about Reddit discussions.


    Read full report
  • This blog post presents a short summary of the COVID-19 analysis.


    Read full blog post
  • This blog post presents the short summary of the report Intermediary topic modelling analysis results. It introduces LDA and explains how to interpret the interactive visualisations.


    Read full blog post
  • This blog post presents the short summary of the report Intermediary topic modelling analysis results.


    Read full blog post
  • This blog post presents an analysis of HackerNews discussions about privacy using sentiment analysis.

    The basis of the exercise is the HackerNews analysis with BigQuery tutorial. The post explains the potential of text-mining methods in extracting insights from social media posts and comments.


    Read full report
  • The post provides a brief introduction to trend analysis with a focus on AI and ML.


    Read full blog post
Contact us
  • Call us

    +48 22 552 70 01
  • Address

    DELab UW – Digital Economy Lab ul. Dobra 56/66 00–312 Warszawa