Projekt:Wikispeech 2017/Ansökan till HIF elrha

Name of the project*:

Text-to-speech in a crisis

Project Summary*: (300 characters)

The project innovates around the use of text-to-speech (TTS) for making information available during a crisis. By using an open software platform, Wikispeech, along with crowdsourcing, automatically spoken info can be provided to those in need. Methodologies, strategies and tools will be explored.

Location*:

This project must take place in 1-3 locations. Please enter up to 3 locations.

Stockholm, Sweden

Start Date (DD/MM/YYYY)*:

Enter your project start date

31/03/2018

Duration*:

What is the duration? The duration can be 6-9 months. Please enter the number of months your proposed project will run; limit up to 9 months.

Please note that any proposed project must be completed by 31 December 2018.

9

Partners:

If working with other organisations on this project, please enter the partner type.

University

Partner Name & Address: (300 characters)

Enter the name and address of the partner.

KTH Royal Institute of Technology (KTH). Dept. of Speech, Music and Hearing, Lindstedtsvägen 24, SE-100 44 Stockholm, Sweden.

Södermalms talteknologiservice AB (STTS). Folkungagatan 122, SE-116 30 Stockholm, Sweden.

Total Funding (GBP)*: (1000 characters)

Total HIF and other contributions to this project (list any other funding contributions to the project.

HIF funding: 50,000 GBP

Wikimedia Sverige own funding: 1,993 GBP

STTS own funding: 1,752 GBP

KTH own funding: 1,083 GBP

Innovation Stage*: (drop down menu)

Recognition: of a specific problem, challenge, or opportunity to be seized, in relation to the provision of humanitarian aid. At this stage, a key focus is to make sure you are addressing an issue that really matters to those affected by crisis.

Targeting*: (drop down menu)

What area of humanitarian response are you targeting?

Information management, communication and technology

Core Challenges*: (600 characters)

What is the core challenge that you feel needs to be addressed?

Information is crucial in a crises, but impaired reading capabilities limit people’s capacity to access written text. Already vulnerable groups, e.g. people with disabilities, are affected the hardest. Text-to-speech can alleviate the problem, but must be of sufficient quality and made available without costly hard- or software. We intend to explore the possibilities of using the new Wikispeech solution and its speech technology for improved accessibility of information. Tools, methodologies and ideas on how to combine resources and engage people are to be investigated.

Change*: (600 characters)

What change will your innovation lead to?

1 More people will be able to access information in a crisis as they can listen to it. As smartphone availability increases globally, usefulness increases over time.

2 Aid organizations can work together to improve the open source TTS and hence use resources more efficiently.

3 Public engagement. People can volunteer to improve the technology, which creates interest and awareness about current crises. As it is online based, e.g. diaspora can be involved.

4 The project design allows for both short term improvements as well as strategic investments to improve info during crises.

Program Supplemental

We recommend you prepare your answers in a word document and then copy and paste into the relevant boxes below.

Questions

SECTION 1:*:

CONTEXT ANALYSIS AND RATIONALE

Existing Practices (2,000 characters)*:

- Provide a short case study of existing practice / literature review of work and research carried out to date in the area of the proposed innovation (cite relevant published literature as footnotes where applicable)

- Indicate clearly how your idea, if successful, could transform practice / address gaps, complement other initiatives and avoid duplications.

The general public has helped collect data quickly through online tools in crisis situations before, e.g. to create maps for emergency routing through the Humanitarian OpenStreetMap Team (Palen et al. 2015). The use of crowdsourcing to improve text-to-speech (TTS) is pioneered in Wikispeech (Andersson et al. 2016), which has an online editing tool for phonetic transcriptions, that can be used by anyone to improve the rendering of a text. In order to provide comprehensible information through TTS, new, complex or important terms must receive a comprehensible pronunciation. Non-trivial texts include a significant proportion of words not found even in very large pronunciation dictionaries (Federico et al., 2004), and methods for managing unknown words are essential.

Wikispeech, an openly licensed scalable TTS solution for information rich texts, is in a unique position to make crucial information available to people that need it the most during a crisis. Actors involved in aid interventions can work together with the general public (with privacy ensured) to improve the quality of the TTS through crowdsourcing. As Wikispeech is open source, anyone can build on it for their specific needs.

As the language and reading skills will vary between different groups the need for TTS in a local language is key. Wikispeech is built with simple addition of more languages in mind. While the commercial solutions only cover the larger languages and in particular languages from wealthy parts of the world, Wikispeech hope to close this gap over time.

However, the language resources needed to create a TTS for a language are not always present or properly compiled. Currently there is no overview or clear understanding of what language resources to focus on to improve communication during a crisis. A tool providing such an overview can guide researchers, companies and investments to the language resources most needed to save lives.

Evidence and rationale for the innovation (1,000 characters)*:

Please provide details of the rationale behind the project:

- Evidence of the need/opportunity of such an innovation and/or;

- Evidence of a demand by users

In our work with Wikispeech, disability org’s have stated the need for effective dissemination of important information through TTS, and crises are an extreme example of when information must reach all affected fast. The DAISY consortium, which maintains the DAISY standard for markup of spoken books, have long noted the need (Kawamura 2010). Ensuring corehensible TTS in a language spoken by the affected of a crisis is key. and the quality of the service must be high to avoid misunderstandings.

During the Ebola crisis Wikipedia's articles about the topic were one of the most used Internet sources in the affected area, but many complained about difficulties reading the advanced texts. TTS would have been a great aid.

There is currently no information available listing the resources most needed to develop TTS solutions to spread information in a crisis. This is hindering targeted efforts by the research and the aid communities.

Potential impact (1,000 characters)*:

- Describe the potential impact of the innovation on humanitarian operations and outcomes, if it were to be successful.

- Describe the potential users

Information dissemination to affected parties is key to any humanitarian operation, in particular during a crisis. However, illiterate people have limited opportunities to get information from text. Making critical information readily available in spoken form can to a great degree solve this issue. The proposed action will investigate what languages to focus on primarily and what tools are needed to make the TTS available in new areas.

The general usefulness is going to grow as more people around the world get internet connected smartphones. A person with a disability or limited reading comprehension will be able to access text on their own terms and avoid dangerous situations.

To improve Wikispeech the general population will be engaged through crowdsourcing, which is a great way to improve public support and awareness for humanitarian missions. TTS researchers and language experts will be engaged to focus on the most crucial language resources needed to save lives.

SECTION 2:*:

DESCRIPTION OF APPROACH AND PLANNED ACTIVITIES

Conception of the innovation (1,000 characters)*:

- Describe what has been done to date: whether it be the initial state of recognising the challenge, or your first steps to inventing the solution.

- Indicate the level of engagement and involvement of stakeholders in the process

- Indicate how the users have been or / will be involved and consulted.

Development of Wikispeech started 2016 and the basic functionalities needed to add new languages and improving existing ones exist. However, currently it’s not built for quickly adding new data or languages in a crisis situation, nor is it clear what languages to prioritize. It has been presented at conferences, with more than 60 press mentions. The potential of Wikispeech has also been discussed with aid personnel during UNESCO MLW in Paris and at events in Stockholm.

The interest from the Wikimedia community is big and we aim to engage volunteers from Wikipedia. We intend to approach aid personnel through direct contacts and at events. Through our vast Wikimedia network we will reach out to disability orgs., linguists and experts in affected regions.

The Swedish associations for people with visual impairments and for dyslexia have both been supporting us and have taken an interested in the possibilities for people from poorer areas of the world.

Approach (1,000 characters)*:

- Describe the methodology you plan to use. Your methodology should clearly show the steps needed to meet the project’s objective(s). – this is where we expect a justification of the approach used to further the innovation and how you will generate evidence.

Consists of two investigative work packages:

WP1: Infrastructure for language resource prioritization

1 Theoretical investigation and model building for prioritization of language resources to be developed

2 Identify & combine available datasets (e.g geographical distribution and number of speakers, available language resources, reading skills, disability support, catastrophic events and distribution)

3 Develop simple mock-up as proof-of-concept. E.g a heatmap showing high priority languages on map

4 Communication to stakeholders

WP2: Investigation of tools needed

1 Investigate specific needs from stakeholders (at different times after crisis) from the aid sector, experts, people from lang. communities

2 Develop scenarios

3 Develop tech spec. to ensure usability (what exists, costs estimates, type of devices it need to work on, internet connectivity, type of TTS)

4 Creation of development plan for new tools & methods

5 Network building for later testing & feedback

Planned activities (1,000 characters)*:

Please outline any planned activities not already mentioned above.

The work packages outlined above will happen in parallel. We will organize a series of bimonthly meetings with stakeholders over the span of the project. This will include both physical and online meetings. At the meetings stakeholders will be invited to provide feedback and in other ways support the conceptualization, and conduct alpha testing. During these meetings we will be ready to alter the project plan according to feedback. The project group will have biweekly or monthly meetings during the span of the entire project. We will participate in at least 5 conferences to network and inform about the project.

The final report will be developed during the last month of the project summarizing work done and the current state of the two WPs. It will form the foundation for applications to further develop the innovations.

We will work to integrate the tools on the Wikimedia platforms for stable, secure and free information dissemination without privacy issues.

SECTION 3:*:

RISK AND MITIGATION

Assumptions, project risk and mitigation (3,000 characters)*:

What are the main risks the project will face? How will you address them?

Provide a brief assessment of the main risks to the innovation and how these risks will be monitored and mitigated.

Text-to-speech enables a user to listen to a written text, read by a synthetic human voice. It helps people who, for different reasons, cannot access written text to read. However, there are a number of risks associated with such solutions that need to be addressed (P = Problem, S = Solution):

P1 Limited quality of the TTS introduces problems of understanding. S: We believe that the solution must look different for different time periods of the crisis. Differences will be outlined in the plans. As the TTS is open sourced, it is possible for more people to figure out the problems and investigate different parameters. We will ensure that all added components are open source.

P2 Hard to improve in short time because limitations in tools, lack of volunteers/language speakers. S: Will be mitigated in 4 ways: 1 Testing and discussions with stakeholders and crowdsourcing experts to include experiences from other initiatives. 2 Ensuring inclusion in Wikipedia in the future will allow us to tap into the active volunteer community. 3 Building up a dedicated network of supportive organizations and a pool of language speakers is an important part. 4 Through the development of infrastructure to improve prioritization in WP1 some work can be done in advance to prepare for crisis that are expected.

P3 Hard and costly to deliver the solution. S: Wikimedia, running the enormously popular website Wikipedia, has an extremely strong and robust technical infrastructure which we aim to use. Wikipedia is nearly always found amongst the first search results. However, the software can be run on other websites if so preferred. We will ongoingly investigate how the TTS and supporting tools can improve efficiency.

P4 Number of languages that technically can be supported. S: Written text for all languages can currently not be added online due to technical limitations. The MediaWiki software, that Wikispeech runs on, is the second most multilingual website in existence and is currently available in more than 300 languages (only surpassed by jw.org). Adding more languages happen frequently, but takes time. However, the 300 languages cover the vast majority of people in the world. Out of these 300 languages only a small part has a functioning TTS. Through WP1 we can help prioritization on what languages to focus on next.

P5 Crowdsourcing can introduce problems and mistakes. S: We have much experience from Wikipedia and an understanding of what tools are needed. An important way to mitigate the effects is that different types of contributions will be allowed and focused on depending on what phase the crisis is at. Anti-vandalism tools will be investigated and outlined as part of the development plan.

P6 Delays between crisis starting and TTS being developed prevents use. S: Methods for how to communicate an emerging crisis to the people involved in the TTS work need to be established.

SECTION 4:*:

CAPACITY AND PARTNERSHIP

Team capacity, partnership and cooperation (3,000 characters)*:

Who is implementing the project? Is there any partnership planned?

- Describe the key members of the project (including any partners) and the knowledge, skills and experience they bring

- Describe any stakeholder groups/networks you hope to engage/collaborate with during the project

The project team consists of three partners that have worked closely over the last two years.

Wikimedia Sverige (WMSE), the Swedish chapter of the Wikimedia Foundation (who runs Wikipedia), is the project owner. WMSE initiated the Wikispeech project in 2015. The chapter has worked extensively with diversity and accessibility on the Wikimedia projects. WMSE’s expertise in crowdsourcing and development of online solutions will be utilized for the development. As part of a global movement with volunteer communities and organizations in around 120 countries WMSE is well situated to find local partners in crisis affected areas. WMSE extensive experience with data compilation and visualization ensures the success of the work to visualize prioritized areas.

The Speech group at KTH Royal Institute of Technology in Stockholm was founded in 1951, and is perhaps the oldest running speech technology lab in the world. The group has researched TTS since the very beginning, and has access to a very large international network of TTS developers, researchers and institutions. It is currently involved in some half a number of national and international projects involving TTS for purposes a varied as smart assistants, robotics, health care and diagnosis, and information dispersal. An area of special interest at KTH is the understudied evaluation of TTS, which crucial for TTS used critical applications, such as crises. The Speech group at KTH is perfectly poised to gather available data needed for the heatmap.

STTS is a company in the speech technology business, with many years of experience developing resources such as electronic pronunciation lexica, etc, in a number of different languages. STTS is amongst other things developing the lexicon for the Wikispeech project and will prepare for tool development and investigate what resources are available.

We will work to engage disability organizations in Sweden for feedback on the tools. We aim to work with staff from the aid sector, such as the Red Cross, the Swedish International Development Cooperation Agency (Sida), Folke Bernadotteakademin and similar organizations to compile knowledge from their work in crisis situations and what is most important in the tools envisioned.

We will also work to engage language communities, linguistic experts, TTS researchers to continue the development of the TTS and the language resources.

The team behind the “International Conference on Tsunami Preparedness of Persons with Disabilities in Thailand”, and any similar events, will be contacted to build on their conclusions around information dissemination to disabled persons.