Information on the processing of personal data

Before you give your consent in the sign-up form, we are obligated to inform about the processing of personal data based on Databeskyttelsesforordningen.

We are obligated to inform you about the details of our processing of your data and furthermore inform you about your rights in reference to this.

 

Legal basis

Our processing of personal data is based on article 6, paragraph 1, item a (consent) of Databeskyttelsesloven.

The data will be stored safely with Aarhus Kommune for as long as you wish to receive news from Tech City Aarhus. Your data will be deleted permanently if you choose to unsubscribe. The information will not be shared with others and data will exclusively be processed by Aarhus Kommune Erhverv.

 

Your rights

You have the right to request insight in the information we hold about you.

You have the right to request eligibility or deletion of the information.

 

Who is using your information?

The data responsible is part of Aarhus Kommune and your personal data is processed only by:

 

Data responsible:

Borgmesterens Afdeling
Erhverv og Bæredygtig Udvikling
Aarhus Kommune Erhverv

Rådhuspladsen 2
8000 Aarhus C

E-mail: aarhuskommuneerhverv@aarhus.dk
Telephone: +45 89 40 22 00

 

If you have any questions in connection to the processing of your data by Aarhus Kommune, please contact the advisor of data protection of Aarhus Kommune at: databeskyttelsesraadgiver@aarhus.dk

Finally, we will inform you that it is possible to address a complaint to Datatilsynet about our processing of personal data at: www.datatilsynet.dk

375 hours of Danish dialect recordings released: Empowering Danish speech technology

Tech Savvy icon 375 hours of Danish dialect recordings released: Empowering Danish speech technology

Over the past two years, Danes from all over the country have donated their voices to a new speech dataset that will improve the use of Danish speech technology. The technology is growing globally and will improve voice-activated assistive technology and help streamline routine tasks such as note-taking.

Read also: The Alexandra Institute turns 25 years old: “Our most important task is to create value in society”

For speech technology to work optimally, it requires large datasets, and Danish has previously lagged behind as it is a small language area. The Alexandra Institute, in collaboration with several partners, has collected around 375 hours of Danish speech – part of a larger ambition to create a dataset of 1,000 hours. The goal is to make it the largest Danish speech dataset to date, with a broad representation across gender, age and the many different dialects and accents in Denmark.

“One of the unique aspects of the dataset is that it has a broad representation of the entire country,” says Dan Saattrup Nielsen, Senior AI Specialist at the Alexandra Institute in a press release.

The dataset can be used for many purposes, including transcription and hearing aid development.

 

Minimize bias in datasets

Previous datasets have been relatively small and dominated by young urban males, which has affected the accuracy of speech recognition for those who speak dialect, are older or of a different gender.

Read also: Major Danish players develop artificial intelligence: New language model to safeguard Danish language and culture – TechSavvy

“This means that the models trained on the dataset will be much better able to handle the different ways we speak out in the countryside, thus minimizing the bias of existing datasets,” explains Dan Saattrup Nielsen.

This will improve technologies like voicebots in customer service and automated note-taking in healthcare. Businesses will also benefit from more accurate automated meeting minutes. As part of the project, the Alexandra Institute has also developed a test dataset that makes it possible to test the accuracy of existing speech recognition systems from Google and Microsoft across different factors such as gender, age and dialects.

“With it, you can test exactly how good those systems are. It can help companies or the public sector make better decisions about which system to use,” says Dan Saattrup Nielsen.

The data set now released is the first part of the project. During the fall, a second part will be released with two-person conversational data that reflects more natural conversations. The project aims to release up to 1,000 hours of data within the next year, which will include both reading and conversation.

 

Facts about CoRal

CoRal is an initiative that has collected over 2,000 Danes’ dialects and accents to create a comprehensive speech dataset.

The goal is to have a dataset with over 1,000 hours of Danish speech, representing all age groups, genders and regional variations.

The project is a collaboration between the Alexandra Institute, the Department of Computer Science at the University of Copenhagen, Alvenir, Corti and the Danish Agency for Digitization and has a total budget of DKK 22 million, of which DKK 14 million comes from Innovation Fund Denmark.

The dataset can be downloaded here.

 

The post 375 hours of Danish dialect recordings released: Empowering Danish speech technology first appeared on TechSavvy.