The Authority set out to design and implement a system capable of:
- producing textual transcripts of media assets
- identifying speaker changes (identifying speakers)
- recognizing name entities
- determining the start and end of spoken words with hundredth-of-a-second accuracy
In addition, all system functions must be accessible via an easy-to-use web interface.
The above requirements serve multiple purposes. On one hand, video and audio media assets become searchable in text form. Such use cases include content analysis of political news and magazine programs. On the other hand, checks such as monitoring advertisements and sponsorships, tracking product placements, verifying protection of minors, examining age ratings, and supervising accessibility for the hearing and visually impaired can be enabled or automated. Thirdly, in official procedures, it is expected that investigation reports contain verbatim transcripts of the examined program elements.
Until now, transcripts were prepared exclusively by human resources, which was an extremely time- and labor-intensive process. Thus, the Authority did not have adequate IT tools for the efficient monitoring of these tasks and regulations.