Klaus Schöffmann

University of Klagenfurt

Medical Video Processing

Processing and analysis of medical videos is becoming more and more important, because an increasing number of videos and images are integrated in the daily routine of surgical and diagnostic work. While the collection of medical multimedia data is not an issue, appropriate tools for efficient use of this data are missing. This includes management and inspection of data, optimized storage, visual analytics, as well as learning relevant semantics and using recognition results for optimizing surgical and diagnostic processes. The characteristics and requirements in this interesting but challenging field are different than the ones in classic multimedia domains. Therefore, this tutorial gives a general introduction to the field, provides a broad overview of specific requirements and challenges, discusses existing work and open challenges, and elaborates in detail how video and image processing and analysis can support surgeon training and retrospective analysis, among several other use cases. In particular, it covers domain-specific compression of medical videos in laparoscopy and cataract surgery, relevance filtering, tool and action classification, irregularity detection, as well as an overview of existing datasets.



Invited Speakers

Klaus Schöffmann

University of Klagenfurt

Relevant Content Detection in Cataract Surgery Videos

Cataract surgery is one of the most frequent surgeries in the world – its goal is to replace a human lens suffering from natural clouding with an artificial one, which can significantly improve or even restore human vision. More and more surgeons in this particular field of ophthalmology record surgery videos from such microscopic treatments because these videos can greatly aid teaching and training. Moreover, they also allow for post-operative investigations when supported by image and video processing methods. This talk will present our latest research results for automatic video content analysis in cataract surgery videos with deep learning and image processing methods. I will talk about the detection of surgical tools with R-CNNs, the recognition of operation phases and actions with CNNs and RNNs, and the detection of irregularities, such as pupil reactions and lens unfolding delay, with a combination of different deep learning and image processing methods.

Bernhard Strobl

AIT – Austrian Institute of Technology GmbH

D.I. Bernhard Strobl

touchless4f: A dedicated device for contactless fingerprint scanning

Developed over the last two decades, contact-based fingerprint authentication has found its place in various scenarios such as police control, border control, access control and forensics. To gain high biometric recognition accuracy, fingerprint samples have to be of high quality. To find its way into a broader field of application scenarios, fingerprint acquisition lacks in terms of hygienic harmless capturing and acquisition speed. Improving speed and decreasing hygienic concerns would also benefit for a higher convenience and public acceptance rate.

We present a special hardware-based approach for a dedicated device for contactless fingerprint scanning which is fast, small, cheap in price terms, and it delivers images of high quality.

The presentation will address the acquisition process, hardware setup, used elements, challenges in algorithmic design, quality estimation, shortcomings and potential improvements. At the end of the presentation there will be the chance to have some hand-on experiments on a device.

Björn Ommer

TU Munich

Next Frontiers of Machine Vision & Learning

Recently, deep learning research has seen enormous progress that has tremendously propelled artificial intelligence and its applications. However, in light of the grand challenges of this field present approaches still show significant limitations. The ultimate goal of artificial intelligence and computer vision in particular are models that help to understand our (visual) world. Lately, deep generative models for visual synthesis have been making significant steps towards learning not only powerful, but also semantically meaningful representations of the image space. Explainable AI extends this even further, seeking models that are also interpretable by a human user. The talk will discuss some of the recent breakthroughs in machine vision and learning, highlight future challenges, and propose ways to improve the accessibility of content. Which methodological advances are needed to fully leverage the potential of intelligent data analysis and what is the next frontier? The talk will then also showcase novel applications in the digital humanities.

Stefan Lang

University of Salzburg


Dirk Tiede

University of Salzburg


Explainable AI for humanitarian action

A never-ending series of conflicts all around the globe render millions of people homeless and force them to flee. The UNHCR estimates the number of forcibly displaced people (FDP) at 82.4 million for the year 2020, with a large share of them (~60%) internally displaced, i.e. not achieving official protection status as not crossing an international border. Other than the news-prevailing Ukrainian war, where latest information technology is (still) abundant and makes UN and other organisations, including NGOs, as well as the public aware of the emerging tragedy (facing disinformation rather than none), the widespread protracted crises in South Sudan, DRC, Syria, Yemen, and so forth, exhibit little to no public attention or media coverage. Humanitarian actors, such as Médicins Sans Frontières (Doctors Without Borders, MSF) therefore rely on independent information for logistics and planning, aid delivery, food and nutrition supply, as well as maintenance of safety and health. The Christian-Doppler Laboratory GEOHUM, established at the University of Salzburg in partnership with MSF, promotes cutting-edge technological advancements in satellite image analysis and geospatial tools for enabling the uptake and proliferation of relevant information products in the context of humanitarian action. A most critical task is to estimate displaced population based on reliable dwelling counts in camps and other ephemeral settlements, including in complex urban settings. To cope with this challenge, we use a mix of input data (different spatial resolution levels, temporal coverage, active vs. passive sensors, etc.), to maximize the effectiveness of information delivery. Timeliness and reliability are the key requirements in this domain, leading to ever-new attempts in automating information extraction and making image processing transferable. Deep learning (DL) routines, spearheaded by convolutional neural networks (CNN), are promising when fed with large amounts of dwelling representations, whose quality and provenance are known. The latter, harvested from preceding semi-automatic analysis in this context, are currently documented and organised in a dedicated sample repository; along with systematic studies on the impact of quality parameters on the DL performance. Together with established methods of object-based image analysis (OBIA), where image understanding is performed by expert system engineering using a-priori knowledge on the geometric and spectral properties of the target classes, we believe a significant step can be made towards making the information extraction process more reliable, rapid and, not the least, explainable. We illustrate the implementation of hybrid (explainable) AI via real-world examples in the humanitarian domain, ranging from well-structured camp settings, e.g. in Minawao (Cameroon), to highly complex urban settings such as the mega-city Khartoum (Sudan).