Transkribus Review 2019

Filotas Liakos
10 min readJul 22, 2019

--

In the last years, a massive number of historical artifacts from libraries, museums, and archives have started to make their presence online appreciable. Documents that were never before accessible to the public eye are now counting thousands of terabytes of digitized images waiting to be transcribed by scholars and history enthusiasts. Ancient manuscripts and medieval documents that were not easily readable by the vast majority of historians are now transcribed in their entirety. Transcription is the process in which historical artifacts are turned into editable text, and in this case, into digitally editable text. Thanks to the technological evolution nowadays, HTR (Handwritten Text Recognition) technology offers the ability to explore the past like never before. Only a few years ago, today’s computational power belonged to the sphere of the imaginary. Computer systems evolved tremendously, and are now able to not only “read” historical scripts, but also automatically transcribe manuscripts and archival documents created in the previous centuries.

Automated recognition of historical artifacts is a challenging task and demands a transdisciplinary approach. Handwritten documents are as unique and individual as their writers. In the last decade, the scenery of HTR technology has significantly changed so that today we can identify the most promising factors which will make the reformation of access to historical handwritten documents achievable. Technologies like pattern recognition, computer vision, and document image analysis are only some of the related fields that have accomplished remarkable progress during the last decade. Additionally, powerful machine learning algorithms involved in the vast development of new extraction methods and document layout analysis algorithms which recently have successfully applied to the HTR field.

Another essential factor that involves in HTR consolidation is the availability of digitized archival documents. Nowadays, more and more institutes perceive digitization as a natural component of their mission and invest significant resources into large scale digitization initiatives. Subsequently, each year thousands of volunteers collaborate with institutes and genuinely contribute to the improvement of the accessibility of digitized collections. Fortunately, all these notions found common ground in one platform.

Transkribus is considered as one of the most critical initiatives for the introduction of HTR technology to the public. This software is a revolutionary tool based on the JAVA programming language together with a graphical widget toolkit. This platform was created as a part of the University of Innsbruck’s contribution to the TranScriptorium e-Research Consortium (2013–2015), a project that was funded by the European Union and can be considered the alpha version of the software. Professor Günter Mühlberger, the head of the Digitization and Electronic Archiving group at the University of Innsbruck, along with his team, are leading the development of this service platform, which is aimed explicitly towards archival institutes and history specialists. Their team received financial support from the European Union, which was initiated with the TranScriptorium project, and continued through a new project, named “Recognition and Enrichment of Archival Documents”. This project combines groundbreaking research, humanities scholarship, digitization initiatives, and a crowdsourcing marketing strategy. Last but not least, with this project, they are aiming to implement a virtual research environment where archivists, volunteers, scholars, and computer scientists will be able to innovate for the enrichment of handwritten archival material using cutting-edge technology and achieve never-before-seen access to archival material with the support of HTR technology.

The Transkribus project, which according to its webpage aspires to be a personal learning network, but also a natural component of successful crowdsourcing citizen science ecosystem, seems to be one far-reaching tool for the digital transformation of historical research. Transkribus’ interface is a platform-autonomous JAVA tool, with which users can reach the services offered by the platform. Users can download Transkribus free of charge, from a comprehensive Wiki webpage which is additionally available as a user guide. During the last six months, I had the opportunity to discover many aspects of this revolutionary software during an internship at the National Archives of the Netherlands. Transkribus was the reason why I decided to explore the new digital perspectives in the archival field. This platform offers a chance to explore the past and the space to deepen into the meaning of historical documents from multiple angles.

When I started to work with this platform as a user, I immediately realized that this software was not necessarily user-friendly. At first glance, the platform’s digital interface was looking kind of peculiar and hard to understand. There were many strange-looking buttons and functions that users have probably never used before, creating an intimidating first impression. In my opinion, the absence of any built-in introduction, or pilot, in order to help users understand how to use the platform sufficiently, classifies Transkribus as a specific category of software, academic software.

The software is indeed well designed, but not addressed to the average user. A promising platform like Transkribus, which tries to introduce HTR technology to the academic world, not being designed based on a user-friendly conception seems rather odd, as it cannot be assumed that academics have advanced experience on these types of platforms. Academics for archival or historical experts profoundly design this crowdsourcing ecosystem. Average users must spend several hours or maybe days until they are able to use the software and work efficiently with it on their projects. I believe that the Transkribus team is aware of this issue and this is the reason why they have created a series of downloadable manuals in PDF form, which aim to introduce the users to the software features and functions. Each manual analyzes a different aspect of the system and explains to users how they can operate the system efficiently. These manuals are undisputedly helpful, but compared to modern standards, this kind of approach can be perceived as an old-fashioned method from some of the users. Most of the digital applications today tend to include introductory instructions and hints in order to appeal to the users’ interest and make them feel capable enough to continue to operate the application. Confidence should be one of the first feelings that users experience while using the software, and with the Transkribus platform, confidence must be gained gradually through constant studying and experimenting.

However, it would be unfair to claim that the Transkribus’ manual approach was not helpful nor educational, as eventually, users will be able to understand the core idea behind this platform and also train themselves on how they can operate it sufficiently. The only disadvantage from a functional perspective, according to my opinion, is that users can educate themselves and achieve a certain level of knowledge about the platform’s operation, but in order to do so, they will have to spend a significant amount of time on studying and manual reading. This is not necessarily a bad condition, but according to modern standards, time-consuming processes usually are a negative characteristic for computer applications.

Besides the interface challenges, Transkribus offers some unique and compelling tools for its users. The platform is well designed, but the interface remains, in my personal opinion, a severe issue for the users. Transkribus’ major selling point seems to be the use of HTR technology, but as a crowdsourcing platform, the Transkribus team should consider that the platform is not used only by experts, but also by scholars, students, and individual researchers. Hence, the designing of a more user-friendly interface seems essential for the software’s success.

The Transkribus project was established in 2010 under funding from the Arts and Humanities Research Council, and until today, the idea behind this project remains as pioneering and innovative as nine years ago. Transkribus’ target groups can be divided into four categories:

  1. Humanities scholars who they are high-level experts and they can provide an accurate transcription of a document. Also, they desire to manage scholarly digital editions of manuscripts.
  2. Archival institutes that want to analyze and restore information from a vast amount of digitized records, and at the same time are actively involved in crowdsourcing operations in order to enrich the produced data.
  3. Volunteers who know to operate the platform efficiently and can take part in significant transcription projects like READ’s “Transcribe Bentham.”
  4. Computer scientists who aim to develop new algorithms and methods for information extraction and they can contribute with their methods to the technological progress of the platform.

Furthermore, the system’s developers are already planning to make Transkribus commercially available to users around the world. Until today, major archival and historical institutes are in touch with Transkribus, expressing their interest. In 2018, the National Archives of the Netherlands used Transkribus as their central ecosystem to base notable projects on cutting edge HTR technology. According to the READ project, which is funding Transkribus until 2019, “the main objective is the advance access to historical, handwritten documents from all over the world, regardless of their alphabet, language or the date of their creation.” Transkribus, as part of the READ project, follows the same promoting strategy and promises to its users the ability to transcribe historical documents in a highly standardized, flexible and reliable way. Mainly for the archival field, Transkribus offers a path for new opportunities to access, enrich and explore archival material like never before.

It can be argued that Transkribus fulfills the standards and the expectations of its users. Among the unique features of this platform are the keyword spotting tool, the automated transcription, the advanced layout analysis and the custom manufacturing of HTR models. Each feature has its advantages and disadvantages, but according to Transkribus, what counts most is the transcriptions’ safety. The Transkribus server ensures that users will never lose their transcriptions or the documents that they have uploaded on the system. However, besides safety, this platform also ensures long term accessibility of the historical and archival artifacts while also contributes to the preservation of the handwriting material through its server. Thus, accessibility in combination with high-end HTR technology that provided through a well organized technological ecosystem are elements that make this platform an almost irresistible combination for Transkribus’ users.

Transkribus has the ability to simplify tasks that would often take years of work, helping scholars with complex handwriting and unusual layouts. Nevertheless, high-end technology combined with the unique features of the platform is also the dominant characteristics of this system. The servers at the University of Innsbruck use machine learning algorithms in order to teach new writing styles to the system. The system can transcribe the text in any language and handwriting type. After a user transcribes part of the text manually, the software engine learns to identify the characters and then finishes the task automatically with impressive accuracy. Thus, the idea behind the platform seems exceptionally simple and pioneering. All the user needs to do give an image to the software and a part of the corresponding text and based upon this text; the software can learn the handwritten script and similar fonts. However, in order to do this properly, users must create certain circumstances, under which their documents will finally be automatically transcribed.

Moreover, Transkribus’ ecosystem is undeniably a big plus that involves in the commercial success of the platform. Users have the opportunity to be part of a very pioneering cutting edge cluster from technologically perspective that allows them to grow their capabilities and knowledge about handwritten documents and application of HTR technology. All those services are provided by the READ project combined with Transkribus’ team expertise free of charge. The standard requirement for the use of this ecosystem is average mobile devices and personal computers. The Scan Tent might be a brilliant approach that possible capture the interest of the potential users but even without this piece of equipment, users are still able to produce sufficient and credible scan and text information for their work in the platform.

On the other hand, automatic transcription might be one of the highest selling points of Transkribus, but the success of this operation is dependent on each document’s needs. Each text has its unique characteristics and requires special personalized treatment. Users must be able to provide to the system accurate human-made transcription of the document that desires to transcribe, and after that, they must build a model that is intelligent enough in order to decode the handwriting types that the document includes. In short, we are concluding that this platform does not have one but several selling points. The technology that Transkribus platform is handling can provide it is entirely advance HTR technology for academic and research needs, while the increased accessibility of the transcribed documents through the servers at the University of Innsbruck ensures that this project will keep bringing together the scientific and the technological world.

Although besides platforms advanced features, competition in marketing approaches today seems to be an essential issue for every service platform. So far there is currently no service worldwide which offers services similar to those of Transkribus. In the same context, nowadays there are many high-level companies out there that can possibly outcast Transkribus. E.g. In October of 2014, Google announced its plan to digitally convert fifteen million historical books and distribute them for free online until the end of 2015. In this scenario, if such a tech giant decides to enter the archival field and offer free digitization and provision of digital surrogates like Transkribus, profoundly the commercial success of the platform would change dramatically.

In short, this piece of machinery will give the opportunity to the scientific audience to develop further research methods and establish brand new ways of operating and extracting information from handwritten archival manuscripts. What counts more is the fact that Transkribus is a powerful medium that users have free access and by this, they can reach the majority of modern technological discoveries on HTR field as part of the Transkribus platform. Transkribus offers a variety of advanced tools to its users, including:

  1. Archiving of text collections and associated scans or transcriptions
  2. Enrichment with metadata
  3. Automatic and manual segmentation of the text
  4. Tag setting, commenting and annotation
  5. Automatic transcription
  6. Use of automatic HTR functions
  7. Training your own HTR model for a specific typeface
  8. Error rate measurement of HTR and OCR

Every tool offers users a different editing perspective, and they are all equally compelling. Transkribus, in order to provide such advanced text recognition and analysis, incorporates advanced Machine Learning algorithms and Natural Language Processing. Furthermore, the system contains a unique piece of source code and engines that exploit the power of Neural Networks. This complex computational structure in combination with the vision that project READ delivers are the primary reasons why this system is such an ambitious initiative for the archival society.

Hence, we are concluding that Transkribus is an exceptional piece of technology because it is constructed under technologically high- end specifications as its own ecosystem can accommodate almost every piece of machinery available today. Transkribus from a technical perspective might not be perfect yet but the growth that the platform has shown into the last decade guarantees the future prosperity of the software. Last but not least, it is fair to acknowledge that Transkribus project is indeed a powerful stepping stone that will inspire a new archival era as has already transformed the ways that most of the European Archival institutions contacted research on handwritten material.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Filotas Liakos
Filotas Liakos

No responses yet

Write a response