Welcome to Hexatomic!
You are reading the user documentation for Hexatomic.
What is Hexatomic?
Hexatomic is an extensible OS-independent platform for deep multi-layer linguistic corpus annotation.
Research projects want to answer very specific research questions. When they use corpora, they may have to use specific software that can
- handle the format their corpus data is available in, or
- provide annotation functionality for the annotation types they want to use.
Projects may also need additional software to search their data. If they have corpora in more than one format, they may have to duplicate the number of software tools they have to use in order to answer their research question. And if this software doesn't exist yet, they have to implement a new tool from scratch.
Hexatomic aims to alleviate this situation, and reduce the number of tools a project will have to use (and install, and maintain, and learn) to 1.
How does it do that?
- It works with a generic graph-based data model, that can handle many different types of annotations.
- It includes a converter framework, which allows the import of a multitude of different corpus formats, provides them in the generic graph-based model for manipulation (corpus-building, annotation, cleaning, etc.), and can export the corpus data to yet another multitude of corpus and other data formats.
- It includes powerful corpus search functionality, that offers the usual free text- and regex-based search, but can also search across linguistic structures and build complex queries across layers.
What if Hexatomic cannot handle my specific annotation type/use case?
Hexatomic is built to be extensible through plugins. If it doesn't offer what you are looking for, you don't have to implement a new tool from scratch, but can instead build a plugin that does what you need. On top, the existing functionality, the data model, import and export functionality, and search come for free with that.
What kind of software is Hexatomic?
Hexatomic is software for the desktop. You download it to your computer. It does not need an internet connection to run, and therefore you can also use it in the field, or on the train en route to a conference.
Specifically, it is an Eclipse e4 application implemented in Java.
Hexatomic is free and open source under the Apache License, Version 2.0.
How can Hexatomic be used?
You can use Hexatomic to do, for example, any or all of the following:
Build a corpus from scratch
Merge different corpora into a new one
Annotate an existing corpus
Error correct a corpus
Search a corpus
Hexatomic installation & start
Download
- Go to the download site for the latest release of Hexatomic:
https://github.com/hexatomic/hexatomic/releases/latest. - Download the
.zip
file for your operating system:- Linux: Download
hexatomic-<version>-linux.gtk.x86_64.zip
- Mac OS: Download
hexatomic-<version>-macosx.cocoa.x86_64.zip
- Windows: Download
hexatomic-<version>-win32.win32.x86_64.zip
- Linux: Download
- Extract the downloaded
zip
file to a directory of your choice.1
That's it. You can now run Hexatomic.
Run Hexatomic
- Go to the folder into which you have unzipped the Hexatomic download.
- Run Hexatomic by
- double-clicking on the launcher file in a file manager, or
- starting the launcher file from the command line.
- In Linux, the launcher file is simply called
hexatomic
and can be started with./hexatomic
. - In Windows, the launcher file is simply called
hexatomic.exe
and can be started withhexatomic
. - For Mac OS X, we provide an
.app
file calledhexatomic.app
.
Some archive extraction software may not work, and Hexatomic may not start. In this case, try 7zip, which should work.
Usage
This section describes how to use the user interface of Hexatomic. See one of the following sub-sections for more details.
Working with projects
Hexatomic works on a single Salt project at any one time.
A project consists of a directory containing a project file (saltProject.salt
) and a number of sub-directories containing the Salt document files.
To open a project in Hexatomic, click on File in the main menu and select the option Open Salt Project. A new file dialog window will come up. Choose the folder containing the project file you want to open, and open it.
Editing the corpus structure
Each corpus project can consist of multiple documents which are organized into
- corpus graphs,
- corpora, and
- sub-corpora.
Even in simple projects with only one document, this corpus structure exists and can be used to extend or re-organize existing corpora. The minimal structure for a corpus with one document is therefore: One corpus graph containing one corpus containing one document.
In Hexatomic, the corpus structure is always visible in the special “Corpus Structure” editor.
Corpus graphs
Corpora, sub-corpora and documents are organized in a hierarchical structure, the so-called corpus graph. A project in Hexatomic can have more than one corpus graph, but for most projects a single corpus graph is sufficient. In the special case where you import different corpora from different annotation formats into the same Hexatomic project for merging them, you will need more than one corpus graph.
In an empty project, just click on the “Add“ button to add a new corpus structure.
The default “Add“ button is context-sensitive, and will add elements "intelligently", depending on which type of element is currently selected in the corpus structure. To explicitly choose the element to add, click on the small arrow on the right side of the button and a drop-down menu with the different options will appear.
If you delete a corpus graph, all of its documents and corpora will also be deleted. Before you delete a (sub-) corpus, delete all of its child elements first.
Corpora and sub-corpora
Inside a corpus graph, the different corpora and sub-corpora are organized as a hierarchy. A corpus graph should only contain one top-level corpus, whose name is often used as corpus name when exporting the corpus to a different format. To add a sub-corpus, select the parent corpus, click on the arrow on the right side of the “Add” button and choose “(Sub-) Corpus”. You can edit the name of a corpus by double-clicking on its entry and pressing enter when finished.
Documents
When a corpus is selected, the default action for the "Add" button is to add a new document. When a document is selected, the "Add" button will create a new sibling document in the same parent corpus. Documents must have a corpus as a parent and contain the base text and linguistic annotations. You can move a document from one (sub-) corpus to another by dragging and dropping it.
It is possible to apply a filter, to only show documents whose names contain a certain string.
Opening an editor
To open an editor for a document of the corpus, first select the corpus in the "Corpus Structure" editor on the left. Then, use the right mouse button to open the context menu for the selected document and click "Open with Text Viewer". This will open a new view where the text of the document is displayed.
Troubleshooting Hexatomic & getting help
Please go through the sections on this page to see if one is what you are looking for. If you don't find what you are looking for here, please report the issue in Hexatomic's issue tracker. The section "Reporting issues" explains how this is done (it's quick and easy).
Sections
How do I do X? Why doesn’t Y work? Where can I go to get help?
The documentation didn't help me!
I've found a bug!
I'm looking for Hexatomic's source code!
I need to contact the Hexatomic team!
How do I do X? Why doesn’t Y work? Where can I go to get help?
This user documentation is the one-stop source of information on using Hexatomic, and can hopefully help you.
If you have read through it and haven't found an answer to your question, you can join the Hexatomic User Mailing List (to be announced) and ask your question there.
If you have an idea how we could make the documentation better so that the next person with your question can get it answered in the documentation, please let us know! "The documentation didn't help me!" below shows you how to do this.
The documentation didn't help me!
This user documentation is the one-stop source of information on using Hexatomic.
If you find that something is missing from the documentation, or that it could be made better in any way, please let us know! The way to do this is through reporting this issue on Hexatomic's GitHub page (cf. below for how to do this).
I've found a bug!
If you think you have found a bug, or Hexatomic does not work the way you have expected after reading the documentation, or Hexatomic doesn't work at all on your machine, please let us know! The way to do this is through reporting this issue on Hexatomic's GitHub page (cf. below for how to do this).
I'm looking for Hexatomic's source code!
It's open source and can be found online at https://github.com/hexatomic/hexatomic.
I need to contact the Hexatomic team!
We'd love to hear from you. Please write us an email at hexatomic [at] corpus-tools.org
.
Reporting issues
We use GitHub, a web-based platform for collaboration on software, to develop Hexatomic. The Hexatomic GitHub page is at https://github.com/hexatomic/hexatomic.
This is the place where you can report an issue (a bug, missing documentation, etc.) and suggest new features.
To do so, you need a GitHub user account. If you don't have one yet, you can register for one at https://github.com/join. It's free!
First of all, please read the Contributing guidelines. It's a quick read, and the guidelines contain important details.
Then, when you are logged in, go to https://github.com/hexatomic/hexatomic/issues/new to create a new issue.