Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

This document describes additional installation steps required to take advantage of the following YTEX features:

  • Semantic Similarity & Word Sense Disambiguation
  • Storing annotations in a relational database
  • Exporting annotations to machine learning tools

Prerequisites

Database Prerequisites

YTEX supports MS SQL Server 2008 and above, MySQL version 5.x, and Oracle versions 10gR2 and above. Create a database user (and schema) for use with ytex. See platform specific notes below.

Oracle

As documented here your database must use the UTF-8 charset.

Make sure you use a tablespace with enough room; e.g. create the ytex user and schema like this:

Code Block
languagesql
create tablespace TBS_YTEX datafile 'C:/oracle/oradata/orcl/TBS_YTEX.dbf' size 1000M autoextend on online;
create user ytex identified by ytex default tablespace TBS_YTEX;
alter user ytex quota unlimited on TBS_YTEX;
grant connect, resource to ytex;
grant create materialized view to ytex;
grant create view to ytex;

If you have installed the UMLS locally, you must also grant ytex select permissions on umls tables; e.g. assuming that umls tables are in the umls schema:

Code Block
languagesql
grant select on umls.MRCONSO to ytex;
grant select on umls.MRSTY to ytex;
grant select on umls.MRREL to ytex;
 

MySQL

To create the mysql user and database, login to mysql as root and run the following commands (change as necessary):

Code Block
languagesql
CREATE DATABASE ytex CHARACTER SET utf8;
CREATE USER 'ytex'@'localhost' IDENTIFIED BY 'ytex';
GRANT ALL PRIVILEGES ON ytex.* TO 'ytex'@'localhost';
On mac you should use the 127.0.0.1 instead of localhost. Note that if ytex connects to the mysql server from a different machine, you should replace localhost with the host name or ip address of the machine you will connect from, or use the wildcard ('%'):
Code Block
languagesql
CREATE USER 'ytex'@'%' IDENTIFIED BY 'ytex';
GRANT ALL PRIVILEGES ON ytex.* TO 'ytex'@'%';

If you have installed UMLS in your database, you must give the ytex user select permission on these tables:

Code Block
languagesql
GRANT SELECT on umls.* to 'ytex'@'%';

The document table uses the text and blob datatypes for the doc_text column that holds the document text. If you are processing large documents, you may need to use the longtext datatype instead. Furthermore, you may have to increase the maximum packet size.

SQL Server

You must have the permission to create database objects in the YTEX database and schema. If you don't have these permissions, ask your DBA to add you to the db_ddladmin & db_datawriter roles for the YTEX database.

If you want to install the UMLS in your SQL Server, you may want to use a different database/schema from the YTEX database. If that is the case, you need permissions on the UMLS database/schema as well.

Installation

1) Install ctakes 'as usual'

Go through the standard ctakes installation for the distribution you just created: See https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1+User+Install+Guide. For the rest of this document, we assume ctakes is installed in CTAKES_HOME

1.5) Patch YTEX Distro (YTEX 3.2.0 only)

Not needed for YTEX 3.2.1.  Some of the install scripts need to be patched (fixed in trunk). Download and unzip  ytex-patch-3.2.0.zip 'over' your installation.  

Linux users: set the shell scripts to executable:

Code Block
cd CTAKES_HOME/bin
chmod ug+x ant ctakes.profile *.sh

2) Unzip YTEX Libraries

Download and unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip 'over' your installation. This contains non-APACHE 2.0 license compliant libraries:

  • Hibernate
  • Weka
  • MySQL JDBC Driver
  • MS SQL Server JDBC Driver

If you are using oracle, download the oracle jdbc driver ojdbc7_g and place it in your CTAKES_HOME\lib directory.

3) Unzip YTEX Resources (Optional - UTS login required)

Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip 'over' your installation. This contains:

  • Concept Graphs derived from the UMLS2013AA used to compute semantic similarity measures
  • Dictionary Lookup table derived from UMLS2013AA for named entity recognition.

If you do not install these files, Word Sense Disambiguation will be disabled, and default ytex dictionary lookup will be limited to a small sample subset of the UMLS

You can always create concept graphs for WSD from your UMLS installation. If you have the UMLS in your DB, YTEX will create a dictionary lookup table from the UMLS during the installation.

4) Edit environment batch/shell script

Fix the path references to match your environment.

  • windows - no changes necessary; see CTAKES_HOME\bin\setenv.cmd
  • linux -
    • move CTAKES_HOME/bin/ctakes.profile to ${HOME}/ctakes.profile
    • edit the CTAKES_HOME environment variable
    • make executable - chmod u+x ${HOME}/ctakes.profile

5) Create CTAKES_HOME\resources\org\apache\ctakes\ytex\ytex.properties

In this file, you specify the database connection parameters. You will find templates in CTAKES_HOME\lib\ctakes-ytex-res-[version].jar, under org\apache\ctakes\ytex\ytex.properties.<db type>.example. If you have UMLS installed on your database, specify the umls.schema and umls.catalog properties (see the properties file for an explanation of what these are).

Code Block
languagebash
titleWindows: extract and edit ytex.properties
cd %CTAKES_HOME%\resources
mkdir org\apache\ctakes\ytex
@REM extract the mysql example.  change mysql to mssql (for MS SQL Server) or orcl (for Oracle)
jar xf ..\lib\ctakes-ytex-res-*.jar org/apache/ctakes/ytex/ytex.properties.mysql.example
copy org\apache\ctakes\ytex\ytex.properties.mysql.example org\apache\ctakes\ytex\ytex.properties
@REM edit the properties file
notepad org\apache\ctakes\ytex\ytex.properties

 

 

Code Block
languagebash
titleLinux: extract and edit ytex.properties
cd $CTAKES_HOME/resources
mkdir -p org/apache/ctakes/ytex
# extract the mysql example.  change mysql to mssql (for MS SQL Server) or orcl (for Oracle)
jar xf ../lib/ctakes-ytex-res-*.jar org/apache/ctakes/ytex/ytex.properties.mysql.example
cp org/apache/ctakes/ytex/ytex.properties.mysql.example org/apache/ctakes/ytex/ytex.properties
# edit the properties file
vi org\apache\ctakes\ytex\ytex.properties

6) Install the UMLS in your database (Optional)

We strongly suggest that you install UMLS in your database.

7) Execute the setup script

windows: Open a command prompt, navigate to CTAKES_HOME, and execute setup script:

Code Block
languagebash
cd /d %CTAKES_HOME%\bin\ctakes-ytex\scripts
..\..\ant.bat -f build-setup.xml all > setup.out 2>&1

linux: From a shell, cd to the CTAKES_HOME directory, set the environment, make sure necessary scripts are executable, and execute the ant script:

Code Block
languagebash
chmod u+x ${HOME}/ctakes.profile
. ${HOME}/ctakes.profile
cd ${CTAKES_HOME}/bin
chmod u+x ant
chmod u+x *.sh
cd ctakes-ytex/scripts
nohup ../../ant -f build-setup.xml all > setup.out 2>&1 &
tail -f setup.out
Check setup.out to make sure the setup was succesful

This will call the ant script build-setup.xml, which does the following:

  • Generates configuration files from templates
  • Sets up YTEX Database Objects

 

The installation executes SQL scripts located in the CTAKES_HOME\bin\scripts\ctakes-ytex\data directory. All YTEX database objects will be dropped and recreated. If this is the initial installation, ignore the errors about objects not existing when they are being dropped. If you have installed the UMLS in your database and configured YTEX to use it, YTEX will create a dictionary lookup table with all concepts from the UMLS. The setup speed is dependent on the latency between the machine you are installing on and the database server.  Creating the dictionary lookup table from the UMLS can take several hours.