You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

This document describes additional installation steps required to take advantage of the following YTEX features:

  • Semantic Similarity & Word Sense Disambiguation
  • Storing annotations in a relational database
  • Exporting annotations to machine learning tools

Prerequisites

Database Prerequisites

YTEX supports MS SQL Server 2008 and above, MySQL version 5.x, and Oracle versions 10gR2 and above. Create a database user (and schema) for use with ytex. See platform specific notes below.

Oracle

As documented here your database must use the UTF-8 charset.

Make sure you use a tablespace with enough room; e.g. create the ytex user and schema like this:

create tablespace TBS_YTEX datafile 'C:/oracle/oradata/orcl/TBS_YTEX.dbf' size 1000M autoextend on online;
create user ytex identified by ytex default tablespace TBS_YTEX;
alter user ytex quota unlimited on TBS_YTEX;
grant connect, resource to ytex;
grant create materialized view to ytex;
grant create view to ytex;

If you have installed the UMLS locally, you must also grant ytex select permissions on umls tables; e.g. assuming that umls tables are in the umls schema:

grant select on umls.MRCONSO to ytex;
grant select on umls.MRSTY to ytex;
grant select on umls.MRREL to ytex;
 

MySQL

To create the mysql user and database, login to mysql as root and run the following commands (change as necessary):

CREATE DATABASE ytex CHARACTER SET utf8;
CREATE USER 'ytex'@'localhost' IDENTIFIED BY 'ytex';
GRANT ALL PRIVILEGES ON ytex.* TO 'ytex'@'localhost';
On mac you should use the 127.0.0.1 instead of localhost. Note that if ytex connects to the mysql server from a different machine, you should replace localhost with the host name or ip address of the machine you will connect from, or use the wildcard ('%'):
CREATE USER 'ytex'@'%' IDENTIFIED BY 'ytex';
GRANT ALL PRIVILEGES ON ytex.* TO 'ytex'@'%';

If you have installed UMLS in your database, you must give the ytex user select permission on these tables:

GRANT SELECT on umls.* to 'ytex'@'%';

The document table uses the text and blob datatypes for the doc_text column that holds the document text. If you are processing large documents, you may need to use the longtext datatype instead. Furthermore, you may have to increase the maximum packet size.

SQL Server

You must have the permission to create database objects in the YTEX database and schema. If you don't have these permissions, ask your DBA to add you to the db_ddladmin & db_datawriter roles for the YTEX database.

If you want to install the UMLS in your SQL Server, you may want to use a different database/schema from the YTEX database. If that is the case, you need permissions on the UMLS database/schema as well.

Installation

1) Install ctakes 'as usual'

Go through the standard ctakes installation for the distribution you just created: See https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1+User+Install+Guide. For the rest of this document, we assume ctakes is installed in CTAKES_HOME

1.5) Patch YTEX Distro

Some of the install scripts need to be patched (fixed in trunk). Download and unzip  ytex-patch-3.2.0.zip 'over' your installation.  

Linux users: set the shell scripts to executable:

cd CTAKES_HOME/bin
chmod ug+x ant ctakes.profile *.sh

2) Unzip YTEX Libraries

Download and unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip 'over' your installation. This contains non-APACHE 2.0 license compliant libraries:

  • Hibernate
  • Weka
  • MySQL JDBC Driver
  • MS SQL Server JDBC Driver

If you are using oracle, download the oracle jdbc driver ojdbc7_g and place it in your CTAKES_HOME\lib directory.

3) Unzip YTEX Resources (Optional - UTS login required)

Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip 'over' your installation. This contains:

  • Concept Graphs derived from the UMLS2013AA used to compute semantic similarity measures
  • Dictionary Lookup table derived from UMLS2013AA for named entity recognition.

If you do not install these files, Word Sense Disambiguation will be disabled, and default ytex dictionary lookup will be limited to a small sample subset of the UMLS

You can always create concept graphs for WSD from your UMLS installation. If you have the UMLS in your DB, YTEX will create a dictionary lookup table from the UMLS during the installation.

4) Edit environment batch/shell script

Fix the path references to match your environment.

  • windows - no changes necessary; see CTAKES_HOME\bin\setenv.cmd
  • linux -
    • move CTAKES_HOME/bin/ctakes.profile to ${HOME}/ctakes.profile
    • edit the CTAKES_HOME environment variable
    • make executable - chmod u+x ${HOME}/ctakes.profile

5) Create CTAKES_HOME\resources\org\apache\ctakes\ytex\ytex.properties

In this file, you specify the database connection parameters. You will find templates in CTAKES_HOME\lib\ctakes-ytex-res-3.2.0.jar, under org\apache\ctakes\ytex\ytex.properties.<db type>.example. If you have UMLS installed on your database, specify the umls.schema and umls.catalog properties (see the properties file for an explanation of what these are).

6) Install the UMLS in your database (Optional)

We strongly suggest that you install UMLS in your database.

7) Execute the setup script

windows: Open a command prompt, navigate to CTAKES_HOME, and execute setup script:

cd /d c:\java\apache-ctakes-3.2.0\bin\ctakes-ytex\scripts
..\..\ant.bat -f build-setup.xml -Dlog4j.conf=CTAKES_HOME\config\log4j.xml all > setup.out 2>&1

linux: From a shell, cd to the CTAKES_HOME directory, set the environment, make sure necessary scripts are executable, and execute the ant script:

chmod u+x ${HOME}/ctakes.profile
. ${HOME}/ctakes.profile
cd ${CTAKES_HOME}/bin
chmod u+x ant
chmod u+x *.sh
cd ctakes-ytex/scripts
nohup ../../ant -f build-setup.xml -Dlog4j.conf=CTAKES_HOME/config/log4j.xml all > setup.out 2>&1 &
tail -f setup.out
Check setup.out to make sure the setup was succesful

This will call the ant script build-setup.xml, which does the following:

  • Generates configuration files from templates
  • Sets up YTEX Database Objects

 

The installation executes SQL scripts located in the CTAKES_HOME\bin\scripts\ctakes-ytex\data directory. All YTEX database objects will be dropped and recreated. If this is the initial installation, ignore the errors about objects not existing when they are being dropped. If you have installed the UMLS in your database and configured YTEX to use it, YTEX will create a dictionary lookup table with all concepts from the UMLS. The setup speed is dependent on the latency between the machine you are installing on and the database server.  Creating the dictionary lookup table from the UMLS can take several hours.

 

  • No labels