The State Scientific Institution “The United Institute for Informatics Problems of the National Academy of Sciences of Belarus”
S.F. Lipnitskij tel. 284-21-68, e-mail: firstname.lastname@example.org
Product brief description:
The mechanism of information-analytical system is based on intelligent algorithms of documents analysis. The algorithms use not only texts statistical characteristics, as it is made in modern approaches (TF, TF-IDF measures), but also knowledge of the application domain that is formed on the basis of texts subject bodies. Supported languages are English, Russian, Belarusian and German. Adding new languages based on forming of texts bodies for these languages without changing software. Minimumcapacityoflanguagebodyis 1000 full-textdocuments.
text documents search is based on the results of indexing from differentinformation sources.Russian, Belarusian, English and German texts abstracting.Creation and edition of different types of abstracts. Creation of key words list and its information valuediagrams.
The system encompasses indexing, search, and abstracting subsystems.
Indexing subsystem facility:
indexing of texts from different information sources (Internet, local network and PC hard disks). Indexing processes multithreading, which allows high speed files processing (especially for multicore processor PCs). Indexed resources updating. Support of the most common text files formats (html, shtml, doc, rtf, docx, pdf, txt) and possibility to add such formats, as ppt, xls, wpd, hlp, odt и xml. Keeping of indexing results in one of the most common DBMS– MySQL 5.1.
Search and abstracting subsystem capacity:
text documents search based on the results of indexing from different information sources.Abstracting texts from different information sources. Forming of general and subject abstractswith the opportunity to set such characteristics, as abstract capacity from original one in percentage terms and threshold value for information capability (subject abstracts are formed with the help of relevant text corpus). Editing of the formed abstract (printing, saving in pdf or html, abstract formatting). Creation of key words list and its information value diagrams.
System adaptability to different input languages without software changing. Search effectiveness on completeness and accuracy criteria is 10-12 % higher than that of other common used systems.
Field of application:
The system can be used in scientific and technical libraries, information and analytical centers and by individual users.
Future system development main lines:
updated information of users on new publications in their subject fields with the opportunity to abstract the publications in multilingual environment. News digests on the required subjects. Providing information for decision making (business and economic reconnoitering, markets, goods and services trends, etc.).Identification of emotionally colored information (adverse publicity,extremist materials, electronic gossips,insinuations, false reports and other unreliable information).Possibility to control Internet considering the existing legislation.
Product developers provide its installation and adaptation to customer’s conditions, product usage training, methodological and technical support and participation in the further development.