By Stephen Boyer, PhD
KNIME is an open-source, free, and powerful platform for data integration, analytics, and reporting.
In the “old days”, familiarity with tools such as Excel, PowerPoint, and Word was considered essential for professional success across disciplines. Going forward in science, these utilities alone are insufficient. Programing skills (Python, SQL, Visual Basic, Bash, awk, etc.) are now considered required, a challenging task for many. It requires time, training, and practice; like learning to play the piano, programming skills do not arise from reading a book or watching someone else play. Fortunately, there is help.
Following the pattern of ‘outsourcing’ math functions such as long division and square roots to the calculator, some data management programming once considered the purview of computer scientists is now accessible to the average person. Yes, to some they are ‘cheats’, but worth knowing about.
KNIME is a workflow engine that visually creates data flows (pipelines) in a user-friendly drag-and-drop manner to tailor a sequence of functions that otherwise require programming skills such as SQL or Python https://www.knime.com . The core version includes hundreds of modules (nodes) for
- data integration (file I/O, nodes supporting common database management systems)
- data transformation (filter, converter, splitter, combiner, joiner)
- other commonly used methods of statistics, data mining, analysis and text analytics
- visualization, supporting free report designer extensions
KNIME workflows can be used to create report templates exportable to document formats like doc, ppt, xls, and pdf. Many other capabilities of KNIME can be found within the KNIME website . They are worth exploring along with similar data-handling applications such as DataWarrior or OrangeDatamining, discussed in earlier Worth Knowing About columns.
An illustration of KNIME’s utility to chemists is the evaluation of compounds for desirable physico-chemical properties such as solubility, pKa, and Lipinski criteria to automate screening drug compounds. Tutorials abound on the KNIME website, together with user forums.
KNIME has a learning curve. It’s minimal compared to the value it provides, much like learning the buttons to push on a calculator. Continuing with the calculator analogy, there is no substitute for learning math but the calculator’s quick, accurate answers can’t be denied. Similarly, KNIME is a useful tool for exploring and deriving value from huge and ever-increasing data streams. In fact, KNIME is so powerful for data management and mining, its initial use in chemistry quickly spread to include sports, finances, engineering, and overall project management.
Figure caption: a representative page from KNIME’s workflow analytics platform