Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I am looking into Log2Vec as a TD-IDF alternative to log vectorization. Primarily, I’ll be interested in consuming Sysmon logs later.

Log2Vec

View file
nameLog2Vec-icccn20.pdf

Abstract—Logs are one of the most valuable data sources for large-scale service management. Log representation, which converts unstructured texts to structured vectors or matrices, serves as the the first step towards automated log analysis. However, the current log representation methods neither represent domain-specific semantic information of logs, nor handle the outof-vocabulary (OOV) words of new types of logs at runtime. We propose Log2Vec, a semantic-aware representation framework for log analysis. Log2Vec combines a log-specific word embedding method to accurately extract the semantic information of logs, with an OOV word processor to embed OOV words into vectors at runtime. We present an analysis on the impact of OOV words and evaluate the performance of the OOV word processor. The evaluation experiments on four public production log datasets demonstrate that Log2Vec not only fixes the issue presented by OOV words, but also significantly improves the performance of two popular log-based service management tasks, including log classification and anomaly detection. We have packaged Log2Vec into an open-source toolkit and hope that it can be used for future research.

...