TransmogrifAI

Automated machine learning for structured data.

Get Started

Built with

About

TransmogrifAI (pronounced trans-mog-ri-phi) is an end-to-end AutoML library for structured data written in Scala that runs on top of Apache Spark. It was developed with a focus on accelerating machine learning developer productivity through machine learning automation, and an API that enforces compile-time type-safety, modularity, and reuse. Through automation, it achieves accuracies close to hand-tuned models with almost 100x reduction in time.

Automation

TransmogrifAI has numerous Transformers and Estimators that make use of Feature abstractions to automate feature engineering, feature validation, and model selection.

Modularity and reuse

TransmogrifAI enforces a strict separation between ML workflow definitions and data manipulation, ensuring that code written using TransmogrifAI is inherently modular and reusable.

Compile-time type safety

Machine learning workflows built using TransmogrifAI are strongly typed. This means developers get to enjoy the many benefits of compile-time type safety, including code completion during development and fewer runtime errors.

Transparency

Model insights leverage stored feature metadata and lineage to help debug models while providing insights to the end user, making machine learning models less of a black box.

Example

Predicting Titanic Survivors with TransmogrifAI

The Titanic dataset is an often-cited dataset in the machine learning community. The goal is to build a machine learning model that will predict survivors from the Titanic passenger manifest. See the docs site for full documentation, getting started, more examples and other information.

import com.salesforce.op._
import com.salesforce.op.readers._
import com.salesforce.op.features._
import com.salesforce.op.features.types._
import com.salesforce.op.stages.impl.classification._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

implicit val spark = SparkSession.builder.config(new SparkConf()).getOrCreate()
import spark.implicits._

// Read Titanic data as a DataFrame
val passengersData = DataReaders.Simple.csvCase[Passenger](path = pathToData).readDataset().toDF()

// Extract response and predictor features
val (survived, predictors) = FeatureBuilder.fromDataFrame[RealNN](passengersData, response = "survived")

// Automated feature engineering
val featureVector = predictors.transmogrify()

// Automated feature validation and selection
val checkedFeatures = survived.sanityCheck(featureVector, removeBadFeatures = true)

// Automated model selection
val (pred, raw, prob) = BinaryClassificationModelSelector().setInput(survived, checkedFeatures).getOutput()

// Setting up a TransmogrifAI workflow and training the model
val model = new OpWorkflow().setInputDataset(passengersData).setResultFeatures(pred).train()

println("Model summary:\n" + model.summaryPretty())

Getting Started

Installation

Install Java 1.8
Get Spark 2.4.x
Set an environment variable
export SPARK_HOME=<SPARK_FOLDER>
Add the TransmogrifAI libs to your Gradle or sbt project

Build and Run on Spark

Build your transmogrifier
Run on Spark:

Gradle (full project)

./gradlew sparkSubmit -Dmain=main -Dargs=args

sbt using sbt-spark-submit (full project)

./sbt "sparkSubmit --class class -- args"

TransmogrifAI

About

Automation

Modularity and reuse

Compile-time type safety

Transparency

Example

Predicting Titanic Survivors with TransmogrifAI

Other Examples

Iris Multi-class Classification

Boston Regression

Aggregates and Joins

Conditional Aggregation

Getting Started

Installation

Build and Run on Spark