» Code examples / Structured Data / Structured data learning with Wide, Deep, and Cross networks

Structured data learning with Wide, Deep, and Cross networks

Author: Khalid Salama
Date created: 2020/12/31
Last modified: 2020/12/31
Description: Using Wide & Deep and Deep & Cross networks for structured data classification.

View in Colab GitHub source


Introduction

This example demonstrates how to do structured data classification using the two modeling techniques:

  1. Wide & Deep models
  2. Deep & Cross models

Note that this example should be run with TensorFlow 2.3 or higher.


The dataset

This example uses the Covertype dataset from the UCI Machine Learning Repository. The task is to predict forest cover type from cartographic variables. The dataset includes 506,011 instances with 12 input features: 10 numerical features and 2 categorical features. Each instance is categorized into 1 of 7 classes.


Setup

import math
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Prepare the data

First, let's load the dataset from the UCI Machine Learning Repository into a Pandas DataFrame:

data_url = (
    "https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.data.gz"
)
raw_data = pd.read_csv(data_url, header=None)
print(f"Dataset shape: {raw_data.shape}")
raw_data.head()
Dataset shape: (581012, 55)
0 1 2 3 4 5 6 7 8 9 ... 45 46 47 48 49 50 51 52 53 54
0 2596 51 3 258 0 510 221 232 148 6279 ... 0 0 0 0 0 0 0 0 0 5
1 2590 56 2 212 -6 390 220 235 151 6225 ... 0 0 0 0 0 0 0 0 0 5
2 2804 139 9 268 65 3180 234 238 135 6121 ... 0 0 0 0 0 0 0 0 0 2
3 2785 155 18 242 118 3090 238 238 122 6211 ... 0 0 0 0 0 0 0 0 0 2
4 2595 45 2 153 -1 391 220 234 150 6172 ... 0 0 0 0 0 0 0 0 0 5

5 rows × 55 columns

The two categorical features in the dataset are binary-encoded. We will convert this dataset representation to the typical representation, where each categorical feature is represented as a single integer value.

soil_type_values = [f"soil_type_{idx+1}" for idx in range(40)]
wilderness_area_values = [f"area_type_{idx+1}" for idx in range(4)]

soil_type = raw_data.loc[:, 14:53].apply(
    lambda x: soil_type_values[0::1][x.to_numpy().nonzero()[0][0]], axis=1
)
wilderness_area = raw_data.loc[:, 10:13].apply(
    lambda x: wilderness_area_values[0::1][x.to_numpy().nonzero()[0][0]], axis=1
)

CSV_HEADER = [
    "Elevation",
    "Aspect",
    "Slope",
    "Horizontal_Distance_To_Hydrology",
    "Vertical_Distance_To_Hydrology",
    "Horizontal_Distance_To_Roadways",
    "Hillshade_9am",
    "Hillshade_Noon",
    "Hillshade_3pm",
    "Horizontal_Distance_To_Fire_Points",
    "Wilderness_Area",
    "Soil_Type",
    "Cover_Type",
]

data = pd.concat(
    [raw_data.loc[:, 0:9], wilderness_area, soil_type, raw_data.loc[:, 54]],
    axis=1,
    ignore_index=True,
)
data.columns = CSV_HEADER

# Convert the target label indices into a range from 0 to 6 (there are 7 labels in total).
data["Cover_Type"] = data["Cover_Type"] - 1

print(f"Dataset shape: {data.shape}")
data.head().T
Dataset shape: (581012, 13)
0 1 2 3 4
Elevation 2596 2590 2804 2785 2595
Aspect 51 56 139 155 45
Slope 3 2 9 18 2
Horizontal_Distance_To_Hydrology 258 212 268 242 153
Vertical_Distance_To_Hydrology 0 -6 65 118 -1
Horizontal_Distance_To_Roadways 510 390 3180 3090 391
Hillshade_9am 221 220 234 238 220
Hillshade_Noon 232 235 238 238 234
Hillshade_3pm 148 151 135 122 150
Horizontal_Distance_To_Fire_Points 6279 6225 6121 6211 6172
Wilderness_Area area_type_1 area_type_1 area_type_1 area_type_1 area_type_1
Soil_Type soil_type_29 soil_type_29 soil_type_12 soil_type_30 soil_type_29
Cover_Type 4 4 1 1 4

The shape of the DataFrame shows there are 13 columns per sample (12 for the features and 1 for the target label).

Let's split the data into training (85%) and test (15%) sets.

train_splits = []
test_splits = []

for _, group_data in data.groupby("Cover_Type"):
    random_selection = np.random.rand(len(group_data.index)) <= 0.85
    train_splits.append(group_data[random_selection])
    test_splits.append(group_data[~random_selection])

train_data = pd.concat(train_splits).sample(frac=1).reset_index(drop=True)
test_data = pd.concat(test_splits).sample(frac=1).reset_index(drop=True)

print(f"Train split size: {len(train_data.index)}")
print(f"Test split size: {len(test_data.index)}")
Train split size: 493860
Test split size: 87152

Next, store the training and test data in separate CSV files.

train_data_file = "train_data.csv"
test_data_file = "test_data.csv"

train_data.to_csv(train_data_file, index=False)
test_data.to_csv(test_data_file, index=False)

Define dataset metadata

Here, we define the metadata of the dataset that will be useful for reading and parsing the data into input features, and encoding the input features with respect to their types.

TARGET_FEATURE_NAME = "Cover_Type"

TARGET_FEATURE_LABELS = ["0", "1", "2", "3", "4", "5", "6"]

NUMERIC_FEATURE_NAMES = [
    "Aspect",
    "Elevation",
    "Hillshade_3pm",
    "Hillshade_9am",
    "Hillshade_Noon",
    "Horizontal_Distance_To_Fire_Points",
    "Horizontal_Distance_To_Hydrology",
    "Horizontal_Distance_To_Roadways",
    "Slope",
    "Vertical_Distance_To_Hydrology",
]

CATEGORICAL_FEATURES_WITH_VOCABULARY = {
    "Soil_Type": list(data["Soil_Type"].unique()),
    "Wilderness_Area": list(data["Wilderness_Area"].unique()),
}

CATEGORICAL_FEATURE_NAMES = list(CATEGORICAL_FEATURES_WITH_VOCABULARY.keys())

FEATURE_NAMES = NUMERIC_FEATURE_NAMES + CATEGORICAL_FEATURE_NAMES

COLUMN_DEFAULTS = [
    [0] if feature_name in NUMERIC_FEATURE_NAMES + [TARGET_FEATURE_NAME] else ["NA"]
    for feature_name in CSV_HEADER
]

NUM_CLASSES = len(TARGET_FEATURE_LABELS)

Experiment setup

Next, let's define an input function that reads and parses the file, then converts features and labels into atf.data.Dataset for training or evaluation.

def get_dataset_from_csv(csv_file_path, batch_size, shuffle=False):

    dataset = tf.data.experimental.make_csv_dataset(
        csv_file_path,
        batch_size=batch_size,
        column_names=CSV_HEADER,
        column_defaults=COLUMN_DEFAULTS,
        label_name=TARGET_FEATURE_NAME,
        num_epochs=1,
        header=True,
        shuffle=shuffle,
    )
    return dataset.cache()

Here we configure the parameters and implement the procedure for running a training and evaluation experiment given a model.

learning_rate = 0.001
dropout_rate = 0.1
batch_size = 265
num_epochs = 50

hidden_units = [32, 32]


def run_experiment(model):

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss=keras.losses.SparseCategoricalCrossentropy(),
        metrics=[keras.metrics.SparseCategoricalAccuracy()],
    )

    train_dataset = get_dataset_from_csv(train_data_file, batch_size, shuffle=True)

    test_dataset = get_dataset_from_csv(test_data_file, batch_size)

    print("Start training the model...")
    history = model.fit(train_dataset, epochs=num_epochs)
    print("Model training finished")

    _, accuracy = model.evaluate(test_dataset, verbose=0)

    print(f"Test accuracy: {round(accuracy * 100, 2)}%")

Create model inputs

Now, define the inputs for the models as a dictionary, where the key is the feature name, and the value is a keras.layers.Input tensor with the corresponding feature shape and data type.

def create_model_inputs():
    inputs = {}
    for feature_name in FEATURE_NAMES:
        if feature_name in NUMERIC_FEATURE_NAMES:
            inputs[feature_name] = layers.Input(
                name=feature_name, shape=(), dtype=tf.float32
            )
        else:
            inputs[feature_name] = layers.Input(
                name=feature_name, shape=(), dtype=tf.string
            )
    return inputs

Encode features

We create two representations of our input features: sparse and dense: 1. In the sparse representation, the categorical features are encoded with one-hot encoding using the CategoryEncoding layer. This representation can be useful for the model to memorize particular feature values to make certain predictions. 2. In the dense representation, the categorical features are encoded with low-dimensional embeddings using the Embedding layer. This representation helps the model to generalize well to unseen feature combinations.

from tensorflow.keras.layers.experimental.preprocessing import CategoryEncoding
from tensorflow.keras.layers.experimental.preprocessing import StringLookup


def encode_inputs(inputs, use_embedding=False):
    encoded_features = []
    for feature_name in inputs:
        if feature_name in CATEGORICAL_FEATURE_NAMES:
            vocabulary = CATEGORICAL_FEATURES_WITH_VOCABULARY[feature_name]
            # Create a lookup to convert string values to an integer indices.
            # Since we are not using a mask token nor expecting any out of vocabulary
            # (oov) token, we set mask_token to None and  num_oov_indices to 0.
            index = StringLookup(
                vocabulary=vocabulary, mask_token=None, num_oov_indices=0
            )
            # Convert the string input values into integer indices.
            value_index = index(inputs[feature_name])
            if use_embedding:
                embedding_dims = int(math.sqrt(len(vocabulary)))
                # Create an embedding layer with the specified dimensions.
                embedding_ecoder = layers.Embedding(
                    input_dim=len(vocabulary), output_dim=embedding_dims
                )
                # Convert the index values to embedding representations.
                encoded_feature = embedding_ecoder(value_index)
            else:
                # Create a one-hot encoder.
                onehot_encoder = CategoryEncoding(output_mode="binary")
                onehot_encoder.adapt(index(vocabulary))
                # Convert the index values to a one-hot representation.
                encoded_feature = onehot_encoder(value_index)
        else:
            # Use the numerical features as-is.
            encoded_feature = tf.expand_dims(inputs[feature_name], -1)

        encoded_features.append(encoded_feature)

    all_features = layers.concatenate(encoded_features)
    return all_features

Experiment 1: a baseline model

In the first experiment, let's create a multi-layer feed-forward network, where the categorical features are one-hot encoded.

def create_baseline_model():
    inputs = create_model_inputs()
    features = encode_inputs(inputs)

    for units in hidden_units:
        features = layers.Dense(units)(features)
        features = layers.BatchNormalization()(features)
        features = layers.ReLU()(features)
        features = layers.Dropout(dropout_rate)(features)

    outputs = layers.Dense(units=NUM_CLASSES, activation="softmax")(features)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model


baseline_model = create_baseline_model()
keras.utils.plot_model(baseline_model, show_shapes=True, rankdir="LR")

png

Let's run it:

run_experiment(baseline_model)
Start training the model...
Epoch 1/50
1864/1864 [==============================] - 8s 4ms/step - loss: 0.9190 - sparse_categorical_accuracy: 0.6374
Epoch 2/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.6765 - sparse_categorical_accuracy: 0.7105
Epoch 3/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.6460 - sparse_categorical_accuracy: 0.7229
Epoch 4/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.6235 - sparse_categorical_accuracy: 0.7309
Epoch 5/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.6079 - sparse_categorical_accuracy: 0.7374
Epoch 6/50
1864/1864 [==============================] - 3s 1ms/step - loss: 0.5995 - sparse_categorical_accuracy: 0.7405
Epoch 7/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5925 - sparse_categorical_accuracy: 0.7443
Epoch 8/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5878 - sparse_categorical_accuracy: 0.7455
Epoch 9/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5827 - sparse_categorical_accuracy: 0.7489
Epoch 10/50
1864/1864 [==============================] - 3s 1ms/step - loss: 0.5777 - sparse_categorical_accuracy: 0.7501
Epoch 11/50
1864/1864 [==============================] - 3s 1ms/step - loss: 0.5766 - sparse_categorical_accuracy: 0.7512
Epoch 12/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5725 - sparse_categorical_accuracy: 0.7530
Epoch 13/50
1864/1864 [==============================] - 3s 1ms/step - loss: 0.5688 - sparse_categorical_accuracy: 0.7542
Epoch 14/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5662 - sparse_categorical_accuracy: 0.7550
Epoch 15/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5626 - sparse_categorical_accuracy: 0.7569
Epoch 16/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5605 - sparse_categorical_accuracy: 0.7580
Epoch 17/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5600 - sparse_categorical_accuracy: 0.7578
Epoch 18/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5575 - sparse_categorical_accuracy: 0.7602
Epoch 19/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5552 - sparse_categorical_accuracy: 0.7610
Epoch 20/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5554 - sparse_categorical_accuracy: 0.7606
Epoch 21/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5525 - sparse_categorical_accuracy: 0.7618
Epoch 22/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5506 - sparse_categorical_accuracy: 0.7628
Epoch 23/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5495 - sparse_categorical_accuracy: 0.7632
Epoch 24/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5501 - sparse_categorical_accuracy: 0.7640
Epoch 25/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5479 - sparse_categorical_accuracy: 0.7643
Epoch 26/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5466 - sparse_categorical_accuracy: 0.7656
Epoch 27/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5453 - sparse_categorical_accuracy: 0.7652
Epoch 28/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5446 - sparse_categorical_accuracy: 0.7662
Epoch 29/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5421 - sparse_categorical_accuracy: 0.7665
Epoch 30/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5430 - sparse_categorical_accuracy: 0.7666
Epoch 31/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5412 - sparse_categorical_accuracy: 0.7672
Epoch 32/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5395 - sparse_categorical_accuracy: 0.7685
Epoch 33/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5402 - sparse_categorical_accuracy: 0.7685
Epoch 34/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5395 - sparse_categorical_accuracy: 0.7672
Epoch 35/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5394 - sparse_categorical_accuracy: 0.7684
Epoch 36/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5370 - sparse_categorical_accuracy: 0.7687
Epoch 37/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5377 - sparse_categorical_accuracy: 0.7692
Epoch 38/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5363 - sparse_categorical_accuracy: 0.7687
Epoch 39/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5363 - sparse_categorical_accuracy: 0.7687
Epoch 40/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5355 - sparse_categorical_accuracy: 0.7701
Epoch 41/50
1525/1864 [=======================>......] - ETA: 0s - loss: 0.5359 - sparse_categorical_accuracy: 0.7687

The baseline linear model achieves ~76% test accuracy.


Experiment 2: Wide & Deep model

In the second experiment, we create a Wide & Deep model. The wide part of the model a linear model, while the deep part of the model is a multi-layer feed-forward network.

Use the sparse representation of the input features in the wide part of the model and the dense representation of the input features for the deep part of the model.

Note that every input features contributes to both parts of the model with different representations.

def create_wide_and_deep_model():

    inputs = create_model_inputs()
    wide = encode_inputs(inputs)
    wide = layers.BatchNormalization()(wide)

    deep = encode_inputs(inputs, use_embedding=True)
    for units in hidden_units:
        deep = layers.Dense(units)(deep)
        deep = layers.BatchNormalization()(deep)
        deep = layers.ReLU()(deep)
        deep = layers.Dropout(dropout_rate)(deep)

    merged = layers.concatenate([wide, deep])
    outputs = layers.Dense(units=NUM_CLASSES, activation="softmax")(merged)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model


wide_and_deep_model = create_wide_and_deep_model()
keras.utils.plot_model(wide_and_deep_model, show_shapes=True, rankdir="LR")

png

Let's run it:

run_experiment(wide_and_deep_model)
Start training the model...
Epoch 1/50
1864/1864 [==============================] - 8s 4ms/step - loss: 0.8944 - sparse_categorical_accuracy: 0.6467
Epoch 2/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.6108 - sparse_categorical_accuracy: 0.7354
Epoch 3/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5916 - sparse_categorical_accuracy: 0.7427
Epoch 4/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5802 - sparse_categorical_accuracy: 0.7459
Epoch 5/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5706 - sparse_categorical_accuracy: 0.7500
Epoch 6/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5601 - sparse_categorical_accuracy: 0.7565
Epoch 7/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5524 - sparse_categorical_accuracy: 0.7604
Epoch 8/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5447 - sparse_categorical_accuracy: 0.7634
Epoch 9/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.5396 - sparse_categorical_accuracy: 0.7662
Epoch 10/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5355 - sparse_categorical_accuracy: 0.7685
Epoch 11/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5316 - sparse_categorical_accuracy: 0.7709
Epoch 12/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5282 - sparse_categorical_accuracy: 0.7723
Epoch 13/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5254 - sparse_categorical_accuracy: 0.7733
Epoch 14/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5219 - sparse_categorical_accuracy: 0.7743
Epoch 15/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5194 - sparse_categorical_accuracy: 0.7759
Epoch 16/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5184 - sparse_categorical_accuracy: 0.7767
Epoch 17/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5154 - sparse_categorical_accuracy: 0.7777
Epoch 18/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5143 - sparse_categorical_accuracy: 0.7789
Epoch 19/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5121 - sparse_categorical_accuracy: 0.7799
Epoch 20/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5109 - sparse_categorical_accuracy: 0.7804
Epoch 21/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.5098 - sparse_categorical_accuracy: 0.7807
Epoch 22/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5081 - sparse_categorical_accuracy: 0.7820
Epoch 23/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5070 - sparse_categorical_accuracy: 0.7823
Epoch 24/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5070 - sparse_categorical_accuracy: 0.7820
Epoch 25/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5039 - sparse_categorical_accuracy: 0.7845
Epoch 26/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5048 - sparse_categorical_accuracy: 0.7833
Epoch 27/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5040 - sparse_categorical_accuracy: 0.7844
Epoch 28/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5032 - sparse_categorical_accuracy: 0.7839
Epoch 29/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5033 - sparse_categorical_accuracy: 0.7841
Epoch 30/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5028 - sparse_categorical_accuracy: 0.7847
Epoch 31/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5015 - sparse_categorical_accuracy: 0.7853
Epoch 32/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5007 - sparse_categorical_accuracy: 0.7851
Epoch 33/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5005 - sparse_categorical_accuracy: 0.7858
Epoch 34/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5004 - sparse_categorical_accuracy: 0.7846
Epoch 35/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5005 - sparse_categorical_accuracy: 0.7858
Epoch 36/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.4986 - sparse_categorical_accuracy: 0.7864
Epoch 37/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.4985 - sparse_categorical_accuracy: 0.7868
Epoch 38/50
 838/1864 [============>.................] - ETA: 1s - loss: 0.4980 - sparse_categorical_accuracy: 0.7867

The wide and deep model achieves ~79% test accuracy.


Experiment 3: Deep & Cross model

In the third experiment, we create a Deep & Cross model. The deep part of this model is the same as the deep part created in the previous experiment. The key idea of the cross part is to apply explicit feature crossing in an efficient way, where the degree of cross features grows with layer depth.

def create_deep_and_cross_model():

    inputs = create_model_inputs()
    x0 = encode_inputs(inputs, use_embedding=True)

    cross = x0
    for _ in hidden_units:
        units = cross.shape[-1]
        x = layers.Dense(units)(cross)
        cross = x0 * x + cross
    cross = layers.BatchNormalization()(cross)

    deep = x0
    for units in hidden_units:
        deep = layers.Dense(units)(deep)
        deep = layers.BatchNormalization()(deep)
        deep = layers.ReLU()(deep)
        deep = layers.Dropout(dropout_rate)(deep)

    merged = layers.concatenate([cross, deep])
    outputs = layers.Dense(units=NUM_CLASSES, activation="softmax")(merged)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model


deep_and_cross_model = create_deep_and_cross_model()
keras.utils.plot_model(deep_and_cross_model, show_shapes=True, rankdir="LR")

png

Let's run it:

run_experiment(deep_and_cross_model)
Start training the model...
Epoch 1/50
1864/1864 [==============================] - 8s 4ms/step - loss: 0.8766 - sparse_categorical_accuracy: 0.6535
Epoch 2/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5970 - sparse_categorical_accuracy: 0.7425
Epoch 3/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.5757 - sparse_categorical_accuracy: 0.7522
Epoch 4/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.5640 - sparse_categorical_accuracy: 0.7567
Epoch 5/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.5569 - sparse_categorical_accuracy: 0.7601
Epoch 6/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5507 - sparse_categorical_accuracy: 0.7622
Epoch 7/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.5454 - sparse_categorical_accuracy: 0.7652
Epoch 8/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.5410 - sparse_categorical_accuracy: 0.7663
Epoch 9/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5383 - sparse_categorical_accuracy: 0.7679
Epoch 10/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5342 - sparse_categorical_accuracy: 0.7698
Epoch 11/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5317 - sparse_categorical_accuracy: 0.7705
Epoch 12/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5284 - sparse_categorical_accuracy: 0.7728
Epoch 13/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5263 - sparse_categorical_accuracy: 0.7733
Epoch 14/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.5241 - sparse_categorical_accuracy: 0.7746
Epoch 15/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.5212 - sparse_categorical_accuracy: 0.7752
Epoch 16/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5190 - sparse_categorical_accuracy: 0.7766
Epoch 17/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5177 - sparse_categorical_accuracy: 0.7779
Epoch 18/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5157 - sparse_categorical_accuracy: 0.7778
Epoch 19/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5134 - sparse_categorical_accuracy: 0.7796
Epoch 20/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5118 - sparse_categorical_accuracy: 0.7800
Epoch 21/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5100 - sparse_categorical_accuracy: 0.7807
Epoch 22/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5087 - sparse_categorical_accuracy: 0.7815
Epoch 23/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5067 - sparse_categorical_accuracy: 0.7823
Epoch 24/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5052 - sparse_categorical_accuracy: 0.7833
Epoch 25/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5039 - sparse_categorical_accuracy: 0.7830
Epoch 26/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.5031 - sparse_categorical_accuracy: 0.7830
Epoch 27/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5018 - sparse_categorical_accuracy: 0.7849
Epoch 28/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.5002 - sparse_categorical_accuracy: 0.7854
Epoch 29/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.4994 - sparse_categorical_accuracy: 0.7853
Epoch 30/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.4982 - sparse_categorical_accuracy: 0.7857
Epoch 31/50
1864/1864 [==============================] - 4s 2ms/step - loss: 0.4967 - sparse_categorical_accuracy: 0.7859
Epoch 32/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.4961 - sparse_categorical_accuracy: 0.7862
Epoch 33/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.4958 - sparse_categorical_accuracy: 0.7862
Epoch 34/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.4941 - sparse_categorical_accuracy: 0.7868
Epoch 35/50
1864/1864 [==============================] - 3s 2ms/step - loss: 0.4937 - sparse_categorical_accuracy: 0.7870
Epoch 36/50
  55/1864 [..............................] - ETA: 3s - loss: 0.4931 - sparse_categorical_accuracy: 0.7814

The deep and cross model achieves ~81% test accuracy.


Conclusion

You can use Keras Preprocessing Layers to easily handle categorical features with different encoding mechanisms, including one-hot encoding and feature embedding. In addition, different model architectures — like wide, deep, and cross networks — have different advantages, with respect to different dataset properties. You can explore using them independently or combining them to achieve the best result for your dataset.