set_distribution functionkeras.distribution.set_distribution(value)
Set the distribution as the global distribution setting.
Arguments
Distribution instance.Guides and examples using set_distribution
distribution functionkeras.distribution.distribution()
Retrieve the current distribution from global context.
list_devices functionkeras.distribution.list_devices(device_type=None)
Return all the available devices based on the device type.
Note: in a distributed setting, global devices are returned.
When device_type is not provided, devices of the default type are
returned. This function nevers return a mix of device types, for instance
GPUs and CPUs.
Arguments
"cpu", "gpu" or "tpu". Defaults to
"gpu" or "tpu" if available when device_type is not provided.
Otherwise returns the "cpu" devices.Return: List of string, the devices that are available for distributed computation. Each device is formatted as "device_type:id", for instance "gpu:1" or "cpu:0".
initialize functionkeras.distribution.initialize(
job_addresses=None, num_processes=None, process_id=None
)
Initialize the distribution system for multi-host/process setting.
Calling initialize will prepare the backend for execution on multi-host
GPU or TPUs. It should be called before any computations.
Note that the parameters can also be injected via environment variables, which can be better controlled by the launch script at startup time. For certain backend that also rely on the environment variables to configure, Keras will properly forward them.
Arguments
None, and the
backend will figure it out with the TPU environment variables. You
can also config this value via environment variable
KERAS_DISTRIBUTION_JOB_ADDRESSES.None, and the backend will figure it out with the TPU
environment variables. You can also configure this value via
environment variable KERAS_DISTRIBUTION_NUM_PROCESSES.0 to num_processes - 1. 0 will indicate
the current worker/process is the master/coordinate job. You can
also configure this value via environment variable
KERAS_DISTRIBUTION_PROCESS_ID.Example
Suppose there are two GPU processes, and process 0 is running at
address 10.0.0.1:1234, and process 1 is running at address
10.0.0.2:2345. To configure such cluster, you can run
On process 0:
keras.distribute.initialize(
job_addresses="10.0.0.1:1234,10.0.0.2:2345",
num_processes=2,
process_id=0)
On process 1:
keras.distribute.initialize(
job_addresses="10.0.0.1:1234,10.0.0.2:2345",
num_processes=2,
process_id=1)
or via the environment variables: On process 0:
os.environ[
"KERAS_DISTRIBUTION_JOB_ADDRESSES"] = "10.0.0.1:1234,10.0.0.2:2345"
os.environ["KERAS_DISTRIBUTION_NUM_PROCESSES"] = "2"
os.environ["KERAS_DISTRIBUTION_PROCESS_ID"] = "0"
keras.distribute.initialize()
On process 1:
os.environ[
"KERAS_DISTRIBUTION_JOB_ADDRESSES"] = "10.0.0.1:1234,10.0.0.2:2345"
os.environ["KERAS_DISTRIBUTION_NUM_PROCESSES"] = "2"
os.environ["KERAS_DISTRIBUTION_PROCESS_ID"] = "1"
keras.distribute.initialize()
Also note that for JAX backend, the job_addresses can be further
reduced to just the master/coordinator address, which is
10.0.0.1:1234.
get_device_count functionkeras.distribution.get_device_count(device_type=None)
Returns the number of available devices based on the device type.
When device_type is not provided, the count of devices of the default type
is returned. This function nevers counts a mix of device types, for instance
GPUs and CPUs.
Arguments
"cpu", "gpu" or "tpu". Defaults to
"gpu" or "tpu" if available when device_type is not provided.
Otherwise returns the "cpu" devices.Returns