Discussion on: What is TensorFrames? TensorFlow + Apache Spark

View post

Replies for: Hi Gavin, thank you for your comment. it this the one you mean - databricks.com/tensorflow/distribu... ? from their docs it seems like the graph ...

Yeah that makes sense. At first I thought tf.distribute.MirroredStrategy works with clusters on separate machines as well, but it looks like that's only for devices on the same machine, and that we only have parallel execution of sections of graphs.

That being said, you would think that they'd make data level parallelism with tf.keras easier wouldn't you?

Adi Polak • Mar 26 '19

I would.
It seems like at the moment that tf.keras is an implementation of the Keras API on TensorFlow.
but wait! we can develop in Keras without TensorFlow. Keras is in an individual library for deep learning. There is an interesting project of Keras on top of Apache Spark, named - Elephas: Distributed Deep Learning with Keras & Spark.

As a whole, from discussions and online forums, many Data scientists say that Keras is better for Deep learning since TensorFlow can be a bit complicated to start with.

Gavin Fernandes • Mar 26 '19

Yeah I know keras is an independent library as well, and yeah it is simpler, but I started machine learning with the low level tensorflow API and only then learnt keras. I do use just keras where I can though.

Currently I'm working on a project that requires the sort of fine control over the training process that only tensorflow can give me, although I haven't tried theano or the rest yet, and it would be infeasible to move to another library with the time constraints we have.

Adi Polak • Mar 27 '19

yeah, project and time constraints are super important. How do you find TensorFlow? From your perspective, how can one become proficient in it?

Gavin Fernandes • Mar 27 '19 • Edited

I like tensorflow and all, but I can't say its without its flaws. It feels like parts of the library are duplicated elsewhere within, and some sections lack succint documentation.

I was working with TFRecord a few weeks ago, and the long and short of it is there were two different ways of writing a TFRecord, and both gave you different output files, which were both valid TFRecords. Plus TFRecords aren't simple feature-label <rant> ... </rant>.
Jeez, I stuck to pandas after that.

I think tensorflow is going in the right direction though. They're working to bring keras and estimators closer together with tf 2.0, and in all fairness to them, some of the bumpy edges that I encountered were sections still in development.

Now my perspective is probably not representative of the wider community here on dev.to. For one thing, I don't do JS/WebDev, and stick to C/C++ and python(3), dabbling in Dart and Clojure a bit. For another, my aim isn't to be a data scientist / coder, and I am by no means proficient in tensorflow. With that said, I feel like the best way to get better with tf is to use it more, whether that be in personal projects, or contributing to someone elses. If you really want to push yourself, and have the time to spare, you could try reimplementing bits of tensorflow, say for example the Convolutional layer, or tanh activation, or maybe even an optimizer. When you're done you can compare it with what the tensorflow source code does as a benchmark.