My thinking is that those parts would still need to be capable of type conversion of some sort, given that the X module might output, say, strings while Y gives away arrays
Neural networks work by having information used as a set of numbers, weights, and functions. They don't really output "numbers" or "strings" or "arrays" they almost always have numbers in, numbers out, then the numbers are converted into whatever meaning they should need.
For example, a network that identifies age from an image will take in numbers as R G B values across a massive array and use a big set of arrays and functions to narrow those down to a single number that slowly is "optimized" until it starts matching somewhat accurate examples.
So a network that deals with strings may just break that string down into characters and have a node for each character. A network that deals with images uses a node for each RGB value.
So if you have a network that was working with words in one type of sentence it will be used to having one region lighting up meaning one thing. However, if the sentence is of a new type then the network will have that meaning light up, but it will not mean what it should.
That's the sort of issue with different datatypes, not so much arrays or ints or doubles or anything of that sort.
Say you can have a sentence
This is a great time.
This is a horrible time.
And you train a neural network on those.
Now you input a sentence.
This time was a great one
The area that normally lights up in response to "great" is now staying dim with the input of "was", and will say that the sentence is not positive. It was trained in an environment where great or horrible always appear in the same place, and assumes that fact. It will mess up when given a different input, and will be useless in those cases.