The earliest parallel processing systems assumed that it was a requirement that memory be shared, at some level, between processing elements attacking a given problem. It's true that a shared memory programming model is intuitive, but it is also subtle, and more difficult than many programmers believe to manage correctly. More to the point, shared access to large memories exacerbates the issues in memory addressing, cross-sectional bandwidth, and latency that have plagued supercomputer designers since the beginning.
Massively parallel machines deploy numbers of processors so large that shared memory semantics aren't practical. There is no attempt to disguise this limitation. Each node has one, or perhaps several (rarely more than 4) processors, memory for the local processor(s), and access to an explicit, program-directed communications network. This system architecture imposes severe constraints on the algorithms used. It is excruciatingly inefficient for algorithms where each parallel program thread must touch every element of the data set, or where they must touch elements at random. But the scalability offered exceeds any other high performance computer architecture paradigm, and that has motivated research and development of languages, tools, and algorithms adapted to massively parallel programming.
Massive parallelism became the dominant paradigm for very high performance computing by the start of the 21st century.