One of the bigest challenges I faced when scaling Elasticsearch was managing shard allocation efectively. We had issues with unevenly distribted data, which led to some nodes being overloaded while others sat underutillized. This caused major performace bottlenecks and slow query respons. To fix it, we had to dive deep into shard managment, adjusting the number of primay shards and repicas for different indices based on usage patterns. Additionally, we tweaked the indexing strategy to reduce the load during peak hours, like batching data more efficently. Once we found the right balance, things ran a lot smoother